Archive

Author Archive

Rubik’s cubes and Selmer groups

One of my biggest regrets when I left academic math and number theory behind in 2007 was that I never finished writing up and publishing some cool results I’d been working on with Manjul Bhargava about what we called “3x3x3 Rubik’s cubes”.

Just a teeny bit of background. Say you have a 3x3x3 matrix filled with numbers, including in the very center. So you have 27 numbers in a special 3-dimension configuration. Since there are three axis for such a cube, there are three ways of dividing such a cube into three 3×3 matrices A, B, and C. Once you do that you can get a cubic form by computing

det(Ax + By + Cz),

which gives you a cubic equation in three variables, or in other words a genus one curve.

Actually you get three different genus one curves, since you do it along any axis. Turns out there are crazy interesting relationships between those curves, as well as in the space of all 3x3x3 cubes.

Just talking about that stuff gets me excited, because it’s first of all a really natural construction, second of all number theoretic, and third of all it actually makes me think of solving Rubik’s cubes, which I’ve always loved.

Anyhoo, I gave my notes to a grad student Wei Ho when I left math, and she and Manjul recently came out with this preprint entitled “Coregular Spaces and genus one curves”, which is posted on the mathematical arXiv.

First, what’s freaking cool about their paper, to me personally, is that my work with Manjul has been incorporated into the paper in the form of parts of sections 3.2 and 5.1.

But what’s even more incredibly cool, to the mathematical world, is that Wei and Manjul are going to use this paper as background to understand the average size of Selmer groups of elliptic curves, a really fantastic result. Here’s the full abstract of their paper:

A coregular space is a representation of an algebraic group for which the ring of polynomial invariants is free. In this paper, we show that the orbits of many coregular irreducible representations where the number of invariants is at least two, over a (not necessarily algebraically closed) field k, correspond to genus one curves over k together with line bundles, vector bundles, and/or points on their Jacobians. In forthcoming work, we use these orbit parametrizations to determine the average sizes of Selmer groups for various families of elliptic curves.

One last thing. I am lucky enough to be a neighbor of Wei right now, as she finishes up a post-doc at Columbia, and she’s agreed to explain this stuff to me in the coming weeks. Hopefully I will remember enough number theory to understand her!

Categories: math

Someone explain to me how accountants think about Capital Appreciation Bonds (#OWS)

Book Club

Yesterday we had a great discussion in our Alternative Banking Book Club about municipal financing, based on the sixth chapter in our book Occupy Finance called A Civics Lesson: Wall Street Feasts on the Commons. The conversation was kindly led by Tom Sgouros, a policy analyst and author from Rhode Island, which seems to be a hotbed of super terrible muni financing.

It was explained that shady deals in muni finance is all over the map, from price fixing in municipal bond deals, which is corruption strictly on the side of the big banks who finance the town’s deals to accounting tricks, where it takes the collusion of town officials to enter into shady and inappropriate contracts.

The thing I’d never really understood until yesterday was how people used Capital Appreciation Bonds to play tricks with their accounting, and specifically with their town’s debt limits.

Context around muni financing

A little more context, although I’m no expert (experts, please add details or correct me if I’ve misrepresented anything). Please also read the chapter, which is excellent and much broader.

First of all, by “municipalities” we mean towns and cities (actually, states and counties, too, not to mention water authorities, economic development agencies, school departments, and all the rest of the “not-federal-not-corporate”). So a town needs to borrow money for something, maybe to pay its workers, maybe to build something or maintain its roads. It borrows money from investors by issuing a muni bond, and the big banks help set that up. Investors invest in these bonds because they have special tax treatment, because they rarely default, and because they want to support their local communities.

But as you can imagine, the big banks have much more expertise on what kind of prices to expect and the level of sophistication it requires to do due diligence, and then if you add into that mix the fact that local town officials are often temporary, ignorant, and desperate, we get a toxic environment. There are lots of examples of this problem, and often they are covered up by the local towns because of associated embarrassment, complicity, and shame. Seriously, it’s awful, and we only hear about some of them like in Detroit and Stockton, when things are incredibly awful. Matt Taibbi has done an amazing job chronicling this stuff.

Anyhoo, with that backdrop, you can imagine that there are bad situations handed to town officials when they enter office, and they are confronted with a major league problem: they need to come up with money now to pay something basic like school teachers or firemen, but there’s no cash. And plus there’s a debt limit which they’re already pushing up against.

Zero coupon

Enter the Capital Appreciation Bond (CAB). It’s a zero-coupon bond, which is already weird. For most muni bonds, towns regularly – quarterly or annually – pay interest or so-called amortizing sums, very much as an individual homeowner might pay monthly for their mortgages, where most of it goes to interest but every month a little bit of the principal is paid off too.

But for CABs, you get some money now and you pay nothing at all until it’s due, at which point you pay it all back at once.

Very very long term debt

You may have even forgotten about it by then, though, because the second weird thing about CABs is that the loans are often very very long term – as in 30 or 40 years. So, given the nature of the set-up and the nature of compound interest, you can end up paying something like 7 times the original amount after that much time.

For example, we see a school district like San Mateo in California borrowing $190 million recently that, when the bond comes due, will owe $1 billion. And it’s widespread in California: according to this article, 200 California school and community college districts issuing these bonds will end up paying 10 to 20 times more than they borrowed.

Accounting practices and CABs

That brings me to the third weirdest property of CABs, namely how they look on balance sheets for accounting purposes.

Namely, and here I need to confess that I’ve been a very bad accounting student, the towns only have to write the original loan down on the balance sheet as a liability, not the eventual pay-out. This is in contrast with other kinds of very similar zero-coupon bonds where you have to write down the eventual payment you will owe, not the amount you start with.

Someone please explain this discrepancy in the field of accounting!! It makes no sense to me. If the cash flows are the same for two different kinds of bonds, how do you get to account for them differently?

Conclusion

In any case, the consequences of this accounting trick are real. In particular it means that, for desperate town officials trying to pay their workers, or even shady town officials trying to get away with stuff, CABs are very attractive indeed, because it looks kind of innocuous and their overall debt limits don’t get breached even though they’ve essentially sold the future of the town to a big Wall Street bank. Plus they won’t be in office when that bill comes due, and they might well be dead.

Some California officials are trying to make CABs illegal or at least restricted, and some states like Michigan and Ohio have already passed laws against them. But given how much money they make for big banks, there are serious headwinds for reasonable rules.

Categories: #OWS, finance

It’s a good day to join a revolution #OWS

I’m too angry today to dole out advice.

Instead I think I’ll join a revolution. This one, that Russell Brand is talking about.

Categories: #OWS

The multiple arms races of the college system

I’m reading a fine book called Nobody Makes You Shop at Walmart, which dispels many of the myths surrounding market populism, otherwise described in the book as “MarketThink”, namely the rhetoric which “portrays the world (governments aside) as if it works like an ideal competitive market, even when proposing actions that contradict that portrayal,” according to the author Tom Slee.

I’ve gotten a lot out of this book, and I suggest that you guys read it, especially if you are libertarians, so we can argue about it afterwards.

One thing Slee does is distinguish between different kinds of competitive and power-dynamic systems, and fingers certain situations as “arms races”, in which there are escalating costs but no long-lasting added value for the participants. They often involve relative rankings.

Slee’s example is a neighborhood block where all the men on the block compete to have the nicest cars. Each household spends a bunch of money to rise in the rankings just to have others respond by spending money too, and at the end of a year they’ve all spent money and none of the rankings have actually changed.

One of Slee’s overall points about arms races is that the way to deal with them is through armament agreements, which everyone involved needs to sign onto. Later in the book he also talks about how hard it is to get large groups of people to agree to anything at all, especially vague social contracts, when there’s an advantage to cheating, something he calls “free riding.” (as a commenter pointed out to me, free riding is more like someone who gets something for nothing, like a worker who benefits from the work of a union without being in the union and paying dues. This is just cheating.)

I’d argue, and I believe the book even uses this example, that education can be seen as an arms race as well. Take the statistics in this Opinionator blog from the New York Times, written by Jonathan Cowan and Jim Kessler, and entitled “The Middle Class Gets Wise.”

It describes how much more money the average high school graduate, versus two-year college, versus four-year college, versus professional degree graduate makes. In other words, it describes the payoffs to being higher ranked in that system. The money is real, of course, and everyone is aware of it as an issue even if they don’t know the exact numbers, so it is very analogous to the car status thing.

Cowan and Kessler describe in their article how, in the face of recession, lots more people have gone to college. That makes sense, since many of them didn’t have jobs and wanted to make themselves employable in the future, and at the same time people knew the job climate was even more rank-oriented since it has become tighter. People responded, in other words, to the incentives.

There’s a feedback loop going on in colleges as well, of course, and paired with the federal loan program and the fact that students cannot get rid of student debt in bankruptcy, we’ve seen a predictable (in direction if not size) and dramatic increase in tuition and student debt load for the younger generation.

My reaction to this is: we need an armament agreement, but it’s really not clear how that’s going to all of a sudden appear or how it would work, considering the number of entities involved, and the free rider problems due to the cash money incentives everywhere.

From the point of view of employers, rankings are great and they can be sure to pick the highest ranked individuals from that system, even if that means – as it often does – having Ph.D. graduates working in mailrooms. So don’t expect any help from them to add sanity to this system.

From the point of view of the colleges, they’re getting to hire more and more administrators, which means growth, which they love.

Finally, from the point of view of the individual student, it makes sense to go into debt, with almost no limit (to a point, but people rarely do that calculation explicitly, and if they did there’d be intense bias) to get significantly higher in the ranking.

In other words, it’s a shitshow, and possibly the only real disruption that could improve it would be widespread and universally respected basic and free-ish education. At least that would solve some of the arms race problems, for employers and for students. It would not make colleges happy.

The authors of the Opinionator piece, Cowan and Kessler, don’t agree with me. They have a goal, which is for even more people to go to school, and for tuition to be somehow magically decreased as well. In other words, up the antes for one feedback loop and hope its partner feedback loop somehow relaxes. Here’s the way they describe it:

So what can we do? Anya Kamenetz, the author of “Generation Debt,” has put together some excellent ideas for Third Way, the centrist policy organization where we both work. Let’s start by reducing the number of college administrators per 100 students, which jumped by 40 percent between 1993 and 2007. We should demand a cease-fire to the perk wars in which colleges build ever-more-luxurious living, dining and recreational facilities. Blended learning, which uses online teaching tools together with professors and teaching assistants, could also help students master coursework at less cost.

There are 37 million Americans with some college experience, but no degree. So pegging government tuition aid to college graduation rates would entice schools to find ways of keeping students in class. And eliminating some of the offerings of rarely chosen majors could bring some market efficiencies now lacking in education.

That really just doesn’t seem like a viable plan to me, and pegging government money to graduation rates is really stupid, as I described here, but maybe I’m just being negative. Cowan and Kessler, please tell me how that “demand” is going to work in practice.

Also, what’s funny about their idealistic demand is that they also think of a couple other things to do but dismiss them as unrealistic:

The most commonly discussed solutions to the problem of income inequality seem unlikely to get to the heart of the problem. Yes, we could raise additional taxes on the wealthy, but we just did that. Bumping up the minimum wage would help, but how high would lawmakers allow it to go? We should look instead at what Americans are already doing to solve this problem and help them do it far more successfully and at less cost.

Am I the only one who thinks raising the minimum wage would help more to address income inequality and is easier to imagine working?

Categories: modeling, musing

Occupy for a Fair and Living Wage #OWS

I wanted to mention an important action that’s happening today at 5:00pm in Herald Square (34th Street and 6th Avenue) in case you are nearby and can join us.

The action is focused on raising the minimum wage. It was planned by OccuEvolve in conjunction with other Occupy groups, including Alternative Banking which made a bunch of signs this past weekend, in solidarity with the 75th Anniversary of the passing of the minimum wage law. The idea is to demand raising the minimum wage to at least 15 dollars a hour.

For a little context, here’s a chart showing the history of the U.S. Federal minimum wage since it began:

History_of_US_federal_minimum_wage_increases

 

Many states have their own minimum wage laws that are either higher of lower than the federal law, and some cities have even more local minimum wages as well. Since federal law supersedes state law, I’m going to assume these guys are just behind recent increases in federal rates. Here’s a picture of the state-by-state minimum wage landscape:

stateminwage

 

I’ve never done the math on how it would be even close to possible to live on an hourly wage of $7.25 but it’s clearly not possible to, say, budget for emergencies even in the most frugal of approaches.

That general fact is embedded in this Bloomberg Businessweek article which argues that Walmart is subsidized by taxpayers and is a drag on growth. The article refers to a report put out by the Democratic staff of the U.S. House Committee on Education and the Workforce entitled The Low-Wage Drag on Our Economy: Wal-Mart’s low wages and their effect on taxpayers and economic growth. It contains this excerpt:

While employers like Wal-Mart seek to reap significant profits through the depression of labor costs, the social costs of this low-wage strategy are externalized. Low wages not only harm workers and their families—they cost taxpayers.

Here’s a graphic showing which big employers are the worst culprits:

minimumwage_infographic_aug21

Let’s demand better tonight at 5:00pm in Herald Square.

 

Categories: #OWS

The scienciness of economics

A few of you may have read this recent New York TImes op-ed (hat tip Suresh Naidu) by economist Raj Chetty entitled “Yes, Economics is a Science.” In it he defends the scienciness of economics by comparing it to the field of epidemiology. Let’s focus on these three sentences in his essay, which for me are his key points:

I’m troubled by the sense among skeptics that disagreements about the answers to certain questions suggest that economics is a confused discipline, a fake science whose findings cannot be a useful basis for making policy decisions.

That view is unfair and uninformed. It makes demands on economics that are not made of other empirical disciplines, like medicine, and it ignores an emerging body of work, building on the scientific approach of last week’s winners, that is transforming economics into a field firmly grounded in fact.

Chetty is conflating two issues in his first sentence. The first is whether economics can be approached as a science, and the second is whether, if you are an honest scientist, you push as hard as you can to implement your “results” as public policy. Because that second issue is politics, not science, and that’s where people like myself get really pissed at economists, when they treat their estimates as facts with no uncertainty.

In other words, I’d have no problem with economists if they behaved like the people in the following completely made-up story based on the infamous Reinhart-Rogoff paper with the infamous excel mistake.

Two guys tried to figure what public policy causes GDP growth by using historical data. They collected their data and did some analysis, and they later released both the spreadsheet and the data by posting them on their Harvard webpages. They also ran the numbers a few times with slightly different countries and slightly different weighting schemes and explained in their write-up that got different answers depending on the initial conditions, so therefore they couldn’t conclude much at all, because the error bars are just so big. Oh well.

You see how that works? It’s called science, and it’s not what economists are known to do. It’s what we all wish they’d do though. Instead we have economists who basically get paid to write papers pushing for certain policies.

Next, let’s talk about Chetty’s comparison of economics with medicine. It’s kind of amazing that he’d do this considering how discredited epidemiology is at this point, and how truly unscientific it’s been found to be, for essentially exactly the same reasons as above – initial conditions, even just changing which standard database you use for your tests, switch the sign of most of the results in medicine. I wrote this up here based on a lecture by David Madigan, but there’s also a chapter in my new book with Rachel Schutt based on this issue.

To briefly summarize, Madigan and his colleagues reproduce a bunch of epidemiological studies and come out with incredible depressing “sensitivity” results. Namely, that the majority of “statistically significant findings” change sign depending on seemingly trivial initial condition changes that the authors of the original studies often didn’t even explain.

So in other words, Chetty defends economics as “just as much science” as epidemiology, which I would claim is in the category “not at all a science.” In the end I guess I’d have to agree with him, but not in a good way.

Finally, let’s be clear: it’s a good thing that economists are striving to be scientists, when they are. And it’s of course a lot easier to do science in microeconomic settings where the data is plentiful than it is to answer big, macro-economic questions where we only have a few examples.

Even so, it’s still a good thing that economists are asking the hard questions, even when they can’t answer them, like what causes recessions and what determines growth. It’s just crucial to remember that actual scientists are skeptical, even of their own work, and don’t pretend to have error bars small enough to make high-impact policy decisions based on their fragile results.

Categories: modeling, rant, statistics

Disorderly Conduct with Alexis and Jesse #OWS

Podcast

So there’s a new podcast called Disorderly Conduct which “explores finance without a permit” and is hosted by Alexis Goldstein, whom I met through her work on Occupy the SEC, and Jesse Myerson, and activist and a writer.

I was recently a very brief guest on their “In the Weeds” feature, where I was asked to answer the question, “What is the single best way to rein in the power of Wall Street?” in three minutes. The answers given by:

  1. me,
  2. The Other 98% organizer Nicole Carty (@nacarty),
  3. Salon.com contributing writer David Dayen (@ddayen),
  4. Americans for Financial Reform Policy Director Marcus Stanley (@MarcusMStanley), and
  5. Marxist militant José Martín (@sabokitty)

can be found here or you can download the episode here.

Occupy Finance video series

We’ve been having our Occupy Finance book club meetings every Sunday, and although our group has decided not to record them, a friend of our group and a videographer in her own right, Donatella Barbarella, has started to interview the authors and post them on YouTube. The first few interviews have made their way to the interwebs:

  1. Linda talking about Chapter 1: Financialization and the 99%.
  2. Me talking about Chapter 2: the Bailout
  3. Tamir talking about Chapter 3: How banks work

Doing Data Science now out!

O’Reilly is releasing the book today. I can’t wait to see a hard copy!! And when I say “hard copy,” keep in mind that all of O’Reilly’s books are soft cover.

Categories: #OWS, data science, finance

Occupy the SEC, Dodd-Frank, and who has standing in financial regulation #OWS

A bit more than a week ago Akshat Tewery came to my Occupy group to discuss his chapter in the book we wrote called Occupy Finance [1].

Akshat is a member of Occupy the SEC [2] and came to talk to us about a short history of financial regulation, and how impressively well things worked in the middle of the last century, when Glass-Steagall was in effect and before it was gamed.

One thing he mentioned in his fascinating hour-long lecture was this lawsuit which I hadn’t heard about. Namely, Occupy the SEC sued the Federal Reserve, SEC, CFTC, OCC, FDIC and U.S. Treasury over not doing their jobs, specifically for the delay in finishing and implementing the Volcker Rule.

You see, Dodd-Frank is a law, and the Volcker Rule, which is supposed to be something like a modern Glass-Steagall act, is part of it. But the law just outlines the rules, and the regulators are supposed to actually turn that law into regulations which they then implement.

There was a deadline for that, and it has passed. So Occupy the SEC sued to make those guys get the job done.

And guess what? The judge found that they didn’t have standing to sue. I’m no lawyer but from what I can understand this means they were deemed not sufficiently relevant to the implementation of Dodd-Frank. They didn’t have enough skin in the game, in other words. Because they’re just, you know, citizens who care about having a functional regulatory environment. Not to mention taxpayers who have bailed out the banks and want to avoid continuing doing that.

That begs the question, who has skin in the regulation game? Answer: banks being regulated. So only those guys can complain to the courts about the regulation. And obviously their complaints will be different from Occupy the SEC’s complaints.

It seems like whenever I look around I see examples like this, where there are people getting away with crappy policies or even crappier deeds because it has a negative effect, but that negative effect is so dispersed that most people don’t have enough “standing” to sue or to even effectively quantify how they’ve been affected.

And I guess this is the land of class-action lawsuits, but that doesn’t seem sufficient. It really seems like there needs to be legal representation for taxpayers somehow. Who is looking out for the average non-insider? Who is keeping tabs on overall systemic risk? In an ideal world that would exist inside the regulators themselves, but we all know it’s not that ideal.

1. We’re out of copies, but if you don’t have a copy of Occupy Finance but you want one, go to our IndieGogo campaign and donate $10 and you’ll get a copy of the book as a thank you.

2.  Which, if you don’t know, consists of an amazing and wonky group of occupiers who write public commenting letters on financial regulation. Their Volcker Rule comments have made quite an impression on regulators, but they’ve also written numerous amicus briefs on various issues as well. Keep an eye on their work on their webpage.

Categories: #OWS, finance

Aunt Pythia’s advice

October 19, 2013 Comments off

Aunt Pythia has a 5-year-old’s birthday party to manage this morning, so she’s going to be more to the point, less philosophical, and overall slightly less fun and sexy than usual, for which she apologizes.

On second thought, they say less is more, so let’s assume it’s just as sexy if not more sexy.

Apology rescinded.

And, please, Aunt Pythia readers: I’ve been plowing through questions faster than I’ve been receiving them, so please

ask me a question at the bottom of the page!

By the way, if you don’t know what the hell I’m talking about, go here for past advice columns and here for an explanation of the name Pythia.

——

Dear Aunt Pythia,

Seeing as Halloween is coming up soon, I was just thinking about what to dress up as (well, looking online at pictures of other people’s ridonculous costumes). In the middle of my search, my brother walked into the room. Thinking that he may be of some help, I asked him what I should dress up as. He answered that I should just go as myself; it’ll be the scariest costume guaranteed.

How should I respond?

Sad Face Pumpkin

Dear SFP,

I think your brother is right, and you should acknowledge that.

Let’s face it, our society is filled with phonies getting up every morning and putting on costumes for work to hide their true inner selves. Being an authentic human being is incredibly intimidating to such people, and they might be terrified when they see you.

Partly this is because it’s just so incredibly rare to see someone be an unqualified human being that the “unknown” aspect is scary, and partly because they’re worried that, if you’re doing it, then they might be expected to do it too. Persevere though, and be brave. It’s worthwhile in spite of such reactions.

Aunt Pythia

——

Dear Aunt Pythia,

Since I had my first baby (a four month old little boy), my mother has starting buying him gifts frequently. Most of these are completely unnecessary, or superfluous, or more expensive than what we need or I would consider affordable.

I don’t want them and it stresses me out because I don’t think my mother can afford them either. She is completely innumerate. In fact, she doesn’t even seem to comprehend large numbers at all. 100 and 1000 and 10000 all mean the same thing to her.

Instead of budgeting with numbers, she tries to balance out a sense of deprivation (so she’ll try to balance out spending $100 on luxuries by buying cheap bread that tastes bad for a month, even though that doesn’t work at all).

Even though she is in her sixties, she constantly has a credit card debt, has kept the same mortgage for the last twenty years, and has minimal retirement savings. I wish she would stop buying us baby clothes from expensive department stores and save it instead. I’ve tried returning them and giving her the money back, and asking her not to buy any more, but often I can only get store credit. In any case she won’t take the money back, and then a few weeks later she’ll come over with a new set of clothes that are already almost too small for him.

Sometimes I lie awake at night stressing about it. I feel powerless to stop her but when she gets too old to work I think it will become my problem as well and I unfortunately don’t earn very much money. What should I do?

Anxious

Dear Anxious,

It’s a huge problem, and your mom is obviously not the only person in that situation. In fact I expect to hear more and more about retirees in huge debt problems in the next few years. Of course some retirees have saved a bunch of money, but not all of them to be sure.

My advice, and this is just on first reflection and I’d invite other readers to give their input, is to stay far away from your mom’s money, legally speaking. She is likely not going to accept your advice, and although it’s probably worth suggesting she go to talk to a non-profit community finance class on budgeting like at a local credit union, I don’t expect this will actually make her instantaneously frugal.

Here’s what I wouldn’t do if I were you: pay off her debts. There would just be more where those came from. When she is unable to pay her debts, by all means help her connect with a lawyer to declare bankruptcy, and help her cope with debt collectors (read the Debt Resistors Operations Manual to learn more about her rights and theirs).

Here’s another thing I wouldn’t do: in any way shape or form become a co-signatory on anything with her. Then you will be liable for her debts.

In the best of worlds, your mom will run up pretty big debts, the credit card companies will figure out she’s never going to pay back those debts, she will declare bankruptcy, and then nobody will give her any more credit. To be sure you will want to make sure she always has food and a place to live and medicine, but think of that as a separate issue from her piling-up debts, which is in the end the problem of the banks that gave her credit cards she couldn’t be trusted with.

Good luck, and enjoy motherhood!

Aunt Pythia

——

Hi Auntie P,

Thank you for answering my “sock” question, but my apologies for not phrasing it properly, and so misleading you as to my intention. Perhaps you will permit me to resubmit it, and – having seen your “not enough sex” comment on 21st – I will try to put some of that into it, instead of boring old socks. 

Let’s imagine that 44 men and 116 women sign up for a dating evening. Each is given a number, and they are drawn at random – the organizer forgetting to ask any basic questions like “sexual orientation?” or to put the men’s numbers in one pot and the women’s in another. As the numbers are drawn out, the first person is paired with the second, the third with the fourth, etc.

So my question is this: how many M/M pairings will there be? Alternatively, what are the chances of getting exactly n such couples?

Socks Maniac

Dear Socks Maniac,

I don’t usually do this, but I’m gonna steal a commenter’s answer whole hog from that post, which I guess you didn’t see. This is from Michael Kleber, whom I’ve know approximately 20 years, and I’ve adjusted it to be sexy like I know we want it:

I think Socks Maniac’s drawer contains lots of individual socks people which get paired up blindfolded. That gives you X all-black male pairs, Y all-white female pairs, and Z mixed black-white male-female ones, and the question is the probability that X is exactly 10.

This can also be answered by counting, but it’s a little uglier. There are 160-choose-44 orders in which you can pull the socks out of the drawer blindfolded people out of the dungeon, of course. To count the number of ways to get exactly X/Y/Z black/white/male/female/mixed pairs, you can think of lining up 80 slots dungeon lairs and picking X of them to get two black socks blindfolded men, Y of the remaining 80-X to get two white ones blindfolded women — and then for the remaining Z slots dungeon lairs you need to pick whether a black or a white sock man or woman was pulled out first, so that’s another 2^Z choices to worry about. So 80-choose-X * (80-X)-choose-Y * 2^Z.

Since Socks Maniac told us X=10, that accounts for 20 of the 44 black socks blindfolded men, leaving 24 black socks blindfolded men paired with 24 white ones blindfolded women (so Z=24), and the other 92 white socks blindfolded women paired up into Y=46 all-white women-on-women pairs. So the number of ways to get exactly 10 all-black male pairs is (80 choose 10) * (70 choose 46) * 2^24. Dividing by the 160-choose-44 to pull socks out of the drawer in the first place, and Wolfram Alpha says you get around 0.01854, or a little under a 2% chance.

Hmm, I see I can’t post links, or even mention the Wolfram Alpha web site by name, without sounding like spam. But anyway, it will happily evaluate

((80 choose 10) * (70 choose 46) * 2^24) / (160 choose 44).

Thanks, Michael!

Aunt Pythia

——

Dear Aunt Pythia,

From reading your blog/column, you sound like an outgoing, extroverted type. So maybe you can give a few pointers to we introverts: what are some good ways to start conversations with strangers? I tend to do OK once I’m actually talking to somebody, but I always feel awkward when trying to initiate contact with other people.

I’m single and I don’t have a ton of friends, so this seems like a useful skill to develop.

I’m Nervous To Join

Dear INTJ,

I think the key is to project a friendliness and openness to the stranger you are talking to, and if it turns out you’re wrong and the person is unfriendly or closed off, then not taking it personally.

So for example, when I see people knitting awesome stuff on the subway, I am pretty much always going to pipe up and tell them how beautiful that piece is. About 65% of the time this leads to an excited conversation about how awesome and useless knitting skills are, and sometimes even leads to the discovery of a new yarn shop or sale or website for one of us. But the rest of the time the person has no interest in talking, and I just walk away. I don’t feel bad for being friendly and wanting to connect with someone, because that is frankly what humans do and it’s not something to be ashamed of.

Note one thing: there was a “reason” for me to talk in the above scenario, and that’s key. It doesn’t make sense to walk up to someone with absolutely no cause and strike up a conversation. Having said that, the reason doesn’t have to be all that good, especially if there’s alcohol involved. It could be as simple as, “I love your shirt!!”, although that’s an opener for truly extroverted people.

One last thing. The more confident you are that most people are friendly and open, the higher your chances are of making a connection, so that leaves you with a bit of a tough feedback loop to get into. I suggest having an extroverted wingwoman or wingman the first few times to show you some ropes and to demonstrate how fun it is to be friendly. And good luck!

Aunt Pythia

——

Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

Categories: Aunt Pythia

*Doing Data Science* now available on Kindle!

My book with Rachel Schutt is now available on Kindle. I’ve tested this by buying it myself from amazon.com and looking at it on my computer’s so-called cloud reader.

Here’s the good news. It is actually possible to do this, and it’s satisfying to see!

Here’s the bad news. The kindle reader doesn’t render latex well, or for that matter many of the various fonts we use for various reasons. The result is a pretty comical display of formatting inconsistency. In particular, whenever a formula comes up it might seem like we’re

screaming about it

and often the quoted passages come in

very very tiny indeed.

I hope it’s readable. If you prefer less comical formatting, the hard copy edition is coming out on October 22nd, next Tuesday.

Next, a word about the book’s ranking. Amazon has this very generous way of funneling down into categories sufficiently so that the ranking of a given book looks really high. So right now I can see this on the book’s page:

but for a while, before yesterday, it took a few more iterations of digging to get to single digits, so it was more like:

But you, know, I’ll take what I can get to be #1! It’s all about metrics!!!

One last thing, which is that the full title is now “Doing Data Science: Straight Talk from the Frontline” and for the record, I wanted the full title to be something more like “Doing Data Science: the no bullshit approach” but for some reason I was overruled. Whatevs.

Categories: data science

The case against algebra II

There’s an interesting debate described in this essay, Wrong Answer: the case against Algebra II, by Nicholson Baker (hat tip Nicholas Evangelos) around the requirement of algebra II to go to college. I’ll do my best to summarize the positions briefly. I’m making some of the pro-side up since it wasn’t well-articulated in the article.

On the pro-algebra side, we have the argument that learning algebra II promotes abstract thinking. It’s the first time you go from thinking about ratios of integers to ratios of polynomial functions, and where you consider the geometric properties of these generalized fractions. It is a convenient litmus test for even more abstraction: sure, it’s kind of abstract, but on the other hand you can also for the most part draw pictures of what’s going on, to keep things concrete. In that sense you might see it as a launching pad for the world of truly abstract geometric concepts.

Plus, doing well in algebra II is a signal for doing well in college and in later life. Plus, if we remove it as a requirement we might as well admit we’re dumbing down college: we’re giving the message that you can be a college graduate even if you can’t do math beyond adding fractions. And if that’s what college means, why have college? What happened to standards? And how is this preparing our young people to be competitive on a national or international scale?

On the anti-algebra side, we see a lot of empathy for struggling and suffering students. We see that raising so-called standards only gives them more suffering but no more understanding or clarity. And although we’re not sure if that’s because the subject is taught badly or because the subject is inherently unappealing or unattainable, it’s clear that wishful thinking won’t close this gap.

Plus, of course doing well in algebra II is a signal for doing well in college, it’s a freaking prerequisite for going to college. We might as well have embroidery as a prerequisite and then be impressed by all the beautiful piano stool covers that result. Finally, the standards aren’t going up just because we’re training a new generation in how to game a standardized test in an abstract rote-memorization skill of formulas and rules. It’s more like learning student’s capacity for drudgery.

OK, so now I’m going to make comments.

While it’s certainly true that, in the best of situations, the content of algebra II promotes abstract and logical thinking, it’s easy for me to believe, based on my very small experience in the matter that, it’s much more often taught poorly, and the students are expected to memorize formulas and rules. This makes it easier to test but doesn’t add to anyone’s love for math, including people who actually love math.

Speaking of my experience, it’s an important issue. Keep in mind that asking the population of mathematicians what they think of removing a high school class is asking for trouble. This is a group of people who pretty much across the board didn’t have any problems whatsoever with the class in question and sailed through it, possibly with a teacher dedicated to teaching honors students. They likely can’t remember much about their experience, and if they can it probably wasn’t bad.

Plus, removing a math requirement, any math requirement, will seem to a mathematician like an indictment of their field as not as important as it used to be to the world, which is always a bad thing. In other words, even if someone’s job isn’t directly on the line with this issue of algebra II, which it undoubtedly is for thousands of math teachers and college teachers, then even so it’s got a slippery slope feel, and pretty soon we’re going to have math departments shrinking over this.

In other words, it shouldn’t surprised anyone that we have defensive and unsympathetic mathematicians on one side who cannot understand the arguments of the empathizers on the other hand.

Of course, it’s always a difficult decision to remove a requirement. It’s much easier to make the case for a new one than to take one away, except of course for the students who have to work for the ensuing credentials.

And another thing, not so long ago we’d hear people say that women don’t need education at all, or that peasants don’t need to know how to read. Saying that a basic math course should become and elective kind of smells like that too if you want to get histrionic about things.

For myself, I’m willing to get rid of all of it, all the math classes ever taught, at least as a thought experiment, and then put shit back that we think actually adds value. So I still think we all need to know our multiplication tables and basic arithmetic, and even basic algebra so we can deal with an unknown or two. But from then on it’s all up in the air. Abstract reasoning is great, but it can be done in context just as well as in geometry class.

And, coming as I now do from data science, I don’t see why statistics is never taught in high school (at least in mine it wasn’t, please correct me if I’m wrong). It seems pretty clear we can chuck trigonometry out the window, and focus on getting the average high school student up to the point of scientific literacy that she can read a paper in a medical journal and understand what the experiment was and what the results mean. Or at the very least be able to read media reports of the studies and have some sense of statistical significance. That’d be a pretty cool goal, to get people to be able to read the newspaper.

So sure, get rid of algebra II, but don’t stop there. Think about what is actually useful and interesting and mathematical and see if we can’t improve things beyond just removing one crappy class.

Categories: math education, statistics

MAA Distinguished Lecture Series: Start Your Own Netflix

I’m on my way to D.C. today to give an alleged “distinguished lecture” to a group of mathematics enthusiasts. I misspoke in a previous post where I characterized the audience to consist of math teachers. In fact, I’ve been told it will consist primarily of people with some mathematical background, with typically a handful of high school teachers, a few interested members of the public, and a number of high school and college students included in the group.

So I’m going to try my best to explain three different ways of approaching recommendation engine building for services such as Netflix. I’ll be giving high-level descriptions of a latent factor model (this movie is violent and we’ve noticed you like violent movies), of the co-visitation model (lots of people who’ve seen stuff you’ve seen also saw this movie) and the latent topic model (we’ve noticed you like movies about the Hungarian 1956 Revolution). Then I’m going to give some indication of the issues in doing these massive-scale calculation and how it can be worked out.

And yes, I double-checked with those guys over at Netflix, I am allowed to use their name as long as I make sure people know there’s no affiliation.

In addition to the actual lecture, the MAA is having me give a 10-minute TED-like talk for their website as well as an interview. I am psyched by how easy it is to prepare my slides for that short version using prezi, since I just removed a bunch of nodes on the path of the material without removing the material itself. I will make that short version available when it comes online, and I also plan to share the longer prezi publicly.

[As an aside, and not to sound like an advertiser for prezi (no affiliation with them either!), but they have a free version and the resulting slides are pretty cool. If you want to be able to keep your prezis private you have to pay, but not as much as you’d need to pay for powerpoint. Of course there’s always Open Office.]

Train reading: Wrong Answer: the case against Algebra II, by Nicholson Baker, which was handed to me emphatically by my friend Nick. Apparently I need to read this and have an opinion.

Categories: math, math education, modeling

Are PayDay lenders better than banks? #OWS

Sometimes my plan of getting up super early to write on my blog fails, and this is one of those days. But I’m still going to ask you to read this article from the New Yorker written by Lisa Servon and entitled, “The High Cost, For The Poor, Of Using A Bank.” Here’s a key passage, but the whole thing is amazing, and yes, I’ve invited her to my Occupy group already:

To understand why, consider loans of small amounts. People criticize payday loans for their high annual percentage rates (APR), which range from three hundred per cent to six hundred per cent. Payday lenders argue that APR is the wrong measure: the loans, they say, are designed to be repaid in as little as two weeks. Consumer advocates counter that borrowers typically take out nine of these loans each year, and end up indebted for more than half of each year.

But what alternative do low-income borrowers have? Banks have retreated from small-dollar credit, and many payday borrowers do not qualify anyway. It happens that banks offer a de-facto short-term, high-interest loan. It’s called an overdraft fee. An overdraft is essentially a short-term loan, and if it had a repayment period of seven days, the APR for a typical incident would be over five thousand per cent.

It makes me wonder whether, if someone did a careful analysis with all-in costs including time and travel, whether PayDay Lenders are not actually a totally rational choice for the poor.

Categories: #OWS, finance, modeling, news

Indiegogo campaign for 2nd edition of Occupy Finance is up! (#OWS)

October 14, 2013 Comments off

Many of you have probably already received copies of Occupy Finance. Here’s my personal evidence, for which I was nearly arrested in the post office (who knew you’re not allowed to take pictures in the post office? not me.):

Occupy_Finance_USPS

We’re hoping you loved your book, or if you haven’t gotten a copy yet, that you’d like to get one soon.

The thing is, we’re very nearly out of copies, and plus there was a missing page and a few other typos for which we have forgiven ourselves, since we got it out in time for September 17th, but which we were happy to fix.

Our plan, if we manage to raise enough money (hopefully $3500), is to print a few thousand more copies (hopefully 5,000) and distribute them to places like libraries and bookstores, not to mention to any people we hear of who want to read the book. We’d prefer to raise money for the printing and then give them away over selling them, since we’d like anyone who wants one to have one regardless of their financial situation.

So here’s the Indiegogo page, and I hope you’ll go take a look and send it to your friends who might be interested in contributing. It features our favorite street performer and Action Committee Head Marni, which for this campaign we refer to as our “Indie GoGo Girl”. She does a really fantastic job explaining our goals in the campaign video, located here. You may also know her as the money bunny, she’s kinda famous. She also has a law degree.

moneybunny

The starting donation is $10, and if you’ve already given money to our group, don’t feel like I’m asking you a second time (I don’t wanna be like that!) but just go ahead and tell your friends about us. Thanks!

Also feel free to share the shortlink on twitter or what have you: http://tinyurl.com/occfinindie

Categories: #OWS

Plumping up darts

Someone asked me a math question the other day and I had fun figuring it out. I thought it would be nice to write it down.

So here’s the problem. You are getting to see sample data and you have to infer the underlying distribution. In fact you happen to know you’re getting draws – which, because I’m a basically violent person, I like to think of as throws of a dart – from a uniform distribution from 0 to some unknown d, and you need to figure out what d is. All you know is your data, so in particular you know how many dart throws you’ve gotten to see so far. Let’s say you’ve seen n draws.

In other words, given x_1, x_2, x_3, \dots, x_n, what’s your best guess for d?

First, in order to simplify, note that all that really matters in terms of the estimate of d is what is max_{i \in \{1, \dots, n\}} (x_i) and how big n is.

Next, note you might as well assume that d=1 and you just don’t know it yet.

With this set-up, you’ve rephrased the question like this: if you throw n darts at the interval [0,1], then where do you expect the right-most dart – the maximum – to land?

It’s obvious from this phrasing that, as n goes to infinity, you can expect a dart to get closer and closer to 1. Moreover, you can look at the simplest case, where n=1, and since the uniform distribution is symmetric, you can see the answer is 1/2. Then you might guess the overall answer, which depends on n and goes to 1 as n goes to infinity, might be n/(n+1). It makes intuitive sense, but how do you prove that?

Start with a small case where you know the answer. For n=1 we just need to know what the expected value of max(x_1) is, and since there’s one dart, the max is just x_1 itself, which is to say we need to compute a simple integral to find the expected value (note it’s coming in handy here that I’ve normalized the interval from 0 to 1 so I don’t have to divide by the width of the interval):

\int_0^1 x \, dx = (x^2/2) |_0^1 = 1/2,

and we recover what we already know. In the next case, we need to integrate over two variables (same comment here, don’t have to divide by area of the 1×1 square base):

\int_0^1 \int_0^1 max(x_1, x_2) \, dx_1 dx_2.

If you think about it, though, x_1 and x_2 play symmetric parts in this matter, so you can assume without loss of generality that x_1 is bigger, as long as we only let x_2 range between 0 and x_1, and then multiply the end result by 2:

 = 2 \int_0^1 \int_0^{x_1} x_1 \, dx_2 dx_1.

But that simplifies to:

= 2 \int_0^1 x_1^2 \, dx = 2 (x_1^3/3) |_0^1 = 2/3.

Let’s do the general case. It’s an n-fold integral over the maximum of all n darts, and again without loss of generality x_1 is the maximum as long as we remember to multiply the whole thing by n. We end up computing:

= n \int_0^1 \int_0^{x_1} \int_0^{x_1} \cdots \int_0^{x_1} x_1 \, dx_n \cdots dx_3 dx_2 dx_1.

But this collapses to:

n \int_0^1 x_1^n \, dx_1 = n (x_1^{n+1}/(n+1)) |_0^1 = n/(n+1).

To finish the original question, take the maximum value in your collection of draws and multiply it by the plumping factor (n+1)/n to get a best estimate of the parameter d.

Categories: math, statistics

Aunt Pythia’s advice

Hello and good morning, dear Aunt Pythia readers. Aunt Pythia is feeling bright-eyed and bushy tailed this morning and can’t wait to dig into the juicy questions and ethical dilemmas she is sure are awaiting her in her beloved and glamorous google spreadsheet.

Aunt Pythia has taken a few minutes today already to count her blessings, and high among them are the chance to interact with you kind people through this blog and particularly this Saturday morning column. Thank you all! Please feel generous for being here, you are appreciated!

And as always please:

ask a question at the bottom of the page!!

By the way, if you want more, go here for past advice columns and here for an explanation of the name Pythia.

——

Dear Auntie,

1. Do body parts that are not for public purview (read “genitals”) show greater physical diversity because they have not been acted upon by marketing and evolution?

2. Does the use of wigs by Orthodox Jewish women lead to baldness, as they don’t have to demonstrate good hair and so theirs is kind of …meh? I have two data points; albeit from the same family.

No disrespect to genitals or Orthodox Jews intended.

Sexual Evolution Xpounded

Dear SEX,

First of all, I’m in a new phase where I am really into using the phrase “particulars”. So I’m really glad you asked this question, since it gives me tremendous opportunity in that regard. I’m no expert in particulars, of course, but I’ll talk about particulars anyway, since you asked.

First, let’s think about whether particulars have escaped evolution untouched: for sure not, but it has presumably been more about procreation probabilities and not dying in childbirth than about beauty per se.

Here’s my argument along those lines, specifically when it comes to women’s particulars and the issue of marketing standardization: my impression is that no man has ever gotten that close to sex and then said, “whoa, your vagina has a slightly peculiar shape and/or positioning relative to your clitoris. Maybe we should not procreate after all!!”

I mean, it may have happened but I haven’t heard about it. Tell me if you have evidence to the contrary.

That’s not to say there’s no beauty there in something that is varied and idiosyncratic, to be sure. And things might be slightly different for men in this regard, since let’s face it, men’s particular particulars are more obvious pieces of apparatus and therefore more easily scrutinized.

As for baldness and wigs: no freaking clue, but I do have something to say about wigs in general, which is that there are a TON of wigs out there if you know how to spot them. In fact if you go onto the NY subway and take a look around, you’ll see that a good portion of rush hour commuting women are wearing wigs, and I don’t think it’s because New Yorkers are more likely to be bald. It’s just a big thing, particularly for Jewish and for African-American women. Bigger than you might think, and essentially never discussed, which always piques my interest.

Hope that helps,

Aunt Pythia

——

Dear Aunt Pythia,

Here is my Career dilemma. I am what you would consider an “Engineer” in the Analytics industry. I have had a good career in building Analytics Products aimed at analyzing data and finally implementing some ‘algorithms’ after enough study to take the human out of the process (one example is a routing algorithm that considers 10-15 price, quality and other factors).

Lately, I feel less excited about ‘normal’ analytics projects (because initial study is smaller and rest is all about creating pipelines to setup algorithms to work autonomously). Instead the new ‘Data Science’ field seems more interesting, fun and challenging. I had a good math background, but that was a decade ago…ideally, I would be part of a Data Science team and learn in the process, but as soon as I say I am not a math major, nobody takes me seriously.

I am relearning some of my math skills but I can hardly refresh years of algebra, calculus and operations research skills that easily.

I am NOT dreaming of being the math nerd in a Data Science team but I cannot figure out if Data Science teams need people like me, who have years of Decision Science + Data Processing background. Yes building 1 model does not make someone a Data Scientist, on the other hand writing a couple of python mapreduce jobs or a few SQL queries does not make someone a Data Architect either.

I apply for jobs, get no response and get frustrated and stop looking…and then repeat that after few weeks. I am almost at the point of giving up and going back to Analytics + Data Architecture field. Do you think Data Science teams would welcome people who have more traditional Data background?

Confused about Career Options

Dear Confused,

A couple of things. First, my new book with Rachel Schutt is coming out in a week and a half and is ideal for someone like you. Get it, read it, and build a few of the things discussed in it with publicly available data so you have a portfolio of projects.

Next, it’s hard to get hired as a data science person with your background, even with projects under your belt. So try to get a job as an engineer in a data-driven business, and worm your way into the data group. Tell them that is your intention, and that you are willing to prove your mad data skillz. I’d be surprised if someone didn’t pick you up under such conditions.

Good luck!

Aunt Pythia

——

Aunt Pythia,

I have, belatedly, come in contact with the “Youth Sports Industrial Complex” and the insane, existential battle parents wage for their children’s future through travel soccer and the like.

Literally, people seem to think that their kid will get into Harvard on the strength of their parents’ SERIOUS COMMITMENT to youth sports. Winning at all costs seems to be the one and only goal.

The thing is, my kid could be very competitive at this particular sport – if we were to join one of the competitive clubs and hand our souls over to the dark side. I don’t expect to get a scholarship or something, frankly that’s nuts.

Am I a looney for suggesting to my kids that playing well and having fun – and exhibiting excellent sportsmanship – are the goal if they never seem to beat the hyper-aggressive kids? Am I setting them up for a life as outcasts if we reject this ethos? As a mom, what do you think?

Maximize, Or Maintain?

Dear MOM,

What a fucking great question, thank you for asking it.

As a mom, I am definitely on the radical fringe when it comes to this. Specifically, I have taken my kids out of all grown-up organized activities, mostly at their request (but secretly because I think that shit is nuts). That means no sports, no nothing (they do student-organized stuff sometimes). They are expected to exercise but they get to choose how, and they are expected to do interesting stuff – so not play video games after school – but it’s up to them what to do.

Because for my family, it’s not just offensive to think that “winning is the goal” at all times. It’s even offensive to think that adults should define the goal for growing children in their free time.

[Rant to those people: What’s wrong with you people, isn’t it enough that these kids will probably have to live by other people’s rules when they’re working in jobs later? Why do we have to start that crap so soon?]

This stance makes it easy for me to never have to deal with the question you’re currently dealing with, namely having a kid who likes a team sport and is good at it, and how to think about the rest of the lunatics. My kids, to be clear, hate team sports and suck at them, like good nerds.

My advice is to be consistently sane and give them absolute agency on these decisions. Be utterly honest about what you think of the attitude displayed by the other kids, and ask your kid what they want considering the dire conditions. They might want to do it anyway, and they will definitely benefit from having a sane person to look to when emotions and goals get distorted and out of hand. Most importantly, if they decide to quit the team, let them.

Good luck!

Aunt Pythia

——

Dear Aunt Pythia,

With an electrical engineering background but no research experience, I want to study mathematics. I am quite certain that I want to be in research. Without an undergraduate background in mathematics (though I’ve take few applied mathematics courses), what’s the best way to move forward? I don’t know what exactly would end up being the outcome – I would like it to be either in cognitive sciences or mathematical physics/geology. It’s rather broad, because I can’t tell unless I know more. Should I take a year out and preparing for something, get another bachelors (which I dread, I don’t want to do the 4 year university) or …?

Slowkill

Dear Slowkill,

Pardon me for saying it, but WTF?? How would you know you want to do math research if you don’t have experience in math? That makes no sense, because it means you want to devote yourself to something you don’t understand at all and have no experience in. It really has nothing to do with math at all, unless you are assuming that stories you heard about living the math life are true. An I’m here to tell you, they’re not. If Good Will Hunting were to be believed, all math professors have personal secretaries scurrying around getting them coffee – NOT!!

My advice is to think about what it is you really want to do – or to escape. I’m sensing more escapism than desire in your words. Go see Gravity, it’s supposed to be awesome and totally escapist.

Good luck,

Auntie P

——

Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!

Categories: Aunt Pythia

Cumulative covariance plots

One thing I do a lot when I work with data is figure out how to visualize my signals, especially with respect to time.

Lots of things change over time – relationships between variables, for example – and it’s often crucial to get deeply acquainted with how exactly that works with your in-sample data.

Say I am trying to predict “y”: so for a data point at time t, we’ll say we try to predict y(t). I’ll take an “x”, a variable that is expected to predict “y”, and I’ll demean both series x and y, hopefully in a causal way, and I will rename them x’ and y’, and then, making sure I’ve ordered everything with respect to time, I’ll plot the cumulative sum of the product x'(t) * y'(t).

In the case that both x'(t) and y'(t) have the both sign – so they’re both bigger than average or they’re both smaller than average, this product is positive, and otherwise it’s negative. So if you plot the cumulative sum, you get an upwards trend if things are positively correlated and downwards trend if things are negatively correlated. If you think about it, you are computing the numerator of the correlation function, so it is indeed just an unscaled version of total correlation.

Plus, since you ordered everything by time first, you can see how the relationship between these variables evolved over time.

Also, in the case that you are working with financial models, you can make a simplifying assumption that both x and y are pretty well demeaned already (especially at short time scales) and this gives you the cumulative PnL plot of your model. In other words, it tells you how much money your model is making.

So I was doing this exercise of plotting the cumulative covariance with some data the other day, and I got a weird picture. It kind of looked like a “U” plot: it went down dramatically at the beginning, then was pretty flat but trending up, then it went straight up at the end. It ended up not quite as high as it started, which is to say that in terms of straight-up overall correlation, I was calculating something negative but not very large.

But what could account for that U-shape? After some time I realized that the data had been extracted from the database in such a way that, after ordering my data by date, it was hugely biased in the beginning and at the end, in different directions, and that this was unavoidable, and the picture helped me determine exactly which data to exclude from my set.

After getting rid of the biased data at the beginning and the end, I concluded that I had a positive correlation here, even though if I’d trusted the overall “dirty” correlation I would have thought it was negative.

This is good information, and confirmed my belief that it’s always better to visualize data over time than it is to believe one summary statistic like correlation.

Categories: data science, modeling

Data Skeptic post

I wrote a blog post for O’Reilly’s website to accompany my essay, On Being a Data Skeptic. Here’s an excerpt:

I left finance pretty disgusted with the whole thing, and because I needed to make money and because I’m a nerd, I pretty quickly realized I could rebrand myself a “data scientist” and get a pretty cool job, and that’s what I did. Once I started working in the field, though, I was kind of shocked by how positive everyone was about the “big data revolution” and the “power of data science.”

Not to underestimate the power of data––it’s clearly powerful! And big data has the potential to really revolutionize the way we live our lives for the better––or sometimes not. It really depends.

From my perspective, this was, in tenor if not in the details, the same stuff we’d been doing in finance for a couple of decades and that fields like advertising were slow to pick up on. And, also from my perspective, people needed to be way more careful and skeptical of their powers than they currently seem to be. Because whereas in finance we need to worry about models manipulating the market, in data science we need to worry about models manipulating people, which is in fact scarier. Modelers, if anything, have a bigger responsibility now than ever before.

Categories: data science, finance, modeling

Make Rich People Read Chekhov

There have been two articles in the New York Times very recently concerning empathy.

First, there was this Opinionator piece about how rich people have less empathy. Second, there was this Well blogpost which reports on a study that implies you can improve your empathy skills, at least in the short term, by reading literary fiction like Chekhov.

Empathy means understanding and sharing the feelings of other people. So what do these two columns actually refer to?

For rich people, it’s mostly about attention rather than empathy. The idea is that researchers study how people pay attention to people (answer: they pay attention to high status people more), and found that rich people don’t do it much at all. They claim attention is a prerequisite for empathy, and that there’s a negative feedback loop going on with the rich, a lack of empathy, and increasing inequality.

As for the literary fiction column, it cites a study in which what they measure is something a little bit different, namely the “theory of mind” of a person after reading Checkhov versus something else. The concept of the theory of mind is that we have internal models of other people’s mindset, and actually they claim to be able to separate this into two parts, cognitive and affective. So if I have a realistic impression of what you’re feeling, we say that my affective theory of mind is good, whereas if I have a realistic impression of how you’re planning to act, that’s called nailing a cognitive theory of mind.

A few comments:

  1. I’m not so sure about the attention-leads-to-empathy assumption. Sometimes I am on a subway and I start sensing people’s emotions around me whether I like it or not, even when I’m trying not to pay attention to them. For me empathy is like smell, and some people are incredibly smelly, especially on the subway.
  2. On the other hand it resonates with me that rich people have less empathy. Certainly this seemed to be the case when I worked at D.E. Shaw, although it might have been a self-selection thing: maybe people who are not empathetic are attracted to working at a hedge fund.
  3. In any case, there’s a tremendous disconnect between regular people and the attitude of finance people, along the lines of “I’m smarter than those people so I deserve to be rich”, and I ascribe much of this disconnect to a lack of empathy.
  4. In both of these columns, though, the question was how well do you pay attention to, and read, people in the same room with you. Unfortunately that’s not a good enough question, at least if you’re worried about that negative feedback loop, if you think about the real world. In the real world, even in New York, rich people don’t spend lots of time in the same room with anyone except other rich people. So it’s a bigger problem to address than what you might at first think.
  5. Having said that, I don’t claim that if everyone just had more empathy all our problems would be solved. Even so I do think it might help. Certainly my sensitivity to other people’s emotions deeply affects me and my actions and goals, but of course that’s too little evidence to go by.
  6. In any case it’s an interesting thought experiment to imagine a world of increased empathy. I like that it’s being considered as a basic attribute of interest, and that it seems tweakable.
  7. Conclusion: before talking to someone I perceive as unempathetic, I will bust out a Checkov short story (this one) and demand they read it on the spot. That should really help.
Categories: finance, musing, news

Guest post: Rage against the algorithms

This is a guest post by , a Tow Fellow at the Columbia University Graduate School of Journalism where he is researching the use of data and algorithms in the news. You can find out more about his research and other projects on his website or by following him on Twitter. Crossposted from engenhonetwork with permission from the author.

shutterstock_125392883_650

How can we know the biases of a piece of software? By reverse engineering it, of course.

When was the last time you read an online review about a local business or service on a platform like Yelp? Of course you want to make sure the local plumber you hire is honest, or that even if the date is dud, at least the restaurant isn’t lousy. A recent survey found that 76 percent of consumers check online reviews before buying, so a lot can hinge on a good or bad review. Such sites have become so important to local businesses that it’s not uncommon for scheming owners to hire shills to boost themselves or put down their rivals.

To protect users from getting duped by fake reviews Yelp employs an algorithmic review reviewer which constantly scans reviews and relegates suspicious ones to a “filtered reviews” page, effectively de-emphasizing them without deleting them entirely. But of course that algorithm is not perfect, and it sometimes de-emphasizes legitimate reviews and leaves actual fakes intact—oops. Some businesses have complained, alleging that the filter can incorrectly remove all of their most positive reviews, leaving them with a lowly one- or two-stars average.

This is just one example of how algorithms are becoming ever more important in society, for everything from search engine personalizationdiscriminationdefamation, and censorship online, to how teachers are evaluated, how markets work, how political campaigns are run, and even how something like immigration is policed. Algorithms, driven by vast troves of data, are the new power brokers in society, both in the corporate world as well as in government.

They have biases like the rest of us. And they make mistakes. But they’re opaque, hiding their secrets behind layers of complexity. How can we deal with the power that algorithms may exert on us? How can we better understand where they might be wronging us?

Transparency is the vogue response to this problem right now. The big “open data” transparency-in-government push that started in 2009 was largely the result of an executive memo from President Obama. And of course corporations are on board too; Google publishes a biannual transparency report showing how often they remove or disclose information to governments. Transparency is an effective tool for inculcating public trust and is even the way journalists are now trained to deal with the hole where mighty Objectivity once stood.

But transparency knows some bounds. For example, though the Freedom of Information Act facilitates the public’s right to relevant government data, it has no legal teeth for compelling the government to disclose how that data was algorithmically generated or used in publicly relevant decisions (extensions worth considering).

Moreover, corporations have self-imposed limits on how transparent they want to be, since exposing too many details of their proprietary systems may undermine a competitive advantage (trade secrets), or leave the system open to gaming and manipulation. Furthermore, whereas transparency of data can be achieved simply by publishing a spreadsheet or database, transparency of an algorithm can be much more complex, resulting in additional labor costs both in creation as well as consumption of that information—a cognitive overload that keeps all but the most determined at bay. Methods for usable transparency need to be developed so that the relevant aspects of an algorithm can be presented in an understandable way.

Given the challenges to employing transparency as a check on algorithmic power, a new and complementary alternative is emerging. I call it algorithmic accountability reporting. At its core it’s really about reverse engineering—articulating the specifications of a system through a rigorous examination drawing on domain knowledge, observation, and deduction to unearth a model of how that system works.

As interest grows in understanding the broader impacts of algorithms, this kind of accountability reporting is already happening in some newsrooms, as well as in academic circles. At the Wall Street Journal a team of reporters probed e-commerce platforms to identify instances of potential price discrimination in dynamic and personalized online pricing. By polling different websites they were able to spot several, such as Staples.com, that were adjusting prices dynamically based on the location of the person visiting the site. At the Daily Beast, reporter Michael Keller dove into the iPhone spelling correction feature to help surface patterns of censorship and see which words, like “abortion,” the phone wouldn’t correct if they were misspelled. In my own investigation for Slate, I traced the contours of the editorial criteria embedded in search engine autocomplete algorithms. By collecting hundreds of autocompletions for queries relating to sex and violence I was able to ascertain which terms Google and Bing were blocking or censoring, uncovering mistakes in how these algorithms apply their editorial criteria.

All of these stories share a more or less common method. Algorithms are essentially black boxes, exposing an input and output without betraying any of their inner organs. You can’t see what’s going on inside directly, but if you vary the inputs in enough different ways and pay close attention to the outputs, you can start piecing together some likeness for how the algorithm transforms each input into an output. The black box starts to divulge some secrets.

Algorithmic accountability is also gaining traction in academia. At Harvard, Latanya Sweeney has looked at how online advertisements can be biased by the racial association of names used as queries. When you search for “black names” as opposed to “white names” ads using the word “arrest” appeared more often for online background check service Instant Checkmate. She thinks the disparity in the use of “arrest” suggests a discriminatory connection between race and crime. Her method, as with all of the other examples above, does point to a weakness though: Is the discrimination caused by Google, by Instant Checkmate, or simply by pre-existing societal biases? We don’t know, and correlation does not equal intention. As much as algorithmic accountability can help us diagnose the existence of a problem, we have to go deeper and do more journalistic-style reporting to understand the motivations or intentions behind an algorithm. We still need to answer the question of why.

And this is why it’s absolutely essential to have computational journalists not just engaging in the reverse engineering of algorithms, but also reporting and digging deeper into the motives and design intentions behind algorithms. Sure, it can be hard to convince companies running such algorithms to open up in detail about how their algorithms work, but interviews can still uncover details about larger goals and objectives built into an algorithm, better contextualizing a reverse-engineering analysis. Transparency is still important here too, as it adds to the information that can be used to characterize the technical system.

Despite the fact that forward thinkers like Larry Lessig have been writing for some time about how code is a lever on behavior, we’re still in the early days of developing methods for holding that code and its influence accountable. “There’s no conventional or obvious approach to it. It’s a lot of testing or trial and error, and it’s hard to teach in any uniform way,” noted Jeremy Singer-Vine, a reporter and programmer who worked on the WSJ price discrimination story. It will always be a messy business with lots of room for creativity, but given the growing power that algorithms wield in society it’s vital to continue to develop, codify, and teach more formalized methods of algorithmic accountability. In the absence of new legal measures, it may just provide a novel way to shed light on such systems, particularly in cases where transparency doesn’t or can’t offer much clarity.