### Archive

Archive for the ‘rant’ Category

## What privacy advocates get wrong

There’s a wicked irony when it comes to many privacy advocates.

They are often narrowly focused on the their own individual privacy issues, but when it comes down to it they are typically super educated well-off nerds with few revolutionary thoughts. In other words, the very people obsessing over their privacy are people who are not particularly vulnerable to the predatory attacks of either the NSA or the private companies that make use of private data.

Let me put it this way. If I’m a data scientist working at a predatory credit card firm, seeking to build a segmentation model to target the most likely highly profitable customers – those that ring up balances and pay off minimums every month, sometimes paying late to accrue extra fees – then if I am profiling a user and notice an ad blocker or some other signal of privacy concerns, chances are that becomes a wealth indicator and I leave them alone. The mere presence of privacy concerns signals that this person isn’t worth pursuing with my manipulative scheme.

If you don’t believe me, take a look at a recent Slate article written by  and entitled Take My Data Please: How I learned to stop worrying and love a less private internet.

In it he describes how he used to be privacy obsessed, for no better reason than that he like to stick up a middle finger to those who would collect his data. I think that article should have been called something like, Well-educated white guy was a privacy freak until he realized he didn’t have to be because he’s a well-educated white guy.

He concludes that he really likes how well customized things are to his particular personality, and that shucks, we should all just appreciate the web and stop fretting.

But here’s the thing, the problem isn’t that companies are using his information to screw Cyrus Nemati. The problem is that the most vulnerable people – the very people that should be concerned with privacy but aren’t – are the ones getting tracked, mined, and screwed.

In other words, it’s silly for certain people to be scrupulously careful about their private data if they are the types of people who get great credit card offers and have a stable well-paid job and are generally healthy. I include myself in this group. I do not prevent myself from being tracked, because I’m not at serious risk.

And I’m not saying nothing can go wrong for those people, including me. Things can, especially if they suddenly lose their jobs or they have kids with health problems or something else happens which puts them into a special category. But generally speaking those people with enough time on their hands and education to worry about these things are not the most vulnerable people.

I hereby challenge Cyrus Nemati to seriously consider who should be concerned about their data being collected, and how we as a society are going to address their concerns. Recent legislation in California is a good start for kids, and I’m glad to see the New York Times editors asking for more.

Categories: data science, rant

## Intentionally misleading data from Scott Hodge of the Tax Foundation

Scott Hodge just came out with a column in the Wall Street Journal arguing that reducing income inequality is way too hard to consider. The title of his piece is Scott Hodge: Here’s What ‘Income Equality’ Would Look Like, and his basic argument is as follows.

First of all, the middle quintile already gets too much from the government as it stands. Second of all, we’d have to raise taxes to 74% for the top quintile to even stuff out. Clearly impossible, QED.

As to the first point, his argument, and his supporting data, is intentionally misleading, as I will explain below. As to his second point, he fails to mention that the top tax bracket has historically been much higher than 74%, even as recently as 1969, and the world didn’t end.

Hodge argues with data he took from a report from the CBO called The Distribution of Federal Spending and Taxes in 2006This report distinguishes between transfers and spending. Here’s a chart to explain what that looks, before taxes are considered and by quintile, for non-elderly households (page 5 of the report):

The stuff on the left corresponds to stuff like food stamps. The stuff in the middle is stuff like Medicaid. The stuff on the right is stuff like wars.

Here are a few things to take from the above:

1. There’s way more general spending going on than transfers.
2. Transfers are very skewed towards the lowest quintile, as would be expected.
3. If you look carefully at the right-most graph, the light green version gives you a way of visualizing of how much more money the top quintile has versus the rest.

Now let’s break this down a bit further to include taxes. This is a key chart that Hodge referred to from this report (page 6 of the report):

OK, so note that in the middle chart, for the middle quintile, people pay more in taxes than they receive in transfers. On the right chart, for the middle quintile, which includes all spending, the middle quintile is about even, depending on how you measure it.

Now let’s go to what Hodge says in his column (emphasis mine):

Looking at prerecession data for non-elderly households in 2006 in “The Distribution of Federal Spending and Taxes in 2006,” the CBO found that those in the bottom fifth, or quintile, of the income scale received $9.62 in federal spending for every$1 they paid in federal taxes of all kinds. This isn’t surprising, since people with low incomes pay little in taxes but receive a lot of transfers.

Nor is it surprising that households in the top fifth received 17 cents in federal spending for every $1 they paid in all federal taxes. High-income households hand over a disproportionate amount in taxes relative to what they get back in spending. What is surprising is that the middle quintile—the middle class—also got more back from government than they paid in taxes. These households received$1.19 in government spending for every $1 they paid in federal taxes. In the first paragraph Hodge intentionally conflates the concept of “transfers” and “spending”. He continues to do this for the next two paragraphs, and in the last sentence, it is easy to imagine a middle-quintile family paying$100 in taxes and receiving $119 in food stamps. This is of course not true at all. What’s nuts about this is that it’s mathematically equivalent to complaining that half the population is below median intelligence. Duh. Since we have a skewed distribution of incomes, and therefore a skewed distribution of tax receipts as well as transfers, then in the context of a completely balanced budget, we would expect the middle quintile – which has a below-mean average income – to pay slightly less than the government spends on them. It’s a mathematical fact as long as our federal tax system isn’t regressive, which it’s not. In other words, this guy is just framing stuff in a “middle class is lazy and selfish, what could rich people possibly be expected do about that?” kind of way. Who is this guy anyway? Turns out that Hodge is the President of the Tax Foundation, which touts itself as “nonpartisan” but which has gotten funding from Big Oil and the Koch brothers. I guess it’s fair to say he has an agenda. Categories: modeling, news, rant ## A visit to Fair Foods in winter It’s been a few days since I last posted. The reason was my trip to WPI for my talk (slides from my talk available here), and then on to Boston, where I stayed with my good friends over at Fair Foods in Dorchester, near Fields Corner. I mentioned Fair Foods before, for example in this post from two and a half years ago. I worked with the director Nancy in high school back in 1988 and 1989, when me and my sometimes guest blogger Becky would go work a couple of days a week. Here’s a picture I stole from the site with Becky in overalls: We’d drive at 6am with Nancy and her beat up old box truck to the Chelsea Produce Market and ask for donations of pallets of vegetables that were too old to be sold to supermarkets but still fresh enough to be eaten that same day. We’d also grab similarly oldish bread from an Arnold’s Bread bakery in Cambridge and then we’d distribute the food at “dollar bag” sites, raking in less than the amount of money we’d spent on gas and insurance for the truck. The program is still going, scraping by with sometime grants and contributions, many from ex-volunteers like me (feel free to send a contribution yourself – a check made out to “Fair Foods” and mailed to PO Box 220168, Dorchester, MA 02122 would be very welcome). And the hard working people there have my undying love and admiration for their incredible commitment and work ethic. To give you some idea, they live in an old drafty Victorian and heat their house with woodstoves. All I can say about this past weekend is thank god for union suits and wool socks. But here’s the thing, you get a pretty ground-level view of hardship and poverty working in a program like that, especially when you’ve done it for more than 25 years, and especially when you see increasingly long lines of people willing to wait for vegetables and bread in bitterly cold weather. Business is booming this winter. Many of the customers of Fair Foods are old friends by now, they’ve been coming weekly to various sites for many years to feed their children and their grandchildren. Many of them are immigrants with very little money, and the$2 it now costs for a big bag of vegetables and fruit is a great deal.

I guess my point is this. I worked for Fair Foods back in the crack epidemic of the late 1980′s and the early 1990′s, which was a hellish time for Dorchester and of course other parts of the country. But nowadays, when the crime rate is so much lower, we’re seeing another kind of hell. It’s a lot quieter.

It’s incredibly sad to see how much more demand there is for salvaged food now than there was 25 years ago, and how many of those old beautiful Victorian family houses are abandoned or at risk of foreclosure, and how few cars there are on the street compared to then. And most especially, how many of the kids I used to play with on the street are now in prison.

Before I left Saturday I made a delicious soup out of the vegetables that Jason and Liz had collected from the Chelsea Market, and I made banana bread that the banana guy had given them. A little boy from the neighborhood who came to talk to Nancy about his report card ate about half of that banana bread in one sitting. A small attempt to try to feed the people who work so hard to feed other people.

It makes me wonder what kind of country we’ve created where people are so hungry, we’re reducing food stamps, and Jamie Dimon is getting an extra big bonus. Where is the justice in that?

Categories: Becky Jaffe, rant

## If it’s hocus pocus then it’s not math

A few days ago there was a kerfuffle over this “numberphile” video, which was blogged about in Slate here by Phil Plait in his “Bad Astronomy” column, with a followup post here with an apology and a great quote from my friend Jordan Ellenberg.

The original video is hideous and should never have gotten attention in the first place. I say that not because the subject couldn’t have been done well – it could have, for sure – but because it was done so poorly that it ends up being destructive to the public’s most basic understanding of math and in particular positive versus negative numbers. My least favorite line from the crappy video:

I was trying to come up with an intuitive reason for this I and I just couldn’t. You have to do the mathematical hocus pocus to see it.

What??

Anything that is hocus pocus isn’t actually math. And people who don’t understand that shouldn’t be making math videos for public consumption, especially ones that have MSRI’s logo on them and get written up in Slate. Yuck!

I’m not going to just vent about the cultural context, though, I’m going to mention what the actual mathematical object of study was in this video. Namely, it’s an argument that “prove” that we have the following identity:

$1 + 2 + 3 + 4 + \dots = - \frac{1}{12}.$

Wait, how can that be? Isn’t the left hand side positive and the right hand side negative?!

This mathematical argument is familiar to me – in fact it is very much along the lines of stuff we sometimes cover at the math summer program HCSSiM I teach at sometimes (see my notes from 2012 here). But in the case of HCSSiM, we do it quite differently. Specifically, we use it as a demonstration of flawed mathematical thinking. Then we take note and make sure we’re more careful in the future.

If you watch the video, you will see the flaw almost immediately. Namely, it starts with the question of what the value is of the infinite sum

$1 -1 + 1 -1 + \dots.$

But here’s the thing, that doesn’t actually have a value. That is, it doesn’t have a value until you assign it a value, which you can do but then you might want to absolutely positively must explain how you’ve done so. Instead of that explanation, the guy in the video just acts like it’s obvious and uses that “fact,” along with a bunch of super careless moving around of terms in infinite sums, to infer the above outrageous identity.

To be clear, sometimes infinite sums do have pretty intuitive and reasonable values (even though you should be careful to acknowledge that they too are assigned rather than “true”). For example, any geometric series where each successive term gets smaller has an actual “converging sum”. The most canonical example of this is the following:

$1/2 + 1/4 + 1/8 + \dots + 1/2^k + \dots = 1.$

What’s nice about this sum is that it is naively plausible. Our intuition from elementary school is corroborated when we think about eating half a cake, then another quarter, and then half of what’s left, and so on, and it makes sense to us that, if we did that forever (or if we did that increasingly quickly) we’d end up eating the whole cake.

This concept has a name, and it’s convergence, and it jibes with our sense of what would happen “if we kept doing stuff forever (again at possibly increasing speed).” The amounts we’ve measured on the way to forever are called partial sums, and we make sure they converge to the answer. In the example above the partial sums are $1/2, 3/4, 7/8,$ and so on, and they definitely converge to 1.

There’s a mathematical way of defining convergence of series like this that the geometric series follows but that the $1-1+1-1 \dots$ series does not. Namely, you guess the answer, and to make sure you’ve got the right one, you make sure that all of the partial sums are very very close to that answer if you go far enough, for any definition of “very very close.”

So if you want it to get within 0.00001, there’s a number N so that, after the Nth partial sum, all partial sums are within 0.00001 of the answer. And so on.

Notice that if you take the partial sums of the $1-1+1-1 \dots$ series you get the sequence $1, 0, 1, 0,1,0,1, \dots,$ which doesn’t get closer and closer to anything. That’s another way of saying that there is no naively plausible value for this infinite sum.

As for the first infinite sum we came across, the $1 +2 + 3 + 4 +\dots,$ that does have a naively plausible value, which we call “infinity.” Totally cool and satisfying to your intuition that you worked so hard to achieve in high school.

But here’s the thing. Mathematicians are pretty clever, so they haven’t stopped there, and they’ve assigned a value to the infinite sum $1-1+1-1 \dots$ in spite of these pesky intuition issues, namely $\frac{1}{2}$, and in a weird mathematical universe of their construction, which is wildly useful in some contexts, that value is internally consistent with other crazy-ass things. One of those other crazy-ass things is the original identity $1 + 2 + 3 + 4 + \dots = - \frac{1}{12}.$

[Note: what would be really cool is if a mathematician made a video explaining the crazy-ass universe and why it's useful and in what contexts. This might be hard and it's not my expertise but I for one would love to watch that video.]

That doesn’t mean the identity is “true” in any intuitively plausible sense of the word. It means that mathematicians are scrappy.

Now here’s my last point, and it’s the only place I disagree somewhat (I think) with Jordan in his tweets. Namely, I really do think that the intuitive definition is qualitatively different from what I’ve termed the “crazy-ass” definition. Maybe not in a context where you’re talking to other mathematicians, and everyone is sufficiently sophisticated to know what’s going on, but definitely in the context of explaining math to the public where you can rely on number sense and (hopefully!) a strong intuition that positive numbers can’t suddenly become negative numbers.

Specifically, if you can’t make any sense of it, intuitive or otherwise, and if you have to ascribe it to “mathematical hocus pocus,” then you’re definitely doing something wrong. Please stop.

Categories: math, math education, rant

## Parents fighting back against sharing children’s data with InBloom

There is a movement afoot in New York (and other places) to allow private companies to house and mine tons of information about children and how they learn. It’s being touted as a great way to tailor online learning tools to kids, but it also raises all sorts of potential creepy modeling problems, and one very bad sign is how secretive everything is in terms of privacy issues. Specifically, it’s all being done through school systems and without consulting parents.

In New York it’s being done through InBloom, which I already mentioned here when I talked about big data and surveillance. In that post I related an EducationNewYork report which quoted an official from InBloom as saying that the company “cannot guarantee the security of the information stored … or that the information will not be intercepted when it is being transmitted.”

The issue is super important and timely, and parents have been left out of the loop, with no opt-out option, and are actively fighting back, for example with this petition from MoveOn (h/t George Peacock). And although the InBloomers claim that no data about their kids will ever be sold, that doesn’t mean it won’t be used by third parties for various mining purposes and possibly marketing – say for test prep tools. In fact that’s a major feature of InBloom’s computer and data infrastructure, the ability for third parties to plug into the data. Not cool that this is being done on the downlow.

Who’s behind this? InBloom is funded by the Bill & Melinda Gates foundation and the operating system for inBloom is being developed by the Amplify division (formerly Wireless Generation) of Rupert Murdoch’s News Corp. More about the Murdoch connection here.

Wait, who’s paying for this? Besides the Gates and Murdoch, New York has spent $50 million in federal grants to set up the partnership with InBloom. And it’s not only New York that is pushing back, according to this Salon article: InBloom essentially offers off-site digital storage for student data—names, addresses, phone numbers, attendance, test scores, health records—formatted in a way that enables third-party education applications to use it. When inBloom was launched in February, the company announced partnerships with school districts in nine states, and parents were outraged. Fears of a “national database” of student information spread. Critics said that school districts, through inBloom, were giving their children’s confidential data away to companies who sought to profit by proposing a solution to a problem that does not exist. Since then, all but three of those nine states have backed out. Finally, according to this nydailynews article, Bill de Blasio is coming out on the side of protecting children’s privacy as well. That’s a good sign, let’s hope he sticks with it. I’m not against using technology to learn, and in fact I think it’s inevitable and possibly very useful. But first we need to have a really good, public discussion about how this data is being shared, controlled, and protected, and that simply hasn’t happened. I’m glad to see parents are aware of this as a problem. Categories: data science, modeling, news, rant ## What does a really efficient market look like? The raison d’être of hedge funds is to make the markets efficient. Or at least that’s one of the raisons d’être, the others being 1) to get rich and 2) to leave early on Fridays in the summer (resp. winter) to get a jump on traffic to the Hamptons (resp. ski area, possibly in Kashmir). And although having efficient markets sounds like a great thing, it makes sense to ask what that would look like from the perspective of a non-insider. This recent Wall Street Journal article on high-tech snooping does a pretty good job setting the tone here. First, the kind of thing they’re doing: Genscape is at the vanguard of a growing industry that employs sophisticated surveillance and data-crunching technology to supply traders with nonpublic information about topics including oil supplies, electric-power production, retail traffic and crop yields. Next, who they’re doing it for: The techniques, which are perfectly legal, represent the latest advance in the longtime Wall Street practice of searching for every possible trading advantage. But the high cost of much of the new information—Genscape’s oil-supply report costs$90,000 a year—means that some forms of trading are becoming even more the province of firms with substantial resources.

Let’s put these two things together from the perspective of the public. The market is getting information from hidden cameras and sensors, and all that information is being fed to “the market” via proprietary hedge funds via channels we will never tap into. The end result is that the prices of commodities are being adjusted to real-world events more and more quickly, but these are events that are not truly known to the real world.

[Aside: I'm going to try to avoid talking about the "true price" of things like gas, because I think that's pretty much a fool's errand. In any case, let me just say that, in addition to the potentially realtime sensor information that goes into a commodity's price, we also have people trading on it because they are adjusting their exposure to some other historically correlated or anti-correlated instrument, or because they've decided to liquidate their books, or because they've decided the Fed has changed its macroeconomic policy, or because Spain needs to deal with its bank problems, or because someone wants to take money out of the market to rent their summer house in the Hamptons. In other words, I'm not ready to argue that we're getting close to the "true price" of gas here. It's just tradable information like any other.]

I am now prepared, as you hopefully are as well, to question what good this all does for people like us, who are not privy to the kind of expensive information required to make these trades. From our perspective, nothing happens, the price fluctuates, and the market is deemed efficient. Is this actually an improvement over the alternative version where something happens, and then the price adjusts? It’s an expensive arms race, taking up vast resources, where things have only become more opaque.

How vast are those resources? Having worked in finance, I know the answer is a shit-ton, if it is profitable in a short-term edgy kind of way. Just as those guys dug a hole through mountains to make the connection between New York to Chicago a few nanoseconds faster, they will go to any length to get the newest info on the market, as long as it is deemed to have a profitable edge in some time frame – i.e. the amount of time it will take a flood of competitors to do the same thing.

Just as there’s a kind of false myth that most of the web is porn, I’d like to perpetuate a new somewhat false myth that most data gathering and mining happens for the benefit of trading. And if that’s false now, let’s talk about it again in 100 years, when the market for celebrities is mature, and you can make money shorting a bad marriage.

Categories: finance, modeling, rant

## Do we really want elite youth to get more elite?

I don’t know if you guys read this recent New York Times editorial entitled Even Gifted Students Can’t Keep Up: In Math and Science, the Best Fend for Themselves.

In it, they claim there’s some kind of crisis going on in this country for smart kids (defined as good test-takers). Mostly their evidence for this is that, among other countries, our super good test takers aren’t as prevalent as in other countries. Turns out we’re in the middle of the pack in terms of super scorers. From the article:

On the 2012 Program for International Student Assessment test, the most recent, 34 of 65 countries and school systems had a higher percentage of 15-year-olds scoring at the advanced levels in mathematics than the United States did.

Why is this a problem? As far as I can see they’ve come up with two reasons.

First, it’s “bad for American competitiveness,” whatever that means. Last time I checked we were still pretty dominant in various ways in terms of technology and science, and there are still plenty of very well-educated young people trying desperately to get visas to enter or stay in this country.

As an aside: it’s a super interesting question to think about how we, as a country, are increasingly ignorant about how our technology works, because so much technical knowledge has been off-shored. But that’s not the crisis these guys are addressing.

Second, it’s bad for the smart kids in this country, because “when the brightest students are not challenged academically, they lose steam and check out.”

I’ll pause in my summary of their article to make the following point. If that’s true, if bright kids who aren’t academically challenged at school start checking out more and more, then it makes just as much sense to me to see if there’s something they can check out towards.

In other words, what else is there for bright teenagers to do besides school? I’ll speak as a former bright teenager. When I lost interest in school, I got a lot of odd jobs in town cleaning houses and raking lawns, and then I used the money to buy lots of softcover books from a local bookstore. I don’t know why I didn’t just take them out of the library, where I also worked. It just didn’t seem as cool as owning my very own Brother Karamazov.

I learned a lot with my odd jobs in high school, which included being a part-time secretary, a barista, a math tutor, and working on a truck at the New England Produce Center. In fact I learned way more about how the world worked than I would have if I’d followed the advice of this editorial, which was to take lots more AP classes and then enter college when I was 14.

My theory is that, instead of obsessing over math scores in standardized tests, we concentrate on allowing our children to enrich their lives with adventures and experiences that they come up with and that are reasonably safe. So let’s start by encouraging widespread internships for younger kids, and not just minimum wage jobs at fast food joints. And not just for super test-takers either. Enrichment happens when kids learn about stuff that’s outside their usual rhythm and when there are no adults scripting their activities and telling them what to do or how many laps to swim.

Notice my emphasis on letting kids choose stuff. What drives me nuts just as much as the idea of further separating and isolating and venerating great test-takers, which as far as I’m concerned is the opposite way you should treat future successful people, is the idea that there should be such a well-defined funnel for children at all.

Yes, kids should all go to school and learn basic things. But the idea that, just because someone’s good at tests they should be treated as if they’re already running the Fed only increases the weird worshippy aspect of how our culture treats math nerds.

Plus, it’s a bizarre time to come up with this idea, considering how many online and live resources there are for nerd kids now compared to when I was a kid. If I’m a nerd teenager now, I can find plenty of ways to share nerdy questions and learn nerdy things online if I decide not to work in a coffee shop.

Finally, let me just take one last swipe at this idea from the perspective of “it’s meritocratic therefore it’s ok”. It’s just plain untrue that test-taking actually exposes talent. It’s well established that you can get better at these tests through practice, and that richer kids practice more. So the idea that we’re going to establish a level playing field and find minority kids to elevate this way is rubbish. If we do end up focusing more on the high end of test-takers, it will be completely dominated by the usual suspects.

In other words, this is a plan to make elite youth even more elite. And I don’t know about you, but my feeling is that’s not going to help our country overall.

Categories: news, rant

## International trade agreements and big money bullying #OWS

The New York Times just put out an amazing and outrageous story, entitled Tobacco Industry Tactics Limit Poorer Nations’ Smoking Laws and written by Sabrina Tavernise.

In it she describes the bullying tactics of tobacco companies to small countries over their internal health regulations aiming to protect their citizenry from cancer. From the article:

In Africa, at least four countries — Namibia, Gabon, Togo and Uganda — have received warnings from the tobacco industry that their laws run afoul of international treaties, said Patricia Lambert, director of the international legal consortium at the Campaign for Tobacco Free Kids.

“They’re trying to intimidate everybody,” said Jonathan Liberman, director of the McCabe Center for Law and Cancer in Australia, which gives legal support to countries that have been challenged by tobacco companies. In Namibia, the tobacco industry has said that requiring large warning labels on cigarette packages violates its intellectual property rights and could fuel counterfeiting.

• This happens because, in order to protect companies from being taken over by foreign nations, and in the  name of free trade, it’s now possible for companies to sue countries directly. That’s what the tobacco industry is doing to small countries without the means to fight back.
• This is exactly the kind of thing that some people like Yves Smith have warned the TPP is going to do with financial regulation and other stuff. So imagine large companies or industries suing the United States or other nations for regulation that would “harm trade” or “violate their intellectual property rights.”
• In fact, it’s not going too far to say that the proliferation of these kinds of treaties are a serious threat to national sovereignty. Yves makes this case here.
• Think about it this way. It’s kind of a supernational Citizen’s United, in that only companies and industries that have enough lawyer power can get their way. And many companies easily have more resources than many countries. Think about Facebook and Google and their lobbying efforts on behalf of data privacy laws in Europe already. Now think how that battle might look when it’s against Namibia.
• Already, from the article, we saw that Uruguay was only able to fight back against Philip Morris because Bloomberg’s foundation helped them with the legal battle. And I’m glad Bloomberg helped, but if you count up all the money that will go to bullying and compare that to all the money available to help out the bullied, you quickly come to a sad conclusion.
• Once again, we have to remember that the TPP is being negotiated in secret, among a bunch of nations many of whom claim to be democratic.
Categories: #OWS, rant

## “People analytics” embeds old cultural problems in new mathematical models

Today I’d like to discuss recent article from the Atlantic entitled “They’re watching you at work” (hat tip Deb Gieringer).

In the article they describe what they call “people analytics,” which refers to the new suite of managerial tools meant to help find and evaluate employees of firms. The first generation of this stuff happened in the 1950′s, and relied on stuff like personality tests. It didn’t seem to work very well and people stopped using it.

But maybe this new generation of big data models can be super useful? Maybe they will give us an awesome way of throwing away people who won’t work out more efficiently and keeping those who will?

Here’s an example from the article. Royal Dutch Shell sources ideas for “business disruption” and wants to know which ideas to look into. There’s an app for that, apparently, written by a Silicon Valley start-up called Knack.

Specifically, Knack had a bunch of the ideamakers play a video game, and they presumably also were given training data on which ideas historically worked out. Knack developed a model and was able to give Royal Dutch Shell a template for which ideas to pursue in the future based on the personality of the ideamakers.

From the perspective of Royal Dutch Shell, this represents huge timesaving. But from my perspective it means that whatever process the dudes at Royal Dutch Shell developed for vetting their ideas has now been effectively set in stone, at least for as long as the algorithm is being used.

I’m not saying they won’t save time, they very well might. I’m saying that, whatever their process used to be, it’s now embedded in an algorithm. So if they gave preference to a certain kind of arrogance, maybe because the people in charge of vetting identified with that, then the algorithm has encoded it.

One consequence is that they might very well pass on really excellent ideas that happened to have come from a modest person – no discussion necessary on what kind of people are being invisible ignored in such a set-up. Another consequence is that they will believe their process is now objective because it’s living inside a mathematical model.

The article compares this to the “blind auditions” for orchestras example, where people are kept behind a curtain so that the listeners don’t give extra consideration to their friends. Famously, the consequence of blind auditions has been way more women in orchestras. But that’s an extremely misleading comparison to the above algorithmic hiring software, and here’s why.

In the blind auditions case, the people measuring the musician’s ability have committed themselves to exactly one clean definition of readiness for being a member of the orchestra, namely the sound of the person playing the instrument. And they accept or deny someone, sight unseen, based solely on that evaluation metric.

Whereas with the idea-vetting process above, the training data consisted of “previous winners” which presumable had to go through a series of meetings and convince everyone in the meeting that their idea had merit, and that they could manage the team to try it out, and all sorts of other things. Their success relied, in other words, on a community’s support of their idea and their ability to command that support.

In other words, imagine that, instead of listening to someone playing trombone behind a curtain, their evaluation metric was to compare a given musician to other musicians that had already played in a similar orchestra and, just to make it super success-based, had made first seat.

That you’d have a very different selection criterion, and a very different algorithm. It would be based on all sorts of personality issues, and community bias and buy-in issues. In particular you’d still have way more men.

The fundamental difference here is one of transparency. In the blind auditions case, everyone agrees beforehand to judge on a single transparent and appealing dimension. In the black box algorithms case, you’re not sure what you’re judging things on, but you can see when a candidate comes along that is somehow “like previous winners.”

One of the most frustrating things about this industry of hiring algorithms is how unlikely it is to actively fail. It will save time for its users, since after all computers can efficiently throw away “people who aren’t like people who have succeeded in your culture or process” once they’ve been told what that means.

The most obvious consequence of using this model, for the companies that use it, is that they’ll get more and more people just like the people they already have. And that’s surprisingly unnoticeable for people in such companies.

My conclusion is that these algorithms don’t make things objective, they makes things opaque. And they embeds our old cultural problems in new mathematical models, giving us a false badge of objectivity.

Categories: data science, modeling, rant

## People don’t get fired enough

This might surprise some of you – or not, I’m not sure. But one of the most satisfying things about leaving academia and the tenure system and going into industry is how, at least in the ideal situation, you can get fired for not doing your job.

In fact, one of the reasons I decided to leave academia is that I really thought some of my colleagues weren’t doing right by the undergraduates, and the frustrating thing was that there was essentially no way to force them to start. Tenure has great aspects and not-so-great aspects, and a total lack of leverage is not a great one. I feel for deans sometimes.

Here’s the dirty little secret of lots of industry jobs, though: lots of time people also don’t get fired when they should. And sometimes it’s super awful bullies who yell and scream and act inappropriately but also pull in amazing sales numbers. There are things like that, of course. That’s the example of how they don’t abide by the alleged social contract but they perform on the bottomline. Social contracts are hard to quantify and somewhat squishy. You see people getting away with stuff because they’re rainmakers or higher ups.

But there are also plenty of examples of people just not doing their job, and having super awful attitudes, or even just completely apathetic attitudes, and for whatever reason they don’t get fired. This demoralizes and irritates and distracts everyone around them, because they all resent the free-rider.

Plus, retaining people who should by all accounts get fired makes the veneer of the kool-aid drinking camaraderie even more flimsy and scrutinizable – what’s so great about working here if people can just slack off and not care? Why do I give two shits about this project anyway? How does this project in the larger scheme of things? Maybe that scrutiny is a good thing – I engage in it myself – but you don’t want everyone thinking that all the time.

Here’s the thing, before you think I’m super vicious and mean to want people to get fired. These people I’m talking about are generally high skilled and temporarily depressed. They’re in the wrong job. And once fired, they will find another job, which will hopefully be a better one for them. I’m not saying that nobody will ever end up jobless and homeless, but very few, and moreover there are plenty of jobless and homeless people who would be psyched to do that job really well (putting aside how difficult it is for homeless people to get seriously considered for a job).

And I’m not saying you fire people out of the blue. You definitely need to tell people they’re not performing well (or that they are) and keep them in the feedback loop on whether things are working out. But in my experience people who deserve to get fired totally know it and can’t believe their luck that they’ve not been fired yet.

To conclude, I’m going on record saying I kind of agree with Jack Welch on this issue in a way I never thought I would.

Categories: musing, rant

## I’m already fat so I may as well be smart

I seem to be in a mood this week for provocative posts about body image and appearance (maybe this is what happens when I skip an Aunt Pythia column). Apologies to people who came for math talk.

I just wanted to mention something positive about the experience of being fat all my life, but especially as a school kid. Because just to be clear, this isn’t a phase. I’ve been pudgy since I was 2 weeks old. And overall it kind of works for me, and I’ll say why.

Namely, being a fat school kid meant that I was so uncool, so outside of normal social activity with boys and the like, that I was freed up to be as smart and as nerdy as I wanted, with very little stress about how that would “look”. You’re already fat, so why not be smart too? You’re not doing anything else, nobody’s paying attention to you, and there’s nothing to gossip about, so might as well join the math team.

It’s really a testament to both the pressure to be thin and the pressure to conform intellectually, i.e. not be a nerd, when you’re a young girl: they are both intense and super unpleasant. The happy truth is, one can be cover for the other. More than that, really: being fat (or “overweight” for people who are squeamish about the word “fat”) has opened up many doors that I honestly think would have, or at least could have, remained shut had I been more socially acceptable.

Going back to dress code at work for a moment: while people claim that corporate dress codes are meant to keep our minds off of sex, that is clearly a huge lie when it comes to many categories of women’s work clothes. Who are we kidding? The mere fact that many women wear high heels to work kind of says it all. And that’s fine, but let’s freaking acknowledge it.

On the other hand, it’s pretty hard to look sexy in a plus-sized suit (although not impossible), and the idea of high heels at work is just nuts. This ends up being a weirdly good thing for me, though: people take me more seriously because I have taken myself out of the sex game altogether – or at least the traditional sex game.

By the way, I’m not saying all fat women have the same perspective on it. I’m lucky enough to have figured out pretty early on how to separate other people’s projected feelings about my body from my own feelings. I am an observer of fat hatred, in other words. That doesn’t make me entirely insulated but it does give me one critical advantage: I have a lot of time on my hands to do stuff that I might otherwise spend fretting about my body.

It also might help partly explain why some girls get on the math team and others don’t. Being fat is something you don’t have control over (the continuing and damaging myth that each person does have control over it notwithstanding) but joining the math team is something you do have control over. And if you aren’t already excluded for some other reason (being fat is one but by no means the only way this could happen of course), you might not want to start that whole thing intentionally. Just a theory.

Categories: rant, women in math

## One reason corporate culture sucks for women

Am I the only person offended by the recent wave of articles wherein “senior women” at corporate offices are going around telling “younger women” about the appropriate dress code?

For example, here’s the beginning of a WSJ piece on just that subject:

Clothes may make the man. Can they undo the woman?

When female employees at Frontier Communications Corp. show up at its headquarters in very short skirts, sweatpants or sneakers, Chief Executive Maggie Wilderotter sometimes pulls them aside for a quick, private chat on dressing for success.

“I want women to be paid attention to for what they say–and not how they look,” explains Ms. Wilderotter.

Later in the article the explain why this is ok:

Women face more pitfalls because they have more clothing choices than men. And because male bosses fear being accused of sexual harassment, it usually falls to female supervisors to confront an associate about her attire.

This is one reason I hate corporate jobs. And yes, it’s because I come from academia and because I’m essentially a hippie, but seriously, why do we need so much policing? Why can’t people just leave each other alone to express themselves? It’s also a double standard:

Rosalind Hudnell, human resources vice president of Intel Corp., occasionally intervenes when she sees young female staffers clad unprofessionally, even though Intel staffers often wear shorts and jeans.

It’s just another in a long list of things you are scrutinized on if you’re a woman. In addition to whether you are a good mother, a feminine-enough-without-being-too-feminine employee, and, as a tertiary issue, if that, whether you actually do your job well. Fuck this.

Question for you readers: what does it really mean that these “senior women” are taking it upon themselves to scrutinize and criticize young women? Am I wrong, is it actually generous? Or is it some kind of hazing thing? Or is it a media invention that doesn’t actually happen?

Categories: rant

## Alan Greenspan still doesn’t get it. #OWS

Yesterday I read Alan Greenspan’s recent article in Foreign Affairs magazine (hat tip Rhoda Schermer). It is entitled “Never Saw It Coming: Why the Financial Crisis Took Economists By Surprise,” and for those of you who want to save some time, it basically goes like this:

I’ll admit it, the macroeconomic models that we used before the crisis failed, because we assumed people financial firms behaved rationally. But now there are new models that assume predictable irrational behavior, and once we add those bells and whistles onto our existing models, we’ll be all good. Y’all can start trusting economists again.

Here’s the thing that drives me nuts about Greenspan. He is still talking about financial firms as if they are single people. He just didn’t really read Adam Smith’s Wealth of Nations, or at least didn’t understand it, because if he had, he’d have seen that Adam Smith argued against large firms in which the agendas of the individuals ran counter to the agenda of the company they worked for.

If you think about individuals inside the banks, in other words, then their individual incentives explain their behavior pretty damn well. But Greenspan ignores that and still insists on looking at the bank as a whole. Here’s a quote from the piece:

Financial firms accepted the risk that they would be unable to anticipate the onset of a crisis in time to retrench. However, they thought the risk was limited, believing that even if a crisis developed, the seemingly insatiable demand for exotic financial products would dissipate only slowly, allowing them to sell almost all their portfolios without loss. They were mistaken.

Let’s be clear. Financial firms were not “mistaken”, because legal contracts can’t think. As for the individuals working inside those firms, there was no such assumption about a slow exhale. Everyone was furiously getting their bonuses pumped up while the getting was good. People on the inside knew the market for exotic financial products would blow at some point, and that their personal risks were limited, so why not make systemic risk worse until then.

As a mathematical modeler myself, it bugs me to try to put a mathematical band-aid on an inherently failed model. We should instead build a totally new model, or even better remove the individual perverted incentives of the market using new rules (I’m using the word “rules” instead of “regulations” because people don’t hate rules as much as they hate regulations).

Wouldn’t it be nice if the agendas of the individuals inside a financial firm were more closely aligned with the financial firm? And if it was over a long period of time instead of just until the bonus day? Not impossible.

And, since I’m an occupier, I get to ask even more. Namely, wouldn’t it be even nicer if that agenda was also shared by the general public? Doable!

Mr. Greenspan, there are ways to address the mistake you economists made and continue to make, but they don’t involve fancier math models from behavioral economics. They involve really simple rule changes and, generally speaking, making finance much more boring and much less profitable.

Categories: #OWS, finance, modeling, rant

I had a great time at Harvard Wednesday giving my talk (prezi here) about modeling challenges. The audience was fantastic and truly interdisciplinary, and they pushed back and challenged me in a great way. I’m glad I went and I’m glad Tess Wise invited me.

One issue that came up is something I want to talk about today, because I hear it all the time and it’s really starting to bug me.

Namely, the fallacy that people, especially young people, are “happy to give away their private data in order to get the services they love on the internet”. The actual quote came from the IBM guy on the congressional subcommittee panel on big data, which I blogged about here (point #7), but I’ve started to hear that reasoning more and more often from people who insist on side-stepping the issue of data privacy regulation.

Here’s the thing. It’s not that people don’t click “yes” on those privacy forms. They do click yes, and I acknowledge that. The real problem is that people generally have no clue what it is they’re trading.

In other words, this idea of a omniscient market participant with perfect information making a well-informed trade, which we’ve already seen is not the case in the actual market, is doubly or triply not the case when you think about young people giving away private data for the sake of a phone app.

Just to be clear about what these market participants don’t know, I’ll make a short list:

• They probably don’t know that their data is aggregated, bought, and sold by Acxiom, which they’ve probably never heard of.
• They probably don’t know that Facebook and other social media companies sell stuff about them even if their friends don’t see it and even though it’s often “de-identified”. Think about this next time you sign up for a service like “Bang With Friends,” which works through Facebook.
• They probably don’t know how good algorithms are getting at identifying de-identified information.
• They probably don’t know how this kind of information is used by companies to profile users who ask for credit or try to get a job.

Conclusion: people are ignorant of what they’re giving away to play Candy Crush Saga[1]. And whatever it is they’re giving away, it’s something way far in the future that they’re not worried about right now. In any case it’s not a fair trade by any means, and we should stop referring to it as such.

What is it instead? I’d say it’s a trick. A trick which plays on our own impulses and short-sightedness and possibly even a kind of addiction to shiny toys in the form of candy. If you give me your future, I’ll give you a shiny toy to play with right now. People who click “yes” are not signaling that they’ve thought deeply about the consequences of giving their data away, and they are certainly not making the definitive political statement that we don’t need privacy regulation.

1. I actually don’t know the data privacy rules for Candy Crush and can’t seem to find them, for example here. Please tell me if you know what they are.

Categories: data science, modeling, rant

## The scienciness of economics

A few of you may have read this recent New York TImes op-ed (hat tip Suresh Naidu) by economist Raj Chetty entitled “Yes, Economics is a Science.” In it he defends the scienciness of economics by comparing it to the field of epidemiology. Let’s focus on these three sentences in his essay, which for me are his key points:

I’m troubled by the sense among skeptics that disagreements about the answers to certain questions suggest that economics is a confused discipline, a fake science whose findings cannot be a useful basis for making policy decisions.

That view is unfair and uninformed. It makes demands on economics that are not made of other empirical disciplines, like medicine, and it ignores an emerging body of work, building on the scientific approach of last week’s winners, that is transforming economics into a field firmly grounded in fact.

Chetty is conflating two issues in his first sentence. The first is whether economics can be approached as a science, and the second is whether, if you are an honest scientist, you push as hard as you can to implement your “results” as public policy. Because that second issue is politics, not science, and that’s where people like myself get really pissed at economists, when they treat their estimates as facts with no uncertainty.

In other words, I’d have no problem with economists if they behaved like the people in the following completely made-up story based on the infamous Reinhart-Rogoff paper with the infamous excel mistake.

Two guys tried to figure what public policy causes GDP growth by using historical data. They collected their data and did some analysis, and they later released both the spreadsheet and the data by posting them on their Harvard webpages. They also ran the numbers a few times with slightly different countries and slightly different weighting schemes and explained in their write-up that got different answers depending on the initial conditions, so therefore they couldn’t conclude much at all, because the error bars are just so big. Oh well.

You see how that works? It’s called science, and it’s not what economists are known to do. It’s what we all wish they’d do though. Instead we have economists who basically get paid to write papers pushing for certain policies.

Next, let’s talk about Chetty’s comparison of economics with medicine. It’s kind of amazing that he’d do this considering how discredited epidemiology is at this point, and how truly unscientific it’s been found to be, for essentially exactly the same reasons as above – initial conditions, even just changing which standard database you use for your tests, switch the sign of most of the results in medicine. I wrote this up here based on a lecture by David Madigan, but there’s also a chapter in my new book with Rachel Schutt based on this issue.

To briefly summarize, Madigan and his colleagues reproduce a bunch of epidemiological studies and come out with incredible depressing “sensitivity” results. Namely, that the majority of “statistically significant findings” change sign depending on seemingly trivial initial condition changes that the authors of the original studies often didn’t even explain.

So in other words, Chetty defends economics as “just as much science” as epidemiology, which I would claim is in the category “not at all a science.” In the end I guess I’d have to agree with him, but not in a good way.

Finally, let’s be clear: it’s a good thing that economists are striving to be scientists, when they are. And it’s of course a lot easier to do science in microeconomic settings where the data is plentiful than it is to answer big, macro-economic questions where we only have a few examples.

Even so, it’s still a good thing that economists are asking the hard questions, even when they can’t answer them, like what causes recessions and what determines growth. It’s just crucial to remember that actual scientists are skeptical, even of their own work, and don’t pretend to have error bars small enough to make high-impact policy decisions based on their fragile results.

Categories: modeling, rant, statistics

## “Here and Now” is shilling for the College Board

Last week Here and Now’s host Jeremy Hobson set up College Board’s James Montoya for a perfect advertisement regarding a story on SAT scores going down. The transcript and recording are here (hat tip Becky Jaffe).

To set it up, they talk about how GPA’s are going up on average over the country but how, at the same time, the average SAT score went down last year.

Somehow the interpretation of this is that there’s grade inflation and that kids must be in need of more test prep because they’re dumber.

What is the College Board?

You might think, especially if you listen to this interview, that the college board is a thoughtful non-profit dedicated to getting kids prepared for college.

Make no mistake about it: the College Board is a big business, and much of their money comes from selling test prep stuff on top of administering tests. Here are a couple of things you might want to know about College Board through its wikipedia page:

Consumer rights organization Americans for Educational Testing Reform (AETR) has criticized College Board for violating its non-profit status through excessive profits and exorbitant executive compensation; nineteen of its executives make more than $300,000 per year, with CEO Gaston Caperton earning$1.3 million in 2009 (including deferred compensation).[10][11] AETR also claims that College Board is acting unethically by selling test preparation materials, directly lobbying legislators and government officials, and refusing to acknowledge test-taker rights.[12]

Anyhoo, let’s just say it this way: College Board has the ability to create an “emergency” about SAT scores, by say changing the test or making it harder, and then the only “reasonable response” is to pay for yet more test prep. And somehow Here and Now’s host Jeremy Hobson didn’t see this coming at all.

The interview

Here’s an excerpt:

HOBSON: It also suggests, when you look at the year-over-year scores, the averages, that things are getting worse, not better, because if I look at, for example, in critical reading in 2006, the average being 503, and now it’s 496. Same deal in math and writing. They’ve gone down.

MONTOYA: Well, at the same time that we have seen the scores go down, what’s very interesting is that we have seen the average GPAs reported going up. So, for example, when we look at SAT test takers this year, 48 percent reported having a GPA in the A range compared to 45 percent last year, compared to 44 percent in 2011, I think, suggesting that there simply have to be more rigor in core courses.

HOBSON: Well, and maybe that there’s grade inflation going on.

MONTOYA: Well, clearly, that there is grade inflation. There is no question about that. And it’s one of the reasons why standardized test scores are so important in the admission office. I know that, as a former dean of admission, test scores help gauge the meaning of a GPA, particularly given the fact that nearly half of all SAT takers are reporting a GPA in the A range.

Just to be super clear about the shilling, here’s Hobson a bit later in the interview:

HOBSON: Well – and we should say that your report noted – since you mentioned practice – that as is the case with the ACT, the students who take the rigorous prep courses do better on the SAT.

What does it really mean when SAT scores go down?

Here’s the thing. SAT scores are fucked with ALL THE TIME. Traditionally, they had to make SAT’s harder since people were getting better at them. As test-makers, they want a good bell curve, so they need to adjust the test as the population changes and as their habits of test prep change.

The result is that SAT tests are different every year, so just saying that the scores went down from year to year is meaningless. Even if the same group of kids took those two different tests in the same year, they’d have different scores.

Also, according to my friend Becky who works with kids preparing for the SAT, they really did make substantial changes recently in the math section, changing the function notation, which makes it much harder for kids to parse the questions. In other words, they switched something around to give kids reason to pay for more test prep.

Important: this has nothing to do with their knowledge, it has to do with their training for this specific test.

If you want to understand the issues outside of math, take for example the essay. According to this critique, the number one criterion for essay grade is length. Length trumps clarity of expression, relevance of the supporting arguments to the thesis, mechanics, and all other elements of quality writing. As my friend Becky says:

I have coached high school students on the SAT for years and have found time and again, much to my chagrin, that students receive top scores for long essays even if they are desultory, tangent-filled and riddled with sentence fragments, run-ons, and spelling errors.

Similarly, I have consistently seen students receive low scores for shorter essays that are thoughtful and sophisticated, logical and coherent, stylish and articulate.

As long as the number one criterion for receiving a high score on the SAT essay is length, students will be confused as to what constitutes successful college writing and scoring well on the written portion of the exam will remain essentially meaningless. High-scoring students will have to unlearn the strategies that led to success on the SAT essay and relearn the fundamentals of written expression in a college writing class.

If the College Board (the makers of the SAT) is so concerned about the dumbing down of American children, they should examine their own role in lowering and distorting the standards for written expression.

Conclusion

Two things. First, shame on College Board and James Montoya for acting like SAT scores are somehow beacons of truth without acknowledging the fiddling that goes on time and time again by his company. And second, shame on Here and Now and Jemery Hobson for being utterly naive and buying in entirely to this scare tactic.

## I’d like you to eventually die

Google has formally thrown their hat into the “rich people should never die” arena, with an official announcement of their new project called Calico, “a new company that will focus on health and well-being, in particular the challenge of aging and associated diseases”. Their plan is to use big data and genetic research to avoid aging.

I saw this coming when they hired Ray Kurzweil. Here’s an excerpt from my post:

A few days ago I read a New York Times interview of Ray Kurzweil, who thinks he’s going to live forever and also claims he will cure cancer if and when he gets it (his excuse for not doing it in his spare time now: “Well, I mean, I do have to pick my priorities. Nobody can do everything.”). He also just got hired at Google.

Here’s the thing. We need people to die. Our planet cannot sustain all the people currently alive as well as all the people who are going to someday be born. Just not gonna happen. Plus, it would be a ridiculously boring place to live. Think about how boring it is already for young people to be around old people. I bore myself around my kids, and I’m only 30 years older than they are.

And yes, it’s tragic when someone we love actually becomes one of those people whose time has come, especially if they’re young and especially if it seemed preventable. For that matter, I’m all for figuring out how to improve the quality of life for people.

But the idea that we’re going to figure out how to keep alive a bunch of super rich advertising executives just doesn’t seem right – because, let’s face it, there will have to be a way to choose who lives and who dies, and I know who is at the top of that list – and I for one am not on board with the plan. Larry Page, Tim Cook, and Ray Kurzweil: I’d really like it if you eventually died.

On the other hand, I’m not super worried about this plan coming through either. Big data can do a lot but it’s not going to make people live forever. Or let’s say it another way: if they can use big data to make people live forever, they can also use big data to convince me that super special rich white men living in Silicon Valley should take up resources and airtime for the rest of eternity.

Categories: data science, rant

## Short your kids, go long your neighbor: betting on people is coming soon

Yet another aspect of Gary Shteyngart’s dystopian fiction novel Super Sad True Love Story is coming true for reals this week.

Besides anticipating Occupy Wall Street, as well as Bloomberg’s sweep of Zuccotti Park (although getting it wrong on how utterly successful such sweeping would be), Shteyngart proposed the idea of instant, real-time and broadcast credit ratings.

Anyone walking around the streets of New York, as they’d pass a certain type of telephone pole – the kind that identifies you via your cell phone and communicates with data warehousing services and databases – would have their credit rating flashed onto a screen. If you went to a party, depending on how you impressed the other party go-ers, your score could plummet or rise in real time, and everyone would be able to keep track and treat you accordingly.

I mean, there were other things about the novel too, but as a data person these details certainly stuck with me since they are both extremely gross and utterly plausible.

And why do I say they are coming true now? I base my claim on two news stories I’ve been sent by my various blog readers recently.

[Aside: if you read my blog and find an awesome article that you want to send me, by all means do! My email address is available on my "About" page.]

First, coming via Suresh and Marcos, we learn that data broker Acxiom is letting people see their warehoused data. A few caveats, bien sûr:

1. You get to see your own profile, here, starting in 2 days, but only your own.
2. And actually, you only get to see some of your data. So they won’t tell you if you’re a suspected gambling addict, for example. It’s a curated view, and they want your help curating it more. You know, for your own good.
3. And they’re doing it so that people have clarity on their business.
4. Haha! Just kidding. They’re doing it because they’re trying to avoid regulations and they feel like this gesture of transparency might make people less suspicious of them.
5. And they’re counting on people’s laziness. They’re allowing people to opt out, but of course the people who should opt out would likely never even know about that possibility.
6. Just keep in mind that, as an individual, you won’t know what they really think they know about you, but as a corporation you can buy complete information about anyone who hasn’t opted out.

In any case those credit scores that Shteyngart talks about are already happening. The only issue is who gets flashed those numbers and when. Instead of the answers being “anyone walking down the street” and “when you walk by a pole” it’s “any corporation on the interweb” and “whenever you browse”.

After all, why would they give something away for free? Where’s the profit in showing the credit scores of anyone to everyone? Hmmmm….

That brings me to my second news story of the morning coming to me via Constantine, namely this TechCrunch story which explains how a startup called Fantex is planning to allow individuals to invest in celebrity athletes’ stocks. Yes, you too can own a tiny little piece of someone famous, for a price. From the article:

People can then buy shares of that player’s brand, like a stock, in the Fantex-consumer market. Presumably, if San Francisco 49ers tight end Vernon Davis has a monster year and looks like he’s going to get a bigger endorsement deal or a larger contract in a few years, his stock would rise and a fan could sell their Davis stock and cash out with a real, monetary profit. People would own tracking or targeted stocks in Fantex that would depend on the specific brand that they choose; these stocks would then rise and fall based on their own performance, not on the overall performance of Fantex.

Let’s put these two things together. I think it’s not too much of a stretch to acknowledge a reason for everyone to know everyone else’s credit score! Namely, we can can bet on each other’s futures!

I can’t think of any set-up more exhilarating to the community of hedge fund assholes than a huge, new open market – containing profit potentials for every single citizen of earth – where you get to make money when someone goes to the wrong college, or when someone enters into an unfortunate marriage and needs a divorce, or when someone gets predictably sick. An orgy in the exact center of tech and finance.

Are you with me peoples?!

I don’t know what your Labor Day plans are, but I’m getting ready my list of people to short in this spanking new market.

## College ranking models

Last week Obama began to making threats regarding a new college ranking system and its connection to federal funding. Here’s an excerpt of what he was talking about, from this WSJ article:

The president called for rating colleges before the 2015 school year on measures such as affordability and graduation rates—”metrics like how much debt does the average student leave with, how easy is it to pay off, how many students graduate on time, how well do those graduates do in the workforce,” Mr. Obama told a crowd at the University at Buffalo, the first stop on a two-day bus tour.

Interesting! This means that Obama is wading directly into the field of modeling. He’s probably sick of the standard college ranking system, put out by US News & World Reports. I kind of don’t blame him, since that model is flawed and largely gamed. In fact, I made a case for open sourcing that model recently just so that people would look into it and lose faith in its magical properties.

So I’m with Obama, that model sucks, and it’s high time there are other competing models so that people have more than one thing to think about.

On the other hand, what Obama is focusing on seems narrow. Here’s what he supposedly wants to do with that model (again from the WSJ article):

Once a rating system is in place, Mr. Obama will ask Congress to allocate federal financial aid based on the scores by 2018. Students at top-performing colleges could receive larger federal grants and more affordable student loans. “It is time to stop subsidizing schools that are not producing good results,” he said.

His main goal seems to be “to make college more affordable”.

I’d like to make a few comments on this overall plan. The short version is that he’s suggesting something that will have strong, mostly negative effects, and that won’t solve his problem of college affordability.

Why strong negative effects?

What Obama seems to realize about the existing model is that it’s had side effects because of the way college administrators have gamed the model. Presumably, given that this new proposed model will be directly tied to federal funding, it will be high-impact and will thus be thoroughly gamed by administrators as well.

The first complaint, then, is that Obama didn’t address this inevitably gaming directly – and that doesn’t bode well about his ability to put into place a reasonable model.

But let’s not follow his lead. Let’s think about what kind of gaming will occur once such a model is in place. It’s not pretty.

Here are the attributes he’s planning to use for colleges. I’ve substituted reasonably numerical proxies for his descriptions above:

1. Cost (less is better)
2. Percentage of people able to pay off their loans within 10 years (more is better)
3. Graduation rate (more is better)
4. Percentage of people graduating within 4 years (more is better)
5. Percentage of people who get high-paying jobs after graduating (more is better)

Cost

Nobody is going to argue against optimizing for lower cost. Unfortunately, what with the cultural assumption of the need for a college education, combined with the ignorance and naive optimism of young people, not to mention start-ups like Upstart that allow young people to enter indentured servitude, the pressure is upwards, not downwards.

The supply of money for college is large and growing, and the answer to rising tuition costs is not to supply more money. Colleges have already responded to the existence of federal loans, for example, by raising tuition in the amount of the loan. Ironically, much of the rise in tuition cost has gone to administrators, whose job it is to game the system for more money.

Which is to say, you can penalize certain colleges for being at the front of the pack in terms of price, but if the overall cost is rising constantly, you’re not doing much.

If you really wanted to make costs low, then fund state universities and make them really good, and make them basically free. That would actually make private colleges try to compete on cost.

Paying off loans quickly

Here’s where we get to the heart of the problem with Obama’s plan.

What are you going to do, as an administrator tasked with making sure you never lose federal funding under the new regime?

Are you going to give all the students fairer terms on their debt? Or are you going to select for students that are more likely to get finance jobs? I’m guessing the latter.

So much for liberal arts educations. So much for learning about art, philosophy, or for that matter anything that isn’t an easy entrance into the tech or finance sector. Only colleges that don’t care a whit about federal money will even have an art history department.

Gaming the graduation rate is easy. Just lower your standards for degrees, duh.

Again, a general lowering of standards is quick and easy.

How well graduates do in the workforce

Putting this into your model is toxic, and measures a given field directly in terms of market forces. Economics, Computer Science, and Business majors will be the kings of the hill. We might as well never produce writers, thinkers, or anything else creative again.

Note this pressure already exists today: many of our college presidents are becoming more and more corporate minded and less interested in education itself, mostly as a means to feed their endowments. As an example, I don’t need to look further than across my street to Barnard, where president Debora Spar somehow decided to celebrate Ina Drew as an example of success in front of a bunch of young Barnard students. I can’t help but think that was related to a hoped-for gift.

Obama needs to think this one through. Do we really want to build the college system in this country in the image of Wall Street and Silicon Valley? Do we want to intentionally skew the balance towards those industries even further?

Building a better college ranking model

The problem is that it’s actually really hard to model quality of education. The mathematical models that already exist and are being proposed are just pathetically bad at it, partly because college, ultimately, isn’t only about the facts you learn, or the job you get, or how quickly you get it. It’s actually a life experience which, in the best of cases, enlarges your world view, and gets you to strive for something you might not have known existed before going.

I’d suggest that, instead of building a new ranking system, we on the one hand identify truly fraudulent colleges (which really do exist) and on the other, invest heavily in state schools, giving them enough security so they can do without their army of expensive administrators.

Categories: modeling, news, rant

## Staples.com rips off poor people; let’s take control of our online personas

You’ve probably heard rumors about this here and there, but the Wall Street Journal convincingly reported yesterday that websites charge certain people more for the exact thing.

Specifically, poor people were more likely to pay more for, say, a stapler from Staples.com than richer people. Home Depot and Lowes does the same for their online customers, and Discover and Capitol One make different credit card offers to people depending on where they live (“hey, do you live in a PayDay lender neighborhood? We got the card for you!”).

They got pretty quantitative for Staples.com, and did tests to determine the cost. From the article:

It is possible that Staples’ online-pricing formula uses other factors that the Journal didn’t identify. The Journal tested to see whether price was tied to different characteristics including population, local income, proximity to a Staples store, race and other demographic factors. Statistically speaking, by far the strongest correlation involved the distance to a rival’s store from the center of a ZIP Code. That single factor appeared to explain upward of 90% of the pricing pattern.

If anyone’s ever seen a census map, race is highly segregated by ZIP code, and my guess is we’d see pretty high correlations along racial lines as well, although they didn’t mention it in the article except to say that explicit race-related pricing is illegal. The article does mentions that things get more expensive in rural areas, which are also poorer, so there’s that acknowledged correlation.

But wait, how much of a price difference are we talking about? From the article:

Prices varied for about a third of the more than 1,000 randomly selected Staples.com products tested. The discounted and higher prices differed by about 8% on average.

In other words, a really non-trivial amount.

The messed up thing about this, or at least one of them, is that we could actually have way more control over our online personas than we think. It’s invisible to us, typically, so we don’t think about our cookies and our displayed IP addresses. But we could totally manipulate these signatures to our advantage if we set our minds to it.

Hackers, get thyselves to work making this technology easily available.

For that matter, given the 8% difference, there’s money on the line so some straight-up capitalist somewhere should be meeting that need. I for one would be willing to give someone a sliver of the amount saved every time they manipulated my online persona to save me money. You save me \$1.00, I’ll give you a dime.

Here’s my favorite part of this plan: it would be easy for Staples to keep track of how much people are manipulating their ZIP codes. So if Staples.com infers a certain ZIP code for me to display a certain price, but then in check-out I ask them to send the package to a different ZIP code, Staples will know after-the-fact that I fooled them. But whatever, last time I looked it didn’t cost more or less to send mail to California or wherever than to Manhattan [Update: they do charge differently for packages, though. That's the only differential in cost I think is reasonable to pay].

I’d love to see them make a case for how this isn’t fair to them.

Categories: data science, modeling, rant