A few of you may have read this recent New York TImes op-ed (hat tip Suresh Naidu) by economist Raj Chetty entitled “Yes, Economics is a Science.” In it he defends the scienciness of economics by comparing it to the field of epidemiology. Let’s focus on these three sentences in his essay, which for me are his key points:
I’m troubled by the sense among skeptics that disagreements about the answers to certain questions suggest that economics is a confused discipline, a fake science whose findings cannot be a useful basis for making policy decisions.
That view is unfair and uninformed. It makes demands on economics that are not made of other empirical disciplines, like medicine, and it ignores an emerging body of work, building on the scientific approach of last week’s winners, that is transforming economics into a field firmly grounded in fact.
Chetty is conflating two issues in his first sentence. The first is whether economics can be approached as a science, and the second is whether, if you are an honest scientist, you push as hard as you can to implement your “results” as public policy. Because that second issue is politics, not science, and that’s where people like myself get really pissed at economists, when they treat their estimates as facts with no uncertainty.
In other words, I’d have no problem with economists if they behaved like the people in the following completely made-up story based on the infamous Reinhart-Rogoff paper with the infamous excel mistake.
Two guys tried to figure what public policy causes GDP growth by using historical data. They collected their data and did some analysis, and they later released both the spreadsheet and the data by posting them on their Harvard webpages. They also ran the numbers a few times with slightly different countries and slightly different weighting schemes and explained in their write-up that got different answers depending on the initial conditions, so therefore they couldn’t conclude much at all, because the error bars are just so big. Oh well.
You see how that works? It’s called science, and it’s not what economists are known to do. It’s what we all wish they’d do though. Instead we have economists who basically get paid to write papers pushing for certain policies.
Next, let’s talk about Chetty’s comparison of economics with medicine. It’s kind of amazing that he’d do this considering how discredited epidemiology is at this point, and how truly unscientific it’s been found to be, for essentially exactly the same reasons as above – initial conditions, even just changing which standard database you use for your tests, switch the sign of most of the results in medicine. I wrote this up here based on a lecture by David Madigan, but there’s also a chapter in my new book with Rachel Schutt based on this issue.
To briefly summarize, Madigan and his colleagues reproduce a bunch of epidemiological studies and come out with incredible depressing “sensitivity” results. Namely, that the majority of “statistically significant findings” change sign depending on seemingly trivial initial condition changes that the authors of the original studies often didn’t even explain.
So in other words, Chetty defends economics as “just as much science” as epidemiology, which I would claim is in the category “not at all a science.” In the end I guess I’d have to agree with him, but not in a good way.
Finally, let’s be clear: it’s a good thing that economists are striving to be scientists, when they are. And it’s of course a lot easier to do science in microeconomic settings where the data is plentiful than it is to answer big, macro-economic questions where we only have a few examples.
Even so, it’s still a good thing that economists are asking the hard questions, even when they can’t answer them, like what causes recessions and what determines growth. It’s just crucial to remember that actual scientists are skeptical, even of their own work, and don’t pretend to have error bars small enough to make high-impact policy decisions based on their fragile results.
Did you think public radio doesn’t have advertising? Think again.
Last week Here and Now’s host Jeremy Hobson set up College Board’s James Montoya for a perfect advertisement regarding a story on SAT scores going down. The transcript and recording are here (hat tip Becky Jaffe).
To set it up, they talk about how GPA’s are going up on average over the country but how, at the same time, the average SAT score went down last year.
Somehow the interpretation of this is that there’s grade inflation and that kids must be in need of more test prep because they’re dumber.
What is the College Board?
You might think, especially if you listen to this interview, that the college board is a thoughtful non-profit dedicated to getting kids prepared for college.
Make no mistake about it: the College Board is a big business, and much of their money comes from selling test prep stuff on top of administering tests. Here are a couple of things you might want to know about College Board through its wikipedia page:
Consumer rights organization Americans for Educational Testing Reform (AETR) has criticized College Board for violating its non-profit status through excessive profits and exorbitant executive compensation; nineteen of its executives make more than $300,000 per year, with CEO Gaston Caperton earning $1.3 million in 2009 (including deferred compensation). AETR also claims that College Board is acting unethically by selling test preparation materials, directly lobbying legislators and government officials, and refusing to acknowledge test-taker rights.
Anyhoo, let’s just say it this way: College Board has the ability to create an “emergency” about SAT scores, by say changing the test or making it harder, and then the only “reasonable response” is to pay for yet more test prep. And somehow Here and Now’s host Jeremy Hobson didn’t see this coming at all.
Here’s an excerpt:
HOBSON: It also suggests, when you look at the year-over-year scores, the averages, that things are getting worse, not better, because if I look at, for example, in critical reading in 2006, the average being 503, and now it’s 496. Same deal in math and writing. They’ve gone down.
MONTOYA: Well, at the same time that we have seen the scores go down, what’s very interesting is that we have seen the average GPAs reported going up. So, for example, when we look at SAT test takers this year, 48 percent reported having a GPA in the A range compared to 45 percent last year, compared to 44 percent in 2011, I think, suggesting that there simply have to be more rigor in core courses.
HOBSON: Well, and maybe that there’s grade inflation going on.
MONTOYA: Well, clearly, that there is grade inflation. There is no question about that. And it’s one of the reasons why standardized test scores are so important in the admission office. I know that, as a former dean of admission, test scores help gauge the meaning of a GPA, particularly given the fact that nearly half of all SAT takers are reporting a GPA in the A range.
Just to be super clear about the shilling, here’s Hobson a bit later in the interview:
HOBSON: Well – and we should say that your report noted – since you mentioned practice – that as is the case with the ACT, the students who take the rigorous prep courses do better on the SAT.
What does it really mean when SAT scores go down?
Here’s the thing. SAT scores are fucked with ALL THE TIME. Traditionally, they had to make SAT’s harder since people were getting better at them. As test-makers, they want a good bell curve, so they need to adjust the test as the population changes and as their habits of test prep change.
The result is that SAT tests are different every year, so just saying that the scores went down from year to year is meaningless. Even if the same group of kids took those two different tests in the same year, they’d have different scores.
Also, according to my friend Becky who works with kids preparing for the SAT, they really did make substantial changes recently in the math section, changing the function notation, which makes it much harder for kids to parse the questions. In other words, they switched something around to give kids reason to pay for more test prep.
Important: this has nothing to do with their knowledge, it has to do with their training for this specific test.
If you want to understand the issues outside of math, take for example the essay. According to this critique, the number one criterion for essay grade is length. Length trumps clarity of expression, relevance of the supporting arguments to the thesis, mechanics, and all other elements of quality writing. As my friend Becky says:
I have coached high school students on the SAT for years and have found time and again, much to my chagrin, that students receive top scores for long essays even if they are desultory, tangent-filled and riddled with sentence fragments, run-ons, and spelling errors.
Similarly, I have consistently seen students receive low scores for shorter essays that are thoughtful and sophisticated, logical and coherent, stylish and articulate.
As long as the number one criterion for receiving a high score on the SAT essay is length, students will be confused as to what constitutes successful college writing and scoring well on the written portion of the exam will remain essentially meaningless. High-scoring students will have to unlearn the strategies that led to success on the SAT essay and relearn the fundamentals of written expression in a college writing class.
If the College Board (the makers of the SAT) is so concerned about the dumbing down of American children, they should examine their own role in lowering and distorting the standards for written expression.
Two things. First, shame on College Board and James Montoya for acting like SAT scores are somehow beacons of truth without acknowledging the fiddling that goes on time and time again by his company. And second, shame on Here and Now and Jemery Hobson for being utterly naive and buying in entirely to this scare tactic.
Google has formally thrown their hat into the “rich people should never die” arena, with an official announcement of their new project called Calico, “a new company that will focus on health and well-being, in particular the challenge of aging and associated diseases”. Their plan is to use big data and genetic research to avoid aging.
I saw this coming when they hired Ray Kurzweil. Here’s an excerpt from my post:
A few days ago I read a New York Times interview of Ray Kurzweil, who thinks he’s going to live forever and also claims he will cure cancer if and when he gets it (his excuse for not doing it in his spare time now: “Well, I mean, I do have to pick my priorities. Nobody can do everything.”). He also just got hired at Google.
Here’s the thing. We need people to die. Our planet cannot sustain all the people currently alive as well as all the people who are going to someday be born. Just not gonna happen. Plus, it would be a ridiculously boring place to live. Think about how boring it is already for young people to be around old people. I bore myself around my kids, and I’m only 30 years older than they are.
And yes, it’s tragic when someone we love actually becomes one of those people whose time has come, especially if they’re young and especially if it seemed preventable. For that matter, I’m all for figuring out how to improve the quality of life for people.
But the idea that we’re going to figure out how to keep alive a bunch of super rich advertising executives just doesn’t seem right – because, let’s face it, there will have to be a way to choose who lives and who dies, and I know who is at the top of that list – and I for one am not on board with the plan. Larry Page, Tim Cook, and Ray Kurzweil: I’d really like it if you eventually died.
On the other hand, I’m not super worried about this plan coming through either. Big data can do a lot but it’s not going to make people live forever. Or let’s say it another way: if they can use big data to make people live forever, they can also use big data to convince me that super special rich white men living in Silicon Valley should take up resources and airtime for the rest of eternity.
Yet another aspect of Gary Shteyngart’s dystopian fiction novel Super Sad True Love Story is coming true for reals this week.
Besides anticipating Occupy Wall Street, as well as Bloomberg’s sweep of Zuccotti Park (although getting it wrong on how utterly successful such sweeping would be), Shteyngart proposed the idea of instant, real-time and broadcast credit ratings.
Anyone walking around the streets of New York, as they’d pass a certain type of telephone pole – the kind that identifies you via your cell phone and communicates with data warehousing services and databases – would have their credit rating flashed onto a screen. If you went to a party, depending on how you impressed the other party go-ers, your score could plummet or rise in real time, and everyone would be able to keep track and treat you accordingly.
I mean, there were other things about the novel too, but as a data person these details certainly stuck with me since they are both extremely gross and utterly plausible.
And why do I say they are coming true now? I base my claim on two news stories I’ve been sent by my various blog readers recently.
[Aside: if you read my blog and find an awesome article that you want to send me, by all means do! My email address is available on my "About" page.]
First, coming via Suresh and Marcos, we learn that data broker Acxiom is letting people see their warehoused data. A few caveats, bien sûr:
- You get to see your own profile, here, starting in 2 days, but only your own.
- And actually, you only get to see some of your data. So they won’t tell you if you’re a suspected gambling addict, for example. It’s a curated view, and they want your help curating it more. You know, for your own good.
- And they’re doing it so that people have clarity on their business.
- Haha! Just kidding. They’re doing it because they’re trying to avoid regulations and they feel like this gesture of transparency might make people less suspicious of them.
- And they’re counting on people’s laziness. They’re allowing people to opt out, but of course the people who should opt out would likely never even know about that possibility.
- Just keep in mind that, as an individual, you won’t know what they really think they know about you, but as a corporation you can buy complete information about anyone who hasn’t opted out.
In any case those credit scores that Shteyngart talks about are already happening. The only issue is who gets flashed those numbers and when. Instead of the answers being “anyone walking down the street” and “when you walk by a pole” it’s “any corporation on the interweb” and “whenever you browse”.
After all, why would they give something away for free? Where’s the profit in showing the credit scores of anyone to everyone? Hmmmm….
That brings me to my second news story of the morning coming to me via Constantine, namely this TechCrunch story which explains how a startup called Fantex is planning to allow individuals to invest in celebrity athletes’ stocks. Yes, you too can own a tiny little piece of someone famous, for a price. From the article:
People can then buy shares of that player’s brand, like a stock, in the Fantex-consumer market. Presumably, if San Francisco 49ers tight end Vernon Davis has a monster year and looks like he’s going to get a bigger endorsement deal or a larger contract in a few years, his stock would rise and a fan could sell their Davis stock and cash out with a real, monetary profit. People would own tracking or targeted stocks in Fantex that would depend on the specific brand that they choose; these stocks would then rise and fall based on their own performance, not on the overall performance of Fantex.
Let’s put these two things together. I think it’s not too much of a stretch to acknowledge a reason for everyone to know everyone else’s credit score! Namely, we can can bet on each other’s futures!
I can’t think of any set-up more exhilarating to the community of hedge fund assholes than a huge, new open market – containing profit potentials for every single citizen of earth – where you get to make money when someone goes to the wrong college, or when someone enters into an unfortunate marriage and needs a divorce, or when someone gets predictably sick. An orgy in the exact center of tech and finance.
Are you with me peoples?!
I don’t know what your Labor Day plans are, but I’m getting ready my list of people to short in this spanking new market.
Last week Obama began to making threats regarding a new college ranking system and its connection to federal funding. Here’s an excerpt of what he was talking about, from this WSJ article:
The president called for rating colleges before the 2015 school year on measures such as affordability and graduation rates—”metrics like how much debt does the average student leave with, how easy is it to pay off, how many students graduate on time, how well do those graduates do in the workforce,” Mr. Obama told a crowd at the University at Buffalo, the first stop on a two-day bus tour.
Interesting! This means that Obama is wading directly into the field of modeling. He’s probably sick of the standard college ranking system, put out by US News & World Reports. I kind of don’t blame him, since that model is flawed and largely gamed. In fact, I made a case for open sourcing that model recently just so that people would look into it and lose faith in its magical properties.
So I’m with Obama, that model sucks, and it’s high time there are other competing models so that people have more than one thing to think about.
On the other hand, what Obama is focusing on seems narrow. Here’s what he supposedly wants to do with that model (again from the WSJ article):
Once a rating system is in place, Mr. Obama will ask Congress to allocate federal financial aid based on the scores by 2018. Students at top-performing colleges could receive larger federal grants and more affordable student loans. “It is time to stop subsidizing schools that are not producing good results,” he said.
His main goal seems to be “to make college more affordable”.
I’d like to make a few comments on this overall plan. The short version is that he’s suggesting something that will have strong, mostly negative effects, and that won’t solve his problem of college affordability.
Why strong negative effects?
What Obama seems to realize about the existing model is that it’s had side effects because of the way college administrators have gamed the model. Presumably, given that this new proposed model will be directly tied to federal funding, it will be high-impact and will thus be thoroughly gamed by administrators as well.
The first complaint, then, is that Obama didn’t address this inevitably gaming directly – and that doesn’t bode well about his ability to put into place a reasonable model.
But let’s not follow his lead. Let’s think about what kind of gaming will occur once such a model is in place. It’s not pretty.
Here are the attributes he’s planning to use for colleges. I’ve substituted reasonably numerical proxies for his descriptions above:
- Cost (less is better)
- Percentage of people able to pay off their loans within 10 years (more is better)
- Graduation rate (more is better)
- Percentage of people graduating within 4 years (more is better)
- Percentage of people who get high-paying jobs after graduating (more is better)
Nobody is going to argue against optimizing for lower cost. Unfortunately, what with the cultural assumption of the need for a college education, combined with the ignorance and naive optimism of young people, not to mention start-ups like Upstart that allow young people to enter indentured servitude, the pressure is upwards, not downwards.
The supply of money for college is large and growing, and the answer to rising tuition costs is not to supply more money. Colleges have already responded to the existence of federal loans, for example, by raising tuition in the amount of the loan. Ironically, much of the rise in tuition cost has gone to administrators, whose job it is to game the system for more money.
Which is to say, you can penalize certain colleges for being at the front of the pack in terms of price, but if the overall cost is rising constantly, you’re not doing much.
If you really wanted to make costs low, then fund state universities and make them really good, and make them basically free. That would actually make private colleges try to compete on cost.
Paying off loans quickly
Here’s where we get to the heart of the problem with Obama’s plan.
What are you going to do, as an administrator tasked with making sure you never lose federal funding under the new regime?
Are you going to give all the students fairer terms on their debt? Or are you going to select for students that are more likely to get finance jobs? I’m guessing the latter.
So much for liberal arts educations. So much for learning about art, philosophy, or for that matter anything that isn’t an easy entrance into the tech or finance sector. Only colleges that don’t care a whit about federal money will even have an art history department.
Gaming the graduation rate is easy. Just lower your standards for degrees, duh.
How quickly people graduate
Again, a general lowering of standards is quick and easy.
How well graduates do in the workforce
Putting this into your model is toxic, and measures a given field directly in terms of market forces. Economics, Computer Science, and Business majors will be the kings of the hill. We might as well never produce writers, thinkers, or anything else creative again.
Note this pressure already exists today: many of our college presidents are becoming more and more corporate minded and less interested in education itself, mostly as a means to feed their endowments. As an example, I don’t need to look further than across my street to Barnard, where president Debora Spar somehow decided to celebrate Ina Drew as an example of success in front of a bunch of young Barnard students. I can’t help but think that was related to a hoped-for gift.
Obama needs to think this one through. Do we really want to build the college system in this country in the image of Wall Street and Silicon Valley? Do we want to intentionally skew the balance towards those industries even further?
Building a better college ranking model
The problem is that it’s actually really hard to model quality of education. The mathematical models that already exist and are being proposed are just pathetically bad at it, partly because college, ultimately, isn’t only about the facts you learn, or the job you get, or how quickly you get it. It’s actually a life experience which, in the best of cases, enlarges your world view, and gets you to strive for something you might not have known existed before going.
I’d suggest that, instead of building a new ranking system, we on the one hand identify truly fraudulent colleges (which really do exist) and on the other, invest heavily in state schools, giving them enough security so they can do without their army of expensive administrators.
You’ve probably heard rumors about this here and there, but the Wall Street Journal convincingly reported yesterday that websites charge certain people more for the exact thing.
Specifically, poor people were more likely to pay more for, say, a stapler from Staples.com than richer people. Home Depot and Lowes does the same for their online customers, and Discover and Capitol One make different credit card offers to people depending on where they live (“hey, do you live in a PayDay lender neighborhood? We got the card for you!”).
They got pretty quantitative for Staples.com, and did tests to determine the cost. From the article:
It is possible that Staples’ online-pricing formula uses other factors that the Journal didn’t identify. The Journal tested to see whether price was tied to different characteristics including population, local income, proximity to a Staples store, race and other demographic factors. Statistically speaking, by far the strongest correlation involved the distance to a rival’s store from the center of a ZIP Code. That single factor appeared to explain upward of 90% of the pricing pattern.
If anyone’s ever seen a census map, race is highly segregated by ZIP code, and my guess is we’d see pretty high correlations along racial lines as well, although they didn’t mention it in the article except to say that explicit race-related pricing is illegal. The article does mentions that things get more expensive in rural areas, which are also poorer, so there’s that acknowledged correlation.
But wait, how much of a price difference are we talking about? From the article:
Prices varied for about a third of the more than 1,000 randomly selected Staples.com products tested. The discounted and higher prices differed by about 8% on average.
In other words, a really non-trivial amount.
The messed up thing about this, or at least one of them, is that we could actually have way more control over our online personas than we think. It’s invisible to us, typically, so we don’t think about our cookies and our displayed IP addresses. But we could totally manipulate these signatures to our advantage if we set our minds to it.
Hackers, get thyselves to work making this technology easily available.
For that matter, given the 8% difference, there’s money on the line so some straight-up capitalist somewhere should be meeting that need. I for one would be willing to give someone a sliver of the amount saved every time they manipulated my online persona to save me money. You save me $1.00, I’ll give you a dime.
Here’s my favorite part of this plan: it would be easy for Staples to keep track of how much people are manipulating their ZIP codes. So if Staples.com infers a certain ZIP code for me to display a certain price, but then in check-out I ask them to send the package to a different ZIP code, Staples will know after-the-fact that I fooled them. But whatever, last time I looked it didn’t cost more or less to send mail to California or wherever than to Manhattan [Update: they do charge differently for packages, though. That's the only differential in cost I think is reasonable to pay].
I’d love to see them make a case for how this isn’t fair to them.
My friend Frank Pasquale sent me this article over twitter, about New York State attorney general Eric T. Schneiderman’s investigation into possibly unfair practices by big banks using opaque and sometimes erroneous databases to disqualify people from opening accounts.
Not much hard information is given in the article but we know that negative reports stemming from the databases have effectively banished more than a million lower-income Americans from the financial system, and we know that the number of “underbanked” people in this country has grown by 10% since 2009. Underbanked people are people who are shut out of the normal banking system and have to rely on the underbelly system including check cashing stores and payday lenders.
I can already hear the argument of my libertarian friends: if I’m a bank, and I have reason to suspect you have messed up with your finances in the past, I don’t offer you services. Done and done. Oh, and if I’m a smart bank that figures out some of these so-called “past mistakes” are actually erroneously reported, then I make extra money by serving those customers that are actually good when they look bad. And the free market works.
Two responses to this. First, at this point big banks are really not private companies, being on the taxpayer dole. In response they should reasonably be expected to provide banking services to all of not most people as part of a service. Of course this is a temporary argument, since nobody actually likes the fact that the banks aren’t truly private companies.
The second, more interesting point – at least to me – is this. We care about and defend ourselves from our constitutional rights being taken away but we have much less energy to defend ourselves against good things not happening to us.
In other words, it’s not written into the constitution that we all deserve a good checking account, nor a good college education, nor good terms on a mortgage, and so on. Even so, in a large society such as ours, such things are basic ingredients for a comfortable existence. Yet these services are rare if not nonexistent for a huge and swelling part of our society, resulting in a degradation of opportunity for the poor.
The overall effect is heinous, and at some point does seem to rise to the level of a constitutional right to opportunity, but I’m no lawyer.
In other words, instead of only worrying about the truly bad things that might happen to our vulnerable citizens, I personally spend just as much time worrying about the good things that might not happen to our vulnerable citizens, because from my perspective lots of good things not happening add up to bad things happening: they all narrow future options.
I’ve blogged before about how I find it outrageous that the credit scoring models are proprietary, considering the impact they have on so many lives.
The argument given for keeping them secret is that otherwise people would game the models, but that really doesn’t make sense.
After all, the models that the big banks have to deal with through regulation aren’t secret, and they game those models all the time. It’s one of the main functions of the banks, in fact, to figure out how to game the models. So either we don’t mind gaming or we don’t hold up our banks to the same standards as our citizens.
Plus, let’s say the models were open and people started gaming the credit score models – what would that look like? A bunch of people paying their electricity bill on time?
Let’s face it: the real reason the models are secret is that the companies who set them up make more money that way, pretending to have some kind of secret sauce. What they really have, of course, is a pretty simple model and access to an amazing network of up-to-date personal financial data, as well as lots of clients.
Their fear is that, if their model gets out, anyone could start a credit scoring agency, but actually it wouldn’t be so easy – if I wanted to do it, I’d have to get all that personal data on everyone. In fact, if I could get all that personal data on everyone, including the historical data, I could easily build a credit scoring model.
So anyhoo, it’s all about money, that and the fact that we’re living under the assumption that it’s appropriate for credit scoring companies to wield all this power over people’s lives, including their love lives.
It’s like we have a secondary system of secret laws where we don’t actually get to see the rules, nor do we get to point out mistakes or reasonably refute them. And if you’re thinking “free credit report,” let’s be clear that that only tells you what data goes in to the model, it doesn’t tell you how it’s used.
As it turns out, though, it’s now more than like a secondary system of laws – it’s become embedded in our actual laws. Somehow the proprietary credit scoring company Equifax is now explicitly part of our healthcare laws. From this New York Times article (hat tip Matt Stoller):
Federal officials said they would rely on Equifax — a company widely used by mortgage lenders, social service agencies and others — to verify income and employment and could extend the initial 12-month contract, bringing its potential value to $329.4 million over five years.
Contract documents show that Equifax must provide income information “in real time,” usually within a second of receiving a query from the federal government. Equifax says much of its information comes from data that is provided by employers and updated each payroll period.
Under the contract, Equifax can use sources like credit card applications but must develop a plan to indicate the accuracy of data and to reduce the risk of fraud.
Thanks Equifax, I guess we’ll just trust you on all of this.
I wrote a post yesterday to discuss the fact that, as we’ve seen in Detroit and as we’ll soon see across the country, the math isn’t working out on pensions. One of my commenters responded, saying I was falling for a “very right wing attack on defined benefit pensions.”
I think it’s a mistake to think like that. If people on the left refuse to discuss reality, then who owns reality? And moreover, who will act and towards what end?
Here’s what I anticipate: just as “bankruptcy” in the realm of airlines has come to mean “a short period wherein we toss our promises to retired workers and then come back to life as a company”, I’m afraid that Detroit may signal the emergence of a new legal device for cities to do the same thing, especially the tossing out of promises to retired workers part. A kind of coordinated bankruptcy if you will.
It comes down to the following questions. For whom do laws work? Who can trust that, when they enter a legal obligation, it will be honored?
From Trayvon Martin to the people who have been illegally foreclosed on, we’ve seen the answer to that.
And then we might ask, for whom are laws written or exceptions made? And the answer to that might well be for banks, in times of crisis of their own doing, and so they can get their bonuses.
I’m not a huge fan of the original bailouts, because it ignored the social and legal contracts in the opposite way, that failures should fail and people who are criminals should go to jail. It didn’t seem fair then, and it still doesn’t now, as JP Morgan posts record $6.4 billion profits in the same quarter that it’s trying to settle a $500 million market manipulation charge.
It’s all very well to rest our arguments on the sanctity of the contract, but if you look around the edges you’ll see whose contracts get ripped up because of fraudulent accounting, and whose bonuses get bigger.
And it brings up the following question: if we bailed out the banks, why not the people of Detroit?
I’m finishing up an essay called “On Being a Data Skeptic” in which I catalog different standard mistakes people make with data – sometimes unintentionally, sometimes intentionally.
It occurred to me, as I wrote it, and as I read the various press conferences with departing mayor Bloomberg and Police Commissioner Raymond Kelly when they addressed the Stop and Frisk policy, that they are guilty of making one of these standard mistakes. Namely, they use a sleight of hand with respect to the evaluation metric of the policy.
Recall that an evaluation metric for a model is the way you decide whether the model works. So if you’re predicting whether someone would like a movie, you should go back and check whether your recommendations were good, and revise your model if not. It’s a crucial part of the model, and a poor choice for it can have dire consequences – you could end up optimizing to the wrong thing.
[Aside: as I've complained about before, the Value Added Model for teachers doesn't have an evaluation method of record, which is a very bad sign indeed about the model. And that's a Bloomberg brainchild as well.]
So what am I talking about?
Here’s the model: stopping and frisking suspicious-looking people in high-crime areas will improve the safety and well-being of the city as a whole.
Here’s Bloomberg/Kelly’s evaluation method: the death rate by murder has gone down in New York during the policy. However, that rate is highly variable and depends just as much on whether there’s a crack epidemic going on as anything else. Or maybe it’s improved medical care. Truth is people don’t really know. In any case ascribing credit for the plunging death rate to Stop and Frisk is a tenuous causal argument. Plus since Stop and Frisk events have decreased drastically recently, we haven’t seen the murder rate shoot up.
Here’s another possible evaluation method: trust in the police. And considering that 400,000 innocent black and Latino New Yorkers were stopped last year under this policy (here are more stats), versus less than 50,000 whites, and most of them were young men, it stands to reason that the average young minority male feels less trust towards police than the average young white male. In fact, this is an amazing statistic put together by the NYCLU from 2011:
The number of stops of young black men exceeded the entire city population of young black men (168,126 as compared to 158,406).
If I’m a black guy I have an expectation of getting stopped and frisked at least once per year. How does that make me trust cops?
Let’s choose an evaluation method closer to what we can actually control, and let’s optimize to it.
Update: a guest columnist fills in for David Brooks, hopefully not for the last time, and gives us his take on Kelly, Obama, and racial profiling.
Usually I like to think through abstract ideas – thought experiments, if you will – and not get too personal. I take exceptions for certain macroeconomists who are already public figures but most of the time that’s it.
Here’s a new category of people I’ll call out by name: CEO’s who defend creepy models using the phrase “People will trade their private information for economic value.”
That’s a quote of Douglas Merrill, CEO of Zest Finance, taken from this video taken at a recent data conference in Berkeley (hat tip Rachel Schutt). It was a panel discussion, the putative topic of which was something like “Attacking the structure of everything”, whatever that’s supposed to mean (I’m guessing it has something to do with being proud of “disrupting shit”).
Do you know the feeling you get when you’re with someone who’s smart, articulate, who probably buys organic eggs from a nice farmer’s market, but who doesn’t expose an ounce of sympathy for people who aren’t successful entrepreneurs? When you’re with someone who has benefitted so entirely and so consistently from the system that they have an almost religious belief that the system is perfect and they’ve succeeded through merit alone?
It’s something in between the feeling that, maybe you’re just naive because you’ve led such a blessed life, or maybe you’re actually incapable of human empathy, I don’t know which because it’s never been tested.
That’s the creepy feeling I get when I hear Douglas Merrill speak, but it actually started earlier, when I got the following email almost exactly one year ago via LinkedIn:
Your profile looked interesting to me.
I’m seeking stellar, creative thinkers like you, for our team in Hollywood, CA. If you would consider relocating for the right opportunity, please read on.
You will use your math wizardry to develop radically new methods for data access, manipulation, and modeling. The outcome of your work will result in game-changing software and tools that will disrupt the credit industry and better serve millions of Americans.
You would be working alongside people like Douglas Merrill – the former CIO of Google – along with a handful of other ex-Googlers and Capital One folks. More info can be found on our LinkedIn company profile or at www.ZestFinance.com.
At ZestFinance we’re bringing social responsibility to the consumer loan industry.
Do you have a few moments to talk about this? If you are not interested, but know someone else who might be a fit, please send them my way!
I hope to hear from you soon. Thank you for your time.
Wow, let’s “better serve millions of Americans” through manipulation of their private data, and then let’s call it being socially responsible! And let’s work with Capital One which is known to be practically a charity.
Message to ZestFinance: “getting rich with predatory lending” doesn’t mean “being socially responsible” unless you have a really weird definition of that term.
Going back to the video, I have a few more tasty quotes from Merrill:
- First when he’s describing how he uses personal individual information scraped from the web: “All data is credit data.”
- Second, when he’s comparing ZestFinance to FICO credit scoring: “Context is developed by knowing thousands of things about you. I know you as a person, not just you via five or six variables.”
I’d like to remind people that, in spite of the creepiness here, and the fact that his business plan is a death spiral of modeling, everything this guy is talking about is totally legal. And as I said in this post, I’d like to see some pushback to guys like Merrill as well as to the NSA.
I’ve been keeping tabs on hard it is to do my bills. I did my bills last night, and man, I’m telling you, I used all of my organizational abilities, all of my customer service experience, and quite a bit of my alpha femaleness just to get it done. Not to mention I needed more than 2 hours of time which I squeezed out by starting the bills while waiting for take-out.
By the way, I am not one of those sticklers for doing everything myself – I have an accountant, and I don’t read those forms, I just sign them and pray. But even so, removing tax issues from the conversation, the kind of expertise required to do my monthly bills is ridiculous and getting worse.
Take medical bills. I have three kids, so there’s always a few appointments pending, but it’s absolutely amazing to me how often I’m getting charged for appointments unfairly. I recently got charged for a physical for my 10-year-old son, even though I know that physicals are free thanks to ObamaCare.
So I call up my insurance company and complain, spend 15 minutes on the phone waiting, then it turns out he isn’t allowed to have more than one physical in a 12-month period which is why it was charged to me. But wait, he had one last April and one this April, what gives? Turns out last April it was on the 14th and this April it was on the 8th. So less than one year.
But surely, I object, you can’t ask for people to always be exactly 12 months apart or more! It turns out that, yes, they have a 30-day grace period for this exact reason, but for some reason it’s not automatic – it requires a person to call and complain to the insurance company to get their son’s physical covered.
Do you see what I mean? This is not actually a coincidence – insurance companies make big money from having non-automatic grace periods, because many people don’t have the time, the patience, and the pushiness to make them do it right, and that’s free money for insurance companies.
There are the (abstract) “rules” and then there’s what actually happens, and it’s a constant battle between what you know you’re paying for which you shouldn’t be and how much your time is worth. For example, if it’s less than $50 I just pay it even if it’s not reasonable. I’m sure other people have different limits.
I see this as a systemic problem. So this isn’t a diatribe against just insurance companies, because I have to jump through about 15 hoops a month like this just to get my paperwork sorted out, and they are mostly not medical issues. This is really a diatribe against complexity, and the regressive tax that complexity projects onto our society.
Rich people have people to work out their paperwork for them. People like me, we don’t have people to do this, but we have the time, skills, and patience to do it ourselves (and the money to buy takeout while we do it). There are plenty of people with no time, or who aren’t organized to have all the information they need at their fingertips when they make these calls, or are too intimidated by customer service phone lines to work it out.
And, as in the example above, there’s usually a perverse incentive for complexity to exist – people give up and pay extra because it’s not worth doing the paperwork. That means it’s always getting worse.
Bottomline: you shouldn’t need to have a college degree and customer service experience to do your bills. I’d love to see an estimate of how much more in unnecessary fees and accounting errors are paid by the poor in this country.
If this article from yesterday’s New York Times doesn’t make you want to join Occupy, then nothing will.
It’s about how, if you work at a truly crappy job like Walmart or McDonalds, they’ll pay you with a pre-paid card that charges you for absolutely everything, including checking your balance or taking your money, and will even charge you for not using the card. Because we aren’t nickeling and diming these people enough.
The companies doing this stuff say they’re “making things convenient for the workers,” but of course they’re really paying off the employers, sometimes explicitly:
In the case of the New York City Housing Authority, it stands to receive a dollar for every employee it signs up to Citibank’s payroll cards, according to a contract reviewed by The New York Times.
Thanks for the convenience, payroll card banks!
One thing that makes me extra crazy about this article is how McDonalds uses its franchise system to keep its hands clean:
For Natalie Gunshannon, 27, another McDonald’s worker, the owners of the franchise that she worked for in Dallas, Pa., she says, refused to deposit her pay directly into her checking account at a local credit union, which lets its customers use its A.T.M.’s free. Instead, Ms. Gunshannon said, she was forced to use a payroll card issued by JPMorgan Chase. She has since quit her job at the drive-through window and is suing the franchise owners.
“I know I deserve to get fairly paid for my work,” she said.
The franchise owners, Albert and Carol Mueller, said in a statement that they comply with all employment, pay and work laws, and try to provide a positive experience for employees. McDonald’s itself, noting that it is not named in the suit, says it lets franchisees determine employment and pay policies.
I actually heard about this newish scheme against the poor when I attended the CFPB Town Hall more than a year ago and wrote about it here. Actually that’s where I heard people complain about Walmart doing this but also court-appointed child support as well.
Just to be clear, these fees are illegal in the context of credit cards, but financial regulation has not touched payroll cards yet. Yet another way that the poor are financialized, which is to say they’re physically and psychologically separated from their money. Get on this, CFPB!
Update: an excellent article about this issue was written by Sarah Jaffe a couple of weeks ago (hat tip Suresh Naidu). It ends with an awesome quote by Stephen Lerner: “No scam is too small or too big for the wizards of finance.”
There’s been a tremendous amount of hubbub recently surrounding the data collection data mining that the NSA has been discovered to be doing.
For me what’s weird is that so many people are up in arms about what our government knows about us but not, seemingly, about what private companies know about us.
I’m not suggesting that we should be sanguine about the NSA program – it’s outrageous, and it’s outrageous that we didn’t know about it. I’m glad it’s come out into the open and I’m glad it’s spawned an immediate and public debate about the citizen’s rights to privacy. I just wish that debate extended to privacy in general, and not just the right to be anonymous with respect to the government.
What gets to me are the countless articles that make a big deal of Facebook or Google sharing private information directly with the government, while never mentioning that Acxiom buys and sells from Facebook on a daily basis much more specific and potentially damning information about people (most people in this country) than the metadata that the government purports to have.
Of course, we really don’t have any idea what the government has or doesn’t have. Let’s assume they are also an Acxiom customer, for that matter, which stands to reason.
It begs the question, at least to me, of why we distrust the government with our private data but we trust private companies with our private data. I have a few theories, tell me if you agree.
Theory 1: people think about worst case scenarios, not probabilities
When the government is spying on you, worst case you get thrown into jail or Guantanamo Bay for no good reason, left to rot. That’s horrific but not, for the average person, very likely (although, of course, a world where that does become likely is exactly what we want to prevent by having some concept of privacy).
When private companies are spying on you, they don’t have the power to put you in jail. They do increasingly have the power, however, to deny you a job, a student loan, a mortgage, and life insurance. And, depending on who you are, those things are actually pretty likely.
Theory 2: people think private companies are only after our money
Private companies who hold our private data are only profit-seeking, so the worst thing they can do is try to get us to buy something, right? I don’t think so, as I pointed out above. But maybe people think so in general, and that’s why we’re not outraged about how our personal data and profiles are used all the time on the web.
Theory 3: people are more afraid of our rights being taken away than good things not happening to them
As my friend Suresh pointed out to me when I discussed this with him, people hold on to what they have (constitutional rights) and they fear those things being taken away (by the government). They spend less time worrying about what they don’t have (a house) and how they might be prevented from getting it (by having a bad e-score).
So even though private snooping can (and increasingly does) close all sorts of options for peoples’ lives, if they don’t think about them, they don’t notice. It’s hard to know why you get denied a job, especially if you’ve been getting worse and worse credit card terms and conditions over the years. In general it’s hard to notice when things don’t happen.
Theory 4: people think the government protects them from bad things, but who’s going to protect them from the government?
This I totally get, but the fact is the U.S. government isn’t protecting us from data collectors, and has even recently gotten together with Facebook and Google to prevent the European Union from enacting pretty good privacy laws. Let’s not hold our breath for them to understand what’s at stake here.
(Updated) Theory 5: people think they can opt out of private snooping but can’t opt out of being a citizen
Two things. First, can you really opt out? You can clear your cookies and not be on gmail and not go on Facebook and Acxiom will still track you. Believe it.
Second, I’m actually not worried about you (you reader of mathbabe) or myself for that matter. I’m not getting denied a mortgage any time soon. It’s the people who don’t know to protect themselves, don’t know to opt out, that I’m worried about and who will get down-scored and funneled into bad options that I worry about.
5 6: people just haven’t thought about it enough to get pissed
This is the one I’m hoping for.
I’d love to see this conversation expand to include privacy in general. What’s so bad about asking for data about ourselves to be automatically forgotten, say by Verizon, if we’ve paid our bills and 6 months have gone by? What’s so bad about asking for any personal information about us to have a similar time limit? I for one do not wish mistakes my children make when they’re impetuous teenagers to haunt them when they’re trying to start a family.
As a fat person, I’ve dealt with a lot of public shaming in my life. I’ve gotten so used to it, I’m more an observer than a victim most of the time. That’s kind of cool because it allows me to think about it abstractly.
I’ve come up with three dimensions for thinking about this issue.
- When is shame useful?
- When is it appropriate?
- When does it help solve a problem?
Note it can be useful even if it doesn’t help solve a problem – one of the characteristics of shame is that the person doing the shaming has broken off all sense of responsibility for whatever the issue is, and sometimes that’s really the only goal. If the shaming campaign is effective, the shamed person or group is exhibited as solely responsible, and the shamer does not display any empathy. It hasn’t solved a problem but at least it’s clear who’s holding the bag.
The lack of empathy which characterizes shaming behavior makes it very easy to spot. And extremely nasty.
Let’s look at some examples of shaming through this lens:
Useful but not appropriate, doesn’t solve a problem
Example 1) it’s both fat kids and their parents who are to blame for childhood obesity:
Example 2) It’s poor mothers that are to blame for poverty:
These campaigns are not going to solve any problems, but they do seem politically useful – a way of doubling down on the people suffering from problems in our society. Not only will they suffer from them, but they will also be blamed for them.
Inappropriate, not useful, possibly solving a short-term discipline problem
Hey parents: shaming your kids might solve your short-term problem of having independent-minded kids, but it doesn’t lead to long-term confidence and fulfillment.
Appropriate, useful, solves a problem
Here’s when shaming is possibly appropriate and useful and solves a problem: when there have been crimes committed that affect other people needlessly or carelessly, and where we don’t want to let it happen again.
For example, the owner of the Bangladeshi factory which collapsed, killing more than 1,000 people got arrested and publicly shamed. This is appropriate, since he knowingly put people at risk in a shoddy building and added three extra floors to improve his profits.
Note shaming that guy isn’t going to bring back those dead people, but it might prevent other people from doing what he did. In that sense it solves the problem of seemingly nonexistent safety codes in Bangladesh, and to some extent the question of how much we Americans care about cheap clothes versus conditions in factories which make our clothes. Not completely, of course. Update: Major Retailers Join Plan for Greater Safety in Bangladesh
Another example of appropriate shame would be some of the villains of the financial crisis. We in Alt Banking did our best in this regard when we made the 52 Shades of Greed card deck. Here’s Robert Rubin:
I’m no expert on this stuff, but I do have a way of looking at it.
One thing about shame is that the people who actually deserve shame are not particularly susceptible to feeling it (I saw that first hand when I saw Ina Drew in person last month, which I wrote about here). Some people are shameless.
That means that shame, whatever its purpose, is not really about making an individual change their behavior. Shame is really more about setting the rules of society straight: notifying people in general about what’s acceptable and what’s not.
From my perspective, we’ve shown ourselves much more willing to shame poor people, fat people, and our own children than to shame the actual villains who walk among us who deserve such treatment.
Shame on us.
I recently read an article off the newsstand called The Rise of Big Data.
It was written by Kenneth Neil Cukier and Viktor Mayer-Schoenberger and it was published in the May/June 2013 edition of Foreign Affairs, which is published by the Council on Foreign Relations (CFR). I mention this because CFR is an influential think tank, filled with powerful insiders, including people like Robert Rubin himself, and for that reason I want to take this view on big data very seriously: it might reflect the policy view before long.
And if I think about it, compared to the uber naive view I came across last week when I went to the congressional hearing about big data and analytics, that would be good news. I’ll write more about it soon, but let’s just say it wasn’t everything I was hoping for.
At least Cukier and Mayer-Schoenberger discuss their reservations regarding “big data” in this article. To contrast this with last week, it seemed like the only background material for the hearing, at least for the congressmen, was the McKinsey report talking about how sexy data science is and how we’ll need to train an army of them to stay competitive.
So I’m glad it’s not all rainbows and sunshine when it comes to big data in this article. Unfortunately, whether because they’re tied to successful business interests, or because they just haven’t thought too deeply about the dark side, their concerns seem almost token, and their examples bizarre.
The article is unfortunately behind the pay wall, but I’ll do my best to explain what they’ve said.
First they discuss the concept of datafication, and their example is how we quantify friendships with “likes”: it’s the way everything we do, online or otherwise, ends up recorded for later examination in someone’s data storage units. Or maybe multiple storage units, and maybe for sale.
They formally define later in the article as a process:
… taking all aspect of life and turning them into data. Google’s augmented-reality glasses datafy the gaze. Twitter datafies stray thoughts. LinkedIn datafies professional networks.
Datafication is an interesting concept, although as far as I can tell they did not coin the word, and it has led me to consider its importance with respect to intentionality of the individual.
Here’s what I mean. We are being datafied, or rather our actions are, and when we “like” someone or something online, we are intending to be datafied, or at least we should expect to be. But when we merely browse the web, we are unintentionally, or at least passively, being datafied through cookies that we might or might not be aware of. And when we walk around in a store, or even on the street, we are being datafied in an completely unintentional way, via sensors or Google glasses.
This spectrum of intentionality ranges from us gleefully taking part in a social media experiment we are proud of to all-out surveillance and stalking. But it’s all datafication. Our intentions may run the gambit but the results don’t.
They follow up their definition in the article, once they get to it, with a line that speaks volumes about their perspective:
Once we datafy things, we can transform their purpose and turn the information into new forms of value
But who is “we” when they write it? What kinds of value do they refer to? As you will see from the examples below, mostly that translates into increased efficiency through automation.
So if at first you assumed they mean we, the American people, you might be forgiven for re-thinking the “we” in that sentence to be the owners of the companies which become more efficient once big data has been introduced, especially if you’ve recently read this article from Jacobin by Gavin Mueller, entitled “The Rise of the Machines” and subtitled “Automation isn’t freeing us from work — it’s keeping us under capitalist control.” From the article (which you should read in its entirety):
In the short term, the new machines benefit capitalists, who can lay off their expensive, unnecessary workers to fend for themselves in the labor market. But, in the longer view, automation also raises the specter of a world without work, or one with a lot less of it, where there isn’t much for human workers to do. If we didn’t have capitalists sucking up surplus value as profit, we could use that surplus on social welfare to meet people’s needs.
The big data revolution and the assumption that N=ALL
According to Cukier and Mayer-Schoenberger, the Big Data revolution consists of three things:
- Collecting and using a lot of data rather than small samples.
- Accepting messiness in your data.
- Giving up on knowing the causes.
They describe these steps in rather grand fashion, by claiming that big data doesn’t need to understand cause because the data is so enormous. It doesn’t need to worry about sampling error because it is literally keeping track of the truth. The way the article frames this is by claiming that the new approach of big data is letting “N = ALL”.
But here’s the thing, it’s never all. And we are almost always missing the very things we should care about most.
So for example, as this InfoWorld post explains, internet surveillance will never really work, because the very clever and tech-savvy criminals that we most want to catch are the very ones we will never be able to catch, since they’re always a step ahead.
Even the example from their own article, election night polls, is itself a great non-example: even if we poll absolutely everyone who leaves the polling stations, we still don’t count people who decided not to vote in the first place. And those might be the very people we’d need to talk to to understand our country’s problems.
Indeed, I’d argue that the assumption we make that N=ALL is one of the biggest problems we face in the age of Big Data. It is, above all, a way of excluding the voices of people who don’t have the time or don’t have the energy or don’t have the access to cast their vote in all sorts of informal, possibly unannounced, elections.
Those people, busy working two jobs and spending time waiting for buses, become invisible when we tally up the votes without them. To you this might just mean that the recommendations you receive on Netflix don’t seem very good because most of the people who bother to rate things are Netflix are young and have different tastes than you, which skews the recommendation engine towards them. But there are plenty of much more insidious consequences stemming from this basic idea.
Another way in which the assumption that N=ALL can matter is that it often gets translated into the idea that data is objective. Indeed the article warns us against not assuming that:
… we need to be particularly on guard to prevent our cognitive biases from deluding us; sometimes, we just need to let the data speak.
And later in the article,
In a world where data shape decisions more and more, what purpose will remain for people, or for intuition, or for going against the facts?
This is a bitch of a problem for people like me who work with models, know exactly how they work, and know exactly how wrong it is to believe that “data speaks”.
I wrote about this misunderstanding here, in the context of Bill Gates, but I was recently reminded of it in a terrifying way by this New York Times article on big data and recruiter hiring practices. From the article:
“Let’s put everything in and let the data speak for itself,” Dr. Ming said of the algorithms she is now building for Gild.
If you read the whole article, you’ll learn that this algorithm tries to find “diamond in the rough” types to hire. A worthy effort, but one that you have to think through.
Why? If you, say, decided to compare women and men with the exact same qualifications that have been hired in the past, but then, looking into what happened next you learn that those women have tended to leave more often, get promoted less often, and give more negative feedback on their environments, compared to the men, your model might be tempted to hire the man over the woman next time the two showed up, rather than looking into the possibility that the company doesn’t treat female employees well.
In other words, ignoring causation can be a flaw, rather than a feature. Models that ignore causation can add to historical problems instead of addressing them. And data doesn’t speak for itself, data is just a quantitative, pale echo of the events of our society.
Some cherry-picked examples
One of the most puzzling things about the Cukier and Mayer-Schoenberger article is how they chose their “big data” examples.
One of them, the ability for big data to spot infection in premature babies, I recognized from the congressional hearing last week. Who doesn’t want to save premature babies? Heartwarming! Big data is da bomb!
But if you’re going to talk about medicalized big data, let’s go there for reals. Specifically, take a look at this New York Times article from last week where a woman traces the big data footprints, such as they are, back in time after receiving a pamphlet on living with Multiple Sclerosis. From the article:
Now she wondered whether one of those companies had erroneously profiled her as an M.S. patient and shared that profile with drug-company marketers. She worried about the potential ramifications: Could she, for instance, someday be denied life insurance on the basis of that profile? She wanted to track down the source of the data, correct her profile and, if possible, prevent further dissemination of the information. But she didn’t know which company had collected and shared the data in the first place, so she didn’t know how to have her entry removed from the original marketing list.
Two things about this. First, it happens all the time, to everyone, but especially to people who don’t know better than to search online for diseases they actually have. Second, the article seems particularly spooked by the idea that a woman who does not have a disease might be targeted as being sick and have crazy consequences down the road. But what about a woman is actually is sick? Does that person somehow deserve to have their life insurance denied?
The real worries about the intersection of big data and medical records, at least the ones I have, are completely missing from the article. Although they did mention that “improving and lowering the cost of health care for the world’s poor” inevitable will lead to “necessary to automate some tasks that currently require human judgment.” Increased efficiency once again.
To be fair, they also talked about how Google tried to predict the flu in February 2009 but got it wrong. I’m not sure what they were trying to say except that it’s cool what we can try to do with big data.
Also, they discussed a Tokyo research team that collects data on 360 pressure points with sensors in a car seat, “each on a scale of 0 to 256.” I think that last part about the scale was added just so they’d have more numbers in the sentence – so mathematical!
And what do we get in exchange for all these sensor readings? The ability to distinguish drivers, so I guess you’ll never have to share your car, and the ability to sense if a driver slumps, to either “send an alert or atomatically apply brakes.” I’d call that a questionable return for my investment of total body surveillance.
Big data, business, and the government
Make no mistake: this article is about how to use big data for your business. It goes ahead and suggests that whoever has the biggest big data has the biggest edge in business.
Of course, if you’re interested in treating your government office like a business, that’s gonna give you an edge too. The example of Bloomberg’s big data initiative led to efficiency gain (read: we can do more with less, i.e. we can start firing government workers, or at least never hire more).
As for regulation, it is pseudo-dealt with via the discussion of market dominance. We are meant to understand that the only role government can or should have with respect to data is how to make sure the market is working efficiently. The darkest projected future is that of market domination by Google or Facebook:
But how should governments apply antitrust rules to big data, a market that is hard to define and is constantly changing form?
In particular, no discussion of how we might want to protect privacy.
Big data, big brother
I want to be fair to Cukier and Mayer-Schoenberger, because they do at least bring up the idea of big data as big brother. Their topic is serious. But their examples, once again, are incredibly weak.
Should we find likely-to-drop-out boys or likely-to-get-pregnant girls using big data? Should we intervene? Note the intention of this model would be the welfare of poor children. But how many models currently in production are targeting that demographic with that goal? Is this in any way at all a reasonable example?
Here’s another weird one: they talked about the bad metric used by US Secretary of Defense Robert McNamara in the Viet Nam War, namely the number of casualties. By defining this with the current language of statistics, though, it gives us the impression that we could just be super careful about our metrics in the future and: problem solved. As we experts in data know, however, it’s a political decision, not a statistical one, to choose a metric of success. And it’s the guy in charge who makes that decision, not some quant.
If you end up reading the Cukier and Mayer-Schoenberger article, please also read Julie Cohen’s draft of a soon-to-be published Harvard Law Review article called “What Privacy is For” where she takes on big data in a much more convincing and skeptical light than Cukier and Mayer-Schoenberger were capable of summoning up for their big data business audience.
I’m actually planning a post soon on Cohen’s article, which contains many nuggets of thoughtfulness, but for now I’ll simply juxtapose two ideas surrounding big data and innovation, giving Cohen the last word. First from the Cukier and Mayer-Schoenberger article:
Big data enables us to experiment faster and explore more leads. These advantages should produce more innovation
Second from Cohen, where she uses the term “modulation” to describe, more or less, the effect of datafication on society:
When the predicate conditions for innovation are described in this way, the problem with characterizing privacy as anti-innovation becomes clear: it is modulation, not privacy, that poses the greater threat to innovative practice. Regimes of pervasively distributed surveillance and modulation seek to mold individual preferences and behavior in ways that reduce the serendipity and the freedom to tinker on which innovation thrives. The suggestion that innovative activity will persist unchilled under conditions of pervasively distributed surveillance is simply silly; it derives rhetorical force from the cultural construct of the liberal subject, who can separate the act of creation from the fact of surveillance. As we have seen, though, that is an unsustainable fiction. The real, socially-constructed subject responds to surveillance quite differently—which is, of course, exactly why government and commercial entities engage in it. Clearing the way for innovation requires clearing the way for innovative practice by real people, by preserving spaces within which critical self-determination and self-differentiation can occur and by opening physical spaces within which the everyday practice of tinkering can thrive.
This is a guest post by Josh Snodgrass.
As the Mathbabe noted recently, a lot of companies are collecting a lot of information about you. Thanks to two Firefox add-ons – Collusion (hat tip to Cathy) and NoScript — you can watch the process and even interfere with it to a degree.
Collusion is a beautiful app that creates a network graph of the various companies that have information about your web activity. Here is an example.
On this graph, I can see that nytimes.com has sent info on me to 2mdn.net, linkstorm.net, serving-sys.com, nyt.com and doubleclick.net. Who are these guys? All I know is that they know more about me than I know about them.
Doubleclick is particularly well-informed. They have gotten information on me from nytimes.com, yahoo.com and ft.com. You may not be able to see it on the picture but there are faint links between the nodes. Some (few) of the nodes are sites I have visited. Most of the nodes, especially some of the central ones are data collectors such as doubleclick and googleanalytics. They have gotten info from sites I’ve visited.
This graph is pretty sparse because I cleared all of my cookies recently. If I let it go for a week and the graph will be so crowded it won’t all fit on a screen.
Pretty much everyone is sharing info about me (and presumably you, too). And, I do mean everyone. Mathbabe is a dot near the top. Collusion tells me that mathbabe.org has shared info with google.com, wordpress.com, wp.com, 52shadesofgreed.com, youtube.com and quantserve.com. Google has passed the info on to googleusercontent.com and gstatic.com
I can understand why. WordPress and presumably wp.com are hosting her blog. Google is providing search capabilities. 52shadesofgreed has an ad posted (You can still buy the decks but even better, come to Alt-Banking meetings and get one free). Youtube is providing some content. It is all innocent enough in a way but it means my surfing is being tracked even on non-commercial sites.
These are the conveniences of modern life. Try blocking all cookies and you will find it pretty inconvenient to use the internet. It would be nice to be selective about cookies but that seems very hard. All of this is happening even though I’ve told my browser not to allow third-party cookies. If you look at cookie policies, it seems you have two alternatives:
- Block all cookies and the site won’t work very well
- Allow cookies and we will send your info to whomever we choose (within the law, of course).
So, it would be nice if there were a law that constrained what they do. My impression is that we Americans have virtually no protection. Europe is better from what I understand.
I’m trying to access a site and there are scripts waiting to run from:
- Po.st Scorecard.com
Clearly a lot of those are about tracking me or showing me ads. As with cookies, if you block all the scripts, the site probably won’t function properly. But the great thing about NoScript is that is makes it easy to allow scripts one by one. So, you can allow the ones that look more legitimate until the site works well enough. Also, you can allow them temporarily.
NoScript and Collusion are great. But mostly they are making me more aware of all the tracking that is going on. And they are also making it clear how hard it is to keep your privacy.
This isn’t just on the internet. Years ago, an economist had an idea about having people put boxes on their cars that would track where they went and charge them for driving, particularly in high congestion times and places. The motivation was to reduce travel that causes a lot of pollution while no one is going anywhere. But people ridiculed the idea. Who would let themselves be tracked everywhere they went.
Well, 40 years later, nearly everyone who has a car has an EZ-pass. And, even if you don’t, they will take a picture of your license plate and keep it on file. All in the name of improving traffic flow.
And, if you use credit cards, there are some big companies that have records of your spending.
What to do about this?
I don’t know.
I like conveniences. Keeping your privacy is hard. DuckDuckGo is a search engine that doesn’t track you (another hat tip to Cathy). But their search results are not as good as Google’s.
Google has all these nice tools that are free. Even if you don’t use them, the web sites you visit surely do. And if they do, google is getting information from them, about you.
This experience has made me even more of a fan of Firefox and add-ons available in it. But what else should I use. And, none of these tools is going to be perfect.
What information gets tracked? A lot of privacy policies say they don’t give out identifying information. But how can we tell?
Just keeping on top of what is going on is hard. For example: what are LSOs? They seem to be a kind of “supercookies”. And Better Privacy seems to be an add-on to help with them.
“Our emails may contain a single, campaign-unique “web beacon pixel” to tell us whether our emails are opened and verify any clicks through to links or advertisements within the email”
Who knew that a pixel could do so much?
The truth is, I want to see these sites. So I am enabling scripts (some of them, as few as I can). The question is how to make the tradeoff. Figuring that out is time consuming. I’ve got better things to do with my life.
I’m going to go read a book.
The Alternative Banking group of #OWS is showing up bright and early tomorrow morning to protest at Citigroup’s annual shareholder meeting. Details are: we meet outside the Hilton Hotel, Sixth Avenue between 53rd and 54th Streets, tomorrow, April 24th, from 8-10 am. We’ve already made some signs (see below).
Here are ten reasons for you to join us.
1) The Glass-Steagall Act, which had protected the banking system since 1933, was repealed in order to allow Citibank and Traveler’s Insurance to merge.
In fact they merged before the act was even revoked, giving us a great way to date the moment when politicians started taking orders from bankers – at the time, President Bill Clinton publicly declared that “the Glass–Steagall law is no longer appropriate.”
2) The crimes Citi has committed have not been met with reasonable punishments.
From this Bloomberg article:
In its complaint against Citigroup, the SEC said the bank misled investors in a $1 billion fund that included assets the bank had projected would lose money. At the same time it was selling the fund to investors, Citigroup took a short position in many of the underlying assets, according to the agency.
The SEC only attempted to fine Citi $285 million, even though Citi’s customers lost on the order of $600 million from their fraud. Moreover, they were not required to admit wrongdoing. Judge Rakoff refused to sign off on the deal and it’s still pending. Citi is one of those banks that is simply too big to jail.
3) We’d like our pen back, Mr. Weill. Going back to repealing Glass-Steagall. Let’s take an excerpt from this article:
…at the signing ceremony of the Gramm-Leach-Bliley, aka the Glass Steagall repeal act, Clinton presented Weill with one of the pens he used to “fine-tune” Glass-Steagall out of existence, proclaiming, “Today what we are doing is modernizing the financial services industry, tearing down those antiquated laws and granting banks significant new authority.”
Weill has since decided that repealing Glass-Steagall was a mistake.
4) Do you remember the Plutonomy Memos? I wrote about them here. Here’s a tasty excerpt which helps us remember when the class war was started and by whom:
We project that the plutonomies (the U.S., UK, and Canada) will likely see even more income inequality, disproportionately feeding off a further rise in the profit share in their economies, capitalist-friendly governments, more technology-driven productivity, and globalization… Since we think the plutonomy is here, is going to get stronger… It is a good time to switch out of stocks that sell to the masses and back to the plutonomy basket.
5) Robert Rubin – enough said. To say just a wee bit more, let’s look at the Bloomberg Businessweek article, “Rethinking Robert Rubin”:
Rubinomics—his signature economic philosophy, in which the government balances the budget with a mix of tax increases and spending cuts, driving borrowing rates down—was the blueprint for an economy that scraped the sky. When it collapsed, due in part to bank-friendly policies that Rubin advocated, he made more than $100 million while others lost everything.
That $100 million was made at Citigroup, which was later bailed out because of bets Rubin helped them make. He has thus far shown no remorse.
6) The Revolving Door problems Citigroup has. Bill Moyers has a great article on the outrageous revolving door going straight from banks to the Treasury and the White House. What with Rubin and Lew, Citigroup seems pretty much a close second behind Goldman Sachs for this sport.
8) The bailout was actually for Citigroup. If you’ve read Sheila Bair’s book Bull by the Horns, you’ll see the bailout from her inside perspective. And it was this: that Citigroup was really the bank that needed it worst. That in fact, the whole bailout was a cover for funneling money to Citi.
9) The ongoing Fed dole. The bailout is still going on – and Citigroup is currently benefitting from the easy money that the Fed is offering, not to mention the $83 billion taxpayer subsidy. WTF?!
10) Lobbying for yet more favors. Citi spent $62 million from 2001 to 2010 on lobbying in Washington. What’s their return on that investment, do you think?
Join us tomorrow morning! Details here.
Last night I went to an event at Barnard where Ina Drew, ex-CIO head of JP Morgan Chase, who oversaw the London Whale fiasco, was warmly hosted and interviewed by Barnard president Debora Spar.
[Aside: I was going to link to Ina Drew's wikipedia entry in the above paragraph, but it was so sanitized that I couldn't get myself to do it. She must have paid off lots of wiki editors to keep herself this clean. WTF, wikipedia??]
A little background in case you don’t know who this Drew woman is. She was in charge of balance-sheet risk management and somehow managed to not notice losing $6.2 billion dollars in the group she was in charge of, which was meant to hedge risk, at least according to CEO Jamie Dimon. She made $15 million per year for her efforts and recently retired.
In her recent Congressional testimony (see Example 3 in this recent post), she threw the quants with their Ph.D.’s under the bus even though the Senate report of the incident noted multiple risk limits being exceeded and ignored, and then risk models themselves changed to look better, as well as the “whale” trader Bruno Iksil‘s desire to get out of his losing position being resisted by upper management (i.e. Ina Drew).
I’m not going to defend Iksil for that long, but let’s be clear: he fucked up, and then was kept in his ridiculous position by Ina Drew because she didn’t want to look bad. His angst is well-documented in the Senate report, which you should read.
Actually, the whole story is somewhat more complicated but still totally stupid: instead of backing out of certain credit positions the old-fashioned and somewhat expensive way, the CIO office decided to try to reduce its capital requirements via reducing (manipulated) VaR, but ended up increasing their capital requirements in other, non-VaR ways (specifically, the “comprehensive risk measure”, which isn’t as manipulable as VaR). Read more here.
Maybe Ina is going to claim innocence, that she had no idea what was going on. In that case, she had no control over her group and its huge losses. So either she’s heinously greedy or heinously incompetent. My money’s on “incompetent” after seeing and listening to her last night. My live Twitter feed from the event is available here.
We featured Ina Drew on our “52 Shades of Greed” card deck as the Queen of diamonds:
Back to the event.
Why did we cart out Ina Drew in front of an audience of young Barnard women last night? Were we advertising a career in finance to them? Is Drew a role model for these young people?
The best answers I can come up with are terrible:
- She’s a Barnard mom (her daughter was in the audience). Not a trivial consideration, especially considering the potential donor angle.
- President Spar is on the board of Goldman Sachs and there’s a certain loyalty among elites, which includes publicly celebrating colossal failures. Possible, but why now? Is there some kind of perverted female solidarity among women that should be in jail but insist on considering themselves role models? Please count me out of that flavor of feminism.
- President Spar and Ina Drew actually don’t think Drew did anything wrong. This last theory is the weirdest but is the best supported by the tone of the conversation last night. It gives me the creeps. In any case I can no longer imagine supporting Barnard’s mission with that woman as president. It’s sad considering my fond feelings for the place where I was an assistant professor for two years in the math department and which treated me well.
Please suggest other ideas I’ve failed to mention.
Warmup: Automatic Grading Models
Before I get to my main take-down of the morning, let me warm up with an appetizer of sorts: have you been hearing a lot about new models that automatically grade essays?
Does it strike you that’s there’s something wrong with that idea but you don’t know what it is?
Here’s my take. While it’s true that it’s possible to train a model to grade essays similarly to what a professor now does, that doesn’t mean we can introduce automatic grading – at least not if the students in question know that’s what we’re doing.
There’s a feedback loop, whereby if the students know their essays will be automatically graded, then they will change what they’re doing to optimize for good automatic grades rather than, say, a cogent argument.
For example, a student might download a grading app themselves (wouldn’t you?) and run their essay through the machine until it gets a great grade. Not enough long words? Put them in! No need to make sure the sentences make sense, because the machine doesn’t understand grammar!
This is, in fact, a great example where people need to take into account the (obvious when you think about them) feedback loops that their models will enter in actual use.
Job Hiring Models
Now on to the main course.
In this week’s Economist there is an essay about the new widely-used job hiring software and how awesome it is. It’s so efficient! It removes the biases of of those pesky recruiters! Here’s an excerpt from the article:
The problem with human-resource managers is that they are human. They have biases; they make mistakes. But with better tools, they can make better hiring decisions, say advocates of “big data”.
So far “the machine” has made observations such as:
- Good if candidate uses browser you need to download like Chrome.
- Not as bad as one might expect to have a criminal record.
- Neutral on job hopping.
- Great if you live nearby.
- Good if you are on Facebook.
- Bad if you’re on Facebook and every other social networking site as well.
Now, I’m all for learning to fight against our biases and hire people that might not otherwise be given a chance. But I’m not convinced that this will happen that often – the people using the software can always train the model to include their biases and then point to the machine and say “The machine told me to do it”. True.
What I really object to, however, is the accumulating amount of data that is being collected about everyone by models like this.
It’s one thing for an algorithm to take my CV in and note that I misspelled my alma mater, but it’s a different thing altogether to scour the web for my online profile trail (via Acxiom, for example), to look up my credit score, and maybe even to see my persistence score as measured by my past online education activities (soon available for your 7-year-old as well!).
As a modeler, I know how hungry the model can be. It will ask for all of this data and more. And it will mean that nothing you’ve ever done wrong, no fuck-up that you wish to forget, will ever be forgotten. You can no longer reinvent yourself.
Forget mobility, forget the American Dream, you and everyone else will be funneled into whatever job and whatever life the machine has deemed you worthy of. WTF.