Archive
How much would you pay to be my friend?
I am on my way to D.C. for a health analytics conference, where I hope to learn the state of the art for health data and modeling. So stay tuned for updates on that.
In the meantime, ponder this concept (hat tip Matt Stoller, who describes it as ‘neoliberal prostitution’). It’s a dating website called “What’s Your Price?” where suitors bid for dates.
What’s creepier, the sex-for-pay aspect of this, or the it’s-possibly-not-about-sex-it’s-about-dating aspect? I’m gonna go with the latter, personally, since it’s a new idea for me. What else can I monetize that I’ve been giving away too long for free?
Hey, kid, you want a bedtime story? It’s gonna cost you.
Let’s enjoy the backlash against hackathons
As much as I have loved my DataKind hackathons, where I get to meet a bunch of friendly nerds who are spend their weekend trying to solve problems using technology, I also have my reservations about the whole weekend hackathon culture, especially when:
- It’s a competition, so really you’re not solving problems as much as boasting, and/or
- you’re trying to solve a problem that nobody really cares about but which might make someone money, so you’re essentially working for free for a future VC asshole, and/or
- you kind of solve a problem that matters, but only for people like you (example below).
As Jake Porway mentions in this fine piece, having data and good intentions do not mean you can get serious results over a weekend. From his essay:
Without subject matter experts available to articulate problems in advance, you get results like those from the Reinvent Green Hackathon. Reinvent Green was a city initiative in NYC aimed at having technologists improve sustainability in New York. Winners of this hackathon included an app to help cyclists “bikepool” together and a farmer’s market inventory app. These apps are great on their own, but they don’t solve the city’s sustainability problems. They solve the participants’ problems because as a young affluent hacker, my problem isn’t improving the city’s recycling programs, it’s finding kale on Saturdays.
Don’t get me wrong, I’ve made some good friends and created some great collaborations via hackathons (and especially via Jake). But it only gets good when there’s major planning beforehand, a real goal, and serious follow-up. Actually a weekend hackathon is, at best, a platform from which to launch something more serious and sustained.
People who don’t get that are there for something other than that. What is it? Maybe this parody hackathon announcement can tell us.
It’s called National Day of Hacking Your Own Assumptions and Entitlement, and it has a bunch of hilarious and spot-on satirical commentary, including this definition of a hackathon:
Basically, a bunch of pallid millenials cram in a room and do computer junk. Harmless, but very exciting to the people who make money off the results.
This question from a putative participant of an “entrepreneur”-style hackathon:
“Why do we insist on applying a moral or altruistic gloss to our moneymaking ventures?”
And the internal thought process of a participant in a White House-sponsored hackathon:
I realized, especially in the wake of the White House murdering Aaron Swartz, persecuting/torturing Bradley Manning and threatening Jeremy Hammond with decades behind bars for pursuit of open information and government/corporate accountability that really, no-one who calls her or himself a “hacker” has any business partnering with an entity as authoritarian, secretive and tyrannical as the White House– unless of course you’re just a piece-of-shit money-grubbing disingenuous bootlicker who uses the mantle of “hackerdom” to add a thrilling and unjustified outlaw sheen to your dull life of careerist keyboard-poking for the status quo.
New Jersey at risk of implementing untested VAM-like teacher evaluation model
This is a guest post by Eugene Stern.
A big reason I love this blog is Cathy’s war on crappy models. She has posted multiple times already about the lousy performance of models that rate teachers based on year-to-year changes in student test scores (for example, read about it here). Much of the discussion focuses on the model used in New York City, but such systems have been, or are being, put in place all over the country. I want to let you know about the version now being considered for use across the river, in New Jersey. Once you’ve heard more, I hope you’ll help me try to stop it.
VAM Background
A little background if you haven’t heard about this before. Because it makes no sense to rate teachers based on students’ absolute grades or test scores (not all students start at the same place each year), the models all compare students’ test scores against some baseline. The simplest thing to do is to compare each student’s score on a test given at the end of the school year against their score on a test given at the end of the previous year. Teachers are then rated based on how much their students’ scores improved over the year.
Comparing with the previous year’s score controls for the level at which students start each year, but not for other factors beside the teacher that affect how much they learn. This includes attendance, in-school environment (curriculum, facilities, other students in the class), out-of-school learning (tutoring, enrichment programs, quantity and quality of time spent with parents/caregivers), and potentially much more. Fancier models try to take these into account by comparing each student’s end of year score with a predicted score. The predicted score is based both on the student’s previous score and on factors like those above. Improvement beyond the predicted score is then attributed to the teacher as “value added” (hence the name “value-added models,” or VAM) and turned into a teacher rating in some way, often using percentiles. One such model is used to rate teachers in New York City.
It’s important to understand that there is no single value-added model, rather a family of them, and that the devil is in the details. Two different teacher rating systems, based on two models of the predicted score, may perform very differently – both across the board, and in specific locations. Different factors may be more or less important depending on where you are. For example, income differences may matter more in a district that provides few basic services, so parents have to pay to get extracurriculars for their kids. And of course the test itself matters hugely as well.
Testing the VAM models
Teacher rating models based on standardized tests have been around for 25 years or so, but two things have happened in the last decade:
- Some people started to use the models in formal teacher evaluation, including tenure decisions.
- Some (other) people started to test the models.
This did not happen in the order that one would normally like. Wanting to make “data-driven decisions,” many cities and states decided to start rating teachers based on “data” before collecting any data to validate whether that “data” was any good. This is a bit like building a theoretical model of how cancer cells behave, synthesizing a cancer drug in the lab based on the model, distributing that drug widely without any trials, then waiting around to see how many people die from the side effects.
The full body count isn’t in yet, but the models don’t appear to be doing well so far. To look at some analysis of VAM data in New York City, start here and here. Note: this analysis was not done by the city but by individuals who downloaded the data after the city had to make it available because of disclosure laws.
I’m not aware of any study on the validity of NYC’s VAM ratings done by anyone actually affiliated with the city – if you know of any, please tell me. Again, the people preaching data don’t seem willing to actually use data to evaluate the quality of the systems they’re putting in place.
Assuming you have more respect for data than the mucky-mucks, let’s talk about how well the models actually do. Broadly, two ways a model can fail are being biased and being noisy. The point of the fancier value-added models is to try to eliminate bias by factoring in everything other than the teacher that might affect a student’s test score. The trouble is that any serious attempt to do this introduces a bunch of noise into the model, to the degree that the ratings coming out look almost random.
You’d think that a teacher doesn’t go from awful to great or vice versa in one year, but the NYC VAM ratings show next to no correlation in a teacher’s rating from one year to the next. You’d think that a teacher either teaches math well or doesn’t, but the NYC VAM ratings show next to no correlation in a teacher’s rating teaching a subject to one grade and their rating teaching it to another – in the very same year! (Gary Rubinstein’s blog, linked above, documents these examples, and a number of others.) Again, this is one particular implementation of a general class of models, but using such noisy data to make significant decisions about teachers’ careers seems nuts.
What’s happening in New Jersey
With all this as background, let’s turn to what’s happening in New Jersey.
You may be surprised that the version of the model proposed by Chris Christie‘s administration (the education commissioner is Christie appointee Chris Cerf, who helped put VAM in place in NYC) is about the simplest possible. There is no attempt to factor out bias by trying to model predicted scores, just a straight comparison between this year’s standardized test score and last year’s. For an overview, see this.
In more detail, the model groups together all students with the same score on last year’s test, and represents each student’s progress by their score on this year’s test, viewed as a percentile across this group. That’s it. A fancier version uses percentiles calculated across all students with the same score in each of the last several years. These can’t be calculated explicitly (you may not find enough students that got exactly the same score each the last few years), so they are estimated, using a statistical technique called quantile regression.
By design, both the simple and the fancy version ignore everything about a student except their test scores. As a modeler, or just as a human being, you might find it silly not to distinguish between a fourth grader in a wealthy suburb who scored 600 on a standardized test from a fourth grader in the projects with the same score. At least, I don’t know where to find a modeler who doesn’t find it silly, because nobody has bothered to study the validity of using this model to rate teachers. If I’m wrong, please point me to a study.
Politics and SGP
But here we get into the shell game of politics, where rating teachers based on the model is exactly the proposal that lies at the end of an impressive trail of doubletalk. Follow the bouncing ball.
These models, we are told, differ fundamentally from VAM (which is now seen as somewhat damaged goods politically, I suspect). While VAM tried to isolate teacher contribution, these models do no such thing – they are simply measuring student progress from year to year, which, after all, is what we truly care about. The models have even been rebranded with a new name: student growth percentiles, or SGP. SGP is sold as just describing student progress rather than attributing it to teachers, there can’t be any harm in that, right? – and nothing that needs validation, either. And because SGP is such a clean methodology – if you’re looking for a data-driven model to use for broad “educational assessment,” don’t get yourself into that whole VAM morass, use SGP instead!
Only before you know it, educational assessment turns into, you guessed it, rating teachers. That’s right: because these models aren’t built to rate teachers, they can focus on the things that really matter (student progress), and thus end up being – wait for it – much better for rating teachers! War is peace, friends. Ignorance is strength.
Creators of SGP
You can find a good discussion of SGP’s and their use in evaluation here, and a lot more from the same author, the impressively prolific Bruce Baker, here. Here’s a response from the creators of SGP. They maintain that information about student growth is useful (duh), and agree that differences in SGP’s should not be attributed to teachers (emphasis mine):
Large-scale assessment results are an important piece of evidence but are not sufficient to make causal claims about school or teacher quality.
SGP and teacher evaluations
But guess what?
The New Jersey Board of Ed and state education commissioner Cerf are putting in place a new teacher evaluation code, to be used this coming academic year and beyond. You can find more details here and here.
Summarizing: for math and English teachers in grades 4-8, 30% of their annual evaluation next year would be mandated by the state to come from those very same SGP’s that, according to their creators, are not sufficient to make causal claims about teacher quality. These evaluations are the primary input in tenure decisions, and can also be used to take away tenure from teachers who receive low ratings.
The proposal is not final, but is fairly far along in the regulatory approval process, and would become final in the next several months. In a recent step in the approval process, the weight given to SGP’s in the overall evaluation was reduced by 5%, from 35%. However, the 30% weight applies next year only, and in the future the state could increase the weight to as high as 50%, at its discretion.
Modeler’s Notes
Modeler’s Note #1: the precise weight doesn’t really matter. If the SGP scores vary a lot, and the other components don’t vary very much, SGP scores will drive the evaluation no matter what their weight.
Modeler’s Note #2: just reminding you again that this data-driven framework for teacher evaluation is being put in place without any data-driven evaluation of its effectiveness. And that this is a feature, not a bug – SGP has not been tested as an attribution tool because we keep hearing that it’s not meant to be one.
In a slightly ironic twist, commissioner Cerf has responded to criticisms that SGP hasn’t been tested by pointing to a Gates Foundation study of the effectiveness of… value-added models. The study is here. It draws pretty positive conclusions about how well VAM’s work. A number of critics have argued, pretty effectively, that the conclusions are unsupported by the data underlying the study, and that the data actually shows that VAM’s work badly. For a sample, see this. For another example of a VAM-positive study that doesn’t seem to stand up to scrutiny, see this and this.
Modeler’s Role Play #1
Say you were the modeler who had popularized SGP’s. You’ve said that the framework isn’t meant to make causal claims, then you see New Jersey (and other states too, I believe) putting a teaching evaluation model in place that uses SGP to make causal claims, without testing it first in any way. What would you do?
So far, the SGP mavens who told us that “Large-scale assessment results are an important piece of evidence but are not sufficient to make causal claims about school or teacher quality” remain silent about the New Jersey initiative, as far as I know.
Modeler’s Role Play #2
Now you’re you again, and you’ve never heard about SGP’s and New Jersey’s new teacher evaluation code until today. What do you do?
I want you to help me stop this thing. It’s not in place yet, and I hope there’s still time.
I don’t think we can convince the state education department on the merits. They’ve made the call that the new evaluation system is better than the current one or any alternatives they can think of, they’re invested in that decision, and we won’t change their minds directly. But we can make it easier for them to say no than to say yes. They can be influenced – by local school administrators, state politicians, the national education community, activists, you tell me who else. And many of those people will have more open minds. If I tell you, and you tell the right people, and they tell the right people, the chain gets to the decision makers eventually.
I don’t think I could convince Chris Christie, but maybe I could convince Bruce Springsteen if I met him, and maybe Bruce Springsteen could convince Chris Christie.
VAM-anifesto
I thought we could start with a manifesto – a direct statement from the modeling community explaining why this sucks. Directed at people who can influence the politics, and signed by enough experts (let’s get some big names in there) to carry some weight with those influencers.
Can you help? Help write it, sign it, help get other people to sign it, help get it to the right audience. Know someone whose opinion matters in New Jersey? Then let me know, and help spread the word to them. Use Facebook and Twitter if it’ll help. And don’t forget good old email, phone calls, and lunches with friends.
Or, do you have a better idea? Then put it down. Here. The comments section is wide open. Let’s not fall back on criticizing the politicians for being dumb after the fact. Let’s do everything we can to keep them from doing this dumb thing in the first place.
Shame on us if we can’t make this right.
Fight back against surveillance using TrackMeNot, TrackMeNot mobile?
After two days of travelling to the west coast and back, I’m glad to be back to my blog (and, of course, my coffee machine, which is the real source of my ability to blog every morning without distraction: it makes coffee at the push of a button, and that coffee has a delicious amount of caffeine).
Yesterday at the hotel I grabbed a free print edition of the Wall Street Journal to read on the plane, and I was super interested in this article called Phone Firm Sells Data on Customers. They talk about how phone companies (Verizon, specifically) are selling location data and browsing data about customers, how some people might be creeped out by this, and then they say:
The new offerings are also evidence of a shift in the relationship between carriers and their subscribers. Instead of merely offering customers a trusted conduit for communication, carriers are coming to see subscribers as sources of data that can be mined for profit, a practice more common among providers of free online services like Google Inc. and Facebook Inc.
Here’s the thing. It’s one thing to make a deal with the devil when I use Facebook: you give me something free, in return I let you glean information about me. But in terms of Verizon, I pay them like $200 per month for my family’s phone usage. That’s not free! Fuck you guys for turning around and selling my data!
And how are marketers going to use such location data? They will know how desperate you are for their goods and charge you accordingly. Like this for example, but on a much wider scale.
There are a two things I can do to object to this practice. First, I write this post and others, railing against such needless privacy invasion practices. Second, I can go to Verizon, my phone company, and get myself off the list. The instructions for doing so seem to be here, but I haven’t actually followed them yet.
Here’s what I wish a third option were: a mobile version of Trackmenot, which I learned about last week from Annelies Kamran.
Trackmenot, created by Daniel C. Howe and Helen Nissenbaum at what looks like the CS department of NYU, confuses the data gatherers by giving them an overload of bullshit information.
Specifically, it’s a Firefox add-on which sends you to all sorts of websites while you’re not actually using your browser. The data gatherers get endlessly confused about what kind of person you actually are this way, thereby fucking up the whole personal data information industry.
I have had this idea in the past, and I’m super happy it already exists. Now can someone do it for mobile please? Or even better, tell me it already exists?
Mr. Ratings Reformer Goes to Washington: Some Thoughts on Financial Industry Activism
This is a guest post by Marc Joffe, the principal consultant at Public Sector Credit Solutions, an organization that provides data and analysis related to sovereign and municipal securities. Previously, Joffe was a Senior Director at Moody’s Analytics for more than a decade.
Note to readers: for a bit of background on the SEC Credit Ratings Roundtable and the Franken Amendment see this recent mathbabe post.
I just returned from Washington after participating in the SEC’s Credit Ratings Roundtable. The experience was very educational, and I wanted to share what I’ve learned with readers interested in financial industry reform.
First and foremost, I learned that the Franken Amendment is dead. While I am not a proponent of this idea – under which the SEC would have set up a ratings agency assignment authority – I do welcome its intentions and mourn its passing. Thus, I want to take some time to explain why I think this idea is dead, and what financial reformers need to do differently if they want to see serious reforms enacted.
The Franken Amendment, as revised by the Dodd Frank conference committee, tasked the SEC with investigating the possibility of setting up a ratings assignment authority and then executing its decision. Within the SEC, the responsibility for Franken Amendment activities fell upon the Office of Credit Ratings (OCR), a relatively new creature of the 2006 Credit Rating Agency Reform Act.
OCR circulated a request for comments – posting the request on its web site and in the federal register – a typical SEC procedure. The majority of serious comments OCR received came from NRSROs and others with a vested interest in perpetuating the status quo or some close approximation thereof. Few comments came from proponents of the Franken Amendment, and some of those that did were inarticulate (e.g., a note from Joe Sixpack of Anywhere, USA saying that rating agencies are terrible and we just gotta do something about them).
OCR summarized the comments in its December 2012 study of the Franken Amendment. Progressives appear to have been shocked that OCR’s work product was not an originally-conceived comprehensive blueprint for a re-imagined credit rating business. Such an expectation is unreasonable. SEC regulators sit in Washington and New York; not Silicon Valley. There is little upside and plenty of political downside to taking major risks. Regulators are also heavily influenced by the folks they regulate, since these are the people they talk to on a day-to-day basis.
Political theorists Charles Lindblom and Aaron Wildavsky developed a theory that explains the SEC’s policymaking process quite well: it is called incrementalism. Rather than implement brand new ideas, policymakers prefer to make marginal changes by building upon and revising existing concepts.
While I can understand why Progressives think the SEC should “get off its ass” and really fix the financial industry, their critique is not based in the real world. The SEC is what it is. It will remain under budget pressure for the forseeable future because campaign donors want to restrict its activities. Staff will always be influenced by financial industry players, and out-of-the-box thinking will be limited by the prevailing incentives.
Proponents of the Franken Amendment and other Progressive reforms have to work within this system to get their reforms enacted. How? The answer is simple: when a request for comment arises they need to stuff the ballot box with varying and well informed letters supporting reform. The letters need to place proposed reforms within the context of the existing system, and respond to anticipated objections from status quo players. If 20 Progressive academics and Occupy-leaning financial industry veterans had submitted thoughtful, reality-based letters advocating the Franken Amendment, I believe the outcome would have been very different. (I should note that Occupy the SEC has produced a number of comment letters, but they did not comment on the Franken Amendment and I believe they generally send a single letter).
While the Franken Amendment may be dead, I am cautiously optimistic about the lifecycle of my own baby: open source credit rating models. I’ll start by explaining how I ended up on the panel and then conclude by discussing what I think my appearance achieved.
The concept of open source credit rating models is extremely obscure. I suspect that no more than a few hundred people worldwide understand this idea and less than a dozen have any serious investment in it. Your humble author and one person on his payroll, are probably the world’s only two people who actually dedicated more than 100 hours to this concept in 2012.
That said, I do want to acknowledge that the idea of open source credit rating models is not original to me – although I was not aware of other advocacy before I embraced it. Two Bay Area technologists started FreeRisk, a company devoted to open source risk models, in 2009. They folded the company without releasing a product and went on to more successful pursuits. FreeRisk left a “paper” trail for me to find including an article on the P2P Foundation’s wiki. FreeRisk’s founders also collaborated with Cate Long, a staunch advocate of financial markets transparency, to create riski.us – a financial regulation wiki.
In 2011, Cathy O’Neil (a.k.a. Mathbabe) an influential Progressive blogger who has a quantitative finance background ran a post about the idea of open source credit ratings, generating several positive comments. Cathy also runs the Alternative Banking group, an affiliate of Occupy Wall Street that attracts a number of financially literate activists.
I stumbled across Cathy’s blog while Googling “open source credit ratings”, sent her an email, had a positive phone conversation and got an invitation to address her group. Cathy then blogged about my open source credit rating work. This too was picked up on the P2P Foundation wiki, leading ultimately to a Skype call with the leader of the P2P Foundation, Michel Bauwens. Since then, Michel – a popularizer of progressive, collaborative concepts – has offered a number of suggestions about organizations to contact and made a number of introductions.
Most of my outreach attempts on behalf of this idea – either made directly or through an introduction – are ignored or greeted with terse rejections. I am not a proven thought leader, am not affiliated with a major research university and lack a resume that includes any position of high repute or authority. Consequently, I am only a half-step removed from the many “crackpots” that send around their unsolicited ideas to all and sundry.
Thus, it is surprising that I was given the chance to address the SEC Roundtable on May 14. The fact that I was able to get an invitation speaks well of the SEC’s process and is thus worth recounting. In October 2012, SEC Commissioner Dan Gallagher spoke at the Stanford Rock Center on Corporate Governance. He mentioned that the SEC was struggling with the task of implementing Dodd Frank Section 939A, which calls for the replacement of credit ratings in federal regulations, such as those that govern asset selection by money market funds.
After his talk, I pitched him the idea of open source credit ratings as an alternative creditworthiness standard that would satisfy the intentions of 939A. He suggested that I write to Tom Butler, head of the Office of Credit Ratings (OCR) and copy him. This led to a number of phone calls and ultimately a presentation to OCR staff in New York in January. Staff members that joined the meeting were engaged and asked good questions. I connected my proposal to an earlier SEC draft regulation which would have required structured finance issuers to publish cashflow waterall models in Python – a popular open source language.
I walked away from the meeting with the perception that, while they did not want to reinvent the industry, OCR staff were sincerely interested in new ideas that might create incremental improvements. That meeting led to my inclusion in the third panel of the Credit Ratings Roundtable.
For me, the panel discussion itself was mostly positive. Between the opening statement, questions and discussion, I probably had about 8 minutes to express my views. I put across all the points I hoped to make and even received a positive comment from one of the other panelists. On the downside, only one commissioner attended my panel – whereas all five had been present at the beginning of the day when Al Franken, Jules Kroll, Doug Peterson and other luminaries held the stage.
The roundtable generated less media attention than I expected, but I got an above average share of the limited coverage relative to the day’s other 25 panelists. The highlight was a mention in the Wall Street Journal in its pre-roundtable coverage.
Perhaps the fact that I addressed the SEC will make it easier for me to place op-eds and get speaking engagements to promote the open source ratings concept. Only time will tell. Ultimately, someone with a bigger reputation than mine will need to advocate this concept before it can progress to the next level.
Also, the idea is now part of the published record of SEC deliberations. The odds of it getting into a proposed regulation remain long in the near future, but these odds are much shorter than they were prior to the roundtable.
Political scientist John Kingdon coined the term “policy entrepreneurs” to describe people who look for and exploit opportunities to inject new ideas into the policy discussion. I like to think of myself as a policy entrepreneur, although I have a long way to go before I become a successful one. If you have read this far and also have strongly held beliefs about how the financial system should improve, I suggest you apply the concepts of incrementalism and policy entrepreneurship to your own activism.
Salt it up, baby!
An article in yesterday’s Science Times explained that limiting the salt in your diet doesn’t actually improve health, and could in fact be bad for you. That’s a huge turn-around for a public health rule that has run very deep.
How can this kind of thing happen?
Well, first of all epidemiologists use crazy models to make predictions on things, and in this case what happened was they saw a correlation between high blood pressure and high salt intake, and they saw a separate correlation between high blood pressure and death, and so they linked the two.
Trouble is, while very low salt intake might lower blood pressure a little bit, it also for what ever reason makes people die a wee bit more often.
As this Scientific American article explains, that “little bit” is actually really small:
Over the long-term, low-salt diets, compared to normal diets, decreased systolic blood pressure (the top number in the blood pressure ratio) in healthy people by 1.1 millimeters of mercury (mmHg) and diastolic blood pressure (the bottom number) by 0.6 mmHg. That is like going from 120/80 to 119/79. The review concluded that “intensive interventions, unsuited to primary care or population prevention programs, provide only minimal reductions in blood pressure during long-term trials.” A 2003 Cochrane review of 57 shorter-term trials similarly concluded that “there is little evidence for long-term benefit from reducing salt intake.”
Moreover, some people react to changing their salt intake with higher, and some with lower blood pressure. Turns out it’s complicated.
I’m a skeptic, especially when it comes to epidemiology. None of this surprises me, and I don’t think it’s the last bombshell we’ll be hearing. But this meta-analysis also might have flaws, so hold your breath for the next pronouncement.
One last thing – they keep saying that it’s too expensive to do this kind of study right, but I’m thinking that by now they might realize the real cost of not doing it right is a loss of the public’s trust in medical research.
SEC Roundtable on credit rating agency models today
I’ve discussed the broken business model that is the credit rating agency system in this country on a few occasions. It directly contributed to the opacity and fraud in the MBS market and to the ensuing financial crisis, for example. And in this post and then this one, I suggest that someone should start an open source version of credit rating agencies. Here’s my explanation:
The system of credit ratings undermines the trust of even the most fervently pro-business entrepreneur out there. The models are knowingly games by both sides, and it’s clearly both corrupt and important. It’s also a bipartisan issue: Republicans and Democrats alike should want transparency when it comes to modeling downgrades- at the very least so they can argue against the results in a factual way. There’s no reason I can see why there shouldn’t be broad support for a rule to force the ratings agencies to make their models publicly available. In other words, this isn’t a political game that would score points for one side or the other.
Well, it wasn’t long before Marc Joffe, who had started an open source credit rating agency, contacted me and came to my Occupy group to explain his plan, which I blogged about here. That was almost a year ago.
Today the SEC is going to have something they’re calling a Credit Ratings Roundtable. This is in response to an amendment that Senator Al Franken put on Dodd-Frank which requires the SEC to examine the credit rating industry. From their webpage description of the event:
The roundtable will consist of three panels:
- The first panel will discuss the potential creation of a credit rating assignment system for asset-backed securities.
- The second panel will discuss the effectiveness of the SEC’s current system to encourage unsolicited ratings of asset-backed securities.
- The third panel will discuss other alternatives to the current issuer-pay business model in which the issuer selects and pays the firm it wants to provide credit ratings for its securities.
Marc is going to be one of something like 9 people in the third panel. He wrote this op-ed piece about his goal for the panel, a key excerpt being the following:
Section 939A of the Dodd-Frank Act requires regulatory agencies to replace references to NRSRO ratings in their regulations with alternative standards of credit-worthiness. I suggest that the output of a certified, open source credit model be included in regulations as a standard of credit-worthiness.
Just to be clear: the current problem is that not only is there wide-spread gaming, but there’s also a near monopoly by the “big three” credit rating agencies, and for whatever reason that monopoly status has been incredibly well protected by the SEC. They don’t grant “NRSRO” status to credit rating agencies unless the given agency can produce something like 10 letters from clients who will vouch for them providing credit ratings for at least 3 years. You can see why this is a hard business to break into.
The Roundtable was covered yesterday in the Wall Street Journal as well: Ratings Firms Steer Clear of an Overhaul – an unfortunate title if you are trying to be optimistic about the event today. From the WSJ article:
Mr. Franken’s amendment requires the SEC to create a board that would assign a rating firm to evaluate structured-finance deals or come up with another option to eliminate conflicts.
While lawsuits filed against S&P in February by the U.S. government and more than a dozen states refocused unflattering attention on the bond-rating industry, efforts to upend its reliance on issuers have languished, partly because of a lack of consensus on what to do.
I’m just kind of amazed that, given how dirty and obviously broken this industry is, we can’t do better than this. SEC, please start doing your job. How could allowing an open-source credit rating agency hurt our country? How could it make things worse?
WSJ: “When Your Boss Makes You Pay for Being Fat”
Going along with the theme of shaming which I took up yesterday, there was a recent Wall Street Journal article called “When Your Boss Makes You Pay for Being Fat” about new ways employers are trying to “encourage healthy living”, or otherwise described, “save money on benefits”. From the article:
Until recently, Michelin awarded workers automatic $600 credits toward deductibles, along with extra money for completing health-assessment surveys or participating in a nonbinding “action plan” for wellness. It adopted its stricter policy after its health costs spiked in 2012.
Now, the company will reward only those workers who meet healthy standards for blood pressure, glucose, cholesterol, triglycerides and waist size—under 35 inches for women and 40 inches for men. Employees who hit baseline requirements in three or more categories will receive up to $1,000 to reduce their annual deductibles. Those who don’t qualify must sign up for a health-coaching program in order to earn a smaller credit.
A few comments:
- This policy combines the critical characteristics of shaming, namely 1) a complete lack of empathy and 2) the shifting of blame for a problem entirely onto one segment of the population even though the “obesity epidemic” is a poorly understood cultural phenomenon.
- To the extent that there may be push-back against this or similar policies inside the workplace, there will be very little to stop employers from not hiring fat people in the first place.
- Or for that matter, what’s going to stop employers from using people’s full medical profiles (note: by this I mean the unregulated online profile that Acxiom and other companies collect about you and then sell to employers or advertisers for medical stuff – not the official medical records which are regulated) against them in the hiring process? Who owns the new-fangled health analytics models anyway?
- We do that already to poor people by basing their acceptance on credit scores.
E-discovery and the public interest
Today I want to bring up a few observations and concerns I have about the emergence of a new field in machine learning called e-discovery. It’s the algorithmic version of discovery, so I’ll start there.
Discovery is part of the process in a lawsuit where relevant documents are selected, pored over, and then handed to the other side. Nowadays, of course, there are more and more documents, almost all electronic, typically including lots of e-mails.
If you’re talking about a big lawsuit, there could be literally millions of documents to wade through, and that takes a lot of time for humans to do, and it can be incredibly expensive and time-consuming. Enter the algorithm.
With advances in Natural Language Processing (NLP), a machine algorithm can sort emails or documents by topic (after getting the documents into machine-readable form, cleaning, and deduping) and can in general do a pretty good job of figuring out whether a given email is “relevant” to the case.
And this is already happening – the Wall Street Journal recently reported that the Justice Department allowed e-discovery for a case involving the merger of two beer companies. From the article:
With the blessing of the Justice Department’s antitrust division, the lawyers loaded the documents into a program and manually reviewed a batch to train the software to recognize relevant documents. The manual review was repeated until the Justice Department and Constellation were satisfied that the program could accurately predict relevance in the rest of the documents. Lawyers for Constellation and Crown Imports used software developed by kCura Corp., which lists the Justice Department as a client.
In the end, Constellation and Crown Imports turned over hundreds of thousands of documents to antitrust investigators.
Here are some of my questions/ concerns:
- These algorithms are typically not open source – companies like kCura make good money doing these jobs.
- That means that they could be wrong, possibly in subtle ways.
- Or maybe not so subtle ways: maybe they’ve been trained to find documents that are both “relevant” and “positive” for a given side.
- In any case, the laws of this country will increasingly depend on a black box algorithm that is no accessible to the average citizen.
- Is that in the public’s interest?
- Is that even constitutional?
The NYC Data Skeptics Meetup
One thing I’m super excited about at work is the new NYC Data Skeptics Meetup we’re organizing. Here’s the description of our mission:
The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. How accurate or misleading is this message? How is it helping or damaging people, and which people? What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view?
This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. We will take deep dives into models in the fields of Data Science, statistics, finance, economics, healthcare, and public policy.
This is an independent forum and open to anyone sharing an interest in the larger use of data. Technical aspects will be discussed, but attendees do not need to have a technical background.
A few things:
- Please join us!
- I wouldn’t blame you for not joining until we have a confirmed speaker, so please suggest speakers for us! I have a bunch of people in mind I’d absolutely love to see but I’d love more ideas. And I’m thinking broadly here – of course data scientists and statisticians and economists, but also lawyers, sociologists, or anyone who works with data or the effects of data.
- If you are skeptical of the need for yet another data-oriented Meetup (or other regular meeting), please think about it this way: there are not that many currently active groups which aren’t afraid to go into the technical weeds and also not obsesses with a simplistic, sound bite business take-away. But please tell me if I’m wrong, I’d love to reach out to people doing similar things.
- Suggest a better graphic for our Meetup than our current portrait of Isaac Asimov.
The rise of big data, big brother
I recently read an article off the newsstand called The Rise of Big Data.
It was written by Kenneth Neil Cukier and Viktor Mayer-Schoenberger and it was published in the May/June 2013 edition of Foreign Affairs, which is published by the Council on Foreign Relations (CFR). I mention this because CFR is an influential think tank, filled with powerful insiders, including people like Robert Rubin himself, and for that reason I want to take this view on big data very seriously: it might reflect the policy view before long.
And if I think about it, compared to the uber naive view I came across last week when I went to the congressional hearing about big data and analytics, that would be good news. I’ll write more about it soon, but let’s just say it wasn’t everything I was hoping for.
At least Cukier and Mayer-Schoenberger discuss their reservations regarding “big data” in this article. To contrast this with last week, it seemed like the only background material for the hearing, at least for the congressmen, was the McKinsey report talking about how sexy data science is and how we’ll need to train an army of them to stay competitive.
So I’m glad it’s not all rainbows and sunshine when it comes to big data in this article. Unfortunately, whether because they’re tied to successful business interests, or because they just haven’t thought too deeply about the dark side, their concerns seem almost token, and their examples bizarre.
The article is unfortunately behind the pay wall, but I’ll do my best to explain what they’ve said.
Datafication
First they discuss the concept of datafication, and their example is how we quantify friendships with “likes”: it’s the way everything we do, online or otherwise, ends up recorded for later examination in someone’s data storage units. Or maybe multiple storage units, and maybe for sale.
They formally define later in the article as a process:
… taking all aspect of life and turning them into data. Google’s augmented-reality glasses datafy the gaze. Twitter datafies stray thoughts. LinkedIn datafies professional networks.
Datafication is an interesting concept, although as far as I can tell they did not coin the word, and it has led me to consider its importance with respect to intentionality of the individual.
Here’s what I mean. We are being datafied, or rather our actions are, and when we “like” someone or something online, we are intending to be datafied, or at least we should expect to be. But when we merely browse the web, we are unintentionally, or at least passively, being datafied through cookies that we might or might not be aware of. And when we walk around in a store, or even on the street, we are being datafied in an completely unintentional way, via sensors or Google glasses.
This spectrum of intentionality ranges from us gleefully taking part in a social media experiment we are proud of to all-out surveillance and stalking. But it’s all datafication. Our intentions may run the gambit but the results don’t.
They follow up their definition in the article, once they get to it, with a line that speaks volumes about their perspective:
Once we datafy things, we can transform their purpose and turn the information into new forms of value
But who is “we” when they write it? What kinds of value do they refer to? As you will see from the examples below, mostly that translates into increased efficiency through automation.
So if at first you assumed they mean we, the American people, you might be forgiven for re-thinking the “we” in that sentence to be the owners of the companies which become more efficient once big data has been introduced, especially if you’ve recently read this article from Jacobin by Gavin Mueller, entitled “The Rise of the Machines” and subtitled “Automation isn’t freeing us from work — it’s keeping us under capitalist control.” From the article (which you should read in its entirety):
In the short term, the new machines benefit capitalists, who can lay off their expensive, unnecessary workers to fend for themselves in the labor market. But, in the longer view, automation also raises the specter of a world without work, or one with a lot less of it, where there isn’t much for human workers to do. If we didn’t have capitalists sucking up surplus value as profit, we could use that surplus on social welfare to meet people’s needs.
The big data revolution and the assumption that N=ALL
According to Cukier and Mayer-Schoenberger, the Big Data revolution consists of three things:
- Collecting and using a lot of data rather than small samples.
- Accepting messiness in your data.
- Giving up on knowing the causes.
They describe these steps in rather grand fashion, by claiming that big data doesn’t need to understand cause because the data is so enormous. It doesn’t need to worry about sampling error because it is literally keeping track of the truth. The way the article frames this is by claiming that the new approach of big data is letting “N = ALL”.
But here’s the thing, it’s never all. And we are almost always missing the very things we should care about most.
So for example, as this InfoWorld post explains, internet surveillance will never really work, because the very clever and tech-savvy criminals that we most want to catch are the very ones we will never be able to catch, since they’re always a step ahead.
Even the example from their own article, election night polls, is itself a great non-example: even if we poll absolutely everyone who leaves the polling stations, we still don’t count people who decided not to vote in the first place. And those might be the very people we’d need to talk to to understand our country’s problems.
Indeed, I’d argue that the assumption we make that N=ALL is one of the biggest problems we face in the age of Big Data. It is, above all, a way of excluding the voices of people who don’t have the time or don’t have the energy or don’t have the access to cast their vote in all sorts of informal, possibly unannounced, elections.
Those people, busy working two jobs and spending time waiting for buses, become invisible when we tally up the votes without them. To you this might just mean that the recommendations you receive on Netflix don’t seem very good because most of the people who bother to rate things are Netflix are young and have different tastes than you, which skews the recommendation engine towards them. But there are plenty of much more insidious consequences stemming from this basic idea.
Another way in which the assumption that N=ALL can matter is that it often gets translated into the idea that data is objective. Indeed the article warns us against not assuming that:
… we need to be particularly on guard to prevent our cognitive biases from deluding us; sometimes, we just need to let the data speak.
And later in the article,
In a world where data shape decisions more and more, what purpose will remain for people, or for intuition, or for going against the facts?
This is a bitch of a problem for people like me who work with models, know exactly how they work, and know exactly how wrong it is to believe that “data speaks”.
I wrote about this misunderstanding here, in the context of Bill Gates, but I was recently reminded of it in a terrifying way by this New York Times article on big data and recruiter hiring practices. From the article:
“Let’s put everything in and let the data speak for itself,” Dr. Ming said of the algorithms she is now building for Gild.
If you read the whole article, you’ll learn that this algorithm tries to find “diamond in the rough” types to hire. A worthy effort, but one that you have to think through.
Why? If you, say, decided to compare women and men with the exact same qualifications that have been hired in the past, but then, looking into what happened next you learn that those women have tended to leave more often, get promoted less often, and give more negative feedback on their environments, compared to the men, your model might be tempted to hire the man over the woman next time the two showed up, rather than looking into the possibility that the company doesn’t treat female employees well.
In other words, ignoring causation can be a flaw, rather than a feature. Models that ignore causation can add to historical problems instead of addressing them. And data doesn’t speak for itself, data is just a quantitative, pale echo of the events of our society.
Some cherry-picked examples
One of the most puzzling things about the Cukier and Mayer-Schoenberger article is how they chose their “big data” examples.
One of them, the ability for big data to spot infection in premature babies, I recognized from the congressional hearing last week. Who doesn’t want to save premature babies? Heartwarming! Big data is da bomb!
But if you’re going to talk about medicalized big data, let’s go there for reals. Specifically, take a look at this New York Times article from last week where a woman traces the big data footprints, such as they are, back in time after receiving a pamphlet on living with Multiple Sclerosis. From the article:
Now she wondered whether one of those companies had erroneously profiled her as an M.S. patient and shared that profile with drug-company marketers. She worried about the potential ramifications: Could she, for instance, someday be denied life insurance on the basis of that profile? She wanted to track down the source of the data, correct her profile and, if possible, prevent further dissemination of the information. But she didn’t know which company had collected and shared the data in the first place, so she didn’t know how to have her entry removed from the original marketing list.
Two things about this. First, it happens all the time, to everyone, but especially to people who don’t know better than to search online for diseases they actually have. Second, the article seems particularly spooked by the idea that a woman who does not have a disease might be targeted as being sick and have crazy consequences down the road. But what about a woman is actually is sick? Does that person somehow deserve to have their life insurance denied?
The real worries about the intersection of big data and medical records, at least the ones I have, are completely missing from the article. Although they did mention that “improving and lowering the cost of health care for the world’s poor” inevitable will lead to “necessary to automate some tasks that currently require human judgment.” Increased efficiency once again.
To be fair, they also talked about how Google tried to predict the flu in February 2009 but got it wrong. I’m not sure what they were trying to say except that it’s cool what we can try to do with big data.
Also, they discussed a Tokyo research team that collects data on 360 pressure points with sensors in a car seat, “each on a scale of 0 to 256.” I think that last part about the scale was added just so they’d have more numbers in the sentence – so mathematical!
And what do we get in exchange for all these sensor readings? The ability to distinguish drivers, so I guess you’ll never have to share your car, and the ability to sense if a driver slumps, to either “send an alert or atomatically apply brakes.” I’d call that a questionable return for my investment of total body surveillance.
Big data, business, and the government
Make no mistake: this article is about how to use big data for your business. It goes ahead and suggests that whoever has the biggest big data has the biggest edge in business.
Of course, if you’re interested in treating your government office like a business, that’s gonna give you an edge too. The example of Bloomberg’s big data initiative led to efficiency gain (read: we can do more with less, i.e. we can start firing government workers, or at least never hire more).
As for regulation, it is pseudo-dealt with via the discussion of market dominance. We are meant to understand that the only role government can or should have with respect to data is how to make sure the market is working efficiently. The darkest projected future is that of market domination by Google or Facebook:
But how should governments apply antitrust rules to big data, a market that is hard to define and is constantly changing form?
In particular, no discussion of how we might want to protect privacy.
Big data, big brother
I want to be fair to Cukier and Mayer-Schoenberger, because they do at least bring up the idea of big data as big brother. Their topic is serious. But their examples, once again, are incredibly weak.
Should we find likely-to-drop-out boys or likely-to-get-pregnant girls using big data? Should we intervene? Note the intention of this model would be the welfare of poor children. But how many models currently in production are targeting that demographic with that goal? Is this in any way at all a reasonable example?
Here’s another weird one: they talked about the bad metric used by US Secretary of Defense Robert McNamara in the Viet Nam War, namely the number of casualties. By defining this with the current language of statistics, though, it gives us the impression that we could just be super careful about our metrics in the future and: problem solved. As we experts in data know, however, it’s a political decision, not a statistical one, to choose a metric of success. And it’s the guy in charge who makes that decision, not some quant.
Innovation
If you end up reading the Cukier and Mayer-Schoenberger article, please also read Julie Cohen’s draft of a soon-to-be published Harvard Law Review article called “What Privacy is For” where she takes on big data in a much more convincing and skeptical light than Cukier and Mayer-Schoenberger were capable of summoning up for their big data business audience.
I’m actually planning a post soon on Cohen’s article, which contains many nuggets of thoughtfulness, but for now I’ll simply juxtapose two ideas surrounding big data and innovation, giving Cohen the last word. First from the Cukier and Mayer-Schoenberger article:
Big data enables us to experiment faster and explore more leads. These advantages should produce more innovation
Second from Cohen, where she uses the term “modulation” to describe, more or less, the effect of datafication on society:
When the predicate conditions for innovation are described in this way, the problem with characterizing privacy as anti-innovation becomes clear: it is modulation, not privacy, that poses the greater threat to innovative practice. Regimes of pervasively distributed surveillance and modulation seek to mold individual preferences and behavior in ways that reduce the serendipity and the freedom to tinker on which innovation thrives. The suggestion that innovative activity will persist unchilled under conditions of pervasively distributed surveillance is simply silly; it derives rhetorical force from the cultural construct of the liberal subject, who can separate the act of creation from the fact of surveillance. As we have seen, though, that is an unsustainable fiction. The real, socially-constructed subject responds to surveillance quite differently—which is, of course, exactly why government and commercial entities engage in it. Clearing the way for innovation requires clearing the way for innovative practice by real people, by preserving spaces within which critical self-determination and self-differentiation can occur and by opening physical spaces within which the everyday practice of tinkering can thrive.
Big data and surveillance
You know how, every now and then, you hear someone throw out a statistic that implies almost all of the web is devoted to porn?
Well, that turns out to be a false myth, which you can read more about here – although once upon a time it was kind of true, before women started using the web in large numbers and before there was Netflix streaming.
Here’s another myth along the same lines which I think might actually be true: almost all of big data is devoted to surveillance.
Of course, data is data, and you could define “surveillance” broadly (say as “close observation”), to make the above statement a tautology. To what extent is Google’s data, collected about you, a surveillance database, if they only use it to tailor searches and ads?
On the other hand, something that seems unthreatening now can become creepy soon: recall the NSA whistleblower who last year described how the government stores an enormous amount of the “electronic communications” in this country to keep close tabs on us.
The past
Back in 2011, computerworld.com published an article entitled “Big data to drive a surveillance society” and makes the case that there is a natural competition among corporations with large databases to collect more data, have it more interconnected (knowing now only a person’s shopping habits but also their location and age, say) and have the analytics work faster, even real-time, so they can peddle their products faster and better than the next guy.
Of course, not everyone agrees to talk about this “natural competition”. From the article:
Todd Papaioannou, vice president of cloud architecture at Yahoo, said instead of thinking about big data analytics as a weapon that empowers corporate Big Brothers, consumers should regard it as a tool that enables a more personalized Web experience.
“If someone can deliver a more compelling, relevant experience for me as a consumer, then I don’t mind it so much,” he said.
Thanks for telling us consumers how great this is, Todd. Later in the same article Todd says, “Our approach is not to throw any data away.”
The present
Fast forward to 2013, when defence contractor Raytheon is reported to have a new piece of software, called Riot, which is cutting-edge in the surveillance department.
The name Riot refers to “Rapid Information Overlay Technology” and it can locate individuals with longitude and latitudes, using cell phone data, and make predictions as well, using data scraped from Facebook, Twitter, and Foursquare. A video explains how they do it. From the op-ed:
The possibilities for RIOT are hideous at consumer level. This really is the stalker’s dream technology. There’s also behavioural analysis to predict movements in the software. That’s what Big Data can do, and if it’s not foolproof, there are plenty of fools around the world to try it out on.
US employers, who have been creating virtual Gulags of surveillance for employees with much less effective technology, will love this. “We know what you do” has always been a working option for coercion. The fantastic levels of paranoia involved in the previous generations of surveillance technology will be truly gratified by RIOT.
The future
Lest we think that our children are not as affected by such stalking software, since they don’t spend as much time on social media and often don’t have cellphones, you should also be aware that educational data is now being collected about individual learners in the U.S. at an enormous scale and with very little oversight.
This report from educationnewyork.com (hat tip Matthew Cunningham-Cook) explains recent changes in privacy laws for children, which happen to coincide with how much data is being collected (tons) and how much money is in the analysis of that data (tons):
Schools are a rich source of personal information about children that can be legally and illegally accessed by third parties.With incidences of identity theft, database hacking, and sale of personal information rampant, there is an urgent need to protect students’ rights under FERPA and raise awareness of aspects of the law that may compromise the privacy of students and their families.
In 2008 and 2011, amendments to FERPA gave third parties, including private companies,increased access to student data. It is significant that in 2008, the amendments to FERPA expanded the definitions of “school officials” who have access to student data to include “contractors, consultants, volunteers, and other parties to whom an educational agency or institution has outsourced institutional services or functions it would otherwise use employees to perform.” This change has the effect of increasing the market for student data.
There are lots of contractors and consultants, for example inBloom, and they are slightly less concerned about data privacy issues than you might be:
inBloom has stated that it “cannot guarantee the security of the information stored … or that the information will not be intercepted when it is being transmitted.”
The article ends with this:
The question is: Should we compromise and endanger student privacy to support a centralized and profit-driven education reform initiative? Given this new landscape of an information and data free-for-all, and the proliferation of data-driven education reform initiatives like CommonCore and huge databases of student information, we’ve arrived at a time when once a child enters a public school,their parents will never again know who knows what about their children and about their families. It is now up to individual states to find ways to grant students additional privacy protections.
No doubt about it: our children are well on their way to being the most stalked generation.
Privacy policy
One of the reasons I’m writing this post today is that I’m on a train to D.C. to sit in a Congressional hearing where Congressmen will ask “big data experts” questions about big data and analytics. The announcement is here, and I’m hoping to get into it.
The experts present are from IBM, the NSF, and North Carolina State University. I’m wondering how they got picked and what their incentives are. If I get in I will write a follow-up post on what happened.
Here’s what I hope happens. First, I hope it’s made clear that anonymization doesn’t really work with large databases. Second, I hope it’s clear that there’s no longer a very clear dividing line between sensitive data and nonsensitive data – you’d be surprised how much can be inferred about your sensitive data using only nonsensitive data.
Next, I hope it’s clear that the very people who should be worried the most about their data being exposed and freely available are the ones who don’t understand the threat. This means that merely saying that people should protect their data more is utterly insufficient.
Next, we should understand what policies already in place look like in Europe:
Finally, we should focus not only the collection of data, but on the usage of data. Just because you have a good idea of my age, race, education level, income, and HIV status doesn’t mean you should be able to use that information against me whenever you want.
In particular, it should not be legal for companies that provide loans or insurance to use whatever information they can buy from Acxiom about you. It should be a highly regulated set of data that allows for such decisions.
How much math do scientists need to know?
I’m catching up with reading the “big data news” this morning (via Gil Press) and I came across this essay by E. O. Wilson called “Great Scientist ≠ Good at Math”. In it, he argues that most of the successful scientists he knows aren’t good at math, and he doesn’t see why people get discouraged from being scientists just because they suck at math.
Here’s an important excerpt from the essay:
Over the years, I have co-written many papers with mathematicians and statisticians, so I can offer the following principle with confidence. Call it Wilson’s Principle No. 1: It is far easier for scientists to acquire needed collaboration from mathematicians and statisticians than it is for mathematicians and statisticians to find scientists able to make use of their equations.
Given that he’s written many papers with mathematicians and statisticians, then, he is not claiming that math itself is not part of great science, just that he hasn’t been the one that supplied the mathy bits. I think this is really key.
And it resonates with me: I’ve often said that the cool thing about working on a data science team in industry, for example, is that different people bring different skills to the table. I might be an expert on some machine learning algorithms, while someone else will be a domain expert. The problem requires both skill sets, and perhaps no one person has all that knowledge. Teamwork kinda rocks.
Another thing he exposes with Wilson’s Principle No. 1, though, which doesn’t resonate with me, is a general lack of understanding of what mathematicians are actually trying to accomplish with “their equations”.
It is a common enough misconception to think of the quant as a guy with a bunch of tools but no understanding or creativity. I’ve complained about that before on this blog. But when it comes to professional mathematicians, presumably including his co-authors, a prominent scientist such as Wilson should realize that they are doing creative things inside the realm of mathematics simply for the sake of understanding mathematics.
Mathematicians, as a group, are not sitting around wishing someone could “make use of their equations.” For one thing, they often don’t even think about equations. And for another, they often think about abstract structures with no goal whatsoever of connecting it back to, say, how ants live in colonies. And that’s cool and beautiful too, and it’s not a failure of the system. That’s just math.
I’m not saying it wouldn’t be fun for mathematicians to spend more time thinking about applied science. I think it would be fun for them, actually. Moreover, as the next few years and decades unfold, we might very well see a large-scale shrinkage in math departments and basic research money, which could force the issue.
And, to be fair, there are probably some actual examples of mathy-statsy people who are thinking about equations that are supposed to relate to the real world but don’t. Those guys should learn to be better communicators and pair up with colleagues who have great data. In my experience, this is not a typical situation.
One last thing. The danger in ignoring the math yourself, if you’re a scientist, is that you probably aren’t that great at knowing the difference between someone who really knows math and someone who can throw around terminology. You can’t catch charlatans, in other words. And, given that scientists do need real math and statistics to do their research, this can be a huge problem if your work ends up being meaningless because your team got the math wrong.
Global move to austerity based on mistake in Excel
As Rortybomb reported yesterday on the Roosevelt Institute blog (hat tip Adam Obeng), a recent paper written by Thomas Herndon, Michael Ash, and Robert Pollin looked into replicating the results of a economics paper originally written by Carmen Reinhart and Kenneth Rogoff entitled Growth in a Time of Debt.
The original Reinhart and Rogoff paper had concluded that public debt loads greater than 90 percent of GDP consistently reduce GDP growth, a “fact” which has been widely used. However, the more recent paper finds problems. Here’s the abstract:
Herndon, Ash and Pollin replicate Reinhart and Rogoff and find that coding errors, selective exclusion of available data, and unconventional weighting of summary statistics lead to serious errors that inaccurately represent the relationship between public debt and GDP growth among 20 advanced economies in the post-war period. They find that when properly calculated, the average real GDP growth rate for countries carrying a public-debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0:1 percent as published in Reinhart and Rogoff. That is, contrary to RR, average GDP growth at public debt/GDP ratios over 90 percent is not dramatically different than when debt/GDP ratios are lower.
The authors also show how the relationship between public debt and GDP growth varies significantly by time period and country. Overall, the evidence we review contradicts Reinhart and Rogoff’s claim to have identified an important stylized fact, that public debt loads greater than 90 percent of GDP consistently reduce GDP growth.
A few comments.
1) We should always have the data and code for published results.
The way the authors Herndon, Ash and Pollin managed to replicate the results was that they personally requested the excel spreadsheets from Reinhart and Rogoff. Given how politically useful and important this result has been (see Rortybomb’s explanation of this), it’s kind of a miracle that they released the spreadsheet. Indeed that’s the best part of this story from a scientific viewpoint.
2) The data and code should be open source.
One cool thing is that now you can actually download the data – there’s a link at the bottom of this page. I did this and was happy to have a bunch of csv files and some (open source) R code which presumably recovers the excel spreadsheet mistakes. I also found some .dta files, which seems like Stata proprietary file types, which is annoying, but then I googled and it seems like you can use R to turn .dta files into csv files. It’s still weird that they wrote code in R but saved files in Stata.
3) These mistakes are easy to make and they’re mostly not considered mistakes.
Let’s talk about the “mistakes” the authors found. First, they’re excluding certain time periods for certain countries, specifically right after World War II. Second, they chose certain “non-standard” weightings for the various countries they considered. Finally, they accidentally excluded certain rows from their calculation.
Only that last one is considered a mistake by modelers. The others are modeling choices, and they happen all the time. Indeed it’s impossible not to make such choices. Who’s to say that you have to use standard country weightings? Why? How much data do you actually need to consider? Why?
[Aside: I’m sure there are proprietary trading models running right now in hedge funds that anticipate how other people weight countries in standard ways and betting accordingly. In that sense, using standard weightings might be a stupid thing to do. But in any case validating a weighting scheme is extremely difficult. In the end you’re trying to decide how much various countries matter in a certain light, and the answer is often that your country matters the most to you.]
4) We need to actually consider other modeling possibilities.
It’s not a surprise, to economists anyway, that after you include more post-WWII years of data, which we all know to be high debt and high growth years worldwide, you get a substantively different answer. Excluding these data points is just as much a political decision as a modeling decision.
In the end the only reasonable way to proceed is to describe your choices, and your reasoning, and the result, but also consider other “reasonable” choices and report the results there too. And if you don’t like the answer, or don’t want to do the work, at the very least you need to provide your code and data and let other people check how your result changes with different “reasonable” choices.
Once the community of economists (and other data-centric fields) starts doing this, we will all realize that our so-called “objective results” utterly depend on such modeling decisions, and are about as variable as our own opinions.
5) And this is an easy model.
Think about how many modeling decisions and errors are in more complicated models!
War of the machines, college edition
A couple of people have sent me this recent essay (hat tip Leon Kautsky) written by Elijah Mayfield on the education technology blog e-Literate, described on their About page as “a hobby weblog about educational technology and related topics that is maintained by Michael Feldstein and written by Michael and some of his trusted colleagues in the field of educational technology.”
Mayfield’s essay is entitled “Six Ways the edX Announcement Gets Automated Essay Grading Wrong”. He’s referring to the recent announcement, which was written about in the New York Times last week, about how professors will soon be replaced by computers in grading essays. He claims they got it all wrong and there’s nothing to worry about.
I wrote about this idea too, in this post, and he hasn’t addressed my complaints at all.
First, Mayfield’s points:
- Journalists sensationalize things.
- The machine is identifying things in the essays that are associated with good writing vs. bad writing, much like it might learn to distinguish pictures of ducks from pictures of houses.
- It’s actually not that hard to find the duck and has nothing to do with “creativity” (look for webbed feet).
- If the machine isn’t sure it can spit back the essay to the professor to read (if the professor is still employed).
- The machine doesn’t necessarily reward big vocabulary words, except when it does.
- You’d need thousands of training examples (essays on a given subject) to make this actually work.
- What’s so really wonderful is that a student can get all his or her many drafts graded instantaneously, which no professor would be willing to do.
Here’s where I’ll start, with this excerpt from near the end:
“Can machine learning grade essays?” is a bad question. We know, statistically, that the algorithms we’ve trained work just as well as teachers for churning out a score on a 5-point scale. We know that occasionally it’ll make mistakes; however, more often than not, what the algorithms learn to do are reproduce the already questionable behavior of humans. If we’re relying on machine learning solely to automate the process of grading, to make it faster and cheaper and enable access, then sure. We can do that.
OK, so we know that the machine can grade essays written for human consumption pretty accurately. But it hasn’t had to deal with essays written for machine consumption yet. There’s major room for gaming here, and only a matter of time before there’s a competing algorithm to build a great essay. I even know how to train that algorithm. Email me privately and we can make a deal on profit-sharing.
And considering that students will be able to get their drafts graded as many times as they want, as Mayfield advertised, this will only be easier. If I build an essay that I think should game the machine, by putting in lots of (relevant) long vocabulary words and erudite phrases, then I can always double check by having the system give me a grade. If it doesn’t work, I’ll try again.
And the essays built this way won’t get caught via the fraud detection software that finds plagiarism, because any good essay-builder will only steal smallish phrases.
One final point. The fact that the machine-learning grading algorithm only works when it’s been trained on thousands of essays points to yet another depressing trend: large-scale classes with the same exact assignments every semester so last year’s algorithm can be used, in the name of efficiency.
But that means last year’s essay-building algorithm can be used as well. Pretty soon it will just be a war of the machines.
A public-facing math panel
I’m returning from two full days of talking to mathematicians and applied mathematicians at Cornell. I was really impressed with the people I met there – thoughtful, informed, and inquisitive – and with the kind reception they gave me.
I gave an “Oliver Talk” which was joint with the applied math colloquium on Thursday afternoon. The goal of my talk was to convince mathematicians that there’s a very bad movement underway whereby models are being used against people, in predatory ways, and in the name of mathematics. I turned some people off, I think, by my vehemence, but then again it’s hard not get riled up about this stuff, because it’s creepy and I actually think there’s a huge amount at stake.
One thing I did near the end of my talk was bring up (and recruit for) the idea of a panel of mathematicians which defines standards for public-facing models and vets the current crop.
The first goal of such a panel would be to define mathematical models, with a description of “best practices” when modeling people, including things like anticipating impact, gaming, and feedback loops of models, and asking for transparent and ongoing evaluation methods, as well as having minimum standards for accuracy.
The second goal of the panel would be to choose specific models that are in use and measure the extent to which they pass the standards of the above best practices rubric.
So the teacher value-added model, I’d expect, would fail in that it doesn’t have an evaluation method, at least that is made public, nor does it seem to have any accuracy standards, even though it’s widely used and is high impact.
I’ve had some pretty amazing mathematicians already volunteer to be on such a panel, which is encouraging. What’s cool is that I think mathematicians, as a group, are really quite ethical and can probably make their voices heard and trusted if they set their minds to it.
New creepy model: job hiring software
Warmup: Automatic Grading Models
Before I get to my main take-down of the morning, let me warm up with an appetizer of sorts: have you been hearing a lot about new models that automatically grade essays?
Does it strike you that’s there’s something wrong with that idea but you don’t know what it is?
Here’s my take. While it’s true that it’s possible to train a model to grade essays similarly to what a professor now does, that doesn’t mean we can introduce automatic grading – at least not if the students in question know that’s what we’re doing.
There’s a feedback loop, whereby if the students know their essays will be automatically graded, then they will change what they’re doing to optimize for good automatic grades rather than, say, a cogent argument.
For example, a student might download a grading app themselves (wouldn’t you?) and run their essay through the machine until it gets a great grade. Not enough long words? Put them in! No need to make sure the sentences make sense, because the machine doesn’t understand grammar!
This is, in fact, a great example where people need to take into account the (obvious when you think about them) feedback loops that their models will enter in actual use.
Job Hiring Models
Now on to the main course.
In this week’s Economist there is an essay about the new widely-used job hiring software and how awesome it is. It’s so efficient! It removes the biases of of those pesky recruiters! Here’s an excerpt from the article:
The problem with human-resource managers is that they are human. They have biases; they make mistakes. But with better tools, they can make better hiring decisions, say advocates of “big data”.
So far “the machine” has made observations such as:
- Good if candidate uses browser you need to download like Chrome.
- Not as bad as one might expect to have a criminal record.
- Neutral on job hopping.
- Great if you live nearby.
- Good if you are on Facebook.
- Bad if you’re on Facebook and every other social networking site as well.
Now, I’m all for learning to fight against our biases and hire people that might not otherwise be given a chance. But I’m not convinced that this will happen that often – the people using the software can always train the model to include their biases and then point to the machine and say “The machine told me to do it”. True.
What I really object to, however, is the accumulating amount of data that is being collected about everyone by models like this.
It’s one thing for an algorithm to take my CV in and note that I misspelled my alma mater, but it’s a different thing altogether to scour the web for my online profile trail (via Acxiom, for example), to look up my credit score, and maybe even to see my persistence score as measured by my past online education activities (soon available for your 7-year-old as well!).
As a modeler, I know how hungry the model can be. It will ask for all of this data and more. And it will mean that nothing you’ve ever done wrong, no fuck-up that you wish to forget, will ever be forgotten. You can no longer reinvent yourself.
Forget mobility, forget the American Dream, you and everyone else will be funneled into whatever job and whatever life the machine has deemed you worthy of. WTF.
Hey WSJ, don’t blame unemployed disabled people for the crap economy
This morning I’m being driven crazy by this article in yesterday’s Wall Street Journal entitled “Workers Stuck in Disability Stunt Economic Recovery.”
Even the title makes the underlying goal of the article crystal clear: the lazy disabled workers are to blame for the crap economy. Lest you are unconvinced that anyone could make such an unreasonable claim of causation, here’s a tasty excerpt from the article that spells it out:
Economic growth is driven by the number of workers in an economy and by their productivity. Put simply, fewer workers usually means less growth.
Since the recession, more people have gone on disability, on net, than new workers have joined the labor force. Mr. Feroli estimated the exodus to disability costs 0.6% of national output, equal to about $95 billion a year.
“The greater cost is their long-term dependency on transfers from the federal government,” Mr. Autor said, “placing strain on the soon-to-be exhausted Social Security Disability trust fund.”
The underlying model here, then, is that there’s a bunch of people who have the choice between going on disability or “joining the labor force” and they’ve all chosen to go on disability. I wonder where their evidence is that people really have that choice, considering the unemployment numbers and participation rate numbers we see nowadays.
For example, the unemployment rate for youths is now 22.9%, and the participation rate for them has gone from 59.2% in December 2007, to 54.5% today. This is probably not because so many kids under the age of 25 are disabled, I suspect. If you look at the overall labor participation rate, it’s dropped from 66.0 in December 2007 to 63.3 in March 2013. Most of the people who have left the work force are also not disabled. They’ve been discouraged for some other mysterious reason. I’m gonna go ahead and guess it’s because they can’t find a job.
This leads me to ask the following question from the journalists LESLIE SCISM and JON HILSENRATH who wrote the article: Where is your evidence of causation??
Here’s another example from the article of a seriously fucked-up understanding of cause and effect:
With overall participation down, the labor force—a measure of people working and people looking for work—is barely growing.
They consistently paint the picture whereby people decide to stop working, and then yucky things happen, in this case the labor force stops growing. Damn those lazy people.
They even bring in a fancy word from physics to describe the problem, namely hysteresis. Now, they didn’t understand or correctly define the term, but it doesn’t really matter, because the point of using a fancy term from physics was not to add to the clarity of the argument but rather to impress.
The goal here is, in fact, that if enough economists use sophisticated language to describe the various effects, we will all be able to blame people with bad backs, making $13.6K per year, on why our economy sucks, rather than the rich assholes in finance who got us into this mess and are currently buying $2 million dollar personal offices instead of going to jail.
Just to be clear, that’s $1,130 a month, which I guess represents so enticing a lifestyle that the people currently enjoying it ‘are “pretty unlikely to want to forfeit economic security for a precarious job market”‘ according to M.I.T. economist David Autor. I’d love to have David Autor spell out, for us, exactly what’s economically secure about that kind of monthly check.
The rest of the article is in large part a description of how people get onto SSDI, insinuating that the people currently on it are not really all that disabled or worthy of living high on the hog, and are in any case never ever leaving.
How’s this for a slightly different take on the situation: there are of course some people who are faking something, that’s always the case. But in general, the people on SSDI need to be there, and before the recession might have had the kind of employers who kept them on even though they often called in sick, out of loyalty and kindness, because they didn’t want to fire them. But when the recession struck those employers had to cut them off, or they went out of business completely. Now those people can’t find work and don’t have many options. In other words, the recession caused the SSDI program to grow. That doesn’t mean it caused a bunch of people to get sick, but it does mean that sick people are more dependent on SSDI because there are fewer options.
By the way, read the comments of this article, there are some really good ones (“What were people with injuries and no high-value job skills to do? Is the number of people in the social security disability program the problem or the symptom?”) as well as some really outrageous ones (‘The current situation makes the picture of the “Welfare Queen” of the 1980s look like an honest citizen’).
Elizabeth Fischer talks about climate modeling at Occupy today
I’m really excited to be going to the pre-meeting talk of my Occupy group today. We’re having a talk by Elizabeth Fischer, who is a post-doc at NASA GISS, a laboratory focused largely on climate change up here in the Columbia neighborhood.
She is coming to talk with us about her work investigating the long-term behavior of ice sheets in a changing climate. Before joining GISS, Dr. Fischer was a quant on Wall Street, working on statistical arbitrage, trade execution, simulation/modeling platforms, signal development, and options trading. I met her when we were both students at math camp in 1988, but we reconnected this past summer at the reunion.
The actual title of her talk is “The History of CO2: Past, Present and Future” and it’s open to the public, so please come if you can (it’s at 2:00 pm in room 409 here but more details are here).
After Elizabeth, we’ll be having our usual Occupy meeting. Topics this week include our plans for a Citigroup and HSBC picket later this month, our panel submissions to the Left Forum in June, our plans for May Day, and continued work on writing a book modeled after the Debt Resistor’s Operations Manual.
Housekeeping – RSS feed for mathbabe broken, possibly fixed
I’ve been trying to address the problem people have been having with their RSS feed for mathbabe. Thanks to my nerd-girl friend Jennifer Rubinovitz, I’ve changed some settings in my WordPress settings and now I can view all of my posts when I open up RSSOwl. But in order for your reader to get caught up I have a feeling you’ll need to somehow refresh it or maybe get rid of mathbabe and then re-subscribe. I’ll update as I learn more (please tell me what’s working for you!).
Guest post by Julia Evans: How I got a data science job
This is a guest post by Julia Evans. Julia is a data scientist & programmer who lives in Montréal. She spends her free time these days playing with data and running events for women who program or want to — she just started a Montréal chapter of pyladies to teach programming, and co-organize a monthly meetup called Montréal All-Girl Hack Night for women who are developers.
I asked mathbabe a question a few weeks ago saying that I’d recently started a data science job without having too much experience with statistics, and she asked me to write something about how I got the job. Needless to say I’m pretty honoured to be a guest blogger here 🙂 Hopefully this will help someone!
Last March I decided that I wanted a job playing with data, since I’d been playing with datasets in my spare time for a while and I really liked it. I had a BSc in pure math, a MSc in theoretical computer science and about 6 months of work experience as a programmer developing websites. I’d taken one machine learning class and zero statistics classes.
In October, I left my web development job with some savings and no immediate plans to find a new job. I was thinking about doing freelance web development. Two weeks later, someone posted a job posting to my department mailing list looking for a “Junior Data Scientist”. I wrote back and said basically “I have a really strong math background and am a pretty good programmer”. This email included, embarrassingly, the sentence “I am amazing at math”. They said they’d like to interview me.
The interview was a lunch meeting. I found out that the company (Via Science) was opening a new office in my city, and was looking for people to be the first employees at the new office. They work with clients to make predictions based on their data.
My interviewer (now my manager) asked me about my role at my previous job (a little bit of everything — programming, system administration, etc.), my math background (lots of pure math, but no stats), and my experience with machine learning (one class, and drawing some graphs for fun). I was asked how I’d approach a digit recognition problem and I said “well, I’d see what people do to solve problems like that, and I’d try that”.
I also talked about some data visualizations I’d worked on for fun. They were looking for someone who could take on new datasets and be independent and proactive about creating model, figuring out what is the most useful thing to model, and getting more information from clients.
I got a call back about a week after the lunch interview saying that they’d like to hire me. We talked a bit more about the work culture, starting dates, and salary, and then I accepted the offer.
So far I’ve been working here for about four months. I work with a machine learning system developed inside the company (there’s a paper about it here). I’ve spent most of my time working on code to interface with this system and make it easier for us to get results out of it quickly. I alternate between working on this system (using Java) and using Python (with the fabulous IPython Notebook) to quickly draw graphs and make models with scikit-learn to compare our results.
I like that I have real-world data (sometimes, lots of it!) where there’s not always a clear question or direction to go in. I get to spend time figuring out the relevant features of the data or what kinds of things we should be trying to model. I’m beginning to understand what people say about data-wrangling taking up most of their time. I’m learning some statistics, and we have a weekly Friday seminar series where we take turns talking about something we’ve learned in the last few weeks or introducing a piece of math that we want to use.
Overall I’m really happy to have a job where I get data and have to figure out what direction to take it in, and I’m learning a lot.




