Today I read this article written by Allie Gross (hat tip Suresh Naidu), a former Teach for America teacher whose former idealism has long been replaced by her experiences in the reality of education in this country. Her article is entitled The Charter School Profiteers.
It’s really important, and really well written, and just one of the articles in the online magazine Jacobin that I urge you to read and to subscribe to. In fact that article is part of a series (here’s another which focuses on charter schools in New Orleans) and it comes with a booklet called Class Action: An Activist Teacher’s Handbook. I just ordered a couple of hard copies.
I’d really like you to read the article, but as a teaser here’s one excerpt, a rant which she completely backs up with facts on the ground:
You haven’t heard of Odeo, the failed podcast company the Twitter founders initially worked on? Probably not a big deal. You haven’t heard about the failed education ventures of the person now running your district? Probably a bigger deal.
When we welcome schools that lack democratic accountability (charter school boards are appointed, not elected), when we allow public dollars to be used by those with a bottom line (such as the for-profit management companies that proliferate in Michigan), we open doors for opportunism and corruption. Even worse, it’s all justified under a banner of concern for poor public school students’ well-being.
While these issues of corruption and mismanagement existed before, we should be wary of any education reformer who claims that creating an education marketplace is the key to fixing the ills of DPS or any large city’s struggling schools. Letting parents pick from a variety of schools does not weed out corruption. And the lax laws and lack of accountability can actually exacerbate the socioeconomic ills we’re trying to root out.
Yesterday was a day filled with secrets and codes. In the morning, at The Platform, we had guest speaker Columbia history professor Matthew Connelly, who came and talked to us about his work with declassified documents. Two big and slightly depressing take-aways for me were the following:
- As records have become digitized, it has gotten easy for people to get rid of archival records in large quantities. Just press delete.
- As records have become digitized, it has become easy to trace the access of records, and in particular the leaks. Connelly explained that, to some extent, Obama’s harsh approach to leakers and whistleblowers might be explained as simply “letting the system work.” Yet another way that technology informs the way we approach human interactions.
After class we had section, in which we discussed the Computer Science classes some of the students are taking next semester (there’s a list here) and then I talked to them about prime numbers and the RSA crypto system.
I got really into it and wrote up an iPython Notebook which could be better but is pretty good, I think, and works out one example completely, encoding and decoding the message “hello”.
When I was prepping for my Slate Money podcast last week I read this column by Matt Levine at Bloomberg on the Citigroup settlement. In it he raises the important question of how the fine amount of $7 billion was determined. Here’s the key part:
Citi’s and the Justice Department’s approaches both leave something to be desired. Citi’s approach seems to be premised on the idea that the misconduct was securitizing mortgages: The more mortgages you did, the more you gotta pay, regardless of how they performed. The DOJ’s approach, on the other hand, seems to be premised on the idea that the misconduct was sending bad e-mails about mortgages: The more “culpable” you look, the more it should cost you, regardless of how much damage you did.
I would have thought that the misconduct was knowingly securitizing bad mortgages, and that the penalties ought to scale with the aggregate badness of Citi’s mortgages. So, for instance, you’d want to measure how often Citi’s mortgages didn’t match up to its stated quality-control standards, and then compare the actual financial performance of the loans that didn’t meet the standards to the performance of the loans that did. Then you could say, well, if Citi had lived up to its promises, investors would have lost $X billion less than they actually did. And then you could fine Citi that amount, or some percentage of that amount. And you could do a similar exercise for the other big banks — JPMorgan, say, which already settled, or Bank of America, which is negotiating its settlement — and get comparable amounts that appropriately balance market share (how many bad mortgages did you sell?) and culpability (how bad were they?).
I think he nailed something here, which has eluded me in the past, namely the concept of what comprises evidence of wrongdoing and how that translates into punishment. It’s similar to what I talked about in this recent post, where I questioned what it means to provide evidence of something, especially when the data you are looking for to gather evidence has been deliberately suppressed by either the people committing wrongdoing or by other people who are somehow gaining from that wrongdoing but are not directly involved.
Basically the way I see Levine’s argument is that the Department of Justice used a lawyerly definition of evidence of wrongdoing – namely, through the existence of emails saying things like “it’s time to pray.” After determining that they were in fact culpable, they basically did some straight-up negotiation to determine the fee. That negotiation was either purely political or was based on information that has been suppressed, because as far as anyone knows the number was kind of arbitrary.
Levine was suggesting a more quantitative definition for evidence of wrongdoing, which involves estimating both “how much you know” and “how much damage you actually did” to determine the damage, and then some fixed transformation of that damage becomes the final fee. I will ignore Citi’s lawyers’ approach since their definition was entirely self-serving.
Here’s the thing, there are problems with both approaches. For example, with the lawyerly approach, you are basically just sending the message that you should never ever write some things on email, and most or at least many people know that by now. In other words, you are training people to game the system, and if they game it well enough, they won’t get in trouble. Of course, given that this was yet another fine and nobody went to jail, you could make the argument – and I did on the podcast – that nobody got in trouble anyway.
The problem with the quantitative approach, is that first of all you still need to estimate “how much you knew” which again often goes back to emails, although in this case could be estimated by how often the stated standards were breached, and second of all, when taken as a model, can be embedded into the overall trading model of securities.
In other words, if I’m a quant at a nasty place that wants to trade in toxic securities, and I know that there’s a chance I’d be caught but I know the formula for how much I’d have to pay if I got caught, then I could include this cost, in addition to an estimate of the likelihood for getting caught, in an optimization engine to determine exactly how many toxic securities I should sell.
To avoid this scenario, it makes sense to have an element of randomness in the punishments for getting caught. Every now and then the punishment should be much larger than the quantitative model might suggest, so that there is less of a chance that people can incorporate the whole shebang into their optimization procedure. So maybe what I’m saying is that arriving at a random number, like the DOJ did, is probably better even though it is less satisfying.
Another possibility to actually deter crimes would be to arbitrarily increasing the likelihood of catching people up to no good, but that has been bounded from above by the way the SEC and the DOJ actually work.
People who celebrate the monthly jobs report getting better nowadays often forget to mention a few facts:
- the new jobs are often temporary or part-time, with low wages
- the old lost jobs, which we lose each month, were often full-time with higher wages
I could go on, and I have, and mention the usual complaints about the definition of the unemployment rate. But instead I’ll take a turn into a thought experiment I’ve been having lately.
Namely, what is the future of work?
It’s important to realize that in some sense we’ve been here before. When all the farming equipment got super efficient and we lost agricultural jobs by the thousands, people swarmed to the cities and we started building things with manufacturing. So if before we had “the age of the farm,” we then entered into “the age of stuff.” And I don’t know about you but I have LOTS of stuff.
Now that all the robots have been trained and are being trained to build our stuff for us, what’s next? What age are we entering?
I kind of want to complain at this point that economists are kind of useless when it comes to questions like this. I mean, aren’t they in charge of understanding the economy? Shouldn’t they have the answer here? I don’t think they have explained it if they do.
Instead, I’m pretty much left considering various science fiction plots I’ve heard about and read about over the years. And my conclusion is that we’re entering “the age of service.”
The age of service is a kind of pyramid scheme where rich people employ individuals to service them in various ways, and then those people are paid well so they can hire slightly less rich people to service them, and so on. But of course for this particular pyramid to work out, the rich have to be SUPER rich and they have to pay their servants very well indeed for the trickle down to work out. Either that or there has to be a wealth transfer some other way.
So, as with all theories of the future, we can talk about how this is already happening.
I noticed this recent Bloomberg View article about how rich people don’t have normal doctors like you and me. They just pay out of pocket for super expensive service outside the realm of insurance. This is not new but it’s expanding.
Here’s another example of the future of jobs, which I should applaud because at least someone has a job but instead just kind of annoys me. Namely, the increasing frequency where I try to make a coffee date with someone (outside of professional meetings) and I have to arrange it with their personal assistant. I feel like, when it comes to social meetings, if you have time to be social, you have time to arrange your social calendar. But again, it’s the future of work here and I guess it’s all good.
More generally: there will be lots of jobs helping out old people and sick people. I get that, especially as the demographics tilt towards old people. But the mathematician in me can’t help but wonder, who will take care of the old people who used to be taking care of the old people? I mean, they by definition don’t have lots of extra cash floating around because they were at the bottom of the pyramid as younger workers.
Or do we have a system where people actually change jobs and levels as they age? That’s another model, where oldish people take care of truly old people and then at some point they get taken care of.
Of course, much like the Star Trek world, none of this has strong connection to the economy as it is set up now, so it’s hard to imagine a smooth transition to a reasonable system, and I’m not even claiming my ideas are reasonable.
By the way, by my definition most people who write computer programs – especially if they’re writing video games or some such – are in a service industry as well. Pretty much anyone who isn’t farming or building stuff in manufacturing is working in service. Writers, poets, singers, and teachers included. Hell, the future could be pretty awesome if we arrange things well.
Anyhoo, a whimsical post for Thursday, and if you have other ideas for the future of work and how that will work out economically, please comment.
In the past 12 hours I’ve read two fascinating articles about the crazy world of standardized testing. They’re both illuminating and well-written and you should take a look.
First, my data journalist friend Meredith Broussard has an Atlantic piece called Why Poor Schools Can’t Win At Standardized Testing wherein she tracks down the money and the books in the Philadelphia public school system (spoiler: there’s not enough of either), and she makes the connection between expensive books and high test scores.
Here’s a key phrase from her article:
Pearson came under fire last year for using a passage on a standardized test that was taken verbatim from a Pearson textbook.
The second article, in the New Yorker, is written by Rachel Aviv and is entitled Wrong Answer. It’s a close look, with interviews, of the cheating scandal from Atlanta, which I have been studying recently. The article makes the point that cheating is a predictable consequence of the high-stakes “data-driven” approach.
Here’s a key phrase from the Aviv article:
After more than two thousand interviews, the investigators concluded that forty-four schools had cheated and that a “culture of fear, intimidation and retaliation has infested the district, allowing cheating—at all levels—to go unchecked for years.” They wrote that data had been “used as an abusive and cruel weapon to embarrass and punish.”
Putting the two together, it’s pretty clear that there’s an acceptable way to cheat, which is by stocking up on expensive test prep materials in the form of testing company-sponsored textbooks, and then there’s the unacceptable way to cheat, which is where teachers change the answers. Either way the standardized test scoring regime comes out looking like a penal system rather than a helpful teaching aid.
Before I leave, some recent goodish news on the standardized testing front (hat tip Eugene Stern): Chris Christie just reduced the importance of value-added modeling for teacher evaluation down to 10% in New Jersey.
Hey my class starts today, I’m totally psyched!
The syllabus is up on github here and I prepared an iPython notebook here showing how to do basic statistics in python, and culminating in an attempt to understand what a statistically significant but tiny difference means, in the context of the Facebook Emotion study. Here’s a useless screenshot which I’m including because I’m proud:
Most of the rest of the classes will feature an awesome guest lecturer, and I’m hoping to blog about those talks with their permission, so stay tuned.
There’s a CNN video news story explaining how the NYC Mayor’s Office of Data Analytics is working with private start-up Placemeter to count and categorize New Yorkers, often with the help of private citizens who install cameras in their windows. Here’s a screenshot from the Placemeter website:
You should watch the video and decide for yourself whether this is a good idea.
Personally, it disturbs me, but perhaps because of my priors on how much we can trust other people with our data, especially when it’s in private hands.
To be more precise, there is, in my opinion, a contradiction coming from the Placemeter representatives. On the one hand they try to make us feel safe by saying that, after gleaning a body count with their video tapes, they dump the data. But then they turn around and say that, in addition to counting people, they will also categorize people: gender, age, whether they are carrying a shopping bag or pushing strollers.
That’s what they are talking about anyway, but who knows what else? Race? Weight? Will they use face recognition software? Who will they sell such information to? At some point, after mining videos enough, it might not matter if they delete the footage afterwards.
Since they are a private company I don’t think such information on their data methodologies will be accessible to us via Freedom of Information Laws either. Or, let me put that another way. I hope that MODA sets up their contract so that such information is accessible via FOIL requests.
I’ve talked before about the industry of for-profit colleges which exists largely to game the federal student loan program. They survive almost entirely on federal student loans of their students, while delivering terrible services and worthless credentials.
Well, good news: one of the worst of the bunch is closing its doors. Corinthian College, Inc (CCI) got caught lying about job placement of its graduates (in some cases, they said 100% when the truth was closer to 0%). They were also caught advertising programs they didn’t actually have.
But here’s what interests me the most, which I will excerpt from the California Office of the Attorney General:
CCI’s predatory marketing efforts specifically target vulnerable, low-income job seekers and single parents who have annual incomes near the federal poverty line. In internal company documents obtained by the Department of Justice, CCI describes its target demographic as “isolated,” “impatient,” individuals with “low self-esteem,” who have “few people in their lives who care about them” and who are “stuck” and “unable to see and plan well for future.”
I’d like to know more about how they did this. I’m guessing it was substantially online, and I’m guessing they got help from data warehousing services.
After skimming the complaint I’m afraid it doesn’t include such information, although it does say that the company advertised programs it didn’t have and then tricked potential students into filling out information about them so CCI could follow up and try to enroll them. Talk about predatory advertising!
Update: I’m getting some information by checking out their recent marketing job postings.
I’m super excited about the recent “mood study” that was done on Facebook. It constitutes a great case study on data experimentation that I’ll use for my Lede Program class when it starts mid-July. It was first brought to my attention by one of my Lede Program students, Timothy Sandoval.
My friend Ernest Davis at NYU has a page of handy links to big data articles, and at the bottom (for now) there are a bunch of links about this experiment. For example, this one by Zeynep Tufekci does a great job outlining the issues, and this one by John Grohol burrows into the research methods. Oh, and here’s the original research article that’s upset everyone.
It’s got everything a case study should have: ethical dilemmas, questionable methodology, sociological implications, and questionable claims, not to mention a whole bunch of media attention and dissection.
By the way, if I sound gleeful, it’s partly because I know this kind of experiment happens on a daily basis at a place like Facebook or Google. What’s special about this experiment isn’t that it happened, but that we get to see the data. And the response to the critiques might be, sadly, that we never get another chance like this, so we have to grab the opportunity while we can.
There have been two articles very recently about how great health data mining could be if we could only link up all the data sets. Larry Page from Google thinks so, which doesn’t surprise anyone, and separately we are seeing that the consequence of the new medical payment system through the ACA is giving medical systems incentives to keep tabs on you through data providers and find out if you’re smoking or if you need to fill up on asthma medication.
And although many would consider this creepy stalking, that’s not actually my problem with it. I think Larry Page is right – we might be able to save lots of lives if we could mine this data which is currently siloed through various privacy laws. On the other hand, there are reasons those privacy laws exist. Let’s think about that for a second.
Now that we have the ACA, insurers are not allowed to deny Americans medical insurance coverage because of a pre-existing condition, nor are they allowed to charge more, as of 2014. That’s good news on the health insurance front. But what about other aspects of our lives?
For example, it does not generalize to employers. In other words, a large employer like Walmart might take into account your current health and your current behaviors and possibly even your DNA to predict future behaviors, and they might decide not to give jobs to anyone at risk of diabetes, say. Even if medical insurance casts were taken out of the picture, which they haven’t been, they’d have incentives not to hire unhealthy people.
Mind you, there are laws that prevent employers from looking into HIPAA-protected health data, but not Acxiom data, which is entirely unregulated. And if we “opened up all the data” then the laws would be entirely moot. It would be a world where, to get a job, the employer got to see everything about you, including your future health profile. To some extent this is already happening.
Perhaps not everyone thinks of this as bad. After all, many people think smokers should pay more for insurance, why not also work harder to get a job? However, lots of the information gleaned from this data – even behaviors – have much more to do with poverty levels than circumstance than with conscious choice. In other words, it’s another stratification of society along the lucky/unlucky birth lottery spectrum. And if we aren’t careful, we will make it even harder for poor people to eke out a living.
I’m all for saving lives but let’s wait for the laws to catch up with the good intentions. Although to be honest, it’s not even clear how the law should be written, since it’s not clear what “medical” data is nowadays nor how we could gather evidence that a private employer is using it against someone improperly.
A tiny article in The Cap Times was recently published (hat tip Jordan Ellenberg) which describes the existence of a big data model which claims to help filter and rank school teachers based on their ability to raise student test scores. I guess it’s a kind of pre-VAM filtering system, and if it was hard to imagine a more vile model than the VAM, here you go. The article mentioned that the Madison School Board was deliberating on whether to spend $273K on this model.
One of the teachers in the district wrote her concerns about this model in her blog and then there was a debate at the school board meeting, and a journalist covered the meeting, so we know about it. But it was a close call, and this one could have easily slipped under the radar, or at least my radar.
Even so, now I know about it, and once I looked at the website of the company promoting this model, I found links to an article where they name a customer, for example in the Charlotte-Mecklenburg School District of North Carolina. They claim they only filter applications using their tool, they don’t make hiring decisions. Cold comfort for people who got removed by some random black box algorithm.
I wonder how many of the teachers applying to that district knew their application was being filtered through such a model? I’m going to guess none. For that matter, there are all sorts of application screening algorithms being regularly used of which applicants are generally unaware.
It’s just one example of the dark matter of big data. And by that I mean the enormous and growing clusters of big data models that are only inadvertently detectable by random small-town or small-city budget meeting journalism, or word-of-mouth reports coming out of conferences or late-night drinking parties with VC’s.
The vast majority of big data dark matter is still there in the shadows. You can only guess at its existence and its usage. Since the models themselves are proprietary, and are generally deployed secretly, there’s no reason for the public to be informed.
Let me give you another example, this time speculative, but not at all unlikely.
Namely, big data health models arising from the quantified self movement data. This recent Wall Street Journal article entitled Can Data From Your Fitbit Transform Medicine? articulated the issue nicely:
Consumer wearables fall into a regulatory gray area. Health-privacy laws that prevent the commercial use of patient data without consent don’t apply to the makers of consumer devices. “There are no specific rules about how those vendors can use and share data,” said Deven McGraw, a partner in the health-care practice at Manatt, Phelps, and Phillips LLP.
The key is that phrase “regulatory gray area”; it should make you think “big data dark matter lives here”.
When you have unprotected data that can be used as a proxy of HIPAA-protected medical data, there’s no reason it won’t be. So anyone who wants stands to benefit from knowing health-related information about you – think future employers who might help pay for future insurance claims – will be interested in using big data dark matter models gleaned from this kind of unregulated data.
To be sure, most people nowadays who wear fitbits are athletic, trying to improve their 5K run times. But the article explained that the medical profession is on the verge of suggesting a much larger population of patients use such devices. So it could get ugly real fast.
Secret big data models aren’t new, of course. I remember a friend of mine working for a credit card company a few decades ago. Her job was to model which customers to offer subprime credit cards to, and she was specifically told to target those customers who would end up paying the most in fees. But it’s become much much easier to do this kind of thing with the proliferation of so much personal data, including social media data.
I’m interested in the dark matter, partly as research for my book, and I’d appreciate help from my readers in trying to spot it when it pops up. For example, I remember begin told that a certain kind of online credit score is used to keep people on hold for customer service longer, but now I can’t find a reference to it anywhere. We should really compile a list at the boundaries of this dark matter. Please help! And if you don’t feel comfortable commenting, my email address is on the About page.
No time for a post this morning but go read this post by Scott Aaronson on using a PageRank-like algorithm to understand human morality and decision making. The post is funny, clever, very thoughtful, and pretty long.
One of the reasons I enjoy my blog is that I get to try out an argument and then see if readers can 1) poke holes in my arguement, or 2) if they misunderstand my argument, or 3) if they misunderstand something tangential to my argument.
Today I’m going to write about an issue of the third kind. Yesterday I talked about how I’d like to see the VAM scores for teachers directly compared to other qualitative scores or other VAM scores so we could see how reliably they regenerate various definitions of “good teaching.”
The idea is this. Many mathematical models are meant to replace a human-made model that is deemed too expensive to work out at scale. Credit scores were like that; take the work out of the individual bankers’ hands and create a mathematical model that does the job consistently well. The VAM was originally intended as such – in-depth qualitative assessments of teachers is expensive, so let’s replace them with a much cheaper option.
So all I’m asking is, how good a replacement is the VAM? Does it generate the same scores as a trusted, in-depth qualitative assessment?
When I made the point yesterday that I haven’t seen anything like that, a few people mentioned studies that show positive correlations between the VAM scores and principal scores.
But here’s the key point: positive correlation does not imply equality.
Of course sometimes positive correlation is good enough, but sometimes it isn’t. It depends on the context. If you’re a trader that makes thousands of bets a day and your bets are positively correlated with the truth, you make good money.
But on the other side, if I told you that there’s a ride at a carnival that has a positive correlation with not killing children, that wouldn’t be good enough. You’d want the ride to be safe. It’s a higher standard.
I’m asking that we make sure we are using that second, higher standard when we score teachers, because their jobs are increasingly on the line, so it matters that we get things right. Instead we have a machine that nobody understand that is positively correlated with things we do understand. I claim that’s not sufficient.
Let me put it this way. Say your “true value” as a teacher is a number between 1 and 100, and the VAM gives you a noisy approximation of your value, which is 24% correlated with your true value. And say I plot your value against the approximation according to VAM, and I do that for a bunch of teachers, and it looks like this:
So maybe your “true value” as a teacher is 58 but the VAM gave you a zero. That would not just be frustrating to you, since it’s taken as an important part of your assessment. You might even lose your job. And you might get a score of zero many years in a row, even if your true score stays at 58. It’s increasingly unlikely, to be sure, but given enough teachers it is bound to happen to a handful of people, just by statistical reasoning, and if it happens to you, you will not think it’s unlikely at all.
In fact, if you’re a teacher, you should demand a scoring system that is consistently the same as a system you understand rather than positively correlated with one. If you’re working for a teachers’ union, feel free to contact me about this.
One last thing. I took the above graph from this post. These are actual VAM scores for the same teacher in the same year but for two different class in the same subject – think 7th grade math and 8th grade math. So neither score represented above is “ground truth” like I mentioned in my thought experiment. But that makes it even more clear that the VAM is an insufficient tool, because it is only 24% correlated with itself.
Every now and then when I complain about the Value-Added Model (VAM), people send me links to recent papers written Raj Chetty, John Friedman, and Jonah Rockoff like this one entitled Measuring the Impacts of Teachers II: Teacher Value-Added and Student Outcomes in Adulthood or its predecessor Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates.
I think I’m supposed to come away impressed, but that’s not what happens. Let me explain.
Their data set for students scores start in 1989, well before the current value-added teaching climate began. That means teachers weren’t teaching to the test like they are now. Therefore saying that the current VAM works because an retrograded VAM worked in 1989 and the 1990’s is like saying I must like blueberry pie now because I used to like pumpkin pie. It’s comparing apples to oranges, or blueberries to pumpkins.
I’m surprised by the fact that the authors don’t seem to make any note of the difference in data quality between pre-VAM and current conditions. They should know all about feedback loops; any modeler should. And there’s nothing like telling teachers they might lose their job to create a mighty strong feedback loop. For that matter, just consider all the cheating scandals in the D.C. area where the stakes were the highest. Now that’s a feedback loop. And by the way, I’ve never said the VAM scores are totally meaningless, but just that they are not precise enough to hold individual teachers accountable. I don’t think Chetty et al address that question.
So we can’t trust old VAM data. But what about recent VAM data? Where’s the evidence that, in this climate of high-stakes testing, this model is anything but random?
If it were a good model, we’d presumably be seeing a comparison of current VAM scores and current other measures of teacher success and how they agree. But we aren’t seeing anything like that. Tell me if I’m wrong, I’ve been looking around and I haven’t seen such comparisons. And I’m sure they’ve been tried, it’s not rocket science to compare VAM scores with other scores.
The lack of such studies reminds me of how we never hear about scientific studies on the results of Weight Watchers. There’s a reason such studies never see the light of day, namely because whenever they do those studies, they decide they’re better off not revealing the results.
And if you’re thinking that it would be hard to know exactly how to rate a teacher’s teaching in a qualitative, trustworthy way, then yes, that’s the point! It’s actually not obvious how to do this, which is the real reason we should never trust a so-called “objective mathematical model” when we can’t even decide on a definition of success. We should have the conversation of what comprises good teaching, and we should involve the teachers in that, and stop relying on old data and mysterious college graduation results 10 years hence. What are current 6th grade teachers even supposed to do about studies like that?
Note I do think educators and education researchers should be talking about these questions. I just don’t think we should punish teachers arbitrarily to have that conversation. We should have a notion of best practices that slowly evolve as we figure out what works in the long-term.
So here’s what I’d love to see, and what would be convincing to me as a statistician. If we see all sorts of qualitative ways of measuring teachers, and see their VAM scores as well, and we could compare them, and make sure they agree with each other and themselves over time. In other words, at the very least we should demand an explanation of how some teachers get totally ridiculous and inconsistent scores from one year to the next and from one VAM to the next, even in the same year.
We need some ground truth, people, and some common sense as well. Instead we’re seeing retired education professors pull statistics out of thin air, and it’s an all-out war of supposed mathematical objectivity against the civil servant.
Have you seen Obama’s latest response to the student debt crisis (hat tip Ernest Davis)? He’s going to rank colleges based on some criteria to be named later to decide whether a school deserves federal loans and grants. It’s a great example of a mathematical model solving the wrong problem.
Now, I’m not saying there aren’t nasty leeches who are currently gaming the federal loan system. For example, take the University of Phoenix. It’s not a college system, it’s a business which extracts federal and private loan money from unsuspecting people who want desperately to get a good job some day. And I get why Obama might want to put an end to that gaming, and declare the University of Phoenix and its scummy competitors unfit for federal loans. I get it.
But unfortunately it won’t fix the problem. Because the real problem is the federal loan system in the first place, which has grown a shitton since I was in school:
and in the meantime, our state and private schools are getting more and more expensive relative to the available grants:
And state funding for public schools has decreased while tuition has increased especially since the financial crisis:
The bottomline is that we – and especially our children – need more state school funding much more than we need a ranking algorithm. The best way to bring down tuition rates at private schools is to give them competition at good state schools.
I gave a talk to the invitation-only NYC CTO Club a couple of weeks ago about my fears about big data modeling, namely:
- that big data modeling is discriminatory,
- that big data modeling increases inequality, and
- that big data modeling threatens democracy.
I had three things on my “to do” list for the audience of senior technologists, namely:
- test internal, proprietary models for discrimination,
- help regulators like the CFPB develop reasonable audits, and
- get behind certain models being transparent and publicly accessible, including credit scoring, teacher evaluations, and political messaging models.
Given the provocative nature of my talk, I was pleasantly surprised by the positive reception I was given. Those guys were great – interactive, talkative, and very thoughtful. I think it helped that I wasn’t trying to sell them something.
Even so, I shouldn’t have been surprised when one of them followed up with me to talk about a possible business model for “fairness audits.” The idea is that, what with the recent bad press about discrimination in big data modeling (some of the audience had actually worked with the Podesta team), there will likely be a business advantage to being able to claim that your models are fair. So someone should develop those tests that companies can take. Quick, someone, monetize fairness!
One reason I think this might actually work – and more importantly, be useful – is that I focused on “effects-based” discrimination, which is to say testing a model by treating it like a black box and seeing how it works on different inputs and gives different outputs. In other words, I want to give a resume-sorting algorithm different resumes with similar qualifications but different races. An algorithmically induced randomized experiment, if you will.
From the business perspective, a test that allows a model to remain a black box feels safe, because it does not require true transparency, and allows the “secret sauce” to remain secret.
One thing, though. I don’t think it makes too much sense to have a proprietary model for fairness auditing. In fact the way I was imagining this was to develop an open-source audit model that the CFPB could use. What I don’t want, and which would be worse than nothing, would be if some private company developed a proprietary “fairness audit” model that we cannot trust and would claim to solve the very real problems listed above.
Update: something like this is already happening for privacy compliance in the big data world (hat tip David Austin).
I get asked pretty often whether I “believe” in open data. I tend to murmur a response along the lines of “it depends,” which doesn’t seem too satisfying to me or to the person I’m talking about. But this morning, I’m happy to say, I’ve finally come up with a kind of rule, which isn’t universal. It focuses on power.
Namely, I like data that shines light on powerful people. Like the Sunlight Foundation tracks money and politicians, and that’s good. But I tend to want to protect powerless people, like people who are being surveilled with sensors and their phones. And the thing is, most of the open data focuses on the latter. How people ride the subway or how they use the public park or where they shop.
Something in the middle is crime data, where you have compilation of people being stopped by the police (powerless) and the police themselves (powerful). But here as well you’ll notice an asymmetry on identifying information. Looking at Stop and Frisk data, for example, there’s a precinct to identify the police officer, but no badge number, whereas there’s a bunch of identifying information about the person being stopped which is recorded.
A lot of the time you won’t even find data about powerful people. Worker bees get scored but the managers are somehow above scoring. Bloomberg never scored his lieutenants or himself even when he insisted that teachers should be scored. I like to keep an eye on who gets data collected about them. The power is where the data isn’t.
I guess my point is this. Data and data modeling are not magical tools. They are in fact crude tools, and so to focus on them is misleading and distracting from the real show, which is always about power (and/or money). It’s a boondoggle to think about data when we should be thinking about when and how a model is being wielded and who gets to decide.
One of the biggest problem we face is that all this data is being collected and saved now and the models haven’t even been invented yet. That’s why there’s so much urgency in getting reasonable laws in place to protect the powerless.
This morning I received this link on plagiarism software via the Columbia Journalism School mailing list – which is significantly more interesting than most faculty mailing lists, I might add.
In the article, the author, Neuroskeptic, describes the smallish bit of work one has to go through to “launder” text in order for the standard plagiarism detector software to deem it original. The example Neuroskeptic gives us is, ironically, from a research ethics program in Britain called PIE which Neuroskeptic is accusing of plagiarizing text:
PIE Original: You are invited to join the Publication Integrity and Ethics (herein referred to as PIE) as one of its founding members. PIE, a not-for profit organisation, offers free membership to all interested individuals. Please join us and become part of this exciting new movement in the world of publishing ethics; it is the professional home for authors, reviewers, editorial board members and editors-in-chief.
Neuroskeptic: You are invited to join Publication Integrity and Ethics (herein referred to as PIE) and become one of its founding members. PIE, a not-for profit organisation, offers interested individuals free membership. Please join this exciting new movement in the publishing ethics world; PIE is the professional home for reviewers, editorial board members, authors, and editors-in-chief.
This second, laundered piece of text got through software called Grammarly, and the first one didn’t.
Neuroskeptic made his or her point, and PIE has been adequately shamed into naming their sources. But I think the larger point is critical.
Namely, this is the problem with having standard software for things like plagiarism. If everyone uses the same tools, then anyone can launder their text sufficiently to jump through all the standard hoops and then be satisfied that they won’t get caught. You just keep running your text through the software, adding “the’s” and changing the beginning of sentences, until it comes out with a green light.
The rules aren’t that you can’t plagiarize, but instead that you can’t plagiarize without adequate laundering.
This reminds me of my previous post on essay correction software. As soon as you have a standard for that, or even a standard approach, you can automate writing essays that will get a good grade, by iteratively running a grading algorithm (or a series of grading algorithms) on your piece and adding big words or whatever until you get an “A” on all versions. You might need to input the topic and the length of the essay, but that’s it.
And if you think that someone smart enough to code this up deserves an A just for the effort, keep in mind that you can buy such software as well. So really it’s about who has money.
Far from believing in MOOC’s destroying the college campus and have everything online, in my cynical moments I’m starting to believe we’re going to have to talk to and test people face to face to be sure they aren’t using algorithms to cheat on tests.
Of course once our brains are directly wired into the web it won’t make a difference.
I finished reading Podesta’s Big Data Report to Obama yesterday, and I have to say I was pretty impressed. I credit some special people that got involved with the research of the report like Danah Boyd, Kate Crawford, and Frank Pasquale for supplying thoughtful examples and research that the authors were unable to ignore. I also want to thank whoever got the authors together with the civil rights groups that created the Civil Rights Principles for the Era of Big Data:
- Stop High-Tech Profiling. New surveillance tools and data gathering techniques that can assemble detailed information about any person or group create a heightened risk of profiling and discrimination. Clear limitations and robust audit mechanisms are necessary to make sure that if these tools are used it is in a responsible and equitable way.
- Ensure Fairness in Automated Decisions. Computerized decisionmaking in areas such as employment, health, education, and lending must be judged by its impact on real people, must operate fairly for all communities, and in particular must protect the interests of those that are disadvantaged or that have historically been the subject of discrimination. Systems that are blind to the preexisting disparities faced by such communities can easily reach decisions that reinforce existing inequities. Independent review and other remedies may be necessary to assure that a system works fairly.
- Preserve Constitutional Principles. Search warrants and other independent oversight of law enforcement are particularly important for communities of color and for religious and ethnic minorities, who often face disproportionate scrutiny. Government databases must not be allowed to undermine core legal protections, including those of privacy and freedom of association.
- Enhance Individual Control of Personal Information. Personal information that is known to a corporation — such as the moment-to-moment record of a person’s movements or communications — can easily be used by companies and the government against vulnerable populations, including women, the formerly incarcerated, immigrants, religious minorities, the LGBT community, and young people. Individuals should have meaningful, flexible control over how a corporation gathers data from them, and how it uses and shares that data. Non-public information should not be disclosed to the government without judicial process.
- Protect People from Inaccurate Data. Government and corporate databases must allow everyone — including the urban and rural poor, people with disabilities, seniors, and people who lack access to the Internet — to appropriately ensure the accuracy of personal information that is used to make important decisions about them. This requires disclosure of the underlying data, and the right to correct it when inaccurate.
This was signed off on by multiple civil rights groups listed here, and it’s a great start.
One thing I was not impressed by: the only time the report mentioned finance was to say that, in finance, they are using big data to combat fraud. In other words, finance was kind of seen as an industry standing apart from big data, and using big data frugally. This is not my interpretation.
In fact, I see finance as having given birth to big data. Many of the mistakes we are making as modelers in the big data era, which require the Civil Rights Principles as above, were made first in finance. Those modeling errors – and when not errors, politically intentional odious models – were created first in finance, and were a huge reason we first had the mortgage-backed-securities rated with AAA ratings and then the ensuing financial crisis.
In fact finance should have been in the report standing as a worst case scenario.
One last thing. The recommendations coming out of the Podesta report are lukewarm and are even contradicted by the contents of the report, as I complained about here. That’s interesting, and it shows that politics played a large part of what the authors could include as acceptable recommendations to the Obama administration.
Here’s one recommendation related to discrimination:
Expand Technical Expertise to Stop Discrimination. The detailed personal profiles held about many consumers, combined with automated, algorithm-driven decision-making, could lead—intentionally or inadvertently—to discriminatory outcomes, or what some are already calling “digital redlining.” The federal government’s lead civil rights and consumer protection agencies should expand their technical expertise to be able to identify practices and outcomes facilitated by big data analytics that have a discriminatory impact on protected classes, and develop a plan for investigating and resolving violations of law.
First, I’m very glad this has been acknowledged as an issue; it’s a big step forward from the big data congressional subcommittee meeting I attended last year for example, where the private-data-for-services fallacy was leaned on heavily.
So yes, a great first step. However, the above recommendation is clearly insufficient to the task at hand.
It’s one thing to expand one’s expertise – and I’d be more than happy to be a consultant for any of the above civil rights and consumer protection agencies, by the way – but it’s quite another to expect those groups to be able to effectively measure discrimination, never mind combat it.
Why? It’s just too easy to hide discrimination: the models are proprietary, and some of them are not even apparent; we often don’t even know we’re being modeled. And although the report brings up discriminatory pricing practices, it ignores redlining and reverse-redlining issues, which are even harder to track. How do you know if you haven’t been made an offer?
Once they have the required expertise, we will need laws that allow institutions like the CFPB to deeply investigate these secret models, which means forcing companies like Larry Summer’s Lending Club to give access to them, where the definition of “access” is tricky. That’s not going to happen just because the CFPB asks nicely.