Archive
Guest post: the age of algorithms
Artie has kindly allowed me to post his thoughtful email to me regarding my NYU conversation with Julia Angwin last month.
This is a guest post by Arthur Doskow, who is currently retired but remains interested in the application and overapplication of mathematical and data oriented techniques in business and society. Artie has a BS in Math and an MS that is technically in Urban Engineering, but the coursework was mostly in Operations Research. He spent the largest part of my professional life working for a large telco (that need not be named) on protocols, interconnection testing and network security. He is a co-inventor on several patents. He also volunteers as a tutor.
Dear Dr. O’Neil and Ms. Angwin,
I had the pleasure of watching the livestream of your discussion at NYU on February 15. I wanted to offer a few thoughts. I’ll try to be brief.
- Algorithms are difficult, and the ones that were discussed were being asked to make difficult decisions. Although it was not discussed, it would be a mistake to assume a priori that there is an effective mechanized and quantitative process by which good decisions can be made with regard to any particular matter. If someone cannot describe in detail how they would evaluate a teacher, or make a credit decision or a hiring decision or a parole decision, then it’s hard to imagine how they would devise an algorithm that would reliably perform the function in their stead. While it seems intuitively obvious that there are better teachers and worse teachers, reformed convicts and likely recidivist criminals and other similar distinctions, it is not (or should not be) equally obvious that the location of an individual on these continua can be reliably determined by quantitative methods. Reliance on a quantitative decision methodology essentially replaces a (perhaps arbitrary) individual bias with what may be a reliable and consistent algorithmic bias. Whether or not that represents an improvement must be assessed on a situation by situation basis.
- Beyond this stark “solvability” issue, of course, are the issues of how to set objectives for how an algorithm should perform (this was discussed with respect to the possible performance objectives of a parole evaluation system) and the devising, validating and implementing of a prospective system. This is a significant and demanding set of activities for any organization, but the alternative of procuring an outsourced “black box” solution requires, at the least, an understanding and an assessment of how these issues were addressed.
- If an organization is considering outsourcing an algorithmic decision system, the RFP process offers them an invaluable opportunity to learn and assess how a proposed system is designed and how it will work – What inputs does it use? How does its decision engine operate? How has it been validated? How will it cover certain test cases? Where has it been used? To what effect? Etc. Organizations that do not take advantage of an RFP process to ask these detailed questions and demand thorough and responsive answers have only themselves to blame.
- While a developers’ code of ethics is certainly a good thing, the development, marketing and support of a proposed solution is a shared task for which all members of the team must share responsibility – coders, system designers and specifiers, testers, marketers, trainers, support staff, executives. There is no single point of responsibility that can guarantee either a correct or an ethical implementation. Perhaps, in the same way that a CEO must personally sign off on all financial filings, the CEO of a company offering an evaluative system should be required to sign off on the legality, effectiveness and accuracy of claims made regarding the system.
- Software contracts are notoriously developer-friendly, basically absolving the developer of all possible consequences arising out of the use of their product. This needs to change, particularly in the case of systems sold as “black box” solutions to a purchaser’s needs, and contracts should be negotiated in which the developer retains significant responsibility and liability.
- As I think was pointed out, there is a broad range of analysis and modeling techniques, ranging from expert systems that seek to encode human knowledge, to heuristic learning system such as neural nets. While heuristic systems have the potential to ferret out non-intuitive relationships, their results obviously require a much higher degree of scrutiny. Part of me wonders how IBM and Watson would do at developing decision systems.
- Extensive testing and analysis should be required before any system “goes live”. It is disappointing to hear that “algorithm auditing” does not seem to be a thriving business, and, depending on the definition of “algorithm auditing”, I may be suggesting even more. Perhaps “algorithm testing” would be a more attractive sounding service name. Beyond requiring an analytical assessment of underlying data requirements and assessment algorithms, systems should be tested using an extensive set of test cases. Test cases should be assessed in advance by other (e.g., human expert) means, and system results should be examined for plausibility and for sanity. Another set of test cases should assess performance with extreme (e.g., best case, worst case) scenarios to check for system sanity. Another possibility is “side by side” testing, in which the system will “shadow” the current implementation, either concurrently or in retrospect and the results will be compared.
- Psychological and other pre-employment tests, described in Weapons of Math Destruction, are problematic in two ways. First is whether it is appropriate to conduct them at all, and second is whether they are effective in their stated purpose (i.e., to select the best prospective employees, or those best matched to the position in question). Certainly, competency testing is an appropriate part of candidate selection, but whether psychological characteristics are a component of competency is arguable, at best. At the very least, however, such testing should be assessed as to whether it predicts what it claims to predict, and whether that characteristic is emblematic of work effectiveness. How to conduct such testing would require some creativity. Testing could be conducted on an “incoming class” of employees, whether prior to hiring, or after hiring with the test results being sequestered (neither reported to company management nor used in any evaluation process). After some period (1 – 2 years), the qualitative measures of employee performance and effectiveness could be compared to the sequestered test results and examined for correlation. Another possibility would be to identify a disinterested company with employees performing similar work. (By disinterested, I mean disinterested in using the evaluative test in question.) Employees of that company could be asked to undergo “risk free” testing, with results again being sequestered from their employer. The quantitative test results could then be compared to the qualitative measures of employee performance and effectiveness used by that employer. Whatever one thinks of such testing, as Weapons of Math Destruction correctly points out, to the extent to which it is used, efforts should be made to test and improve its efficacy. To the extent that such testing is promoted by an outside party, that party should be ready, willing and able to demonstrate observed effectiveness.
- An interesting alternative to a proprietary black box system would be what might be called a meta-system, a configurable engine which would allow its procurer to specify the inputs, weightings and the manner in which they are used to formulate a decision, perhaps offering a drag and drop software interface to specify the decision algorithm. Such a system would leave the fundamentals of the decision algorithm design to the purchasing company, but simply facilitate its implementation.
- One must always be cautious the possibility of inherent bias in data. As a simple example, recidivism is most easily estimated by the proportion of released convicts who are re-arrested. But if recidivism is actually defined by the percentage of released convicts who return to criminal life, then the estimate is likely skewed in several ways. Some recidivists will be caught; others will not. For example, some types of crime are more heavily investigated than others, leading to higher re-arrest rates. Further, even among perpetrators of the same crime, investigation and enforcement may well be targeted more to some areas than to others.
- As was pointed out during the discussion, being fair, being humane may cost money. And this is the real issue with many algorithms. In economists’ terms, the inhumanity associated with an algorithm could be referred to as an externality. Optimization has its origins with the solutions to problem in the inanimate world, how to inspect mass produced parts for flaws, how to cut a board to obtain the most salable pieces of lumber, how to minimize the lengths of circuit traces on a PC board. There were problems that touched on human behavior, scheduling issues, or traveling salesman type problems, but not to the extent that they ignored humane considerations. We are now to the point where we have human beings being compared to poisonous Skittles, and where life altering decisions of great import (hiring, firing parole, assessment, scheduling, etc.) are being subjected to optimization processes, often of questionable validity, which objectify people, view them as resources or threats, and give little or no consideration to the very human consequences of their deployment. Assuming that your good work can drive to this consensus, there is a fork in the road as to how it can be addressed. One way would be to attempt to implement humane costs, benefits and constraints into the models being deployed and optimize on that basis. The other is to stand back and monitor applications for their human costs and attempt to address them iteratively. Or, as Yogi said, you can come to the fork and take it.
Dystopian Bloomberg Posts: Price Discrimination and Snap
This is just out on Bloomberg:
The Dystopian Future of Price Discrimination
And this came out Monday:
Snap Needs to Get Inside Your Head
We don’t know why we’re all so fat
One of the most ridiculous aspects of the “blame the individual” approach to obesity is the overall trend of fatness throughout the country and the world over the past few decades.


The rates of obesity have skyrocketed and we simply don’t know why. Here are a few reasons that are consistently trotted out:
- People have become lazy. (Actually this one’s easy. In fact we’re getting more exercise and it doesn’t seem to help. And for that matter, exercise doesn’t make you lose weight.)
- We’re eating too much fast food.
- We’re drinking too much soda and generally eating too much sugar.
- We’re watching too much TV/ playing too many video games.
- There are too many food options constantly surrounding us.
- Our portion sizes are too big.
- We’re just eating too much of everything. In some sense this is a tautology. The question is why.
- Glyphosates in our grains are making us fat.
- Our internal stomach biomes are messed up and make us fat.
- Bad genetics make us fat. This one’s easy too, since our gene pool hasn’t changed that much recently.
- Dieting itself makes us fat.
So, there are tons of reasons, and I’m sure I missed some. The tricky thing is, all of them sound plausible, and none of them are likely the single answer. Likely it’s a combination of a bunch of them.
But the truth is, we don’t know. And people hate not knowing stuff, so they pretend they know. That’s not helpful. We need to be scientists and try testing out hypotheses.
The biggest problem with testing the above hypotheses is that many of them are hard to avoid environmental factors of modern life. You can take the soda machine away from a high school but then the kids will just buy soda at the nearby corner store.
Until we’ve figured it out, I’d like us to admit I don’t know why we’re all so fat. And I’d like us to stop blaming individuals, especially children.
How to think statistically (about dieting)
There are lots of ways to get statistical thinking wrong, not so many ways to get it right. Here’s a series of examples from wrong to right:
- I did this, and it’s not a “diet,” it’s a lifestyle change, and it works for me!
- I know people who live or interact with the world in a certain way, and it seems to work for them! After all, French women are thin. We should all do what they do.
- There was a study of volunteers, and for the people who stayed in the study to the end, they lost weight doing such and such lifestyle change!
- There was a study of volunteers, and they tracked people down who tried to leave the study, and the average weight gain was still real, among the people they found!
- There was a study of doctors giving advice or enrolling people in programs to help overweight people lose weight, and 97% of people lost no weight and plenty of people gained weight, maybe even more than half.
What I’d love is for people to understand how much difference there is between a personal experience (1) and advice we’d have on public health (5).
Here’s the golden standard: if you can come up with something to tell Medicare about how to have a population of morbidly obese people become a population of regular weight people, then you win. Otherwise, if you’re tempted to tell me about a lifestyle change that worked for you, please don’t, because that’s not statistical.
Also, I’d like a word about the theory that with enough discipline and willpower, anyone can lose weight. I think it’s fair to say I have discipline and willpower. In fact, I’m a fucking poster child for them. I wrote a Ph.D. as one of few women in a male-dominated field. I wrote a book or two. I’ve had three kids and I’ve never struck one of them in anger. In fact I’m pretty nice to people most of the time, even though I’m relatively often filled with rage at the unfairness of the world. That’s hard. It takes willpower.
I even ran a sprint triathlon at 275 pounds, really fast, which took months of ridiculous training. Also, I know all about healthy habits, I don’t eat “emotionally,” just when I’m hungry, and I love brussel sprouts and other healthy foods. I just get really fucking hungry, often.
Readers, I’m the fucking center of the disciple in willpower universe over here.
Given all of that, if anything I’d argue my willpower is one reason I’m so heavy. When I was 22 or so, I went on a fat-free diet, on the advice of my doctor, that fucked me up; I lost 30 pounds but then gained something like 75. I think I messed up my insulin resistance. In fact I believe that also happened to me on my first starvation diet when I was 14.
I’m guessing I’d be thinner if I’d had less willpower, in other words. I wouldn’t be better off, though, because I kind of like my books and my Ph.D. and my kids who don’t fear their parents.
Anyway, from now on let’s talk statistically, shall we?
The nature of choice in diets
There’s a lot of statistical evidence that dieting doesn’t work. I’ll postpone the documentation of the highlights of that evidence for a later post, but you can google it for yourself (avoid, if you can, the links that are trying to get you to buy something).
And when I say “diets don’t work,” here’s what I mean. I mean that, statistically speaking, people who go on diets don’t successfully lose and keep off weight for more than about six months. So, after two years or so, the average weight is about the same or higher in a group of dieters.
Can we take that as a given for now? Thanks. We can argue about it later if you want.
Here’s the thing. That statement confounds lots of people, I think because it’s statistical in nature. They will always imagine that, because they are themselves examples of someone who has lost weight and kept it off for more than two years through dieting, dieting does in fact work, and we should all try what they’ve tried.
It’s annoying to be told this over and over again, especially when you’re someone who’s tried a million things. And believe me, almost every fat person I know has tried a million things. For that reason I’d appreciate no more such advice, although in a later post I will be asking for zany pseudo-scientific theories about why fat people stay fat (there are so many!).
So yeah, people don’t understand statistical facts. But I think there’s something more going on here. Namely, the illusory nature of choice when it comes to dieting.
Because diets do seem to work short term, people think they’ve gotten control over their eating, at least temporarily. And then, at some point, people drop off their diets. They sometimes do it with a “what the fuck” attitude, but my guess is most of them don’t even remember doing it. It’s a kind of momentary amnesia, and before they know what’s happened they’re eating something they shouldn’t have. That is certainly my experience.
From the outsider’s perspective, that’s a person who has chosen to go off their diets, and in a certain sense it’s obviously true, since for example anyone who was locked in a cell with no food would not have the ability to go off a diet, nor would someone who cannot feed themselves. Indeed, it requires the access to food and the action of eating to go off a diet. So in that sense it takes a certain amount of freedom.
But, there’s another sense in which, I’d argue, there’s no choice in the matter at all. After all, dieting requires a positive declaration of a desire to lose weight. Sometimes it even requires forking over cash, maybe a lot of cash. People are trying hard to lose weight, in other words, and yet they can’t, and even statistically speaking they cannot.
Said another way: if 1000 people went to a lot of trouble to do something, and they all tried but 990 of them failed to do it, would we decide they had made the choice not to do it?
I’m ready to say there’s something else at work here, something more basic than free will. It’s like our choice to breathe. We can’t decide not to do it. Or we can, but only for a bit.
Commenters, please stick to the question of the nature of choice in dieting. I will delete other stuff, thanks!
Updates: TED and bariatric surgery
Readers, I’ve got two announcements today.
First, I’ll be giving a TED talk in April in Vancouver. And yes, for those of you remember, I haven’t always been the biggest fan of such things. But I’ve changed my mind/ sold out/ decided that it might just be great.
As a friend of mine explained to me, sometimes things get so douchey they come out the other side and are super cool. Also, I’m giving a talk in the section called Our Robotic Overlords, so that’s a very good sign.
Second, I’ve decided to undergo bariatric surgery. I’m jumping through the many insurance-qualifying hoops for now but if all goes well it will happen later this year, possibly as soon as July.
And… I’m planning to chronicle my journey on mathbabe. If that kind of thing doesn’t interest you, feel free to never come back, but if that kind of thing does interest you, then buckle up!
I’m not planning to keep myself to the subject of the bariatric surgery; in fact that’s just an excuse to think about a lot more, specifically:
- the nature of scientific understanding and how it does or does not percolate throughout society as a whole,
- how money and shame corrupt our understanding of scientific evidence,
- how bad data and bad technologies and biased academic publishing prevent us from learning optimally,
- the nature of individual choice, willpower, and control,
- my historical self-image as a dieter, a fat person, a woman, a feminist, and a thinker,
- how I gathered evidence and made this decision, and of course
- the process itself.
So I’m thinking kind of big and I’m going to have fun with it. Please feel free to comment, I’d love your help!
How Data Can Make Immigrants Look Like Criminals
My newest Bloomberg View column:
How Data Can Make Immigrants Look Like Criminals
Bigger Data Isn’t Always Better Data
My newest piece on Bloomberg:
Bigger Data Isn’t Always Better Data
Insurance and Big Data Are Incompatible
My newest Bloomberg View piece about how that FitBit could be bad for your health:
That Free Health Tracker Could Cost You
New links!
- I wrote about how big data is undermining our understanding and faith in historical facts and in statistics in my newest Bloomberg column, Do You Trust Big Data? Try Googling the Holocaust
- Last week this Vice piece came out, which I contributed to along with lots of writers I really admire like Astra Taylor, on how technology can be made to work for us: Man Versus Machine
- My buddy Paul-Olivier Dehaye is on fire over at medium.com with his newest approach to disrupting the big data surveillance state. He now has devised a way to request your file from Cambridge Analytica, and I’m totally doing this: Quick guide to asking Cambridge Analytica for your data

Age of Algorithms: Data, Democracy and the News Event at NYU Journalism 2/15
Next Wednesday evening I’ll be talking data, democracy, and the news with the amazing Julia Angwin at the NYU Journalism School moderated by Robert Lee Hotz. More information here.
Please come! Or if you can’t come, you can watch the livestream.

Dear President Bannon…. #PostcardsToBannon
How do you get rid of the influence of Steve Bannon’s whispering in Trump’s ear? The best strategy I’ve heard is to make Trump jealous of the attention. And one way to do that is to refer to Bannon as the president.
The hashtag #PostcardsToBannon blew up on Twitter yesterday, with all sorts of people posting pics of their postcards:

From Justin Hendrix via Twitter
In fact, it got so much attention that it was featured overnight on USA Today.
It’s a small act but it might make you feel great to do it.
Donald Trump is the Singularity
I have a new fun piece over at Bloomberg this morning:
Becky Jaffe: Resources to #Resist
This is a guest post by Becky Jaffe.
Per your request, I drafted a quick list of progressive organizations that we will want to support now more than ever. This list of national organizations is by no means comprehensive, just a good place to start if you want to get plugged in to community organizations that build power for the most marginalized sectors of our society. Each of these is a clickable link that will take you directly to the organization’s website so you can learn more about their mission. Please add to this list and circulate widely. I will be creating a Bay Area-specific list soon for people who want to support local community organizations and I encourage you to make a similar list for your region.
Let’s get busy supporting each other, people! We have our work cut out for us and much joyful organizing ahead.
Immigrant/Refugee rights:
- National Network for Immigrant and Refugee Rights
- National Immigration Project of the National Lawyers’ Guild
- National Immigration Law Center
- Catholic Charities
- the New American Leaders Project
- Presente
- Define American
Civil Rights, social justice and legal defense organizations:
- CAIR, the Council on American-Islamic Relations
- SURJ, Showing Up for Racial Justice
- NAACP, National Association for the Advancement of Colored People
- Black Lives Matter
- the Anti-Defamation League
- Race Forward
- Fred T. Korematsu Institute for Civil Rights and Education
- Bend the Arc: a Jewish partnership for Justice
- Center for Constitutional Rights
- Human Rights Watch:United States
- ACLU, the American Civil Liberties Union
- NLG, the National Lawyer’s Guild
- Legal Aid Society
- SPLC the Southern Poverty Law Center
- The Innocence Project
- Schools Not Prisons
- Anti-Eviction Mapping Project
- SEIU, Service Employees International Union
- Planned Parenthood
- National Organization for Women
LGBTQ rights:
- GLAAD: Gay And Lesbian Alliance Against Defamation
- National Center for Lesbian Rights
- Human Rights Campaign
- Lambda Legal Defense and Education Fund
- Transgender Law Center
Disability rights:
Building democracy:
- Women’s March on Washington: 10 Actions for the first 100 Days
- the Equal Justice Society
- The Highlander Research and Education Center
Fight for the Future - Indivisible: Former congressional staffers reveal best practices for making Congress listen
- Common Cause
- FAIR: Fairness and Accuracy in Reporting
- Center for Digital Democracy
- Brennan Center for Justice
- Public Citizen
- Inequality Media
Environmental organizations:
Cambridge Analytica
My newest Bloomberg post is out, in response to this article about Cambridge Analytica:
Get a New York ID Card #Resist
This is a guest post by Elizabeth Hutchinson, an Associate Professor of Art History at Barnard College/Columbia University who supports social justice initiatives at work and in her community. She is also a yarn whisperer who likes nothing better than knitting with Mathbabe.
If you are a regular reader of Mathbabe, you may already be putting your time, money and intellectual labor to work in support of organizations that defend the rights of vulnerable groups and our vulnerable environment (#BlackLivesMatter, Make the Road New York, Planned Parenthood, SURJ, 350.org, NYCStandswithStandingRock, and many others).
But if you are a New York City resident, here’s another practical thing you can do: apply for an ID NYC card.

ID NYC is a program established by the de Blasio administration in 2014 that allows city residents to obtain a photo identification without requiring the same government-generated documents required for a drivers license or passport. These residents then have a municipal ID that can help them open bank accounts, apply for library cards and gain access to other services as well as free membership to a range of NYC cultural institutions like the Museum of Modern Art.

In lieu of a Social Security card or equivalent document, applicants for the ID NYC could use non-U.S. government-generated forms of identification, including, among other things, a combination of a utility bill verifying a local address and a foreign passport or consular identification.
Even if you have a photo ID and a library card, here’s why you should get an ID NYC: this program is widely used by the undocumented immigrants in our midst, and the records of their applications are vulnerable to seizure by federal government authorities charged with expanding the pursuit of both undocumented and documented immigrants.
How is this so, you might ask, knowing that New York is a sanctuary city? Well, it is true that New York is committed to not aiding Immigration and Customs Enforcement (ICE) in a number of ways. For example, it has pledged not to use its city precincts or jails to house immigrants detained by Immigration and Customs Enforcement (though it does cooperate when ICE requests individuals already in NYC custody who were convicted of a serious felony) and to not share city agency information with federal immigration authorities.

Sanctuary Cities according to this site. For a more complete list click here.
The ID NYC program was set up to be in line with this stance: the law establishing the program ordered that the copies of documents used in applying for the ID be destroyed at the end of the first two years, or in December 2016, in the meantime only sharing them with law enforcement only through judicial subpoena (something that happened only a handful of times). However, a case brought by Republican members of the State Assembly from Staten Island in December resulted in a ruling that all records be retained indefinitely.
After Trump’s election, Mayor de Blasio pledged to change the record keeping system and stop retaining copies of the applicants’ documents beginning in 2017. However, the city will continue to retain significant information about applicants, including their name, gender, address, birthdate, and the photo taken when the id was made.
The ID NYC program DOES NOT ask applicants about their immigration status. Nevertheless, because this program is well used by members of New York’s immigrant communities (according to the Gothamist, over a third of NYC residents are foreign-born), these applications could be used for fishing expeditions looking for our undocumented neighbors.
Yes, the Mayor has pledged to fight to keep this paperwork private. But we can’t be sure how the courts will act when push comes to shove.
The solution? Gum up the works.

Blast the program with lots and lots of applications from NYC residents so that any authority that does manage to subpoena applications has an immense archive to wade through. Estimates suggest that about 1 million people have applied for ID NYC to date. That leaves about 6.8 million New Yorkers who still can. (Yes, kids can apply, too, as long as they are 14.)
Applying is easy, though it will take you a little time. You start by making an appointment at one of the 25 enrollment centers. There’s a form to fill out (applications are available in more than 25 languages), that you can do ahead of time and print out or fill out when you get there. Bring along your documents. Once you check in, you wait for an agent to go over the application and take your picture and then you can arrange to receive the id in the mail or pick it up. I got mine at the Mid-Manhattan Library. I made the appointment about a month ahead of time, though there were appointments sooner, and waited less than an hour to see the agent. It was about as much hassle as mailing a package at the post office.

Maybe this isn’t the most effective form of resistance, but it is an easy one that may do some good.
I look forward to seeing you in the streets. And the public library. And MoMA.

To report incidents of discrimination or hate
- The Governor’s Office – 1-888-392-3644
- The Mayor’s Office of Immigrant Affairs 311 or 212-788-7654. Translation is available. You can also go to www1.nyc.gov for many other resources for NYC immigrants.
Additional Resources
- ImmigrationLawHelp.org – Helps low-income immigrants find legal help.
- National Immigration Law Center: Explains your rights, no matter who is president.
- New York Immigrant Coalition and
- Make the Road: Provide policy updates and resources to support immigrants in NYC
- New York Communities for Change
- Causa Justa/Just Cause
Immigrant protests #JFKTerminal4 and 2pm at Battery Park today
I was excited to join the protest at JFK Airport last night. Here’s some footage:
And here’s two nice pictures:


One of the cool things about the protest is how messages were sent and spread through the chants. In particular I learned about another planned protest today at Battery Park at 2pm, which I believe is being organized by immigrant rights group Make the Road.

More information available here.
By the way, in case you’ve heard that a judge put a stay on the Executive Order about immigrants, there are plenty of reasons to question that. It’s also possible that border patrol agents are not obeying those orders.
Bloomberg post: When Algorithms Come for Our Children
Hey all, my second column came out today on Bloomberg:
When Algorithms Come for Our Children
Also, I reviewed a book called Data for the People by Andreas Weigend for Science Magazine. My review has a long name:
Bloomberg View!
Great news! I’m now a Bloomberg View columnist. My first column came out this morning, and it’s called If Fake News Fools You, It Can Fool Robots, Too. Please take a look and tell me what you think!


