Author Archive

Book Tour Events!

Readers, I’m so happy to announce upcoming public events for my book tour, which starts in 2 weeks! Holy crap!

The details aren’t all entirely final, and there may be more events added later, but here’s what we’ve got so far. I hope I see some of you soon!

Events for Cathy O’Neil

Author of

How Big Data Increases Inequality and Threatens Democracy

(Crown; September 6, 2016)

Thursday, September 8


Reading/Signing/Talk with Felix Salmon

Barnes & Noble Upper East Side

150 E 86th St.

New York, NY 10028


Tuesday, September 13


In Conversation Event

Town Hall Seattle

1119 8th Ave.

Seattle, WA 98101 

Wednesday, September 14

Democracy/Citizenship Series

Mechanics’ Institute Library

57 Post St.

San Francisco, CA 94104


Wednesday, September 14

In Conversation with Lianna McSwain

Book Passage

51 Tamal Vista Blvd.

Corte Madera, CA 94925


Thursday, September 15


Privacy.Security.Risk. 2016

San Jose Marriott

301 S. Market Street

San Jose, CA 95113

Tuesday, September 20


In Conversation with Jen Golbeck

Busboys and Poets (w/Politics & Prose)

1025 5th Street NW

Washington, D.C. 20001

Monday, October 3



Harvard Book Store

1256 Mass Ave.

Cambridge, MA 02138

Saturday, October 22nd


Wisconsin Book Festival

Wisconsin Institutes for Discovery

DeLuca Forum

For more information or to schedule an interview contact:
Sarah Breivogel, 212-572-2722, or
Liz Esman, 212-572-6049,

Categories: Uncategorized

Chicago’s “Heat List” predicts arrests, doesn’t protect people or deter crime

A few months ago I publicly pined for a more scientific audit of the Chicago Police Department’s “Heat List” system. The excerpt from that blogpost:

…the Chicago Police Department uses data mining techniques of social media to determine who is in gangs. Then they arrest scores of people on their lists, and finally they tout the accuracy of their list in part because of the percentage of people who were arrested who were also on their list. I’d like to see a slightly more scientific audit of this system.

Thankfully, my request has officially been fulfilled!

Yesterday I discovered via Marcos Carreiro on Twitter, that a paper has been written entitled Predictions put into practice: a quasi-experimental evaluation of Chicago’s predictive policing pilot, written by Priscillia Hunt, and John S. Hollywood and published in the Journal of Experimental Criminology.

The paper’s main result upheld my suspicions:

Individuals on the SSL are not more or less likely to become a victim of a homicide or shooting than the comparison group, and this is further supported by city-level analysis. The treated group is more likely to be arrested for a shooting.

Inside the paper, they make the following important observations. First, crime rates have been going down over time, and the “Heat List” system has not effected that trend. An excerpt:

…the statistically significant reduction in monthly homicides predated the introduction of the SSL, and that the SSL did not cause further reduction in the average number of monthly homicides above and beyond the pre-existing trend.

Here’s an accompanying graphic:

Screen Shot 2016-08-18 at 6.14.39 AM.png

This is a really big and important point, one that smart people like Gillian Tett get thrown off by when discussing predictive policing tools. We cannot automatically attribute success to any policing policy in the context of meta-effects.

Next, being on the list doesn’t protect you:

However, once other demographics, criminal history variables, and social network risk have been controlled for using propensity score weighting and doubly-robust regression modeling, being on the SSL did not significantly reduce the likelihood of being a murder or shooting victim, or being arrested for murder.

But it does make it more likely for you to get surveilled by police:

Seventy-seven percent of the SSL subjects had at least one contact card over the year following the intervention, with a mean of 8.6 contact cards, and 60 % were arrested at some point, with a mean of 1.53 arrests. In fact, almost 90 % had some sort of interaction with the Chicago PD (mean = 10.72 interactions) during the year-long observation window. This increased surveillance does appear to be caused by being placed on the SSL. Individuals on SSL were 50 % more likely to have at least one contact card and 39 % more likely to have any interaction (including arrests, contact cards, victimizations, court appearances, etc.) with the Chicago PD than their matched comparisons in the year following the intervention. There was no statistically significant difference in their probability of being arrested or incapacitated8 (see Table 4). One possibility for this result, however, is that, given the emphasis by commanders to make contact with this group, these differences are due to increased reporting of contact cards for SSL subjects.

And, most importantly, being on the list means you are likely to be arrested for shooting, but it doesn’t cause that to be true:

In other words, the additional contact with police did not result in an increased likelihood for arrests for shooting, that is, the list was not a catalyst for arresting people for shootings. Rather, individuals on the list were people more likely to be arrested for a shooting regardless of the increased contact.

That also comes with an accompanying graphic:

Screen Shot 2016-08-18 at 6.29.13 AM.png

From now on, I’ll refer to Chicago’s “Heat List” as a way for the police to predict their own future harassment and arrest practices.

Categories: Uncategorized

What is alpha?

Last week on Slate Money I had a disagreement, or at least a lively discussion, with Felix Salmon and Josh Barro on the definition of alpha.

They said it was anything that a portfolio returned above and beyond the market return, given the amount of risk the portfolio was carrying. That’s not different from how wikipedia defines alpha, and I’ve seen it said in more or less this way in a lot of places. Thus the confusion.

However, while working as a quant at a hedge fund, I was taught that alpha was the return of a portfolio that was uncorrelated to the market.

It’s a confusing thing to discuss, partly because the concept of “risk” is somewhat self-referential – more on that soon – and partly because we sometimes embed what’s called the capital asset pricing model (CAPM) into our assumptions when we talk about how portfolio returns work.

Let’s start with the following regression, which refers to stock-based portfolios, and which defines alpha:

R_{i, t} - R_f = \alpha + \beta (R_{M, t} - R_f) + \epsilon_t

Now, the term term R_f refers to the risk-free rate, or in other words how much interest you get on US treasuries, which we can approximate by 0 because it’s easier to ignore them and because it’s actually pretty close to 0 anyway. That cleans up our formula:

R_{i, t} = \alpha + \beta R_{M, t} + \epsilon_t

In this regression, we are fitting the coefficients \alpha and \beta to many instances of time windows where we’ve measured our portfolio’s return R_{i, t} and the market’s return R_{M, t}. Think of market as the S&P500 index, and think of the time windows as days.

So first, defining alpha with the above regression does what I claimed it would do: it “picks off” that part of the portfolio returns that are correlated to the market and put it in the beta coefficient, and the rest is left to alpha. If beta is 1, alpha is 0, and if the error terms are all zero, you are following the market exactly.

On the other hand, the above formulation also seems to support Felix’s suggestion that alpha is the return that is not accounted for by risk. The thing is, it’s true, at least according to the CAPM theory of investing, which says you can’t do better than the market, that you’re rewarded by market your risk in a direct way, and that everyone knows this and refuses to take on other, unrewarded risks. In particular, alpha in the above equation should be zero, but anything “extra” that you earn beyond the expected market returns would be represented by alpha in the above regression.

So, are we actually agreeing?

Well, no. The two approaches to defining alpha are very different. In particular, my definition has no reference to CAPM. Say for a moment we don’t believe in CAPM. We can still run the regression above. All we’re doing, when we run that regression, is measuring the extent to which our portfolio’s returns are “explained” by its overlap with the market.

In particular, we do not expect the true risk of our portfolio to be apparent in the above equation. Which brings us to how risk is defined, and it’s weird, because it cannot be directly measured. Instead, we typically infer risk from the volatility – computed as standard deviation – of past returns.

This isn’t a terrible idea, because if something moves around wildly on a daily basis, it would appear to be pretty risky. But it’s also not the greatest idea, as we learned in 2008, because lots of credit instruments like credit default swaps move very little on a daily basis but then suddenly lose tremendous value overnight. So past performance is not always indicative of future performance.

But it’s what we’ve got, so let’s hold on to it for the discussion. The key observation is the following:

The above regression formula only displays the market-correlated risk, and the remaining risk is unmeasured. A given portfolio might have incredibly wild swings in value, but as long as they are uncorrelated to the market, they will be invisible to the above equation, showing up only in the error terms.

Said another way, alpha is not truly risk-adjusted. It’s only market-risk-adjusted.

We might have an investment portfolio with a large alpha and a small beta, and someone who only follows CAPM theory would tell me we’re amazing investors. In fact hedge funds try to minimize their relationship to market returns – that’s the “hedge” in hedge funds – and so they’d want exactly that, a large alpha, a tiny beta, and quite a bit of risk. [One caveat: some people stipulate that a lot of that uncorrelated return is fabricated through sleazy accounting.]

It’s not like I am alone here – for a long time people have been aware that there’s lots of risk that’s not represented by market risk – for example, other instrument classes and such. So instead of using a simplistic regression like the one above, people generalize everything in sight and use the Sharpe ratio, which is the ratio of returns (often relative to some benchmark or index) to risks, where risks are measured by more complicated volatility-like computations.

However, that more general concept is also imperfect, mostly because it’s complicated and highly gameable. Portfolio managers are constantly underestimating the risk they take on, partly because – or entirely because – they can then claim to have a high Sharpe ratio.

How much does this matter? People have a colloquial use for the word alpha that’s different from my understanding, which isn’t surprising. The problem lies in the possibility that people are bragging when they shouldn’t, especially when they’re hiding risk, and especially especially if your money is on the line.

Categories: Uncategorized

The truth about clean swimming pools

There’s been a lot of complaints about the Olympic pools turning green and dirty in Rio. People seem worried that the swimmers’ health may be at risk and so on.

Well, here’s what I learned last month when my family rented a summer house with a pool. Pools that look clean are not clean. They would be better described as, “so toxic that algae cannot live in it.”

I know what I’m talking about. One weekend my band visiting the house, and the pool guy had been missing for 2 weeks straight. This is what my pool looks like:


Album cover, obviously.

Then we added an enormous vat of chemicals, specifically liquid chlorine, and about 24 hours later this is what happened:


It wasn’t easy to recreate this. I had to throw the shark’s tail at Jamie like 5 times because it kept floating away. Also, back of the album, obviously.

Now you might notice that it’s not green anymore, but it’s also not clear. To get to clear, blue water, you need to add yet another tub of some other chemical.

Long story short: don’t be deceived by “clean” pool water. There’s nothing clean about it.

Update: I’m not saying “chemicals are bad,” and please don’t compare me to the – ugh – Food Babe! I’m just saying “clean water” isn’t an appropriate description. It’s not as if it’s pure water, and we pour tons of stuff in to get it to look like that. So yes, algae and germs can be harmful! And yes, chlorine in moderate amounts is not bad for you!

Categories: Uncategorized

Donald Trump is like a biased machine learning algorithm

Bear with me while I explain.

A quick observation: Donald Trump is not like normal people. In particular, he doesn’t have any principles to speak of, that might guide him. No moral compass.

That doesn’t mean he doesn’t have a method. He does, but it’s local rather than global.

Instead of following some hidden but stable agenda, I would suggest Trump’s goal is simply to “not be boring” at Trump rallies. He wants to entertain, and to be the focus of attention at all times. He’s said as much, and it’s consistent with what we know about him. A born salesman.

What that translates to is a constant iterative process whereby he experiments with pushing the conversation this way or that, and he sees how the crowd responds. If they like it, he goes there. If they don’t respond, he never goes there again, because he doesn’t want to be boring. If they respond by getting agitated, that’s a lot better than being bored. That’s how he learns.

A few consequences. First, he’s got biased training data, because the people at his rallies are a particular type of weirdo. That’s one reason he consistently ends up saying things that totally fly within his training set – people at rallies – but rub the rest of the world the wrong way.

Next, because he doesn’t have any actual beliefs, his policy ideas are by construction vague. When he’s forced to say more, he makes them benefit himself, naturally, because he’s also selfish. He’s also entirely willing to switch sides on an issue if the crowd at his rallies seem to enjoy that.

In that sense he’s perfectly objective, as in morally neutral. He just follows the numbers. He could be replaced by a robot that acts on a machine learning algorithm with a bad definition of success – or in his case, a penalty for boringness – and with extremely biased data.

The reason I bring this up: first of all, it’s a great way of understanding how machine learning algorithms can give us stuff we absolutely don’t want, even though they fundamentally lack prior agendas. Happens all the time, in ways similar to the Donald.

Second, some people actually think there will soon be algorithms that control us, operating “through sound decisions of pure rationality” and that we will no longer have use for politicians at all.

And look, I can understand why people are sick of politicians, and would love them to be replaced with rational decision-making robots. But that scenario means one of three things:

  1. Controlling robots simply get trained by the people’s will and do whatever people want at the moment. Maybe that looks like people voting with their phones or via the chips in their heads. This is akin to direct democracy, and the problems are varied – I was in Occupy after all – but in particular mean that people are constantly weighing in on things they don’t actually understand. That leaves them vulnerable to misinformation and propaganda.
  2. Controlling robots ignore people’s will and just follow their inner agendas. Then the question becomes, who sets that agenda? And how does it change as the world and as culture changes? Imagine if we were controlled by someone from 1000 years ago with the social mores from that time. Someone’s gonna be in charge of “fixing” things.
  3. Finally, it’s possible that the controlling robot would act within a political framework to be somewhat but not completely influenced by a democratic process. Something like our current president. But then getting a robot in charge would be a lot like voting for a president. Some people would agree with it, some wouldn’t. Maybe every four years we’d have another vote, and the candidates would be both people and robots, and sometimes a robot would win, sometimes a person. I’m not saying it’s impossible, but it’s not utopian. There’s no such thing as pure rationality in politics, it’s much more about picking sides and appealing to some people’s desires while ignoring others.
Categories: Uncategorized

Holy crap – an actual book!

Yo, everyone! The final version of my book now exists, and I have exactly one copy! Here’s my editor, Amanda Cook, holding it yesterday when we met for beers:


Here’s my son holding it:


He’s offered to become a meme in support of book sales.

Here’s the back of the book, with blurbs from really exceptional people:


In other exciting book news, there’s a review by Richard Beales from Reuter’s BreakingViews, and it made a list of new releases in Scientific American as well.


I want to apologize in advance for all the book news I’m going to be blogging, tweeting, and otherwise blabbing about. To be clear, I’ve been told it’s my job for the next few months to be a PR person for my book, so I guess that’s what I’m up to. If you come here for ideas and are turned off by cheerleading, feel free to temporarily hate me, and even unsubscribe to whatever feed I’m in for you!

But please buy my book first, available for pre-order now. And feel free to leave an amazing review.

Categories: Uncategorized

Who Counts as a Futurist? Whose Future Counts?

This is a guest post by Matilde Marcolli, a mathematician and theoretical physicist, who also works on information theory and computational linguistics. She studied theoretical physics in Italy and mathematics at the University of Chicago. She worked at the Massachusetts Institute of Technology and the Max Planck Institute for Mathematics, and is currently a professor at Caltech. This post in in response to Cathy’s last post.

History of Futurism

For a good part of the past century the term “futurism” conjured up the image of a revolutionary artistic and cultural movement that flourished in Russia and Italy in the first two decades of the century. In more recent times and across the Atlantic, it has acquired a different connotation, one related to speculative thought about the future of advanced technology. In this later form, it is often explicitly associated to the speculations of a group of Silicon Valley tycoons and their acolytes.

Their musings revolve around a number of themes: technological immortality in the form of digital uploading of human consciousness, space colonization, and the threat of an emergent superintelligent AI. It is easy to laugh off all these ideas as the typical preoccupations of a group of aging narcissist wealthy white males, whose greatest fear is that an artificial intelligence may one day treat them the way they have been treating everybody else all along.

However, in fact none of these themes of “futurist speculation” originates in Silicon Valley: all of them have been closely intertwined in history and date back to the original Russian Futurism, and the related Cosmist movement, where mystics like Fedorov alternated with scientists like Tsiolkovsky (the godfather of the Soviet space program) envisioning a future where science and technology would “storm the heavens and vanquish death”.

The crucial difference in these forms of futurism does not lie in the themes of speculation, but rather in the role of humanity in this envisioned future. Is this the future of a wealthy elite? Is this the future of a classless society?

Strains of Modern Futurism

Fast forward to our time again, there are still widely different versions of “futurism” and not all of them are a capitalist protectorate. Indeed, there is a whole widely developed Anarchist Futurism (usually referred to as Anarcho-Transhumanism) which is anti-capitalist but very pro-science and technology. It has its roots in many historical predecessors: the Russian Futurism and Cosmism, naturally, but also the revolutionary brand of the Cybernetic movement (Stafford Beer, etc.), cultural and artistic movements like Afrofuturism and Solarpunk, Cyberfeminism (starting with Donna Haraway’s Cyborg), and more recently Xenofeminism.

What some of the main themes of futurism look like in the anarchist lamelight is quite different from their capitalist shadow.

Fighting Prejudice with Technology

Morphological Freedom” is one of the main themes of anarchist transhumanism: it means the freedom to modify one’s own body with science and technology, but whereas in the capitalist version of transhumanism this gets immediately associated to Hollywood-style enhanced botox therapies for those incapable of coming to terms with their natural aging process, in the anarchist version the primary model of morphological freedom is the transgender rights, the freedom to modify one’s own sexual and gender identity.

It also involves a fight against ableism, in as there is nothing especially ideal about the (young, muscular, male, white, healthy) human body.

The Vitruvian Man, which was the very symbol of Humanism, was also a symbol of the intrinsically exclusionary nature of Humanism. Posthumanism and Transhumanism are also primarily an inclusionary process that explodes the exclusionary walls of Humanism, without negating its important values (for example Humanism replaced religious thinking by a basis for ethical values grounded in human rights).

An example of Morphological Freedom against ableism is the rethinking of the notion of prosthetics. The traditional approach aimed at constructing artificial limbs that as much as possible resemble the human limbs, implicitly declaring the user of prosthetics in some way “defective”.

However, professional designers have long realized that prosthetic arms that do not imitate a human arm, but that work like an octopus tentacle can be more efficient than most traditional prosthetics. And when children are given the possibility to design and 3D print their own prosthetics, they make colorful arms that launch darts and flying saucers and that make them look like superheroes. Anarchist transhumanism defends the value and importance of neurodiversity.

Protesting with Technology

The mathematical theory of networks and of complex systems and emergent behavior can be used to make protests and social movements more efficient and successful. Sousveillance and anti-surveillance techniques can help protecting people from police brutality. Hacker and biohacker spaces help spreading scientific literacy and directly involve people in advanced science and technology: the growing community of DIY synthetic biology with biohacker spaces like CounterCulture Labs, has been one of the most successful grassroot initiatives involving advanced science. These are all important aspects and components of the anarchist transhumanist movement.

Needless to say, the community of people involved in Anarcho-Tranhumanism is a lot more diverse than the typical community of Silicon Valley futurists. Anarchism itself comes in many different forms, anarcho-communism, anarcho-syndacalism, mutualism, etc. (no, not anarcho-capitalism, that is an oxymoron not a political movement!) but at heart it is an ethical philosophy aimed at increasing people’s agency (and more generally the agency of any sentient being), based on empathy, cooperation, mutual aid.


Science and technology have enormous potential, if used inclusively and for the benefit of all and not with goals of profit and exploitation.

For people interested in finding out more about Anarcho-Tranhumanism there is an Anarcho-Transhumanist Manifesto currently being written (which is still very much in the making): the parts that are written at this point can be accessed here.

There is also a dedicated Facebook page, which posts on a range of topics including anarchist theory, philosophy, transhumanism and posthumanism and their historical roots, and various thoughts on science and technology and their transformative role.

The opinions expressed by the author are solely her own: her past and current affiliations are listed for identification purposes only.
Categories: Uncategorized

Get every new post delivered to your Inbox.

Join 3,914 other followers