Yesterday one of my long-standing fears was confirmed: futurists are considered moral authorities.
The Intercept published an article entitled Microsoft Pitches Technology That Can Read Facial Expressions at Political Rallies, and written by Alex Emmons, which described a new Microsoft product that is meant to be used at large events like the Superbowl, or a Trump rally, to discern “anger, contempt, fear, disgust, happiness, neutral, sadness or surprise” in the crowd.
Spokesperson Kathryn Stack, when asked whether the tool could be used to identify dissidents or protesters, responded as follows:
“I think that would be a question for a futurist, not a technologist.”
Can we parse that a bit?
First and foremost, it is meant to convey that the technologists themselves are not responsible for the use of their technologies, even if they’ve intentionally designed it for sale to political campaigns.
So yeah, I created this efficient plug-and-play tool of social control, but that doesn’t mean I expect people to use it!
Second, beyond the deflecting of responsibility, the goal of that answer is to point to the person who really is in charge, which is for some reason “a futurist.” What?
Now, my experience with futurists is rather limited – although last year I declared myself to be one – but even so, I’d like to point out that futurism is male dominated, almost entirely white, and almost entirely consists of Silicon Valley nerds. They spend their time arguing about the exact timing and nature of the singularity, whether we’ll live forever in bliss or we’ll live forever under the control of rampant and hostile AI.
In particular, there’s no reason to imagine that they are well-versed in the history or in the rights of protesters or of political struggle.
In Star Wars terms, the futurists are the Empire, and Black Lives Matter are the scrappy Rebel Alliance. It’s pretty clear, to me at least, that we wouldn’t go to Emperor Palpatine for advice on ethics.
People, can we face some hard truths about how Americans save for retirement?
It Isn’t Happening
Here’s a fact: most people aren’t seriously saving for retirement. Ever since we chucked widespread employer based pension systems for 401K’s and personal responsibility, people just haven’t done very well saving. They take money out for college for their kids, or an unforeseen medical expense, or they just never put money in in the first place. Very few people are saving adequately.
In Fact, It Shouldn’t Happen
Next: it’s actually, mathematically speaking, extremely dumb to have 401K’s instead of a larger pool of retirement money like pensions or Social Security.
Why do I say that? Simple. Imagine everyone was doing a great job saving for retirement. This would mean that everyone “had enough” for the best-case scenario, which is to say living to 105 and dying an expensive, long-winded death. That’s a shit ton of money they’d need to be saving.
But most people, statistically speaking, won’t live until 105, and their end-of-life care costs might not always be extremely high. So for everyone to prepare for the worst is total overkill. Extremely inefficient to the point of hoarding, in fact.
Pooled Retirement Systems Are Key
Instead, we should think about how much more efficient it is to pool retirement savings. Then lots of people die young and are relatively “cheap” for the pool, and some people live really long but since it’s all pooled, things even out. It’s a better and more efficient system.
Most pension plans work like this, but they’ve fallen out of favor politically. And although some people complain that it’s hard to reasonably fund pension funds, most of that is actually poorly understood. Even so, I don’t see employer-based pension plans making a comeback.
Social Security is actually the best system we have, and given how few people have planned and saved for retirement, we should invest heavily in it, since it’s not sufficient to keep elderly people above the poverty line. And, contrary to popular opinion, Social Security isn’t going broke, could easily be made whole and then some, and is the right thing to do – both morally and mathematically – for our nation.
Stuff’s going on, peoples, and some of it’s actually really great. I am so happy to tell you about it now that I’m back from vacation.
- The Tampon Tax is gone from New York State. This is actually old news but I somehow forgot to blog it. As my friend Josh says, we have to remember to celebrate our victories!!
- Next stop, Menstrual Equality! Jennifer Weiss-Wolf is a force of nature and she won’t stop until everyone has a free tampon in their… near vicinity.
- There’s a new “bail” algorithm in San Francisco, built by the Arnold Foundation. The good news is, they aren’t using educational background and other race and class proxies in the algorithm. The bad news is, they’re marketing it just like all the other problematic WMD algorithms out there. According to Arnold Foundation vice president of criminal justice Matt Alsdorf, “The idea is to provide judges with objective, data-driven, consistent information that can inform the decisions they make.” I believe the consistent part, but I’d like to see some data about the claim of objectivity. At the very least, Arnold Foundation, can you promise a transparent auditing process of your bail algorithms?
- In very very related news, Julia Angwin calls for algorithmic accountability.
- There’s a new method to de-bias sexist word corpora using vector algebra and Mechanical Turks. Cool! I might try to understand the math here and tell you more about it at a later date.
- Speaking of Mechanical Turk, are we paying them enough? The answer is no. Let’s require a reasonable hourly minimum wage for academic work. NSF?
The Computer Fraud and Abuse Act is badly in need of reform. It currently criminalizes violations of terms of services for websites, even when those terms of service are written in a narrow way and the violation is being done for the public good.
Specifically, the CFAA keeps researchers from understanding how algorithms work. As an example, Julia Angwin’s recent work on recidivism modeling, which I blogged about here, was likely a violation of the CFAA:
A more general case has been made for CFAA reform in this 2014 paper, Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms, written by Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort.
They make the case that discrimination audits – wherein you send a bunch of black people and then white people to, say, try to rent an apartment from Donald Trump’s real estate company in 1972 – have clearly violated standard ethical guidelines (by wasting people’s time and not letting them in on the fact that they’re involved in a study), but since they represent a clear public good, such guidelines should have have been set aside.
Similarly, we are technically treating employers unethically when we have fake (but similar) resumes from whites and blacks sent to them to see who gets an interview, but the point we’ve set out to prove is important enough to warrant such behavior.
Their argument for CFAA reform is a direct expansion of the aforementioned examples:
Indeed, the movement of unjust face-to-face discrimination into computer algorithms appears to have the net effect of protecting the wicked. As we have pointed out, algorithmic discrimination may be much more opaque and hard to detect than earlier forms of discrimination, while at the same time one important mode of monitoring—the audit study—has been circumscribed. Employing the “traditional” design of an audit study but doing so via computer would now waste far fewer resources in order to find discrimination. In fact, it is difficult to imagine that a major internet platform would even register a large amount of auditing by researchers. Although the impact of this auditing might now be undetectable, the CFAA treats computer processor time as and a provider’s “authorization” as far more precious than the minutes researchers have stolen from honest landlords and employers over the last few decades. This appears to be fundamentally misguided.
As a consequence, we advocate for a reconceptualization of accountability on Internet platforms. Rather than regulating for transparency or misbehavior, we find this situation argues for “regulation toward auditability.” In our terms, this means both minor, practical suggestions as well as larger shifts in thinking. For example, it implies the reform of the CFAA to allow for auditing exceptions that are in the public interest. It implies revised scholarly association guidelines that subject corporate rules like the terms of service to the same cost-benefit analysis that the Belmont Report requires for the conduct of ethical research—this would acknowledge that there may be many instances where ethical researchers should disobey a platform provider’s stated wishes.
When it comes to alternative credit scoring systems, look for the phrase “we give consumers more access to credit!”
That’s code for a longer phrase: “we’re doing anything at all we want, with personal information, possibly discriminatory and destructive, but there are a few people who will benefit from this new system versus the old, so we’re ignoring costs and only counting the benefits for those people, in an attempt to distract any critics.”
Unfortunately, the propaganda works a lot of the time, especially because tech reporters aren’t sufficiently skeptical (and haven’t read my upcoming book).
The alt credit scoring field has recently been joined by another player, and it’s the stuff of my nightmares. Specifically, ZestFinance is joining forces with Baidu in China to assign credit scores to Chinese citizens based on the history of their browsing results, as reported in the LA Times.
- ZestFinance is the American company, led by ex-Googler Douglas Merrill who likes to say “all data is credit data” and claims he cannot figure out why people who spell, capitalize, and punctuate correctly are somehow better credit risks. Between you and me, I think he’s lying. I think he just doesn’t like to say he happily discriminates against poor people who have gone to bad schools.
- Baidu is the Google of China. So they have a shit ton of browsing history on people. Things like, “symptoms for Hepatitis” or “how do I get a job.” In other words, the company collects information on a person’s most vulnerable hopes and fears.
Now put these two together, which they already did thankyouverymuch, and you’ve got a toxic cocktail of personal information, on the one hand, and absolutely no hesitation in using information against people, on the other.
In the U.S. we have some pretty good anti-discrimination laws governing credit scores – albeit incomplete, especially in the age of big data. In China, as far as I know, there are no such rules. Anything goes.
So, for example, someone who recently googled for how to treat an illness might not get that loan, even if they were simply trying to help their friend or family member. Moreover, they will never know why they didn’t get the loan, nor will they be able to appeal the decision. Just as an example.
Am I being too suspicious? Maybe: at the end of the article announcing this new collaboration, after all, Douglas Merrill from ZestFinance is quoted touting the benefits:
“Today, three out of four Chinese citizens can’t get fair and transparent credit,” he said. “For a small amount of very carefully handled loss of privacy, to get more easily available credit, I think that’s going to be an easy choice.”
I’ve started a company called ORCAA, which stands for O’Neil Risk Consulting and Algorithmic Auditing and is pronounced “orcaaaaaa”. ORCAA will audit algorithms and conduct risk assessments for algorithms, first as a consulting entity and eventually, if all goes well, as a more formal auditing firm, with open methodologies and toolkits.
No worries! I’m busy learning everything I can about the field, small though it is. Today, for example, my friend Suresh Naidu suggested I read this fascinating study, referred to by those in the know as “Oaxaca’s decomposition,” which separates differences of health outcomes for two groups – referred to as “the poor” and the “nonpoor” in the paper – into two parts: first, the effect of “worse attributes” for the poor, and second, the effect of “worse coefficients.” There’s also a worked-out example of children’s health in Viet Nam which is interesting.
The specific formulas they use depends crucially on the fact that the underlying model is a linear regression, but the idea doesn’t: in practice, we care about both issues. For example, with credit scores, it’s obvious we’d care about the coefficients – the coefficients are the ingredients in the recipe that takes the input and gives the output, so if they fundamentally discriminate against blacks, for example, that would be bad (but it has to be carefully defined!). At the same time, though, we also care about which inputs we choose in the first place, which is why there are laws about not being able to use race or gender in credit scoring.
And, importantly, this analysis won’t necessarily tell us what to do about the differences we pick up. Indeed many of the tests I’ve been learning about and studying have that same limitation: we can detect problems but we don’t learn how to address them.
If you have any suggestions for me on methods for either auditing algorithms or for how to modify problematic algorithms, I’d be very grateful if you’d share them with me.
Also, if there are any artists out there, I’m on the market for a logo.
This is a guest post by Brian D’Alessandro, who daylights as the Head of Data Science at Zocdoc and as an Adjunct Professor with NYU’s Center for Data Science. When not thinking probabilistically, he’s drumming with the indie surf rock quarter Coastgaard.
I’d like to address the recent study by Roland Fryer Jr from Harvard University, and associated NY Times coverage, that claims to show zero racial bias in police shootings. While this paper certainly makes an honest attempt to study this very important and timely problem, it ultimately suffers from issues of data sampling and subjective data preparation. Given the media attention it is receiving, and the potential policy and public perceptual implications of this attention, we as a community of data people need to comb through this work and make sure the headlines are consistent with the underlying statistics.
First thing’s first: is there really zero bias in police shootings? The evidence for this claim is, notably, derived from data drawn from a single precinct. This is a statistical red flag and might well represent selection bias. Put simply, a police department with a culture that successfully avoids systematic racial discrimination may be more willing than others to share their data than one that doesn’t. That’s not proof of cherry-picking, but as a rule we should demand that any journalist or author citing this work should preface any statistic with “In Houston, using self-reported data,…”.
For that matter, if the underlying analytic techniques hold up under scrutiny, we should ask other cities to run the same tests on their data and see what the results are more widely. If we’re right, and Houston is rather special, we should investigate what they’re doing right.
On to the next question: do those analytic techniques hold up? The short answer is: probably not.
How The Sampling Was Done
As discussed here by economist Rajiv Sethi and here by Justin Feldman, the means by which the data instances were sampled to measure racial bias in Houston police shootings is in itself potentially very biased.
Essentially, Fryer and his team sampled “all shootings” as their set of positively labeled instances, and then randomly sampled “arrests in which use of force may have been justified” (attempted murder of an officer, resisting/impeding arrest, etc.) as the negative instances. The analysis the measured racial biases using the union of these two sets.
Here is a simple Venn diagram representing the sampling scheme:
In other words, the positive population (those with shooting) is not drawn from the same distribution as the negative population (those arrests where use of force is justified). The article implies that there is no racial bias conditional on there being an arrest where use of force was justified. However, the fact that they used shootings that were outside of this set of arrests means that this is not what they actually tested.
Instead, they only show that there was no racial bias in the set that was sampled. That’s different. And, it turns out, a biased sampling mechanism can in fact undo the bias that exists in the original data population (see below for a light mathematical explanation). This is why we take great pains in social science research to carefully design our sampling schemes. In this case, if the sampling is correlated with race (which it very likely is), all bets are off on analyzing the real racial biases in police shootings.
What Is Actually Happening
Let’s accept for now the two main claims of the paper: 1) black and hispanic people are more likely to endure some force from police, but 2) this bias doesn’t exist in an escalated situation.
Well, how could one make any claim without chaining these two events together? The idea of an escalation, or an arrest reason where force is justified, is unfortunately an often subjective concept reported after the fact. Could it be possible that a an officer is more likely to find his/her life in danger when a black, as opposed to a white, suspect reaches for his wallet? Further, while unquestioned compliance is certainly the best life-preserving policy when dealing with an officer, I can imagine that an individual being roughed up by a cop is liable to push back with an adrenalized, self-preserving an instinctual use of force. I’ll say that this is likely for black and white persons, but if the black person is more likely to be in that situation in the first place, the black person is more likely to get shot from a pre-stop position.
To sum up, the issue at hand is not whether cops are more likely to shoot at black suspects who are pointing guns straight back at the cop (which is effectively what is being reported about the study). The more important questions, which is not addressed, is why are black men more likely to pushed up against the wall by a cop in the first place, or does race matter when a cop decides his/her life is in danger and believes lethal force is necessary?
What Should Have Happened
While I empathize with the data prep challenges Fryer and team faced (the Times article mentions that put a collective 3000 person hours here), the language of the article and its ensuing coverage unfortunately does not fit the data distribution induced by the method of sampling.
I don’t want to suggest in any way that the data may have been manipulated to engineer a certain result, or that the analysis team mistakenly committed some fundamental sampling error. The paper does indeed caveat the challenge here, but given that admission, I wonder why the authors were so quick to release an un-peer-reviewed working version and push it out via the NY Times.
Peer review would likely have pointed out these issues and at least push the authors to temper their conclusions. For instance, the paper uses multiple sources to show that non-lethal violence is much more likely if you are black or hispanic, controlling for other factors. I see the causal chain being unreasonably bisected here, and this is a pretty significant conceptual error.
Overall, Fryer is fairly honest in the paper about the given data limitations. I’d love for him to take his responsibility to the next level and make his data, both in raw and encoded forms, public. Given the dependency on both subjective, manual encodings of police reports and a single, biased choice of sampling method, more sensitivity analysis should be done here. Also, anyone reporting on this (Fryer himself), should make a better effort to connect the causal chain here.
Headlines are sticky, and first impressions are hard to undo.This study needs more scrutiny at all levels, with special attention to the data preparation that has been done. We need a better impression than the one already made.
The coverage of the results comes down to the following:
P(Shooting | Black, Escalation) = P(Shooting | White, Escalation)
(here I am using ‘Escalation’ as the set of arrests where use of force is considered justified. And for notational simplicity I have omitted the control variables from the conditional above).
However, the analysis actually shows that:
P(Shooting | Black, Sampled) = P(Shooting | White, Sampled),
Where (Sampled = True) if the person was either shot or the situation escalated and the person was not shot. This makes a huge difference, because with the right bias in the sampling, we could have a situation in which there is in fact bias in police shooting but not in the sampled data. We can show this with a little application of Bayes rule:
P(Shot|B, Samp) / P(Shot|W, Samp) = [P(Shot|B) / P(Shot|W)] * [P(Samp|W) / P(Samp|B)]
The above should be read as: the bias in the study depends on both the racial bias in the population (P(S|B) / P(S|W)) and the bias in the sampling. Any bias in the population can therefore effectively be undone by a sampling scheme that is also racially biased. Unfortunately, the data summarized in the study doesn’t allow us to back into the 4 terms on the right hand side of the above equality.