Three Ideas for defusing Weapons of Math Destruction

Home > Uncategorized > Three Ideas for defusing Weapons of Math Destruction

Three Ideas for defusing Weapons of Math Destruction

June 3, 2016 Cathy O'Neil, mathbabe

This is a guest post by Kareem Carr, a data scientist living in the Cambridge. He’s a fellow at Harvard’s Institute of Quantitative Social Science and an associate computational biologist at the Broad Institute of Harvard and MIT. Formerly, he has held positions at Harvard’s Molecular and Cellular Biology Department and the National Bureau of Economic Research.

When your algorithms can potentially affect the way physicians treat their dying patients, it really brings home how critical it is to do data science right. I work at a biomedical research institute where I study cancer data and ways of using it to help make better decisions. It’s a tremendously rewarding experience. I get the chance to apply data science on a massive scale and in a socially relevant way. I am passionate about the ways in which we can use automated decision processes for social good and I spend the vast majority of my time thinking about data science.

A year ago, I started working on creating a framework for assessing performance of an algorithm used heavily in cancer research. The first part of the project involved gathering all the data that we could get our hands on. The datasets had been created by different processes and had various advantages and disadvantages. First, the most valued but labor-intensive category of datasets to create had been manually curated by multiple people. More plentiful were datasets that had not been manually curated, but had been assessed by so many different algorithms that they were considered extremely well-characterized. Finally, there were the artificial datasets that had been created by simulation and for which the truth was known, but which lacked the complexity and depth of real data. Each type of dataset required careful consideration of the type of evidence it provided for proper algorithm performance. I came to really understand that validation of an algorithm and characterization of the typical errors were an essential part of the data science. The project taught me a few lessons that I think might be generally applicable.

Use open datasets

In most cases, it is preferable that algorithms be open-source and available for all to examine. If algorithms must be closed-source and proprietary, then open, curated datasets are essential for comparisons among algorithms. These may include real data that has been cleared for general use, anonymized data or high-quality artificial data. Open datasets allow us to analyze algorithms even when they are too complex to understand or when the source code is hidden. We can observe where and when they make errors and discern patterns. We can determine in what circumstances other algorithms can be better. This insight can be extremely powerful when it comes to applying algorithms in the real world.

Take manual curation seriously

Domain-specific experts, such as doctors in medicine or coaches in sports, are generally a very powerful source of information. Panels of experts are even better. While humans are by no means perfect, when careful consideration of an algorithmic result by experts implies that the algorithm has failed, it’s important to take that message seriously. It’s important to investigate if and why the algorithm failed. Even if the problem is never fixed, it is important to understand the types of errors the algorithm makes and to measure its failure rate in various circumstances.

Demand causal models

While it has become very easy to build systems which generate high-performing black-box algorithms, we must push for explainable results wherever possible. Furthermore, we should demand truly causal models rather than the merely predictive. Predictive models perform well when there are no external modifications of the system. Causal models continue to be accurate despite exogenous shocks and policy interventions. Frequently, we create the former, yet try to deploy them as if they are the latter with disastrous consequences.

All three principles have one underlying idea. Bad data science obscures and ignores the real world performance of its algorithms. It relies on little to no validation. When it does perform validation, it relies on canned approaches to validation. It doesn’t critically examine instances of bad performance with an eye towards trying to understand how and why these failures occur. It doesn’t make the nature of these failures widely known so consumers of these algorithms can deploy them with discernment and sophistication.

Good data science does the opposite. It creates algorithms which are deeply and widely understood. It allows us to understand when algorithms fail and how to adapt to those failures. It allows us to intelligently interpret the results we receive. It leads to better decision making.

Let’s stop the proliferation of weapons of math destruction with better data science!

Categories: Uncategorized

Comments (12)

kew100

June 3, 2016 at 8:05 am

This post has sparked a considerable breakfast conversation in this household. My spouse is involved in studying and designing curriculum and experiences for students with “big data” for lack of a better term. Underlying that interest, however, is a keen interest in statistical reasoning. We have both been involved in studying “statistics in the wild,” doing close studies of statistical consulting sessions between master statisticians and health/medical researchers.

All that said, and really beside the point, I would like to have this statement explained a bit further…

“datasets that had not been manually curated, but had been assessed by so many different algorithms that they were considered extremely well-characterized”

1. How is well-characterized defined, and its degrees assessed?
2. Are manually created datasets by definition well-characterized?
3. When “truth is known” is that related to well-characterized data, or does it relate to variance/co-variance, or some other thing?

Thank you for this post and the blog. I look forward to O’Neil’s upcoming book.

LikeLike
- Guest2
  
  June 3, 2016 at 9:50 am
  
  I also noticed this phrase, and immediately thought of death registers.
  
  “Death” is, or can be, well-characterized (although I did talk to a woman yesterday whose husband did not receive his SSDI check this month because SSA listed him as dead! He walked into the SSA office to show that he was not dead, but I do not know if they believed him or not).
  
  What this gets into is the social construction of knowledge, that is, the epistemological questions about data AS A CONSTRUCT. All hermeneutics involve the construction and subsequent interpretation of knowledge, and communities and organizations compete with one another with what is “true.” (If there are Weapons of Math Destruction, then there must be a WAR going on. I am talking about the war.)
  
  As the list of Big Datasets grows beyond comprehension, each group in control of their datasets effectively control what can be said about that data. But the important question, as demonstrated here, is whether these “communications” (for lack of a better term) are viewed as legitimate or not. All of the listed elements here can be said to “index” greater or lesser legitimacy.
  
  The sociology of knowledge is the ongoing exploration of how knowledge is locally constructed and utilized; and the discipline was actively involved in the recent culture wars regarding scientific knowledge. Ludwik Fleck’s ‘Genesis and Development of a Scientific Fact’ (1935) is probably the first treatment of this process, and has a Foreword by Thomas S. Kuhn where he discusses the books influence on his own thinking.
  
  LikeLiked by 1 person
  - kew100
    
    June 3, 2016 at 10:09 am
    
    Thanks for the Fleck reference. This movement forward into Susan Leigh Star’s work, as well as Geoff Bowker (and of course, my ultimate favorite for so many reasons, Bruno Latour) undergirds most of my understanding (for good and bad I must say).
    
    I was hoping that either O’Neil or Carr would describe what they mean within their mathematical/statistical world by those terms.
    
    LikeLiked by 1 person
  - kew100
    
    June 4, 2016 at 8:20 am
    
    I also apologize to you, learned poster, as I sounded dismissive rather than appreciative of a sociology/construction of knowledge perspective.
    
    LikeLike
  - kew100
    
    June 4, 2016 at 9:16 am
    
    Like a dog worrying a bone… I will give you one of my favorite examples culled from my field experience.
    
    I worked with a physician from Britain who came to the states in the 90’s to work on standardized vocabularies (i.e., precursors to SNOMED, etc). He had this lovely anecdote from the British Health Systems which had been computerized for some time and used a taxonomic system called Read Codes for characterizing findings/diagnoses.
    
    As some point in his practice group, they wanted to know how many patients had tattoos. They created a report and found a fairly even distribution of patients across providers with tattoos except for one outlier. This outlier was an older physician who had found the use of the electronic Read Codes somewhat difficult, so they looked into his practice.
    
    What they found is that he was coding for a finding that was not well formed by newly created best practices, but quite well formed in historical medical practices of older physicians and patients. Patients would often complain of feeling tired, run down, lethargic, etc. Physicians called this finding “tired all the time”. So, he coded his patients with the Read Code TATT — for tattoo.
    
    LikeLike
    - Guest2
      
      June 4, 2016 at 7:30 pm
      
      Funny story!
      
      Yes, Bruno Latour, of course. Along the same line of sociological thought is Stephan Fuchs, Against Essentialism (2001).
      
      You might also be interested in Joseph V. Rees’ article on the work of E. A. Codman (1869-1940) in Regulation & Governance (2008) 2, 9–29, “The orderly use of experience: Pragmatism and the development of hospital industry self-regulation.”
      
      https://en.wikipedia.org/wiki/Ernest_Amory_Codman
      
      Codman eventually established his own hospital to develop and demonstrate his ideas — ideas that Medicare used for about a decade.
      
      Codman is an excellent example of how ideas gain currency. I am partial to the competitive aspect.
      
      LikeLiked by 1 person
- Kareem Carr
  
  June 3, 2016 at 1:33 pm
  
  The initial discussion of datasets was meant as context for the later remarks and not as a point of contention itself. I am mostly just describing a specific project I worked on. That being said, in the context of my article: 1. ‘well-characterized’ datasets are datasets that a well-understood by the research community 2. ‘Manually curated’ just means verified by human experts (so they would be well-characterized yes) 3. I am saying the truth is known about simulated data. That just means that hopefully we know what parameters and algorithm were used to generate the simulated output.
  
  LikeLiked by 1 person
  - kew100
    
    June 4, 2016 at 8:14 am
    
    Oh dear. I was not being contentious but I seem so in my probing when I let my big feet get in the way. I apologize. The breakfast conversation was a very good one as well…
    
    I am quite interested in your terms, how you use them, and what distinctions you make. I appreciate your update and help in educating this big footed one.
    
    LikeLiked by 1 person
  - Guest2
    
    June 4, 2016 at 7:45 pm
    
    Thank you for the clarification.
    
    ” ‘[W]ell-characterized’ datasets are datasets that a well-understood by the research community” is, I think (the way I read it), sociological. But there are many, if not numerous sub-communities involved with Big data research — competing among themselves, but, more importantly, fighting to grasp the unseen threats now that the genii is out of the bottle.
    
    It is an open question if we (or anyone, for that matter) can even keep track of the data being produced, let alone assess it. What organization is tracking all these datasets and their uses? None that I can think of.
    
    Here is are a pair of examples from health care and higher ed that need to be included:
    
    https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/MCBS-Public-Use-File/index.html
    
    http://www.ihep.org/postsecdata/mapping-data-landscape/national-postsecondary-data-infrastructure
    
    http://www.ihep.org/events/envisioning-national-postsecondary-data-infrastructure-21st-century-paper-series-launch
    
    http://www.ihep.org/events/ihep-staff-presentations
    
    LikeLike
    - Kareem Carr
      
      June 7, 2016 at 5:55 am
      
      There is at least one project called Dataverse which allows researchers to upload the datasets associated with their papers and can store metadata associated with those datasets. Full disclosure. I’m part of the organization (IQSS) that is developing this system. http://dataverse.org/about
      
      LikeLike
nikos2evangelos

June 7, 2016 at 5:44 am

“Frequently, we create the former [predictive], yet try to deploy them as if they are the latter [causal] with disastrous consequences.” Feeling lazy so the first two things I thought of as the former were 538 and most mainstream economic modeling. Now I’m taking my insomniac head back to bed for an hour.

LikeLike
- kew100
  
  June 7, 2016 at 8:56 am
  
  I assume, just as quibbling and disagreement over measures, taxonomies, coding, etc. affect the data sets, that practice affects causal models/reasoning as well.
  
  I worked on a project with a PhD/MD in medical informatics guy who created a system that supported ontologies. His take on the enterprise was approximately… Yeah, everything is socially constructed but closure around something to take action is required (i.e.,we must represent and act to create cause). Who and what provide that closure, e.g., Nate Silver, Janet Yellen, or Edward Shortliffe, or super delegate system, or the Fed, or the SNOMED consortium.
  
  IQSS looks incredibly wonderful within this universe. Thank you for the pointer…
  
  LikeLike