What constitutes evidence?

Home > data journalism, education, journalism > What constitutes evidence?

What constitutes evidence?

July 7, 2014 Cathy O'Neil, mathbabe

My most recent Slate Money podcast with Felix Salmon and Jordan Weissmann was more than usually combative. I mean, we pretty much always have disagreements, but Friday it went beyond the usual political angles.

Specifically, Felix thought I was jumping too quickly towards a dystopian future with regards to medical data. My claim was that, now that the ACA has motivated hospitals and hospital systems to keep populations healthy – a good thing in itself – we’re seeing dangerous side-effects involving the proliferation of health profiling and things like “health scores” attached to people much like we now have credit scores. I’m worried that such scores, which are created using data not covered under HIPAA, will be used against people when they try to get a job.

Felix asked me to point to evidence of such usage.

Of course, it’s hard to do that, partly because it’s just the beginning of such data collection – although the FTC’s recent report pointed to data warehouses that already puts people into categories such as “diabetes interest” – and also because it’s proprietary all the way down. In other words, web searches and the like are being legally collected and legally sold and then it’s legal to use risk scores or categories to filter job applications. What’s illegal is to use HIPAA-protected data such as disability status to remove someone from consideration for a job, but that’s not what’s happening.

Anyhoo, it’s made me think. Am I a conspiracy theorist for worrying about this? Or is Felix lacking imagination if he requires evidence to believe it? Or some combination? This is super important to me because if I can’t get Felix, or someone like Felix, to care about this issue, I’m afraid it will be ignored.

This kind of thing came up a second time on that same show, when Felix complained that the series of articles (for example this one from NY Magazine) talking about money laundering in New York real estate also lacked evidence. But that’s also tricky since the disclosure requirements on real estate are not tight. In other words, they are avoiding collecting evidence of money laundering, so it’s hard to complain there’s a lack of data. From my perspective the journalists investigating this article did a good job finding examples of laundering and showing it was easy to set up (especially in Delaware). But Felix wasn’t convinced.

It’s a general question I have, actually, and I’m glad to be involved with the Lede Program because it’s actually my job to think about this kind of thing, especially in the context of journalism. Namely, when do we require data – versus anecdotal evidence – to believe in something? And especially when the data is being intentionally obscured?

Categories: data journalism, education, journalism

Comments (28)

Aaron

July 7, 2014 at 9:51 am

Seems to me that if this behavior is clearly legal then it is reasonable to be worried about it, whether it is already happening or not. If it’s something we don’t want to happen, the laws should be set up to prevent it.

If you’re accusing someone specific of something specific and illegal, then you certainly need evidence, which means you need to have the ability to collect evidence.

LikeLike
Ernest Davis

July 7, 2014 at 9:57 am

You obviously don’t need data to be concerned that a problem will arise, or might arise. In particular, if you can make a convincing case that people will have both an incentive and an opportunity to make this kind of data as a filter for hiring or other decisions and that it’s legal for them to do so, then it’s not much of a stretch to be concerned that they might actually do so.

The question of when and how you can use anecdotal data in cases where there is no systematic data has been collected, or where you have reason to think that data has been hidden or distorted is a really difficult one; there are slippery slopes in both directions. If you find anything good that’s been written, I’d be very interested to hear about it.

LikeLike
Jon Awbrey

July 7, 2014 at 10:14 am

There is a basic principle of systems engineering that applies here —

EATCODO &bull. Every Abuse That Can Occur Does Occur

LikeLike
- Jon Awbrey
  
  July 7, 2014 at 10:32 am
  
  In view of your recent cow analogy, I guess we could change this to:
  
  EATCOW • Every Abuse That Can Occur, Will
  
  LikeLike
medicalquackblog

July 7, 2014 at 11:00 am

You are absolutely correct Cathy being a write about medical data and was a former developer myself. I just made a blog post this morning from doctors about “taking medicine back” and fraud is all over in healthcare data. Along with my own opinions with data I’m constantly in contact with doctors, again years ago I lived at a family practice office when writing my medical records system, believe me, saw the front, back and the middle.

FTC is useless as I have been writing to an attorney there for quite a while and I also get information on things from a confidential former CMS employee too. I’m not an expert on everything but all should look and see who’s models are being used by HHS and CMS..look at the annual report from United Healthcare and explore the huge amount of subsidiaries they have and the fact that they make 1/3 of their income from areas not involved with insurance. I have told everyone for years to look at the subsidiaries in health care as that’s where all the action is. If you have read all the Lewin reports over the years (another subsidiary of United) it’s been a great marketing tool for them to use as well. The average consumer has no clue on what doctors are going through. Scoring doctors is past time to cut pay as Untied pays many of them in the OC at rates less than Medicare. It’s all data in a context to make money. A leopard doesn’t change their spots either, so when you think of UHC and all their models, don’t forget and I think it still stands from 2008 that they still hold the largest fine for derivatives on record for blatantly back dating stocks and the AMA lawsuit for using algorithmic formulas to short pay doctors on out of network for 15 years that all the other insurers came along and licensed. They all ended up with lawsuits too.

Anyway follow the politics and see now that the #2 man at CMS is a former United executive so there’s you model for the US with Healthcare and data, scary and I have been blogging about it for years and read the UHC annual report, the 90 page plan for children’s health.with the future of healthcare stratificaiton.

http://ducknetweb.blogspot.com/2014/07/taking-back-medicine-maintenance-of.html

LikeLike
abekohen

July 7, 2014 at 11:19 am

Start with:

http://investing.businessweek.com/research/stocks/private/snapshot.asp?privcapId=216797895

Then go on to:

http://wellfra.me/

and:

http://www.onehealthscore.com/tos

“Aggregated information is your Anonymous Information that is combined with the Anonymous Information of other users and does not allow you to be identified or contacted. Depending on the circumstances, we may or may not charge third parties for this aggregated information, or limit the third parties’ use of the aggregated information.”

Cathy, sounds like your fears are justified.

As for money laundering in real estate you just have to know the right people. Obviously there will be no “big data” on it.

LikeLike
- Cathy O'Neil, mathbabe
  
  July 7, 2014 at 11:48 am
  
  Thanks, Abe!
  
  LikeLike
Josh

July 7, 2014 at 12:56 pm

Aaron and Ernest expressed most of what I have to say. In the cases you cite, we should be concerned even if it is not happening, yet.

But I’ll add one thing: we are overly hung up on black and white views of things. We should be willing to accept uncertainty but not use it as a reason for inaction. If the likelihood is high enough, or the consequences dire enough, that is cause for action even recognizing, as we generally should, uncertainty.

Proof beyond a reasonable doubt is an appropriate criterion in some situations but not most.

LikeLike
Arthur Wilke

July 7, 2014 at 1:03 pm

While there are conspiracy theorists who find devilish things under every pebble, the tag “conspiracy theory” is increasingly used to discount questions and questioners. The latter is a part of a conservative world view, one that assumes or asserts that unless a question or concern has been officially certified and there’s clear and compelling empirical evidence, interrogating existing conditions is subversive.

The late C. Wright Mills as a methodological stance urged formulating what he termed the “outrageous hypothesis,” outrageous to popular and intellectually canonized thought. This prompts inquiry rather than a common rediscovery of embedded assumptions; the answer is known before the demonstrated results are in. This is one of the flaws in some dominant economic models. Among those who have undertaken to illuminate critical questions include Thomas Kuhn (The Structure of Scientific Revolutions) and, more recently, Stuart Firestein (Ignorance: How It Drives Science).

A political cultural mechanism for breaking the resistance to questions has been the social issue-cum social problem. In terms of promoting empirical research, often debunking of ideological and folk claims, this was once a province of a liberal intelligentsia. However, over time the social issue has become more generic and patrolling of what constitutes empirical evidence has become fuzzier with doubt being a more aggressively organized challenge (e g., Naomi Oreskes & Erik Conway’s, Merchants of Doubt and Harry Collins’, Are We All Scientific Experts Now?). As a result diversionary or faux issues become the product of an elite, news-cycle driven system. And there is empirical evidence for this.

As to the Affordable Care Act, there are accounting elements, indicators, that from a policy oversight position look promising. However, some of the design, as the Citizens Council for Health Care Freedom suggest (and worth empirical investigation) (http://hcrenewal.blogspot.com/2014/06/citizens-council-for-health-freedom.html) are problematic. And, as this piece on the Centers for Medicare and Medicaid Services (CMS) which exercises oversight on some medical expenditures (http://www.cjr.org/the_second_opinion/six_times_medicare_caved.php.) indicates, is vulnerable to political pressure. As Rob Johnson, head of the Institute for New Economic Thinking (INET) noted from his experience as a Congressional staffer, lobbyists in a half hour could undo six months work of a policy analyst. Government capture isn’t a conspiratorial notion, it’s a part of the strategy that’s been openly admitted to by financial and some insurance interests.

Questions, challenges and returning attention to some empirical tests remains a worthy project. And journalists who are not simply stenographers can assist some of us about matters that might be worth a second look. The obstacles they face in the political economics of news production and distribution and the competition from cheaper Internet delivered “filter bubbles” (Eli Pariser, The Filter Bubble) are challenging.

LikeLike
Shecky R

July 7, 2014 at 1:47 pm

A very important topic in many ways, evidence being difficult to define consistently and empirically… I’m not even comfortable with terms like “evidence-based science” or “evidence-based medicine” because ULTIMATELY all evidence (or what passes for it) is based on trust, faith, belief, and the like.
I don’t know the answer to your end-questions, but NO, you’re not a “conspiracy theorist” — you’re thinking/acting rationally and responsibly given what has preceded — profit-driven companies gaming the system and doing end-runs around laws to acquire an edge (and these days, laws can’t keep up with innovations). Felix is an ostrich with his head in the sand 😉

LikeLike
JSE

July 7, 2014 at 2:03 pm

Did you know Barry gave a course on the notion of evidence together with a law professor?

LikeLike
- Cathy O'Neil, mathbabe
  
  July 7, 2014 at 2:10 pm
  
  hey I forgot about that, thanks!
  
  LikeLike
kpclancy

July 7, 2014 at 3:30 pm

Perhaps check out some of the European health systems? They have been in this game a lot longer than the ACA.
http://www.biomedcentral.com/1471-2458/12/944

LikeLike
dbk

July 7, 2014 at 4:27 pm

Thank you for this post – medical profiling is definitely on the horizon (here already for many), and I think your analogy to credit profiling is quite apt. I wonder what the purpose of HIPAA will be in future given the prospect that confidential health data will be for sale to the highest bidder in future …

LikeLike
wgersen

July 7, 2014 at 5:05 pm

What evidence is there that ANY statistical models used to draw conclusions from aggregated data are valid and reliable? We want to believe that there is a way to quantify things to replace human judgment. Human judgment based on data can be flawed and colored by misjudgment… but, in a well conceived bureaucracy it can be subjected to further review by others. We seem to think that “dispassionate quantification” that substitutes precision for accuracy requires no review.

An example of this bogus quantification is best found in education where statistically flawed algorithms are used as the basis for judging performance with no recourse available. You’ve written frequently and eloquently about the VAM hoax… what makes you think the medical field is going to develop a more sophisticated means of “scoring” doctors or patients?

LikeLike
- Arthur Wilke
  
  July 7, 2014 at 7:12 pm
  
  Under the old canon, matters of validity and reliability were concerns of measurement, matters dealing with observation and the challenging concern of situated perspective: from those observed [which are or can be systematized) or from trained observers. Determining what measures trained observers would use was informed and critiqued by other notions such as assumptions, theories, etc. and possible uses of resulting calculated findings.
  
  Under the old canon, statistics were: 1) the way to handle measured data using appropriate (per the canons of trained observers) arithmetic/mathematical procedures, and 2) decision-making conventions per the nature of the data and meeting the assumptions of the arithmetic/mathematical procedures.
  
  In an ever expanding information age in which there are lots of things that are measured or can serve as proxies for measured things, especially with regard to Big Data, some additional challenges. Fashioning techniques for observation including competing teams of technical personnel and using statistical decision-making (5 sigma) was deployed in the work on identifying Higgs Boson. But as noted, there are a number of things being observed such as educational outcomes that are less controlled and influenced by particular interests (e g., anti-public education interests).
  
  The use of algorithms on problematic measures (GDP, stock prices) can create advantages/disadvantages for some but may not be illuminating regarding how things “work.”
  
  While there are questions about how much information can and should be gathered and what liabilities are entailed by whom and what, the big data modelers can at times find things that might otherwise be ignored. This is noted in a story in today’s New York times regarding detecting automobile defects via the GM OnStar system. The political economy of data and data analysis is a challenge, added by the growth of ignorance (the focus of agnotology) which seems to be attracting large constituencies even transcending ideological divides.
  
  Criticism of measured things and analysis can mask preferences for other measures (and procedures). While not wholly unproblematic, promoting claims such as the, protective benefits of throwing salt over one’s shoulder rather than getting vaccinations entail political cultural matters that transcend measures and models with a focus on who uses information/knowledge and for what ends.
  
  LikeLike
  - Guest2
    
    July 8, 2014 at 8:57 am
    
    I liked this STS perspective, but wanted to add the Holy Grail question — Whom does the re-contextualized data serve? As Arthur says, What is its political economy? Even the origins of modern statistical practice — British Galtonians — show the importance of historical context for “meaning.”
    
    Data has no meaning without interpretation, and ALL data must be interpreted to mean anything. This is where all the fun and games occur.
    
    LikeLike
WRD

July 7, 2014 at 11:23 pm

Felix won that round. I thought the argument shifted quickly. You argued this could be a potential problem. Felix argued it wasn’t. Then you said it’s not just a potential problem, but is already happening. Felix asked for evidence. Things fell apart after that. I thought Felix had a perfectly reasonable point of view while yours never got off the ground.

There are lots of people who come up with doomsday predictions, insist this is just around the corner (or already happening!!) and it’s the next huge societal problem. I came away believing Felix pushed back on that rather effectively. He seemed to argue “tell us why this is a problem” convincingly.

Perhaps I’m especially disinclined to take this seriously unless I hear a more comprehensive argument. (which is why I subscribed to your RSS feed, I guess). I especially thought Jordan’s comment was spot-on. Hard for me to get behind doomsday predictions about evil big data if my general reaction is just meh.

LikeLike
- Cathy O'Neil, mathbabe
  
  July 8, 2014 at 7:03 am
  
  Interesting. I do think it’s a question of what your prior is in the first place. In your case, you’re prior is “meh,” and you left unconvinced. Your prior is also that my conversation with Felix is meant to be a competition where someone wins, another place where we differ.
  
  I guess I will not be selling any books to you!
  
  LikeLike
  - Guest2
    
    July 8, 2014 at 8:49 am
    
    The idea that “my conversation with Felix [was] meant to be a competition where someone wins” is a consequence of the chosen medium-context.
    
    LikeLike
- bsantry
  
  July 8, 2014 at 5:13 pm
  
  Dear Cathy and WRD,
  
  I’m going to chime in again, as I did in a lengthy and detailed way in “The dark matter of big data” post. I agree with WRD and with Felix.
  
  With respect to fit bit data and potential employment discrimination, I made evidence based predictions that mining fit bit data is unlikely to produce significant incremental information about risk.
  
  We currently know that middle aged and older workers and members of minority groups have higher risk of cardiovascular disease and cancer, the major diseases affecting the working age population. I believe that part of the reason that older workers and minorities are overrepresented in the long term unemployed is employer concern for their health care spending.
  
  If one mined fit bit data bases for blood pressure and wanted to discriminate, available evidence suggests that one would wind up discriminating against the groups we are already discriminating against (older workers and minorities).
  
  You responded that I was mistaken to think that massive big data work would give us the same answers. Surprises do happen, but markedly different findings would be a discontinuous surprise. However, this is tangential to my point that the present dystopian plight of our long term unemployed brothers and sisters is more worthy of attention than a fit bit enhanced dystopian future.
  
  I added to the conversation that ACA has essentially neutralized the health care spending economics (to the employer) of selecting employees on the basis of expected disease risk. You passed this observation by.
  
  LikeLike
  - Cathy O'Neil, mathbabe
    
    July 8, 2014 at 6:08 pm
    
    I choose to interpret your and WRD’s semi-hostile comments as evidence that I’ve struck a nerve. Fair warning, I delete hostile comments.
    
    As for me “passing the observation by” that ACA has improved the problem of a person not getting health insurance based on a pre-existing condition, you must have listened to the wrong podcast, because that’s exactly what I said. I also mentioned it in this recent post.
    
    And by the way, I still disagree with your assumption that the only thing we could possibly do with detailed information about biometric data is reconstruct well-known epidemiological studies. I’m basically agreeing with Larry Page from Google when I say this: big data techniques on the microscale can do far more than what we’ve already seen in health. Specifically, we could save many more lives than we do now.
    
    Having said that, it doesn’t mean the current laws are water tight. They aren’t, particularly when it comes to hiring practices. We already see plenty of companies charging their employees more for certain medical issue, which disproportionately affects minorities (see Judith Lichtman’s fine testimony for the the U.S. Equal Employment Opportunity Commission on this issue).
    
    In other words, we see employers scoring employees on their health while they’re on the job, what’s to prevent them from doing that before they’re hired? It’s legal and, as I said on the podcast, health scores already exist (hat tip Abe). My biggest question is, how widespread will they become, how much data will we be able to legally collect to improve them, and when are the laws going to catch up with this scheme?
    
    LikeLike
  - abekohen
    
    July 9, 2014 at 4:31 pm
    
    Interesting point that medical scoring would further discriminate against older workers. Even before that, a law meant to protect the rights of workers over-40 (ADEA 1967) had the unintended consequence of greater reluctance to hire older workers.
    
    LikeLike
Guest73248

July 8, 2014 at 3:09 pm

I think your concerns are valid. I do think Felix had a point with big data in general, when people quickly jump to all kinds of scary dystopian possibilities in most discussions about it. However, I think Felix was too quick to label your points as being a part of that. You had valid evidence that shows we should be concerned about this kind of thing. Honestly, with all the individual data that is collected, bought and sold for the purposes of advertising and targeting, I don’t think it’s a big jump to see this as something worrying, so I’m not sure why Felix was so quick to discount it. Perhaps he has faith that as that problem approaches, the appropriate regulations and laws will be passed to prevent the problem from ever actually coming to pass. I’m not sure I am so optimistic.

The podcast is great by the way, it is nice to hear people challenge each other on different topics.

LikeLike
Savonarola

July 9, 2014 at 8:23 pm

Did you see the piece, maybe in the WaPo, about using facial recognition to judge how long people are going to live? They used two examples of people and computer aged them – looking for elasticity in the skin, etc. They are using biometrics, not even health data that they haven’t compiled yet, with the goal of being able to weed out those who aren’t healthy enough to insure. There was also a discussion in the article, if I remember correctly, about employers taking a picture of potential candidates and using it to decide who might cost the company health plan too much to hire. . . . .

LikeLike
fredbo

July 10, 2014 at 3:18 am

It’s time to put on your chodaboy hat, if you want to believe that some activity that tens of thousands of people in large profit motivated organizations can do, and have many billions of dollars in collective incentive to achieve, simply will all be good boyscouts and sing kumbaya.

There are no drug dealers because it is illegal… oh wait, there’s a conspiracy.
The NSA wouldn’t track, monitor, the communications of most US citizens, because it’s illegal… oh wait, there’s a conspiracy.
Banks wouldn’t falsify mortgage documents, because it is illegal… oh wait, there’s robo-signing.

If we want to assume that no corporation will use medical data to discriminate, you might as well believe that no CEO of a multi-billion-dollar corporation would be stupid enough to retain emails where they promise to illegally collude on discriminatory hiring practices, even though it’s part of the legal public record.

People do stupid things, and corporations seek competitive advantage… count on it!

LikeLike
LBB

July 11, 2014 at 7:10 am

Delete “feel” where u wrote “feel require.”

LikeLike
- Cathy O'Neil, mathbabe
  
  July 11, 2014 at 7:15 am
  
  thanks!
  
  On Fri, Jul 11, 2014 at 7:10 AM, mathbabe wrote:
  
  >
  
  LikeLike