Home > Uncategorized > Bigger Data Isn’t Always Better Data

Bigger Data Isn’t Always Better Data

March 1, 2017 Cathy O'Neil, mathbabe

My newest piece on Bloomberg:

Bigger Data Isn’t Always Better Data

Categories: Uncategorized

Comments (6)

tomslee

March 1, 2017 at 12:42 pm

Nice piece as always.

It prompts one thought about discrimination. Our current protected classes are coarse (race, gender) partly because our measurement tools have been coarse. There may be discrimination against other attributes or combinations of attributes that happen all the time but which are not included in “protected classes” because it is too difficult to track. Height, weight and attractiveness have all been linked to career success, and in the UK accent can still be a big deal.

Now we have new technologies as you describe, which may reflect our existing discrimination patterns more precisely (maybe–hypothetical only–accent or attractiveness modifies discrimination against race or gender). The number of attributes that may correlate with career success but which would be excluded from consideration in a fair system should grow.

Rather like your previous insurance post, maybe the use of big data & sophisticated machine learning algorithms will turn out to be incompatible with fairness in hiring.

LikeLike
- Cathy O'Neil, mathbabe
  
  March 1, 2017 at 12:45 pm
  
  I was thinking the exact same thing. By the way classism isn’t legal as I am sure you know. In a world suffering from increasing inequality that doesn’t make a lot of sense.
  
  LikeLike
Guest2

March 2, 2017 at 9:32 pm

Here is a slightly different twist to ‘big data’ — hi-tech transcripts! What will they think of next?
https://www.insidehighered.com/views/2017/03/01/three-important-questions-ask-about-credentialing-essay

It is an open question whether this digital-cloud approach will amplify or dampen signals with discriminatory potential. I suspect that the main reason for credential innovation like this is the opportunity to hook up the school with a high-status, high-prestige employer. Discrimination is multi-dimensional, for this reason alone we must broaden our focus.

LikeLike
dmf

March 5, 2017 at 12:24 pm

perhaps in addition to ethics we might also need to add to the math mix a bit of philosophy along the lines of category mistakes and misplaced concreteness:
https://motherboard.vice.com/en_us/article/society-is-too-complicated-to-have-a-president-complex-mathematics-suggest

LikeLike
- Guest2
  
  March 6, 2017 at 8:50 pm
  
  I completely agree — except with the “Civilization Complexity” graph, which needs to be corrected to show increasing complexity, not decreasing. The editor of the piece should have caught this obvious error.
  
  LikeLike
mclaren

March 21, 2017 at 2:34 am

A salient article. Alas, methinks the cloud of metadata surrounding the actual data itself can probably (almost certainly will) be used to infer the income/gender/class of the applicant. Wish there was a simple solution to the problem of the reinforcing feedback loop of the rich and the Ivy League, but the only one I can think of of involves using a random lottery to select applicants. Which, oddly enough, has solid mathematical foundation:
https://arxiv.org/abs/0907.0455

LikeLike