Strata and swag
Yesterday I gave a 5-minute lightning talk at a corporate big data conference here in New York called Strata+Hadoop World, put on by O’Reilly and Cloudera.
My talk was part of a session run by DataKind, aimed at talking about the ethics of algorithms. My 5 minutes were taken up discussing 5 ideas:
- In order to do good with data, first you have to not do bad. Data scientists aren’t trained to think through the ethics and social impact of their work, so this is non-trivial.
- We haven’t actually figured out the difference between correlation and causation. That means, in the context of social algorithms, that we blame the victim constantly. Think about the HR algorithm that decides never to hire another woman engineer because it notices how badly women engineers fare in the workplace.
- Or, we could take the example of the justice system, where we use recidivism algorithms to figure out that poor black people are more likely to be arrested, and we decide to punish them even more as a result, instead of asking why the justice system isn’t serving to help them as much as it helps white or rich people.
- Or, we could take the example of teacher assessment, where we blame teachers on student test scores, even though they have little power over them.
- Conclusion: data scientists are de facto policy makers. We shouldn’t be.
So, the talk I gave was sparsely attended, with maybe 40 people in the room (which is actually more than we expected). I was happy to see those people, and many of them were earnest and thoughtful, to be sure. Danah Boyd spoke in the second session, as usual very eloquently, and I felt like there were far too few people in the room compared to who might benefit from hearing her.
But let’s face it, Strata is a celebration of big data in the corporate setting, and few people there were spending too much time fretting about ethics. It was dominated by its expo room, where dozens of data science platforms extending the hype of the power of big data were set to sell you magical thinking. There were also a few groups doing good stuff, to be sure, but the overall feel was similar to how it felt back in 2011, except bigger.
Not to be cynical! There’s plenty of other stuff going on that wasn’t in 2011, so really it’s fine. And plus, I did manage to meet up with some colorful ladies:
and I picked up an enormous amount of Strata swag (more here) because teenage sons:
If I had stayed longer I could have gotten plenty of free beer and food, not to mention more pens than I could ever use. There were even lego data science characters, but to get those I had to stay to listen to the pitch, which was a dealbreaker for me.
Conclusion: Strata fills a niche not unlike the New York Coffee Festival. Almost completely frivolous but fun for the participants, as long as you don’t get caffeine poisoning.
I like the picture of the colorful ladies. Say hello to Debbie for me.
At a meetup this week I got a Cloudera tee shirt that said: “Data is the new bacon.” What does that mean? That data is not kosher? I don’t get it.
LikeLike
Maybe it’s like Kevin Bacon, everyone’s favorite co-star …
LikeLike
(sp) lightning
(unless you meant it to be en-lightening, maybe …)
LikeLike
Thanks!
LikeLike
I am in the big data/corporate setting myself. One of the biggest time savers I’ve done in the last year has been to make myself the contact for big data vendors, instead of highers-up. What has happened in the past is that these big data vendors make pitches to higher execs where they lie, flat-out lie, how quickly and automatically their platform turns data into money for the company. I then have to spend hours upon hours debunking those lies. Now, I can debunk those lies at the initial pitch, which saves me an enormous amount of time.
LikeLike
This makes me look forward to reading your book, Cathy.
It sounds like you didn’t have time to discuss your challenging argument:
In the context of social analysis, correlation -> causation -> blame.
Too bad there wasn’t time for discussion.
One response might be: “I’m not a policy maker. I just do my job. It’s not my job to blame anyone. I write reports that are very cautious about inference, so I avoid the path you are warning against.”
Maybe that’s what you are hoping to hear, but I suspect not. If that’s the conclusion, it would not affect anyone’s behavior or judgment, except maybe making reports more rigorous.
It’s possible that the conclusion you want for your argument is that data scientists *are* policy makers, whether they want it or not, unless they quit their jobs. This conclusion would seem to obligate data scientists to make judgments, and to work for good agendas and policies, not bad ones.
That conclusion is a tough sell, but I suspect it’s really the conclusion you want.
LikeLike
I was bummed I couldn’t attend your talk because I was speaking at the same time. Am looking forward to seeing the video from your talk though – such a critical topic and especially for Strata folks (even if they don’t realize it).
LikeLike