The smell test for big data
The other day I was chatting with a data scientist (who didn’t know me), and I asked him what he does. He said that he used social media graphs to see how we might influence people to lose weight.
Whaaaa? That doesn’t pass the smell test.
If I can imagine it happening in real life, between people, then I can imagine it happening in a social medium. If it doesn’t happen in real life, it doesn’t magically appear on the internet.
So if I have a huge crush on LeBron James (true), and if he tweets that I should go out and watch “Life of Pi” because it’s a great movie (true), then I’d do it, because I’d imagine he is here with me in my living room suggesting that I see that movie, and I’d do anything that man says if he’s in my living room, especially if he’s jamming with me.
But if LeBron James tells me to lose weight while we’re hanging, then I just feel bad and weird. Because nobody can influence someone else to lose weight in person*.
Bottomline: there’s a smell test, and it states that real influence happening inside a social graph isn’t magical just because it’s mathematically formulated. It is at best an echo of the actual influence exerted in real life. I have yet to see a counter-example to that. If you have one, please challenge me on this.
Any data scientist going around claiming they’re going to surpass this smell test should stop right now, because it adds to the hype and adds to the noise around big data without adding to the conversation.
* I’ll make an exception if they’re a doctor wielding a surgical knife about to remove my stomach or something, which doesn’t translate well into social media, and might not always work long-term. And to be fair, you (or LeBron) can influence me to not eat a given thing on a given day, or even to go on a diet, but by now we should know that doesn’t have long term effects. There’s a reason Weight Watchers either doesn’t publish their results or relies on survivorship bias for fake results.