What you tweet could cost you

Yesterday I came across this Reuters article by Brenna Hughes Neghaiwi: 

In insurance Big Data could lower rates for optimistic tweeters.


The title employs a common marketing rule. Frame bad news as good news. Instead of saying, Big data shifts costs to pessimistic tweeters, mention only those who will benefit.

So, what’s going on? In the usual big data fashion, it’s not entirely clear. But the idea is your future health will be measured by your tweets and your premium will go up if it’s bad news. From the article:

In a study cited by the Swiss group last month, researchers found Twitter data alone a more reliable predictor of heart disease than all standard health and socioeconomic measures combined.

Geographic regions represented by particularly high use of negative-emotion and expletive words corresponded to higher occurrences of fatal heart disease in those communities.

To be clear, no insurance company is currently using Twitter data against anyone (or for anyone), at least not openly. The idea outlined in the article is that people could set up accounts to share their personal data with companies like insurance companies, as a way of showing off their healthiness. They’d be using a company like digi.me to do this. Monetize your data and so on. Of course, that would be the case at the beginning, to train the algorithm. Later on who knows.

While we’re on the topic of Twitter, I don’t know if I’ve had time to blog about University of Maryland Computer Science Professor Jennifer Golbeck. I met Professor Golbeck in D.C. last month when she interviewed me at Busboys and Poets. During that discussion she mentioned her paper, Predicting Personality from Social Media Text, in which she inferred personality traits from Twitter data. Here’s the abstract:

This paper replicates text-based Big Five personality score predictions generated by the Receptiviti API—a tool built on and tied to the popular psycholinguistic analysis tool Linguistic Inquiry and Word Count (LIWC). We use four social media datasets with posts and personality scores for nearly 9,000 users to determine the accuracy of the Receptiviti predictions. We found Mean Absolute Error rates in the 15–30% range, which is a higher error rate than other personality prediction algorithms in the literature. Preliminary analysis suggests relative scores between groups of subjects may be maintained, which may be sufficient for many applications.

Here’s how the topic came up. I was mentioning Kyle Behm, a young man I wrote about in my book who was denied a job based on a “big data” personality test. The case is problematic. It could represent a violation of the Americans with Disability Act, and a lawsuit filed in court is pending.

What Professor Golbeck demonstrates with her research is that, in the future, the employers won’t even need to notify applicants that their personalities are being scored at all, it could happen without their knowledge, through their social media posts and other culled information.

I’ll end with this quote from Christian Mumenthaler, CEO of Swiss Re, one of the insurance companies dabbling in Twitter data:

I personally would be cautious what I publish on the internet.

Categories: Uncategorized

At the Wisconsin Book Festival!

I arrived in Madison last night and had a ridiculously fantastic meal at Forequarter in Madison thanks to my friends Shamus and Jonny of the Underground Food Collective.


I’m here to give a talk at the Wisconsin Book Festival, which will take place today at noon, and I’m excited to have my buddy Jordan Ellenberg introduce me at my talk.

I’ll also stop by beforehand at WORT for a conversation with Patty Peltekos on her show called A Public Affair, as well as afterwards at the local NPR station, WPR, for a show called To The Best of Our Knowledge. These might be recorded, I don’t know when they’re airing.

What a city! Very welcoming and fun. I should visit more often.

Categories: Uncategorized

Facebook’s Child Workforce

I’ve become comfortable with my gadfly role in technology. I know that Facebook would characterize their new “personalized learning” initiative, Summit Basecamp, as innovative if not downright charitable (hat tip Leonie Haimson). But again, gadly.

What gets to me is how the students involved – about 20,000 students in more than 100 charter and traditional public schools – are really no more than an experimental and unpaid workforce, spending classroom hours training the Summit algorithm and getting no guarantee in return of real learning.

Their parents, moreover, are being pressured to sign away all sorts of privacy rights for those kids. And, get this, Basecamp “require disputes to be resolved through arbitration, essentially barring a student’s family from suing if they think data has been misused.” Here’s the quote from the article that got me seriously annoyed, from the Summit CEO Diane Tavenner herself:

“We’re offering this for free to people,” she said. “If we don’t protect the organization, anyone could sue us for anything — which seems crazy to me.”

To recap. Facebook gets these kids to train their algorithm for free, whilst removing them from their classroom time, offering no evidence that they will learn anything, making sure that they’ll be able to use the childrens’ data for everything short of targeted ads, and also ensuring the parents can’t even hire a lawyer to complain. That sounds like a truly terrible deal.

Here’s the thing. The kids involved are often poor, often minority. They are the most surveilled generation and the most surveilled subpopulation out there, ever. We have to start doing better for them than unpaid work for Facebook.

Categories: Uncategorized

Guest post: An IT insider’s mistake

This is a guest post by an IT Director for a Fortune 500 company who has worked with many businesses and government agencies.

It was my mistake. My daughter’s old cell phone had died. My wife offered to get a new phone from Verizon and give that to me and then give my daughter my old phone. Since I work with Microsoft it made sense for me to get the latest Nokia Lumia model. It’s a great looking phone, with a fantastic camera, and a much bigger screen than my old model. I told my wife not to wipe all the data off my old phone but to just get the phone numbers switched, and we could then delete all my contacts from my old phone. While you can remove an email account on the phone, you can’t change the account that is associated with Windows Phone’s cloud. So my daughter manually deleted all my phone contacts and added her own to my old phone – but before that I had synced up my new phone to the cloud and got all my contacts downloaded to it. Within 24 hours, the Microsoft Azure cloud had re-synced both phones, so now all the deletes my daughter did propagated to my new phone.

I lost all my contacts.

I panicked, went back to the Verizon store, and they told me that we had to flash my old phone to factory settings. But they didn’t have a way for me to get my contacts back. And they had no way for me to contact Microsoft directly to get them back either. The Windows Phone website lists no contact phone number for customer support – Microsoft relies on the phone carriers to provide this, apparently believing that being a phone manufacturer doesn’t require you to have a call center that can resolve consumer issues. I see this as a policy flaw.

I had the painstaking process of figuring out how to get my phone contacts back, maybe one at a time.

But the whole cloud syncing made me think about how we’ve now come to trust that we can have everything on our phones and not think about adequately backing it up. In 2012, the Wired reporter Mat Honan reported about how a hacker systematically deleted all his personal information including baby photos on his Apple devices he had saved to the cloud. The big three phone manufacturers now (Apple, Google and Microsoft) have a lot of personal information in their clouds about all of us cell phone users. Each company, on its own, can each create a Kevin Bacon style “six degrees of separation” contacts map that would make the NSA proud. While I lost over 100 or more phone contacts, each one of those people would likely also have a similar or more contacts plugged into their phones, and so on. If the big three (AGM, not to be confused with Annual General Meetings) colluded together, they could even create a real time locator map showing where all our contacts are right now all round the world. Think of the possibilities for tracking: cheating spouses, late lunches at work, what time you quit drinking at the local, what sporting events you go to, which clients your competitors are meeting with etc. Microsoft’s acquisition of LinkedIn makes this sharing of information even more powerful. Now they’ll have our phone numbers and email contacts and some professional correspondence too.

I don’t trust Google. Their motto of “don’t be evil”, almost begs the question why do they have to remind themselves of that? Some years ago they were reported as scanning emails written to and from Gmail accounts. Spying on what your customers think of as private correspondence comes to my mind as evil. And just last week Yahoo admits to doing the same thing on behalf of the government, scanning for a very specific search phrase. I hope the NSA got their suspect with that request, and it wasn’t just a trial balloon to see how far they could go with pressuring the big data providers and aggregators. Yes, I can see the guys in suits and dark glasses approaching Marissa Mayer, “Trust us, this will save lives. We believe there’s the risk of an imminent terrorist attack”. I hope they arrest someone and bring charges, even if to justify Marissa’s position.

So why do I bring all that up? I believe we need consumer personal data protection rights. Almost like credit reporting. The big three (AGM) personal data aggregators and Facebook and LinkedIn collect a lot of personal data about each of us. We should have the right to know what they keep about us, and to possibly correct that record, like we do with the credit bureaus. We should be able to get a free digital copy of our personal data at least annually. The personal data aggregators should also have to report who they share that information with, and in what form. Do they pass along our phone contact information, or email accounts to 3 rd party providers or license that to other companies to help them do their business? The Europeans are ahead of America in protecting privacy rights on the internet, with the right to be forgotten, and the right to correct data. We should not be left behind in making our lives safer from invasion of our privacy and loss of personal security.

We need to know. The personal data aggregators need to be held to higher standards.

Categories: Uncategorized

New America event next Monday

Hey D.C. folks!

I’ll be back in your area next Monday for an event at the New America Foundation from noon to 1:30pm. It will also be livestreamed.

It’s going to be a panel discussion with some super interesting folks:

David Robinson is co-founder and principal at Upturn, a team of technologists working to give people a meaningful voice in how technology shapes their lives. David leads the firm’s work on automated decisions in the criminal justice system.

Rachel Levinson-Waldman is senior counsel to the Liberty and National Security Program at the Brennan Center for Justice. She is an expert on surveillance technology and national security issues, and a frequent commentator on the intersection of policing, technology and civil rights.

Daniel Castro is vice president at the Information Technology and Innovation Foundation (ITIF) and director of ITIF’s Center for Data Innovation. He was appointed by U.S. Secretary of Commerce Penny Pritzker to the Commerce Data Advisory Council.

K. Sabeel Rahman is an assistant professor of law at Brooklyn Law School, a Eric and Wendy Schmidt fellow at New America, and a Four Freedoms fellow at the Roosevelt Institute. He is the author of Democracy Against Domination (Oxford University Press 2017), and studies the history, values, and policy strategies that animate efforts to make our society more inclusive and democratic, and our economy more equitable.

Also, I wrote an essay for New America in preparation for the event, entitled Alien Algorithms.

I hope I see you next Monday!

Categories: Uncategorized

Three upcoming NY events, starting tonight at Thoughtworks

I’ve got three upcoming New York events I wanted people to know about.

Thoughtworks/ Data-Pop Alliance Tonight

First, I’ll be speaking tonight starting at 6:30pm at Thoughtworks, at 99 Madison Ave. It’s co-hosted by Data-Pop Alliance, and after giving a brief talk about my book I’ll be joined for a panel discussion by Augustin Chaintreau (Columbia University), moderated by Emmanuel Letouzé (Data-Pop Alliance and MIT Media Lab). There will be Q&A as well. More here.


Betaworks next week

Next I’ll be talking with the folks at Betaworks about my book next Thursday evening, starting at 6:30pm, at 29 Little West 12th Street. You can get more information and register for the event here.

Data & Society in two weeks

Finally, if the world of New York City data hasn’t gotten sick of hearing from me, I’ll be giving a “Databite” (with whiskey!) at Data & Society the afternoon of Wednesday, October 26th, starting at 4pm. Data & Society is located at 36 West 20th Street, 11th Floor. I will update this post with an registration link when I have it.

Categories: Uncategorized

Facial Recognition is getting really accurate, and we have not prepared

There’s reason to believe facial recognition software is getting very accurate. According to a WSJ article by Laura Mills, Facial Recognition Software Advances Trigger Worries, a Russian company called NTechLab has built software that “correctly matches 73% of people to large photo database.” The stat comes from celebrities recognized in a database of a million pictures.

Now comes the creepy part. The company, headed by two 20-something Russian tech dudes, are not worried about the ethics of their algorithms. Here are their reasons:

  1. Because it’s already too late to worry. In the words of one of the founders, “There is no private life.”
  2. They don’t need to draw a line in the sand for who they give this technology to, because“we don’t receive requests from strange people.”
  3. Also, the technology should be welcomed, rather than condemned, because according to the founders, “There is always a conflict between progress and some scared people,” he said. “But in any way, progress wins.”

Thanks for the assurance!

Let’s compare the above reasons to not worry to the below reasons we have to worry, which include:

  1. The founders are in negotiations to sell their products to state-affiliated security firms from China and Turkey.
  2. Moscow’s city government is planning to install NTechLab’s technology on security cameras around the city.
  3. They were already involved in a scandal in which people used their software to identify and harass women who had allegedly acted in pornographic films online in Russia.
Categories: Uncategorized