This is a guest post by Angela Grammatas, a digital analytics consultant specializing in worldwide implementations of online analytics tools. She loves powerful data, but doesn’t love having artificial intelligence use it in creepy ways. She also has synesthesia and paints numbers (Instgram @angelagrammatas).
This week, the US Congress voted to allow ISPs (Internet Service Providers) to collect and sell your internet data without your consent. Erasing your web data – or not allowing any to be collected in the first place – is getting more difficult, and less effective. Hiding from data collection isn’t working.
It’s time for a completely different approach. Instead of restricting our data, it’s time to create more – a lot more. A flood of meaningless data could create a noisy cover that makes our true behavior hard to understand. This could be a path to bursting the filter bubble, one person at a time. And if enough people participate, we could collectively render some datasets completely meaningless.
Why should we care about where our data goes?
Organizations rely on data to “target” users online and serve them relevant (read, “more likely to be clicked”) advertisements. Plenty of targeting is innocuous and can be genuinely helpful. For example, getting a sale offer on a recently-viewed product can be a win-win; the company makes a sale, and the customer is happy about the discount. Targeting (and re-targeting) makes that possible.
But when the pool of data gets larger and more integrated, the implications change. For example: let’s imagine that “Jane Internet” loves cats, and visits cats.com daily. One day, she’s considering how to vote on a local proposition, and she does some research by visiting two political news sites at opposite ends of the spectrum. She reads a relevant article on each site, getting a balanced view of the issue. Let’s imagine that the “Yes on Prop A” campaign has access to retargeting capabilities that utilize that large, blended dataset. Soon, Jane starts to see “Vote Yes on Prop A” advertisements on many unrelated websites, with the message that Prop A will be great for local wildlife.
Jane has no way of knowing this, but that pro-wildlife message has been chosen specifically for her, because of her past visits to cats.com. The ads are everywhere online (for Jane), so Jane believes that this message is a primary “Yes on A” talking point, and she’s encouraged to vote in agreement. The “No on A” campaign never has any opportunity to discuss or debate the point. They may not even know that the cats-related topic has been raised, because they’ve never even been exposed to it – that message is reserved for retargeting campaigns directed at people like Jane. Jane’s attempt to be a well-informed voter has been usurped by retargeting. And, perhaps most importantly, Jane doesn’t even know this has happened.
How could meaningless data help?
Jane was targeted because of her visits to cats.com, and the (reasonable) assumption that cats.com visitors are animal lovers. What if she’d spent just as much time visiting sites related to other topics – desserts.com, running.com, and supportthelibrary.com? Many organizations want to access potential customers who are interested in desserts, running, and libraries. If Jane was visiting all of those sites, she’d be seeing a variety of targeted messages, exposing her to different points of view while also decreasing the impact of any single message. Jane would start to break out of the “filter bubble” created by targeted ads. In that case, Jane may not see any ads related to Prop A – or she might see ads that address the issue from a variety of perspectives. For Jane, the playing field would be leveled again.
But if Janes all over the country also began to visit a much wider variety of sites, they could level the playing field for everyone. Targeting algorithms that identify “people like Jane” look for similarities in web browsing behavior, and assume that these people will have similar ad-clicking behavior. If the dataset becomes more randomized, those correlations will be weaker, and even when similar groups are identified, they won’t result in as many clicks – driving the cost of the ads up, and reducing the incentive to retarget.
The reality, of course, is that Jane doesn’t have the time or inclination to spend hours clicking random links online just to create her own personal meaningless dataset. That’s why I created Noiszy (http://noiszy.com), a free browser plugin that runs in the background on Jane’s computer (or yours!) and creates real-but-meaningless web data – digital “noise.” It visits and navigates around websites from within the user’s browser, leaving your misleading digital footprints wherever it goes.
When organizations lose the ability to “figure us out” from our browsing data, they’ll have to work harder to build products and content that people willingly engage and share data with, rather than simply chasing clicks and impressions. Could “fake data” lead to the end of “fake news”? Targeting algorithms are happily churning away on our data, pushing whatever messages the highest bidder wants us to see, and we have no obvious way to feed back into the cycle. Meaningless data can help us hack this system, and bring about a conversation we deeply need to have: how should algorithms be (re)built for the greatest good?