Guest post: Make Your Browsing Noiszy

Home > Uncategorized > Guest post: Make Your Browsing Noiszy

Guest post: Make Your Browsing Noiszy

March 31, 2017 Cathy O'Neil, mathbabe

This is a guest post by Angela Grammatas, a digital analytics consultant specializing in worldwide implementations of online analytics tools. She loves powerful data, but doesn’t love having artificial intelligence use it in creepy ways. She also has synesthesia and paints numbers (Instgram @angelagrammatas).

This week, the US Congress voted to allow ISPs (Internet Service Providers) to collect and sell your internet data without your consent. Erasing your web data – or not allowing any to be collected in the first place – is getting more difficult, and less effective. Hiding from data collection isn’t working.

It’s time for a completely different approach. Instead of restricting our data, it’s time to create more – a lot more. A flood of meaningless data could create a noisy cover that makes our true behavior hard to understand. This could be a path to bursting the filter bubble, one person at a time. And if enough people participate, we could collectively render some datasets completely meaningless.

Why should we care about where our data goes?

Organizations rely on data to “target” users online and serve them relevant (read, “more likely to be clicked”) advertisements. Plenty of targeting is innocuous and can be genuinely helpful. For example, getting a sale offer on a recently-viewed product can be a win-win; the company makes a sale, and the customer is happy about the discount. Targeting (and re-targeting) makes that possible.

But when the pool of data gets larger and more integrated, the implications change. For example: let’s imagine that “Jane Internet” loves cats, and visits cats.com daily. One day, she’s considering how to vote on a local proposition, and she does some research by visiting two political news sites at opposite ends of the spectrum. She reads a relevant article on each site, getting a balanced view of the issue. Let’s imagine that the “Yes on Prop A” campaign has access to retargeting capabilities that utilize that large, blended dataset. Soon, Jane starts to see “Vote Yes on Prop A” advertisements on many unrelated websites, with the message that Prop A will be great for local wildlife.

Jane has no way of knowing this, but that pro-wildlife message has been chosen specifically for her, because of her past visits to cats.com. The ads are everywhere online (for Jane), so Jane believes that this message is a primary “Yes on A” talking point, and she’s encouraged to vote in agreement. The “No on A” campaign never has any opportunity to discuss or debate the point. They may not even know that the cats-related topic has been raised, because they’ve never even been exposed to it – that message is reserved for retargeting campaigns directed at people like Jane. Jane’s attempt to be a well-informed voter has been usurped by retargeting. And, perhaps most importantly, Jane doesn’t even know this has happened.

How could meaningless data help?

Jane was targeted because of her visits to cats.com, and the (reasonable) assumption that cats.com visitors are animal lovers. What if she’d spent just as much time visiting sites related to other topics – desserts.com, running.com, and supportthelibrary.com? Many organizations want to access potential customers who are interested in desserts, running, and libraries. If Jane was visiting all of those sites, she’d be seeing a variety of targeted messages, exposing her to different points of view while also decreasing the impact of any single message. Jane would start to break out of the “filter bubble” created by targeted ads. In that case, Jane may not see any ads related to Prop A – or she might see ads that address the issue from a variety of perspectives. For Jane, the playing field would be leveled again.

But if Janes all over the country also began to visit a much wider variety of sites, they could level the playing field for everyone. Targeting algorithms that identify “people like Jane” look for similarities in web browsing behavior, and assume that these people will have similar ad-clicking behavior. If the dataset becomes more randomized, those correlations will be weaker, and even when similar groups are identified, they won’t result in as many clicks – driving the cost of the ads up, and reducing the incentive to retarget.

The reality, of course, is that Jane doesn’t have the time or inclination to spend hours clicking random links online just to create her own personal meaningless dataset. That’s why I created Noiszy (http://noiszy.com), a free browser plugin that runs in the background on Jane’s computer (or yours!) and creates real-but-meaningless web data – digital “noise.” It visits and navigates around websites from within the user’s browser, leaving your misleading digital footprints wherever it goes.

When organizations lose the ability to “figure us out” from our browsing data, they’ll have to work harder to build products and content that people willingly engage and share data with, rather than simply chasing clicks and impressions. Could “fake data” lead to the end of “fake news”? Targeting algorithms are happily churning away on our data, pushing whatever messages the highest bidder wants us to see, and we have no obvious way to feed back into the cycle. Meaningless data can help us hack this system, and bring about a conversation we deeply need to have: how should algorithms be (re)built for the greatest good?

Categories: Uncategorized

Comments (28)

David B

March 31, 2017 at 8:13 am

Alternately, wouldn’t it be fun to purchase and expose the browsing history of, say, all the members of congress and regulators and lobbyists. Sunshine may show them the correct path.

LikeLike
iksperimentalist

March 31, 2017 at 8:35 am

I like your approach. I may just try it.
I worry about getting more computer infections.

Considering the new legislation:
You should include the fact that Google has been doing this for a long time. Due to an Obama edict from the past, Google had privileges that the other IPSs did not. This new action levels the playing field.
It probably would have been better to disallow Google rather than allowing everyone else.

LikeLike
Kacper ksieski

March 31, 2017 at 9:02 am

This is a brilliant concept, thank you for this post.

LikeLike
msobel

March 31, 2017 at 9:39 am

Sounds like Chaff https://en.wikipedia.org/wiki/Chaff_(countermeasure)

LikeLiked by 1 person
- n8chz
  
  April 3, 2017 at 12:45 pm
  
  Chaff vs. Chum!
  
  LikeLike
Shecky R

March 31, 2017 at 9:41 am

Sorry to say, I can’t imagine there would be enough of a user-base for Noiszy to work (…and it requires a large-scale number of users to work well). Moreover, no matter what randomness you introduce into the tracking experience, algorithms will still find patterns even if false ones, and will people want false patterns attributed to them?
My own approach (for what little impact it has) is simply to “blacklist” companies/products that send me repeated annoying, unsuitable ads and NEVER buy from them, nor ever recommend them to others — they don’t just mis-target me, they lose sales well into the future (and I do this even though I realize they may be perfectly good companies & products who just have some crappy outfit doing their digital promotion); just the old-style voting with your feet and your dollars, rather than even trying to outwit them, because they will always THINK they have the “ability to figure us out.”

LikeLike
Todd Marshall

March 31, 2017 at 9:42 am

How is browsing to those 7 sites noisy? Shouldn’t it be some large randomized collection that would nonetheless be checked for whatever hazard or unethical content it contains? Anyway a great beginning concept. It seems to sit in ‘waiting’ mode.

LikeLike
john

March 31, 2017 at 10:04 am

So, MathBabe, you created Noiszy ?

So why are you forcing us all to pony up our email addresses to you to get a download link for Noiszy – – – it seems if you really wanted to show you are not part of the marketing crowd you would just freely distribute the plugin.

As one HN reader pointed out, your plugin is for Chrome only and has Google Analytics attached to it.

LikeLike
- tim
  
  March 31, 2017 at 10:43 pm
  
  i dont have to put my email in? just click on ‘get plugin’?
  
  LikeLike
largeben (@largeben)

March 31, 2017 at 11:44 am

Cathy/Angela, I see global benefits if widely adopted, but how would you summarize personal benefits in form of protection for any single user even without widespread adoption?

LikeLike
DBear

March 31, 2017 at 12:04 pm

Internet Noise https://slifty.github.io/internet_noise/ is a website that takes the same approach, but doesn’t require users to download anything or provide any information (though you will need to leave the tab open)

LikeLike
George J. Peacock

March 31, 2017 at 12:46 pm

What about a program from a trusted vendor that takes over your browser and simply visits a shitload of diverse sites? That is, an easy-to-use “noise machine.”

LikeLike
- George Peacock
  
  March 31, 2017 at 12:50 pm
  
  Haha. Don’t leave your browser open around friends. Lesson learned. 🙂 “What about male models.” — Zoolander
  
  LikeLike
Josh

March 31, 2017 at 1:14 pm

Thank you very much.

Do you plan to create a version for Firefox or know of any similar plugins?

LikeLike
RTG

March 31, 2017 at 2:03 pm

I’ve often thought about doing something like this. I actually try to do it with my broader digital footprint, cordoning off emails between multiple accounts etc. But I’m sure that anything I do intentionally is unlikely to add enough noise to my data.

It’s an interesting concept, and a way to circumvent Congress’s unwillingness to protect basic privacies.

LikeLike
Lloyd Lofthouse

March 31, 2017 at 2:15 pm

I check the sales of my four published novels regularly, and they try to sell me my own books.

LikeLike
Margaret Thorpe

March 31, 2017 at 4:55 pm

Great idea! I will definitely be trying Noiszy 🙂 But, I imagine for this really to have much of an effect on the bigger problem, a majority of online users would have to be generating this kind of random data, no?

LikeLike
thatsmeinthecorner2012

April 1, 2017 at 1:07 am

This will destroy your ability to use your own browsing history for anything useful, as well as increase the data you use, and the amount of temporary internet files crufting up your computer. I like the concept, but it needs tweaking.

LikeLiked by 1 person
mattbeaven

April 1, 2017 at 11:49 pm

Seems like a lot of work. Maybe we could just write an algorithm to automate this. What could go wrong?!

LikeLike
LBY

April 2, 2017 at 5:15 pm

Is Noiszy open source? Where is the code?

LikeLike
Peter Mork

April 3, 2017 at 11:32 am

How does Noiszy compare with efforts like TrackMeNot? The latter has the advantage of also obfuscating search engine profiles.

LikeLike
Tam Doey

April 3, 2017 at 4:16 pm

Worry about random visits to porn sites or other undesirable visits which might end up blowing up in my face! That’s my paranoia for you

LikeLike
- Cathy O'Neil, mathbabe
  
  April 3, 2017 at 4:17 pm
  
  Think about it, though, you can always say hey, that wasn’t me, that was my randomized browser plug-in. I’m thinking of using that excuse even though I haven’t installed it.
  
  LikeLike
- Peter Mork
  
  April 3, 2017 at 4:26 pm
  
  Noiszy only generates traffic to news sites.NBC, CBS, ABC, CNN, Atlantic, MSNBC, Fox, Breitbart, and Shareblue. The last two or off by default it seems.
  
  LikeLike
dmf

April 4, 2017 at 11:45 am

Reblogged this on Deterritorial Investigations .

LikeLike
Klondike Jack

April 5, 2017 at 9:51 am

There is a broader advantage to doing this no matter what any of the downsides might be as others have pointed out. An example from the bill collection industry. They buy and sell accounts recievable on a massive scale, and the majority of what is sold are accounts that are known to be useless and of bad contact info for the same. We have been getting calls for years, over a decade by a long series of collection agencies, all looking for the same non-existent person. I amuse myself by telling them this, but we keep getting calls. Bad names/data NEVER get purged. The industry generates its own noise, its own inefficiency because selling the data is as much, maybe more a part of revenue as actual collections. When personal data becomes a commodity in and of itself, there is a big disincentive to discarding any of it, of throwing money away. Talk about externalizing your costs! To others in your own industry!! Not the kind of competition we want to see.
Marketers try to sell us stuff we really do not need and that we have no room for. In the direct mail industry of old, a 2% or so response rate (sales) was considered good. The inefficiency in marketing has always been there. The self storage industry thrives on this, but I consider their growth to be a good proxy for the many downsides of runaway consumerism, of things produced and sold for the sole purpose of generating income, not to because it’s useful to the buyers. We are owned by our possessions, a nation of hoarders and shopaholics.
The system of blindly commodified data needs to be as severely burdened as possible to try to send a wake up call about the proliferation of server farms, not just to improve the data itself. They consume a surprisingly big fraction of produced energy, and the portion of that being used to store and process useless data can only increase. Extrapolate this to the surveillance state. Decisions made based on bad data, ones that can change lives. I think you get the idea. Has anyone examined what the point of diminishing returns is when it comes to big (noisy) data and how willing its users are to absorb that inefficiency in pursuit of their goals? Viewed from afar, all this looks a whole lot like a Complexity Problem death spiral.
There is no way to avoid entropy, it simply can’t be done, so the decision has to be made based on where the least damaging and wasteful place to locate the entropy is and on what kind of inefficiency is the least damaging and wasteful. I seriously doubt that these “bad data plaques” that are building up in the circulatory systems of our IT infrastructure is the right place for that. Where the better places might actually be escapes me at the moment, but my strong suspicion is that keeping it in the non commodified, flesh and blood gray matter sector is better due to our innate ability to ignore stuff we see as useless and then re-acquire stuff that becomes useful.

LikeLike
Laura H. Chapman

April 11, 2017 at 9:06 am

I have yet to see serious discussion of how ditching privacy will influence k-12 education where online testing is the rule and the only privacy protections were FERPA and COPPA. THink also about HIPPARCHUS and the mandated use of e filings for the IRS. Anyone who has looked at ISP privacy statements knows they were practically worthless, now Congress has said, in effect, make as much money as you can on data sales.. We have leveled the playing field for you so you can compete with Google, Amazon etc. Maps of ISP services show that vast areas of the country’s have one ISP. The big winners are well known, top rated in speed, users, and integration of services.

LikeLike
Ralph Trickey

April 14, 2017 at 1:18 pm

Someone just released another tool to help fight ISPs. It uses google’s safe search which should help avoid visiting bad sites. It’s harder to set up that Noiszy but seems like a robust approach.
https://github.com/essandess/isp-data-pollution

LikeLike