Gaming the Google mail filter and the modeling feedback loop

Home > data science, finance > Gaming the Google mail filter and the modeling feedback loop

Gaming the Google mail filter and the modeling feedback loop

October 15, 2012 Cathy O'Neil, mathbabe

The gmail filter

If you’re like me, a large part of your life takes place in your gmail account. My gmail address is the only one I use, and I am extremely vigilant about reading emails – probably too much so.

On the flip side, I spend quite a bit of energy removing crap from my gmail. When I have the time and opportunity, and if I receive an unwanted email, I will set a gmail filter instead of just deleting. This is usually in response to mailing lists I get on by buying something online, so it’s not quite spam. For obvious spam I just click on the spam icon and it disappears.

You see, when I check out online to pay for my stuff, I am not incredibly careful about making sure I’m not signing up to be on a mailing list. I just figure I’ll filter anything I don’t want later.

Which brings me to the point. I’ve noticed lately that, more and more often, the filter doesn’t work, at least on the automatic setting. If you open an email you don’t want, you can click on “filter messages like these” and it will automatically fill out a filter form with the “from” email address that is listed.

More and more often, these quasi-spammers are getting around this somehow. I don’t know how they do it, because it’s not as simple as changing their “from” address every time, which would work pretty well. Somehow not even the email I’ve chosen to filter is actually deleted through this process.

I end up having to copy and paste the name of the product into a filter, but this isn’t a perfect solution either, since then if my friend emails me about this product I will automatically delete that genuine email.

The modeling feedback loop

This is a perfect example of the feedback loop of modeling; first there was a model which automatically filled out a filter form, then people in charge of sending out mailing lists for products realized they were being successfully filtered and figured out how to game the model. Now the model doesn’t work anymore.

The worst part of the gaming strategy is how well it works. If everybody uses the filter model, and you are the only person who games it, then you have a tremendous advantage over other marketers. So the incentive for gaming is very high.

Note this feedback loop doesn’t always exist: the stars and planets didn’t move differently just because Newton figured out his laws, and people don’t start writing with poorer penmanship just because we have machine learning algorithms that read envelopes at the post office.

But this feedback loop does seem to be associated with especially destructive models (think rating agency models for MBS’s and CDO’s). In particular, any model which is “gamed” to someone’s advantage probably exhibits something like this. It will work until the modelers strike back with a better model, in an escalation not unlike an arms race (note to ratings agency modelers: unless you choose to not make the model better even when people are clearly gaming it).

As far as I know, there’s nothing we can do about this feedback loop except to be keenly aware of it and be ready for war.

Categories: data science, finance

Comments (8)

Jonathan

October 15, 2012 at 9:14 am

Yes, this is a ubiquitous problem. Often the victim isn’t in a position to fight it. For instance, college testing is “gamed” and whatever the SAT measured when I was a kid is very different from what it measures now. It is very hard for people (particularly poor people who are disadvantaged) to fight this. The testing services and/or colleges could fight it. They do, to some extent, but it’s not clear they really want to fix it.

This is also a problem for economic regulators. It’s also not clear they want to fight this war (read Sheila Bair’s “Bull by the Horms”) and in any case are outgunned.

LikeLike
David Wees

October 15, 2012 at 11:15 am

I really simple trick that works if you are using Gmail is to add +nospam or something similar to the name portion of your Gmail address. For example, if your regular Gmail address is davidwees@ etc…. you’d write davidwees+nospam@ etc… when signing up for the service. This produces a to address that is stable (since very few companies will create a filter to strip the +nospam from your email address) and easily filtered.

Unfortunately for some providers davidwees+nospam@ etc… isn’t considered a valid email address. Another method is to create a secondary email address for which you sign up to services, and have it forwarded to your Gmail account, which means you’ll get the notification links required, but that you can probably create a rule for the forwarded mail.

LikeLike
- brianvan (@brianvan)
  
  October 15, 2012 at 2:18 pm
  
  It’s incredibly annoying when an email signup form for a company or organization rejects the “+” symbol, since RFC 5322 specifically mentions it IS a valid character. It’s a bad filtering practice that simply breaks the Internet. Grrrr.
  
  LikeLike
  - David Wees
    
    October 15, 2012 at 2:57 pm
    
    It’s just lazy developing. They are using some regular expression to evaluate the legality of your email address, and they’ve just chosen a regular expression which has a limitation that probably no one other than the developer is aware of. How many people know that a + sign is a perfectly acceptable character in an email address?
    
    LikeLike
Diddy

October 15, 2012 at 3:16 pm

If anyone had unlimited access to gmail, imagine all the value that informTion has.
Oh, and deleting mails and documents doesn’t work. Their shadow copies still exist in google servers.

LikeLike
Ross

October 18, 2012 at 6:22 am

A better version of the +nospam tip above, is to add a random full stop (period) in your email address, so rossjamesparker@ becomes ross.jamesparker@ . Gmail ignores the full stop for delivery, but you can use it as a to: element in the filter.

LikeLike
Adam

October 18, 2012 at 2:53 pm

I know this isn’t the substantive point, but given that you have your own domain you could give a different unique email address to each site.

LikeLike
Ryan Berryman

November 13, 2012 at 1:14 pm

Great article Cathy. Here’s another example of a model feedback loop in sports that I just cameacross- the NCAA Football BCS rankings:

http://sports.yahoo.com/news/ncaaf–undefeated-notre-dame-stuck-at-no-3-need-help-archaic-preseason-polls-bcs.html

LikeLike