I’m in Berkeley this week, where I gave two talks (here are my slides from Monday’s talk on recommendation engines, and here are my slides from Tuesday’s talk on modeling) and I’ve been hanging out with math nerds and college friends and enjoying the amazing food and cafe scene. This is the freaking life, people.
Here’s what’s been on my mind lately: the urgent need for good data journalism. If you read this Washington Post blog by Max Fisher you will get at one important angle of the problem. The article talks about the need for journalists to be competent in basic statistics and exploratory data analysis to do reasonable reporting on data, in this case the state of journalistic freedoms.
And you might think that, as long as journalists report on other stuff that’s not data heavy, they’re safe. But I’d argue that the proliferation of data is leaking into all corners of our culture, and basic data and computing literacy is becoming increasingly vital to the job of journalism.
Here’s what I’m not saying (a la Miss Disruption): learn to code, journalists, and everything will be cool. To be clear, having data skills is necessary but not sufficient.
So it’s more like, if you don’t learn to code, and even more importantly if you don’t learn to be skeptical of the models and the data, then you will have yet another obstacle between you and the truth.
Here’s one way to think about it. A few days ago I wrote a post about different ways to define and regulate discriminatory acts. On the one hand you have acts or processes that are “effectively discriminatory” and on the other you have acts or processes that are “intentionally discriminatory.”
In this day and age, we have complicated, opaque, and proprietary models: in other words, a perfect hiding place for bad intentions. It would be idiotic for someone with the intention of being discriminatory to do so outright. It’s much easier to embed such a thing in an opaque model where it will seem unintentional and will probably never be discovered at all.
But how is an investigative journalist going to even approach that? The first thing they need is to arm themselves with the right questions and the right attitude. And it wouldn’t help if they or their team can perform a test on the data and algorithm as well.
I’m not saying that we’re going to suddenly have do-everything super human journalists. Just as the list of job requirements for data scientists is outrageously long and nobody can be expert at everything, we will have to form teams of journalists which as a whole has lots of computing and investigative expertise.
The alternative is that the models go unchallenged, which is a really bad idea.
Here’s a perfect example of what I think needs to happen more: when ProPublica reverse-engineered Obama’s political messaging model.
There’s a wicked irony when it comes to many privacy advocates.
They are often narrowly focused on the their own individual privacy issues, but when it comes down to it they are typically super educated well-off nerds with few revolutionary thoughts. In other words, the very people obsessing over their privacy are people who are not particularly vulnerable to the predatory attacks of either the NSA or the private companies that make use of private data.
Let me put it this way. If I’m a data scientist working at a predatory credit card firm, seeking to build a segmentation model to target the most likely highly profitable customers – those that ring up balances and pay off minimums every month, sometimes paying late to accrue extra fees – then if I am profiling a user and notice an ad blocker or some other signal of privacy concerns, chances are that becomes a wealth indicator and I leave them alone. The mere presence of privacy concerns signals that this person isn’t worth pursuing with my manipulative scheme.
If you don’t believe me, take a look at a recent Slate article written by Cyrus Nemati and entitled Take My Data Please: How I learned to stop worrying and love a less private internet.
In it he describes how he used to be privacy obsessed, for no better reason than that he like to stick up a middle finger to those who would collect his data. I think that article should have been called something like, Well-educated white guy was a privacy freak until he realized he didn’t have to be because he’s a well-educated white guy.
He concludes that he really likes how well customized things are to his particular personality, and that shucks, we should all just appreciate the web and stop fretting.
But here’s the thing, the problem isn’t that companies are using his information to screw Cyrus Nemati. The problem is that the most vulnerable people – the very people that should be concerned with privacy but aren’t – are the ones getting tracked, mined, and screwed.
In other words, it’s silly for certain people to be scrupulously careful about their private data if they are the types of people who get great credit card offers and have a stable well-paid job and are generally healthy. I include myself in this group. I do not prevent myself from being tracked, because I’m not at serious risk.
And I’m not saying nothing can go wrong for those people, including me. Things can, especially if they suddenly lose their jobs or they have kids with health problems or something else happens which puts them into a special category. But generally speaking those people with enough time on their hands and education to worry about these things are not the most vulnerable people.
I hereby challenge Cyrus Nemati to seriously consider who should be concerned about their data being collected, and how we as a society are going to address their concerns. Recent legislation in California is a good start for kids, and I’m glad to see the New York Times editors asking for more.
This is a guest post by Leopold Dilg.
There’s little chance we can underestimate our American virtues, since our overlords so seldom miss an opportunity to point them out. A case in point – in fact, le plus grand du genre, though my fingers tremble as I type that French expression, for reasons I’ll explain soon enough – is the Cadillac commercial that interrupted the broadcast of the Olympics every few minutes.
A masterpiece of casting and directing and location scouting, the ad follows a middle-aged man, muscular enough but not too proud to show a little paunch – manifestly a Master of the Universe – strutting around his chillingly modernist $10 million vacation house (or is it his first or fifth home? no matter), every pore oozing the manly, smirky bearing that sent Republican country-club women swooning over W.
It starts with Our Hero, viewed from the back, staring down his infinity pool. He pivots and stares down the viewer. He shows himself to be one of the more philosophical species of the MotU genus. “Why do we work so hard?” he puzzles. “For this? For stuff?….” We’re thrown off balance: Will this son of Goldman Sachs go all Walden Pond on us? Fat chance.
Now, still barefooted in his shorts and polo shirt, he’s prowling his sleak living room (his two daughters and stay-at-home wife passively reading their magazines and ignoring the camera, props in his world no less than his unused pool and The Car yet to be seen) spitting bile at those foreign pansies who “stop by the café” after work and “take August off!….OFF!” Those French will stop at nothing.
“Why aren’t YOU like that,” he says, again staring us down and we yield to the intimidation. (Well gee, sir, of course I’m not. Who wants a month off? Not me, absolutely, no way.) “Why aren’t WE like that” he continues – an irresistible demand for totalizing merger. He’s got us now, we’re goose-stepping around the TV, chanting “USA! USA! No Augusts off! No Augusts off!”
No, he sneers, we’re “crazy, hardworking believers.” But those Frogs – the weaklings who called for a double-check about the WMDs before we Americans blasted Iraqi children to smithereens (woops, someone forgot to tell McDonalds, the official restaurant of the U.S. Olympic team, about the Freedom Fries thing; the offensive French Fries are THERE, right in our faces in the very next commercial, when the athletes bite gold medals and the awe-struck audience bites chicken nuggets, the Lunch of Champions) – might well think we’re “nuts.”
“Whatever,” he shrugs, end of discussion, who cares what they think. “Were the Wright Brothers insane? Bill Gates? Les Paul?… ALI?” He’s got us off-balance again – gee, after all, we DO kinda like Les Paul’s guitar, and we REALLY like Ali.
Of course! Never in a million years would the hip jazz guitarist insist on taking an August holiday. And the imprisoned-for-draft-dodging boxer couldn’t possibly side with the café-loafers on the WMD thing. Gee, or maybe…. But our MotU leaves us no time for stray dissenting thoughts. Throwing lunar dust in our eyes, he discloses that WE were the ones who landed on the moon. “And you know what we got?” Oh my god, that X-ray stare again, I can’t look away. “BORED. So we left.” YEAH, we’re chanting and goose-stepping again, “USA! USA! We got bored! We got bored!”
Gosh, I think maybe I DID see Buzz Aldrin drumming his fingers on the lunar module and looking at his watch. “But…” – he’s now heading into his bedroom, but first another stare, and pointing to the ceiling – “…we got a car up there, and left the keys in it. You know why? Because WE’re the only ones goin’ back up there, THAT’s why.” YES! YES! Of COURSE! HE’S going back to the moon, I’M going back to the moon, YOU’RE going back to the moon, WE’RE ALL going back to the moon. EVERYONE WITH A U.S. PASSPORT is going back to the moon!!
Damn, if only the NASA budget wasn’t cut after all that looting by the Wall Street boys to pay for their $10 million vacation homes, WE’D all be going to get the keys and turn the ignition on the rover that’s been sitting 45 years in the lunar garage waiting for us. But again – he must be reading our mind – he’s leaving us no time for dissent, he pops immediately out of his bedroom in his $12,000 suit, gives us the evil eye again, yanks us from the edge of complaint with a sharp, “But I digress!” and besides he’s got us distracted with the best tailoring we’ve ever seen.
Finally, he’s out in the driveway, making his way to the shiny car that’ll carry him to lower Manhattan. (But where’s the chauffer? And don’t those MotUs drive Mazerattis and Bentleys? Is this guy trying to pull one over on the suburban rubes who buy Cadillacs stupidly thinking they’ve made it to the big time?)
Now the climax: “You work hard, you create your own luck, and you gotta believe anything is possible,” he declaims.
Yes, we believe that! The 17 million unemployed and underemployed, the 47 million who need food stamps to keep from starving, the 8 million families thrown out of their homes – WE ALL BELIEVE. From all the windows in the neighborhood, from all the apartments across Harlem, from Sandy-shattered homes in Brooklyn and Staten Island, from the barren blast furnaces of Bethlehem and Youngstown, from the foreclosed neighborhoods in Detroit and Phoenix, from the 70-year olds doing Wal-mart inventory because their retirement went bust, from all the kitchens of all the families carrying $1 trillion in college debt, I hear the national chant, “YOU MAKE YOUR OWN LUCK! YOU MAKE YOUR OWN LUCK!”
And finally – the denouement – from the front seat of his car, our Master of the Universe answers the question we’d all but forgotten. “As for all the stuff? That’s the upside of taking only two weeks off in August.” Then the final cold-blooded stare and – too true to be true – a manly wink, the kind of wink that makes us all collaborators and comrades-in-arms, and he inserts the final dagger: “N’est-ce pas?”
I am looking into the history of anti-discrimination laws like the Equal Credit Opportunity Act, (ECOA) and how it got passed, and hopefully find data to measure how well it’s worked since it got passed in 1974.
Putting aside the history of this legislation for now – although it is fascinating – I’d like to talk this morning about this paper from 1989 written by Gregory Elliehausen and Thomas Durkin from the Board of Governors of the Federal Reserve System, which discusses the abstract question of approaches to defining and regulation around discrimination.
This came up because when Congress passed ECOA, they left it to the regulators – in this case the Federal Reserve – to decide exactly how to write the rules, which pertain to credit decisions (think credit card offerings). From the article:
The term “discriminate against an applicant” was defined in Section 202. 2(n) as meaning “to treat an applicant less favorably than other applicants.” By itself, this rule does not offer an unquestionably unambiguous operational definition of socially unacceptable discrimination in a screening context where limited selections are constantly being made from a longer list of applicants.
The paper then goes on to list 3 separate regulatory approaches to anti-discrimination regulation. I have found these three definition really interesting and thought-provoking. I won’t even go into the rest of the paper on this post because I think just this list of three approaches is so interesting. Tell me if you agree.
1) The “effects-based” approach to regulation. This is the idea that, we don’t need to know how you actually make credit decisions, but if the effect is that no women or minorities ever get credit from you, then you’re doing something wrong. If you want to be really extreme in this category you get to things like quotas. if you want to be less extreme you think about studying applications that are similar except for one thing like race or gender, kind of like the the male vs. female science lab application test studied here. Needless to say, effects-based regulation is not in use, it’s considered too extreme.
2) The “intent-based” approach to regulation. This is where you have to prove intent to discriminate. It’s super rare that you can do that, because it’s super rare that people aiming to discriminate are dumb enough to make it obvious. Far easier to embed discrimination in a model where you can maintain plausible deniability. Although intent-based regulation is considered too extreme in the other direction, it seems to be what surfaces when there’s a legal case (although I’m not a legal expert).
3) The “practices-based” approach to regulation. This is where you make a list of acceptable or unacceptable practices in extending credit and hope you cover everything. So for example you aren’t allowed to explicitly use race or marital status or governmental assistance status in your credit models. This is what the Fed finally decided to use, and it makes sense in that it’s easy to implement, but of course the lists change over time, and that’s the key issue (for me anyway): we need to update those lists in the age of big data.
Tell me if you think there’s yet another approach not mentioned. And note these regulatory approaches correspond to different ways of thinking about or even defining discrimination, which is itself a great reason to list them comprehensively. I think my future discussions about what constitutes discrimination will be informed by which above approach will pick up on a given instance.
Aunt Pythia has some exciting news.
After spending about 5 days of the last 7 in bed with an awful flu, and finishing off both seasons of House of Cards (with the associated feeling of being simultaneously drowned in cynicism and phlegm), Aunt Pythia started in on Battlestar Galactica, which she honestly should have done years ago.
And do you know who stars in that series, at least in Season 2? None other than yours truly, Pythia the Oracle of Delphi! I am honored, and I hope you are honored by association. Go ahead, feel the honor.
After you enjoy my column (and the honor!) today please don’t forget to:
think of something to ask Aunt Pythia at the bottom of the page!
Dear Aunt Pythia,
I gave a talk at this year’s JMM in Baltimore. It was one of those super rushed 10-minute talks. But giving any talk at all sufficed for my university to pay for my travel and lodging. That’s not to say that I didn’t take it seriously. I did. I even dressed nice for it, which I don’t normally do as a grad student and mother of a toddler. I bothered to care about a talk that has only enough time to explain its title because this year is an important year for me. It’s my last year of my PhD and I’m applying for postdocs and jobs. It’s why I attended the JMM.
My talk went well enough. I got a few questions at the end and I didn’t go over my time. And that should be the end of it. JMM is over. I can get back to stressing over my dissertation. But I got an email. An email from someone who was in the audience. He wrote to me that he enjoyed my talk and would like to meet me for dinner. He even added that this is “to be clear, a non-math invitation.”
My first thought was that I should send a reply correcting the many grammatical errors I found in his very short email. But that thought quickly changed into anger. I traveled a very long distance to work. I’m taking time away from my research, away from my 2-year-old so that I can present myself professionally to an audience of my peers and potential employers. I hope and expect to be treated like a real scientist. I remembered all the stories, all of the frustration of so many of my friends and colleagues, scientists who also happen to be women, who were treated with anything but respect just because they weren’t born with a penis. I was insulted, furious that some stupid little boy thought that this sort of behavior is appropriate.
But there was always the small chance that he is, in fact, stupid—in certain ways. After all, this is a math conference. There are mathematicians who, while brilliant, may not have (let’s just say) mature social skills. (Though this guy’s probably not too clever since a quick Google search would have revealed that I have a webpage containing a photo of me and my family and therefore not likely to be interested in dating.)
I replied with an invitation to meet for lunch. So that I can verify that he’s not developmentally challenged and confirm his implied intention. And then yell at him to his face. He didn’t end up showing, even though he sounded eager to meet in the multiple emails he sent following my response. He was probably scared away by the large crowd of my friends that had gathered around our meeting place to support me or, more likely, to witness the spectacle.
Most of the men I spoke to about this incident were sympathetic to the poor idiotic horny kid who clearly had no idea how to talk to girls. They recalled some embarrassing moments from their youth and said that I should have just mercifully sent him a gentle rejection.
I, on the other hand, find his action to be a stark example of how women are not taken seriously in science and feel he should be told that this sort of behavior is not excusable. Granted, a public shaming may not have been warranted. But I think that I am right to feel insulted in this situation.
I’m still thinking about emailing this guy and telling him off. My friend (who is usually a feminist) thinks that while the guy had absolutely no tact and needs some guidance on interacting with other humans, finding a speaker attractive and approaching her at a conference is not wrong. He thinks that had the guy joined me and my friends for drinks after my talk and then later admitted to his interest in me, I would not have been offended. I disagree.
What do you think? Am I overreacting?
Scientista (in training)
Wow, that was a really long question, but I decided to publish it all anyway, because I can see you earnestly want my advice. Not so sure you’re going to like my advice though.
Because here’s the thing, you are absolutely overreacting. I mean, that’s ok, and no actual harm done, but what a huge amount of time wasted at JMM where you could have been doing math, drinking bourbon, or playing bridge.
That’s not to say I like what the guy did, it was definitely obtuse to the point of idiocy, but there you have it, he’s an idiot. Best thing to do in that situation is to delete the email and not give it another thought.
I mean, I guess there might have been a side benefit for the rest of the math community in this planned public shaming, if word had gotten out that this guy had written such an unsolicited and unwelcome email. It might have given pause to the 450 other such emails that happened that weekend. Or not.
Also, I think we should be careful to separate your efforts in preparing your talk and coming to the conference, which were real, from this guy’s sexual interest. I’m guessing that, had you gotten 5 emails talking about the math and how awesome it is, and this email to boot, you would have been able to shrug this one off. It’s the unfortunate nature of short talks that they take a lot to prepare for but there’s little chance of getting good feedback. But let’s not take out that frustration on him entirely.
In one way I’d like to defend this guy: at least he made his explicit desires known. It would have been worse, in my opinion, if he’d come up with some math pretext for meeting and then put his hand on your knee at lunch.
Plus, I’d like to take this opportunity to defend sex at math conferences in general. I mean, it’s one of the classic ways of blowing off some steam after a long day of whirlwind 10-minute talks, married or unmarried.
Finally, and I hope this doesn’t sound too harsh, I’d like to give you some general advice. You are a woman in math, which means you are a warrior, even if you didn’t want to sign up for that. And the best and easiest way to be a warrior is to have a thick skin, to remember the victories, and to ignore the defeats.
And I don’t mean stay quiet about awful, actionable sexism that threatens your job or your responsibilities at work, but I do mean deleting idiotic emails without a second thought, from now on.
Dear Aunt Pythia,
Given that the entire financial industry seems to be loaded with unethical behavior, what do you think are ethical ways to invest your money? Certainly choosing credit unions over large banks seems to be a good way for your savings but I am curious about how you would invest for retirement. Do you think there are ethical ways to invest in stocks, bonds, etc?
Serious Pondering About Money
I get asked this a lot, but I don’t have a good answer. And honestly I worry more about people who don’t have any money saved for retirement at all, and are stuck in student or medical debt.
If you really want my advice, I’d say there are three things you could or should worry about regarding savings: liquidity, risk, and ethics. You may have more things you worry about, but this is just a starting point. I’d suggest you divide your money up into those categories, depending on how you weight the associated concerns.
For the liquidity part, keep cash in a savings account (FDIC insured) or a money market account (not FDIC insured). For the risk part, invest in an ETF for the overall market, because we’ve seen that the government props up the market so you want to ride that buffered wave whilst minimizing fees. For the ethical part, track down a company – or even an individual – doing stuff you think is good for the world and invest in it. It’s highly illiquid and highly risky to do that, but you’ve already taken care of those concerns.
Dear Aunt Pythia,
Is it OK to review NSA grant proposals?
You might have seen Beilinson’s letter to the AMS notices extolling mathematicians to break ties with the NSA. I kind of sympathize with it. The AMS helps the NSA administer its grants program and I recently got two proposals to referee. These were from young mathematicians that I hold in high regard and think deserve to be funded. As NSF funding is dwindling, if they don’t get the NSA grant they might be unfunded. Moreover, I am knowledgeable about their work and felt that if I turned down the request it would be bad for them, so I decided to review the proposals. Have I done the right thing?
Not Sure Actually
I feel your pain. The funding is drying up for these worthy researchers, but you’d rather not feel like a collaborator. Those are directly conflicting issues.
And it’s exactly what I fear when I think of the oncoming MOOC revolution and the end of math research. Who is going to fund math research when calculus is gone? The obvious answer is private companies, private individuals, and places like the NSA. Not a pretty picture.
My best advice for you is to review the proposals because you want those researchers funded – and feel slightly better that they’re doing research external to the NSA – and at the same time get involved with solving the larger funding problem for mathematics. This could mean going to talk to your congressperson about the need for mathematical funding or it could mean spreading the word more generally about the importance of math research.
Dear Auntie Pythia,
The Facebook Data Science folks posted a series of blog posts about love (or at least relationships). As a data scientist and sex oracle, what do you make of the results and/or on the use of social network data for these kinds of studies?
Lots Of Valentine’s Extrapolations
Wow, thanks for the link. I happen to know the author, Mike Develin, of those posts, first because he was a (brilliant) student of mine at math camp way back in like 1993, and second because we worked at D.E. Shaw together – although he worked in the California office.
So anyhoo, I like the posts. They’re smart. The one thing I’d say, for example about the age difference of couples in different countries, is that I have to assume there’s a bias away from older middle-aged couples and towards couples where the husband is old and the wife is young. Here’s a picture:
I say this because, even if both members of the couple are on Facebook (and that already skews somewhat young), I would guess older people are less likely to divulge their marital status. That kind of thing makes me think we should look at these charts with the caveat that they are true “in the context of Facebook data”.
In terms of the ethics of this kind of use of aggregated data, I’d say it’s great. The stuff I think is scary is the stuff that isn’t aggregated and is hidden from us.
Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!
Every now and then you see a published result that has exactly the right kind of data, in sufficient amounts, to make the required claim. It’s rare but it happens, and as a data lover, when it happens it is tremendously satisfying.
Today I want to share an example of that happening, namely with this paper entitled Regulating Consumer Financial Products: Evidence from Credit Cards (hat tip Suresh Naidu). Here’s the abstract:
We analyze the effectiveness of consumer financial regulation by considering the 2009 Credit Card Accountability Responsibility and Disclosure (CARD) Act in the United States. Using a difference-in-difference research design and a unique panel data set covering over 150 million credit card accounts, we find that regulatory limits on credit card fees reduced overall borrowing costs to consumers by an annualized 1.7% of average daily balances, with a decline of more than 5.5% for consumers with the lowest FICO scores. Consistent with a model of low fee salience and limited market competition, we find no evidence of an offsetting increase in interest charges or reduction in volume of credit. Taken together, we estimate that the CARD Act fee reductions have saved U.S. consumers $12.6 billion per year. We also analyze the CARD Act requirement to disclose the interest savings from paying off balances in 36 months rather than only making minimum payments. We find that this “nudge” increased the number of account holders making the 36-month payment value by 0.5 percentage points.
That’s a big savings for the poorest people. Read the whole paper, it’s great, but first let me show you some awesome data broken down by FICO score bins:
This data, and the results in this paper, fly directly in the face of the myth that if you regulate away predatory fees in one way, they will pop up in another way. That myth is based on the assumption of a competitive market with informed participants. Unfortunately the consumer credit card industry, as well as the small business card industry, is not filled with informed participants. This is a great example of how asymmetric information causes predatory opportunities.
Yesterday a couple of people sent me this article about mysterious deaths at JP Morgan. There’s no known connection between them, but maybe it speaks to some larger problem?
I don’t think so. A little back-of-the-envelope calculation tells me it’s not at all impressive, and this is nothing but media attention turned into conspiracy theory with the usual statistics errors.
Here are some numbers. We’re talking about 3 suicides over 3 weeks. According to wikipedia, JP Morgan has 255,000 employees, and also according to wikipedia, the U.S. suicide rate for men is 19.2 per 100,000 per year, and for women is 5.5. The suicide rates for Hong Kong and the UK, where two of the suicides took place, are much higher.
Let’s eyeball the overall rate at 19 since it’s male dominated and since may employees are overseas in higher-than-average suicide rate countries.
Since 3 weeks is about 1/17th of a year, we’d expect to see about 19/17 suicides per year per 100,000 employees, and seince we have 255,000 employees, that means about 19/17*2.55 = 2.85 suicides in that time. We had three.
This isn’t to say we’ve heard about all the suicides, just that we expect to see about one suicide a week considering how huge JP Morgan is. So let’s get over this, it’s normal. People commit suicide pretty regularly.
It’s very much like how we heard all about suicides at Foxconn, but then heard that the suicide rate at Foxconn is lower than the general Chinese population.
There is a common statistical problem called the clustering illusion, whereby actually random events look clustered sometimes. Here’s a 2-dimensional version of the clustering illusion:
Actually my calculation above points to something even dumber, which is that we expected 2.85 suicides and we saw 3, so it’s not even a proven cluster. Although it could be, because again we probably didn’t hear about all of them. Maybe it’s a cluster of “really obvious jump-from-a-building” suicides.
And I’m not saying JP Morgan is a nice place to work. I feel suicidal just thinking about working there myself. But I don’t want us to jump to any statistically unsupported conclusions.