I was taught that justice is a right that every American should have. Also justice should be the goal of every American. I think that’s what makes this country. To me, justice means the innocent should be found innocent. It means that those who do wrong should get their due punishment. Ultimately, it means fair treatment. So a call for justice shouldn’t offend or disrespect anybody. A call for justice shouldn’t warrant an apology.
Those who support me, I appreciate your support. But at the same time, support the causes and the people and the injustices that you feel strongly about. Stand up for them. Speak up for them. No matter what it is because that’s what America’s about and that’s what this country was founded on.
I think I will take him up on that suggestion, this morning at Citigroup Headquarters, 399 Park Avenue (near 54th Street) at 10:30am, in part inspired by Liz Warren’s speech from last week. See you there!
Recently I’ve seen two very different versions of what a more data-driven Congress would look like, both emerging from the recent cruddy Cromnibus bill mess.
First, there’s this Bloomberg article, written by the editors, about using data to produce evidence on whether a given policy is working or not. Given what I know about how data is produced, and how definitions of success are politically manipulated, I don’t have much hope for this idea.
Second, there was a reader’s comments on this New York Times article, also about the Cromnibus bill. Namely, the reader was calling on the New York Times to not only explore a few facts about what was contained in the bill, but lay it out with more numbers and more consistency. I think this is a great idea. What if, when Congress gave us a shitty bill, we could see stuff like:
- how much money is allocated to each thing, both raw dollars and as a percentage of the whole bill,
- who put it in the omnibus bill,
- the history of that proposed spending, and the history of voting,
- which lobbyists were pushing it, and who gets paid by them, and ideally
- all of this would be in an easy-to-use interactive.
That’s the kind of data that I’d love to see. Data journalism is an emerging field, and we might not be there yet, but it’s something to strive for.
As I wrote about already, last Friday I attended a one day workshop in Montreal called FATML: Fairness, Accountability, and Transparency in Machine Learning. It was part of the NIPS conference for computer science, and there were tons of nerds there, and I mean tons. I wanted to give a report on the day, as well as some observations.
First of all, I am super excited that this workshop happened at all. When I left my job at Intent Media in 2011 with the intention of studying these questions and eventually writing a book about them, they were, as far as I know, on nobody’s else’s radar. Now, thanks to the organizers Solon and Moritz, there are communities of people, coming from law, computer science, and policy circles, coming together to exchange ideas and strategies to tackle the problems. This is what progress feels like!
OK, so on to what the day contained and my copious comments.
Sadly, I missed the first two talks, and an introduction to the day, because of two airplane cancellations (boo American Airlines!). I arrived in the middle of Hannah Wallach’s talk, the abstract of which is located here. Her talk was interesting, and I liked her idea of having social scientists partnered with data scientists and machine learning specialists, but I do want to mention that, although there’s a remarkable history of social scientists working within tech companies – say at Bell Labs and Microsoft and such – we don’t see that in finance at all, nor does it seem poised to happen. So in other words, we certainly can’t count on social scientists to be on hand when important mathematical models are getting ready for production.
Also, I liked Hannah’s three categories of models: predictive, explanatory, and exploratory. Even though I don’t necessarily think that a given model will fall neatly into one category or the other, they still give you a way to think about what we do when we make models. As an example, we think of recommendation models as ultimately predictive, but they are (often) predicated on the ability to understand people’s desires as made up of distinct and consistent dimensions of personality (like when we use PCA or something equivalent). In this sense we are also exploring how to model human desire and consistency. For that matter I guess you could say any model is at its heart an exploration into whether the underlying toy model makes any sense, but that question is dramatically less interesting when you’re using linear regression.
Anupam Datta and Michael Tschantz
An issue I enjoyed talking about was brought up in this talk, namely the question of whether such a finding is entirely evanescent or whether we can call it “real.” Since google constantly updates its algorithm, and since ad budgets are coming and going, even the same experiment performed an hour later might have different results. In what sense can we then call any such experiment statistically significant or even persuasive? Also, IRL we don’t have clean browsers, so what happens when we have dirty browsers and we’re logged into gmail and Facebook? By then there are so many variables it’s hard to say what leads to what, but should that make us stop trying?
From my perspective, I’d like to see more research into questions like, of the top 100 advertisers on Google, who saw the majority of the ads? What was the economic, racial, and educational makeup of those users? A similar but different (because of the auction) question would be to reverse-engineer the advertisers’ Google ad targeting methodologies.
Finally, the speakers mentioned a failure on Google’s part of transparency. In your advertising profile, for example, you cannot see (and therefore cannot change) your marriage status, but advertisers can target you based on that variable.
Sorelle Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian
Next up we had Sorelle talk to us about her work with two guys with enormous names. They think about how to make stuff fair, the heart of the question of this workshop.
First, if we included race in, a resume sorting model, we’d probably see negative impact because of historical racism. Even if we removed race but included other attributes correlated with race (say zip code) this effect would remain. And it’s hard to know exactly when we’ve removed the relevant attributes, but one thing these guys did was define that precisely.
Second, say now you have some idea of the categories that are given unfair treatment, what can you do? One thing suggested by Sorelle et al is to first rank people in each category – to assign each person a percentile in their given category – and then to use the “forgetful function” and only consider that percentile. So, if we decided at a math department that we want 40% women graduate students, to achieve this goal with this method we’d independently rank the men and women, and we’d offer enough spots to top women to get our quota and separately we’d offer enough spots to top men to get our quota. Note that, although it comes from a pretty fancy setting, this is essentially affirmative action. That’s not, in my opinion, an argument against it. It’s in fact yet another argument for it: if we know women are systemically undervalued, we have to fight against it somehow, and this seems like the best and simplest approach.
Ed Felten and Josh Kroll
After lunch Ed Felton and Josh Kroll jointly described their work on making algorithms accountable. Basically they suggested a trustworthy and encrypted system of paper trails that would support a given algorithm (doesn’t really matter which) and create verifiable proofs that the algorithm was used faithfully and fairly in a given situation. Of course, we’d really only consider an algorithm to be used “fairly” if the algorithm itself is fair, but putting that aside, this addressed the question of whether the same algorithm was used for everyone, and things like that. In lawyer speak, this is called “procedural fairness.”
So for example, if we thought we could, we might want to turn the algorithm for punishment for drug use through this system, and we might find that the rules are applied differently to different people. This algorithm would catch that kind of problem, at least ideally.
David Robinson and Harlan Yu
Next up we talked to David Robinson and Harlan Yu about their work in Washington D.C. with policy makers and civil rights groups around machine learning and fairness. These two have been active with civil rights group and were an important part of both the Podesta Report, which I blogged about here, and also in drafting the Civil Rights Principles of Big Data.
The question of what policy makers understand and how to communicate with them came up several times in this discussion. We decided that, to combat cherry-picked examples we see in Congressional Subcommittee meetings, we need to have cherry-picked examples of our own to illustrate what can go wrong. That sounds bad, but put it another way: people respond to stories, especially to stories with innocent victims that have been wronged. So we are on the look-out.
Closing panel with Rayid Ghani and Foster Provost
I was on the closing panel with Rayid Ghani and Foster Provost, and we each had a few minutes to speak and then there were lots of questions and fun arguments. To be honest, since I was so in the moment during this panel, and also because I was jonesing for a beer, I can’t remember everything that happened.
As I remember, Foster talked about an algorithm he had created that does its best to “explain” the decisions of a complicated black box algorithm. So in real life our algorithms are really huge and messy and uninterpretable, but this algorithm does its part to add interpretability to the outcomes of that huge black box. The example he gave was to understand why a given person’s Facebook “likes” made a black box algorithm predict they were gay: by displaying, in order of importance, which likes added the most predictive power to the algorithm.
[Aside, can anyone explain to me what happens when such an algorithm comes across a person with very few likes? I’ve never understood this very well. I don’t know about you, but I have never “liked” anything on Facebook except my friends’ posts.]
Rayid talked about his work trying to develop a system for teachers to understand which students were at risk of dropping out, and for that system to be fair, and he discussed the extent to which that system could or should be transparent.
Oh yeah, and that reminds me that, after describing my book, we had a pretty great argument about whether credit scoring models should be open source, and what that would mean, and what feedback loops that would engender, and who would benefit.
Altogether a great day, and a fantastic discussion. Thanks again to Solon and Moritz for their work in organizing it.
I don’t have enough time for a full post today, but if you haven’t already, please watch Liz Warren’s speech from last Friday. She lays out the facts about Citigroup in an uncomplicated way. Surprising and refreshing coming from a politician.
Aunt Pythia has something in the works for you dear people, but it’s not quite ready yet, and you’ll have to wait another week. Rest assured, it will be worth it. And apologies to mathbabe.org subscribers who received an errant test post this week.
In the meantime, Aunt Pythia is going to write a quick column today from a Montreal hotel room after an amazing workshop yesterday which she will comment on later in the week.
So quick, get some tea and some flannel-lined flannel, because damn it’s wintery outside, all snowy and shit. Aunt Pythia’s about to spew her usual unreasonable nonsense!
LET’S DO THIS PEOPLES!!! And please, even if you’ve got nothing interesting to say for yourself, feel free to make something up or get inspired by Google auto complete and then go ahead and:
ask Aunt Pythia your question at the bottom of the page!
Dear Aunt Pythia,
This may not really be an “Aunt Pythia” question. But could either you or Mathbabe comment on this article on sexism in academic science?
I can imagine many ways they could be misrepresenting the statistics, but I don’t know which.
No Bias, Really?
Dear No Bias,
I was also struck by the inflammatory tone and questionable conclusions of this article. But you know, controversy sells.
So, here are a couple of lines I’ll pull out. First:
Our country desperately needs more talented people in these fields; recruiting more women could address this issue. But the unwelcoming image of the sexist academy isn’t helping. Fortunately, as we have found in a thorough analysis of recent data on women in the academic workplace, it isn’t accurate, either.
Many of the common, negative depictions of the plight of academic women are based on experiences of older women and data from before the 2000s, and often before the 1990s. That’s not to say that mistreatment doesn’t still occur — but when it does, it is largely anecdotal, or else overgeneralized from small studies.
I guess right off the bat I’d ask, how are you collecting data? The data I have personally about sexist treatment at the hands of my colleagues hasn’t, to my knowledge, been put in any database. The sexist treatment I’ve witnessed for pretty much all of my female mathematics colleagues has, equally, never been installed in a database to my knowledge. So yeah, not convinced these people know what they are talking about. It’s famously hard to prove something doesn’t exist, especially when you don’t have a search algorithm.
One possibility for the data they seem to have: they interviewed people after the fact, perhaps decades after the fact. If that’s the case, then you’d expect more and better data on older women, and that’s what we are currently seeing. There is a lag on this data collection, in other words. That’s not the same as “it doesn’t exist.” A common mistake researchers make. They take the data as “objective truth” and forget that it’s a human process to collect it (or not collect it!). Think police shootings.
The article then goes on to talk about how the data for women in math and other science fields isn’t so bad in terms of retention, promotion, and other issues. For there I’d say, the women have already gone through a mighty selection process, so in general you’d expect them to be smarter than their colleagues, so in general their promotion rates should be higher, but they aren’t. So that’s also a sign of sexism.
I mean, whatever. That’s not actually what I claim is true, so much as another interpretation of this data. My overall point is that, they have some data, and they are making strong and somewhat outrageous claims which I can dismiss without much work.
I hope that helps!
Dear Aunt Pythia,
In his November “Launchings” column, David Bressoud has presents some interesting data on differences between male and female college calculus students. As much as I’ve appreciated all of Bressoud’s careful explorations of mathematics education, I find I’m a bit irritated by his title, “MAA Calculus Study: Women Are Different,” because it appears to take the male experience as the norm.
Perhaps I was already annoyed because of a NYTimes op-ed, “Academic Science Isn’t Sexist”, in which Wendy Williams and Steven Ceci claim that “[w]e are not your father’s academy anymore,” and that the underrepresentation of women in math-intensive fields is “rooted in women’s earlier educational choices, and in women’s occupational and lifestyle preferences.” Here, too, the message seems to be “don’t worry about changing the academy — women are different from the norm, which is (naturally) that which works for men.”
My question for you, Aunt Pythia, is this: am I overreacting here?
I received my PhD in mathematics in 1984, and I’ve seen significant change for the better in the academy since then. Child care at AMS meetings? A crowd in the women’s rest room at same? Unthinkable when I started. But if women are still disproportionately “choosing” to go into other fields, might we look a little more closely at the environments in which girls and women are making their educational and “lifestyle” choices?
I welcome your thoughts. If you’re eager for more data analysis, I’d also love to hear your take on the paper by Williams, Ceci, and their colleagues.
Still One of the Underrepresented After All These Years
Without even reading that article, I can say without hesitation that yes, it’s a ridiculous title, and it’s infuriating and YOU ARE NOT OVERREACTING. To be clear, that is bold-faced, italicized, and all caps. I mean it.
The word “different” forces us to compare something to a baseline, and given that there is no baseline even mentioned, we are forced to guess at it, and that imposes the “man as default” mindset. Fuck that. I mean, if the title had been, “There are differences between male and female calculus students,” I would not have been annoyed, because even though “male” comes first, I’m not a stickler. I just want to acknowledge that if we mention one category, we mention the other as well.
To illustrate this a bit more, we don’t entitle a blog post “Whites are different” and leave it at that, because we’d be like, different from whom? From blacks? From Asians? From Asian-Americans? See how that works? You need to say different from some assumed baseline, and the assumed baseline has to be a cultural norm. And right now it’s white male. Which is arguable one reason that calculus students act differently when they are men (har!).
As for the other article, I already shit on that in the previous answer but I’m happy to do it once again. It’s bullshit, and I’m disappointed that the Times published it.
As for the article, I don’t have time now but I’ll take a look, thanks!
Dear Aunt Pythia,
I am twenty years old, near the halfway point in my senior year of a mathematics BS at a large, well-regarded public university in the Northeast. I’ve been aiming my energies at graduate school, and I am now looking at PhD program applications. Most apps ask for two or three letters of recommendation from a faculty member who is familiar with your work. This poses a very big problem, because all of my professors hate me.
Okay, maybe it’s not quite like that. But I’ve had a really lousy time in the math department at LWRPUN. My fellow students are dispassionate, unresponsive, and unfriendly. My professors are dry, uncommitted to their students, and the ones who aren’t mathematically incompetent are lousy teachers. On top of all this, a crippling bureaucracy has prevented me countless times from taking classes I’m interested in (few as they are in this catalog), substituting instead ANOTHER REQUIRED SEMESTER OF ANALYSIS.
So I haven’t made any personal connections of the sort that might benefit me in the form of a letter of rec. My work hasn’t even been that good; my depression and anxiety (in general as well as re all this) have increasingly prevented me from completing even easy homework assignments. Nobody here has seen my best mathematical work, and for that matter, nobody anywhere else has either*.
And for four years, everyone I’ve come to with this gathering creeping progressively life-eating concern has given me the same old BS about You should really put yourself out there! and It’s just so important to go to your professor’s office hours! without considering maybe — I’ve tried, I really have.
What can I do, Aunt Pythia? I’m really passionate about mathematics, but I’m worried I won’t be able to pursue my studies without these magic papers.
Reports Embargoed by Crummy Lecturers, Earnestly Seeking Solace
*I thankfully have a professor from an outside experience willing to write about my teaching credentials, but that one letter is surely not sufficient to show my potential as a graduate student and researcher.
I am afraid I will have to call bullshit on you, RECLESS. Plus your sign-off doesn’t actually spell anything.
Here’s the thing, there are no mathematically incompetent lecturers at large, well-regarded public universities. There are, in fact, mathematically very competent people who can’t get jobs at such places. Such is the pyramid-shaped job market of mathematics. So whereas I believe you when you say your lecturers have been uninspired, and uncommitted to their students, the fact that you added “mathematically incompetent” just makes me not believe you at all, in anything.
Here’s what I think is happening. You think you’re really into math, but you’ve never really understood your classes, nor have you understood that you’ve never understood your classes, because your self-image is that you’re already a mathematician, and that people have just not acknowledged your brilliance.
But that’s not how math actually works. Math is a social endeavor, where you have to communicate your ideas well enough for others to understand them, or else you aren’t doing math.
I’m not saying you haven’t had bad luck with teachers. It’s a real possibility. But there’s something else going on as well, and I don’t think you can honestly expect to go to the next level without sorting stuff out. In other words, even if you don’t love the teacher, if you loved the subject, got into it, and did the proofs, you’d still be getting adequate grades to ask for letters. The thing about writing letters, as a math prof, is that you don’t have to like the student personally to write a good letter, you just need to admire their skills. But since you can’t do that either, you won’t get good letters, and moreover I don’t think you’d deserve good letters. And therefore I don’t think you should go to grad school.
Suggestion: look carefully at your own behavior, figure out what it is you are doing that isn’t working. Maybe think of what you love about math, or about your own image of being a mathematician, and see if there’s something you really know you’re good at, and other people know it to, and develop that.
Dearest Aunt Pythia,
I have a sex question for you! Kind of. You have to get through the boring back story first…I’m a 19 year old female physics major. I’m quiet, rather mousy, and awkward. A lot of the time I feel like I have more to prove than the boys do, because I’m a girl, and because of the aforementioned shyness.
People seem to automatically assume I’m unintelligent. I think I’m just as intelligent as the boys in my program, but I don’t come off that way! Point is, I want to be this cool, strong, independent, successful, respectable girl who doesn’t take shit from misogynistic people who assume I’m inferior.
However, I feel extremely guilty about my sexual preferences. I’m pretty submissive. I’d like power exchange in my relationships…hair pulling, bondage, spanking, being bossed around, the whole bit. I like to be dominated by men. Older men. Smart older men. Hopefully I’ve successfully conveyed my dilemma. I want to be respected by the men (and women, and others) I’m surrounded by in my academic life, but taken control of as a girlfriend.
Why does what I despise happening to me in an academic setting please me so much in a romantic/sexual one? Agh, I feel like such a bad girl! (and not in the arousing way…)
This is such a relief – finally, a sex question! – and it’s honestly one of the best questions I’ve ever gotten, ever, in Aunt Pythia or elsewhere. I’m so glad I can answer this for you.
It is absolutely not in conflict to want something in a sexual context that is abhorrent to you in normal life. It is in fact a well-known pattern! You shouldn’t feel at all weird about it! Lots – LOTS – of the submissives I’ve met are, in their day jobs, the boss, literally. They have companies and are extremely fancy and in control. And then they love to be bossed around and spanked. Seriously. If anything, my feeling is that your sexual proclivities point to being alpha in real life, but maybe I’m going overboard.
So yeah, no problem here. You are killing it. And in 3 or 4 years I want you to write back and explain to me how you’ve found an amazing lover who gives you what you want in the bedroom and worships your physics prowess outside it. There will, in fact, be people lining up for this role.
And those people in your program? Do your best to ignore them. Men are just impossibly arrogant at that age, but time will humble them somewhat even as your confidence will rise as you learn more. I’m not saying it ever evens out entirely but it does improve.
Also: find other women (and super cool men) to study with. Surround yourself with supportive people. Take note of obnoxious people and avoid them. Trade up with friends whenever possible.
Well, you’ve wasted yet another Saturday morning with Aunt Pythia! I hope you’re satisfied! Please if you could, ask me a question. And don’t forget to make an amazing sign-off, they make me very very happy.
Click here for a form or just do it now:
There’s some tricky business going on right now in politics, with a bunch of ridiculous last-minute negotiations to roll back elements of Dodd-Frank and aid Wall Street banks in the current budget deal. Hell, it’s the end of the year, and people are distracted, so the public won’t mind if the banks get formal government backing for their risky trades, right?
Occupy the SEC has a petition you can sign, located here, which is opposed to these changes. You might remember Occupy the SEC for their incredible work in public comments on the Dodd-Frank bill in the first place. I urge you to go take a look at their petition and, if you agree with them, sign it.
After you sign the petition, feel free to treat yourself to some holiday satire and cheer, namely The 2014 Haters Guide To The Williams-Sonoma Catalog.
As many thoughtful people have pointed out already, Eric Garner’s case proves that video evidence is not a magic bullet to combat and punish undue police brutality. The Grand Jury deemed such evidence insufficient for an indictment, even if the average person watching the video cannot understand that point of view.
Even so, it would be a mistake to dismiss video cameras on police as entirely a bad idea. We shouldn’t assume no progress could be made simply because there’s an example which lets us down. I am no data evangelist, but neither am I someone who dismisses data. It can be powerful and we should use its power when we can.
And before I try to make the general case for video cameras on cops, let me make one other point. The Eric Garner video has already made progress in one arena, namely public opinion. Without the video, we wouldn’t be seeing nationwide marches protesting the outrageous police conduct.
A few of my data nerd thoughts:
- If cops were required to wear cameras, we’d have more data. We should think of that as building evidence, with the potential to use it to sway grand juries, criminal juries, judges, or public opinion.
- One thing I said time after time to my students this summer at the data journalism program I directed is the following: a number by itself is usually meaningless. What we need is to compare that number to a baseline. The baseline could be the average number for a population, or the median, or some range of 5th to 95th percentiles, or how it’s changed over time, or whatnot. But in order to gauge any baseline you need data.
- So in the case of police videotapes, we’d need to see how cops usually handle a situation, or how cops from other precincts handle similar situations, or the extremes of procedures in such situations, or how police have changed their procedures over time. And if we think the entire approach is heavy handed, we can also compare the data to the police manual, or to other countries, or what have you. More data is better for understanding aggregate approaches, and aggregate understanding makes it easier to fit a given situation into context.
- Finally, the cameras might also change their behavior when they are policing, knowing they are being taped. That’s believable but we shouldn’t depend on it.
- And also, we have to be super careful about how we use video evidence, and make sure it isn’t incredibly biased due to careful and unfair selectivity by the police. So, some cops are getting in trouble for turning off their cameras at critical moments, or not turning them on ever.
Let’s take a step back and think about how large-scale data collection and mining works, for example in online advertising. A marketer collects a bunch of data. And knowing a lot about one person doesn’t necessarily help them, but if they know a lot about most people, it statistically speaking does help them sell stuff. A given person might not be in the mood to buy, or might be broke, but if you dangle desirable good in front of a whole slew of people, you make sales. It’s a statistical play which, generally speaking, works.
In this case, we are the marketer, and the police are the customers. We want a lot of information about how they do their job so when the time comes we have some sense of “normal police behavior” and something to compare a given incident to or a given cop to. We want to see how they do or don’t try to negotiate peace, and with whom. We want to see the many examples of good and great policing as well as the few examples of terrible, escalating policing.
Taking another step back, if the above analogy seems weird, there’s a reason for that. In general data is being collected on the powerless, on the consumers, on the citizens, or the job applicants, and we should be pushing for more and better data to be collected instead on the powerful, on the police, on the corporations, and on the politicians. There’s a reason there is a burgeoning privacy industry for rich and powerful people.
For example, we want to know how many people have been killed by the police, but even a statistic that important is incredibly hard to come by (see this and this for more on that issue). However, it’s never been easier for the police to collect data on us and act on suspicions of troublemakers, however that is defined.
Another example – possibly the most extreme example of all – comes this very week from the reports on the CIA and torture. That is data and evidence we should have gotten much earlier, and as the New York Times demands, we should be able to watch videos of waterboarding and decide for ourselves whether it constitutes torture.
So yes, let’s have video cameras on every cop. It is not a panacea, and we should not expect it to solve our problems over night. In fact video evidence, by itself, will not solve any problem. We should think it as a mere evidence collecting device, and use it in the public discussion of how the most powerful among us treat the least powerful. But more evidence is better.
Finally, there’s the very real question of who will have access to the video footage, and whether the public will be allowed to see it at all. It’s a tough question, which will take a while to sort out (FOIL requests!), but until then, everyone should know that it is perfectly legal to videotape police in every place in this country. So go ahead and make a video with your camera when you suspect weird behavior.