Archive
Knowing the Pythagorean Theorem
This guest post is by Sue VanHattum, who blogs at Math Mama Writes. She teaches math at Contra Costa College, a community college in the Bay Area, and is working on a book titled Playing With Math: Stories from Math Circles, Homeschoolers, and Passionate Teachers, which will be published soon.
Here’s the Pythagorean Theorem:
In a right triangle, where the lengths of the legs are given by
and
, and the length of the hypotenuse is given by
, we have
Do you remember when you first learned about it? Do you remember when you first proved it?
I have no idea when or where I first saw it. It feels like something I’ve always ‘known’. I put known in quotes because in math we prove things, and I used the Pythagoeran Theorem for way too many years, as a student and as a math teacher, before I ever thought about proving it. (It’s certainly possible I worked through a proof in my high school geometry class, but my memory kind of sucks and I have no memory of it.)
It’s used in beginning algebra classes as part of terrible ‘pseudo-problems’ like this:
Two cars start from the same intersection with one traveling southbound while the other travels eastbound going 10 mph faster. If after two hours they are 10 times the square root of 24 [miles] apart, how fast was each car traveling?
After years of working through these problems with students, I finally realized I’d never shown them a proof (this seems terribly wrong to me now). I tried to prove it, and didn’t really have any idea how to get started.
This was 10 to 15 years ago, before Google became a verb, so I searched for it in a book. I eventually found it in a high school geometry textbook. Luckily it showed a visually simple proof that stuck with me. There are hundreds of proofs, many of them hard to follow.
There is something wrong with an education system that teaches us ‘facts’ like this one and knocks the desire for deep understanding out of us. Pam Sorooshian, an unschooling advocate, said in a talk to other unschooling parents:
Relax and let them develop conceptual understanding slowly, over time. Don’t encourage them to memorize anything – the problem is that once people memorize a technique or a ‘fact’, they have the feeling that they ‘know it’ and they stop questioning it or wondering about it. Learning is stunted.
She sure got my number! I thought I knew it for all those years, and it took me decades to realize that I didn’t really know it. This is especially ironic – the reason it bears Pythagoras’ name is because the Pythagoreans were the first to prove it (that we know of).
It had been used long before Pythagoras and the Greeks – most famously by the Egyptians. Egyptian ‘rope-pullers’ surveyed the land and helped build the pyramids, using a taut circle of rope with 12 equally-spaced knots to create a 3-4-5 triangle: since this is a right triangle, giving them the right angle that’s so important for building and surveying.
Ever since the Greeks, proof has been the basis of all mathematics. To do math without understanding why something is true really makes no sense.
Nowadays I feel that one of my main jobs as a math teacher is to get students to wonder and to question. But my own math education left me with lots of ‘knowledge’ that has nothing to do with true understanding. (I wonder what else I have yet to question…) And beginning algebra students are still using textbooks that ‘give’ the Pythagorean Theorem with no justification. No wonder my Calc II students last year didn’t know the difference between an example and a proof.
Just this morning I came across an even simpler proof of the Pythagorean Theorem than the one I have liked best over the past 10 to 15 years. I was amazed that I hadn’t seen it before. Well, perhaps I had seen it but never took it in before, not being ready to appreciate it. I’ll talk about it below.
My old favorite goes like this:
- Draw a square.
- Put a dot on one side (not at the middle).
- Put dots at the same place on each of the other 3 sides.
- Connect them.
- You now have a tilted square inside the bigger square, along with 4 triangles. At this point, you can proceed algebraically or visually.
Algebraic version:
- big square = small tilted square + 4 triangles
Visual version:
- Move the triangles around.
- What was
is now
- Also check out Vi Hart’s video showing a paper-folding proof (with a bit of ripping). It’s pretty similar to this one.
To me, that seemed as simple as it gets. Until I saw this:

This is an even more visual proof, although it might take a few geometric remarks to make it clear. In any right triangle, the two acute (less than 90 degrees) angles add up to 90 degrees. Is that enough to see that the original triangle, triangle A, and triangle B are all similar? (Similar means they have exactly the same shape, though they may be different sizes.) Which makes the ‘houses with asymmetrical roofs’ also all similar. Since the big ‘house’ has an ‘attic’ equal in size to the two other ‘attics’, its ‘room’ must also be equal in area to the two other ‘rooms’. Wow! (I got this language from Alexander Bogomolny’s blog post about it, which also tells a story about young Einstein discovering this proof.
Since all three houses are similar (exact same shape, different sizes), the size of the room is some given multiple of the size of the attic. More properly, area(square) = area(triangle), where
is the same for all three figures. The square attached to triangle
(whose area we will say is also
) has area
, similarly for the square attached to triangle
. But note that
which is the area of the square attached to the triangle labeled
. But
, and
, so
and it also equals
giving us what we sought:
I stumbled on the article in which this appeared (The Step to Rationality, by R. N. Shepard) while trying to find an answer to a question I have about centroids. I haven’t answered my centroid question yet, but I sure was sending out some google love when I found this.
What I love about this proof is that the triangle stay central in our thoughts throughout, and the focus stays on area, which is what this is really about. It’s all about self-similarity, and that’s what makes it so beautiful.
I think that, even though this proof is simpler in terms of steps than my old favorite, it’s a bit harder to see conceptually. So I may stick with the first one when explaining to students. What do you think?
Profit as proxy for value
I enjoyed my discussion with Doug Henwood yesterday at the Left Forum moderated by Suresh Naidu.
At the very end Doug defined capitalism pretty much like this wikipedia article:
Capitalism is an economic system based on the private ownership of the means of production, with the goal of making a profit.
Doug went on to make the point that, as a society, we might decide to replace our general pursuit of profit by a pursuit of improving our collective quality of life.
It occurred to me that Doug had identified a proxy problem much like I talked about in a recent post called How Proxies Fail. The general history of failed proxies I outlined goes like this:
- We’d like to measure something
- We can’t measure it directly so let’s come up with a proxy
- We’re aware of the problems at first
- We start to use it and it works pretty well
- We slowly forget the problems we had understood, and at the same time
- People start gaming the proxy in various ways and it loses its connection with the original object of interest.
In this example, the thing we’re trying to measure is something along the lines of “human value,” although we’d probably also want to consider value to the rest of mother nature as well. For context, we were discussing the financial system – what the purported function of the financial system is and what monstrous proportions it has taken on due to the brutal pursuit of profits over goals we might consider reasonable and useful to society.
So the proxy for value is profit. And of course we measure profit in money.
Going back to my history of proxies, it’s been a long time ago since the discussion of “whether money is a good proxy for value” was started, and a large part of economic theory, I guess, is devoted to considering the extent to which this proxy fails. I say “I guess” because I’m no economist, but I am aware of the economic concept of externality, which grapples with this discrepancy between money paid or earned, and to whom, versus actual harm or benefit, and to whom.
It could be argued that the concept and industry of regulation has been erected to deal with externalities of our profit proxy: when a chemical company pollutes the water, causing harm to nearby nature and people, regulators step in, sometimes (and sometimes people sue, of course, but most of the time they’re not even aware of the value being extracted from them, or are helpless to confront it adequately).
This is obviously more than an academic or regulatory topic: it pervades our collective lives. When an individual loses sight of the failures of the profit proxy, they value themselves or others in terms of how much money they have or the rate at which they get paid. They infer that if someone is highly paid or rich, they must be valuable. If someone’s poor, they must hold no value. There are a lot of people like this, I’m sure you’ve met them.
And that brings us to the part of the history of a failed proxy, which is that people game the proxy. We’ve seen this happen a lot lately, especially in finance and technology. And if you think about it, it’s no surprise since so much money goes through the financial system, and the financial system is now entirely technologically driven, and the systems are so complex that the regulators can’t keep up with the manufactured externalities. Someone could probably write a book reframing large parts of the financial system as purely devoted to exploiting the difference between value and money.
I don’t think I’ll start coming to different conclusions now that I have this framework to think through, but I do think it will be easier for me to spot instances of the “profit proxy failure” when I come across them. It’s especially timely for me to be thoughtful about this kind of thing, since I’m hoping to create something valuable, rather than merely profitable. I don’t want to avoid profit, obviously, but I don’t want to measure my progress with the wrong stick.
Aunt Pythia’s advice
Aunt Pythia is psyched to be able to answer your questions and dispense (self-described) invaluable advice today as always.
If you don’t know what you’re in for, go here for past advice columns and here for an explanation of the name Pythia. Most importantly,
Submit your question for Aunt Pythia at the bottom of this page!
——
Dear Aunt Pythia,
I already have great sex with my hot bearded husband, and I’ve been on hormonal birth control pills for years. So how amazing will it be if I switch to a non-hormonal copper IUD, if Obamacare makes my insurance cover it now? Please be specific. I am weighing my options.
Thank you!
Considering a Change
Dear CAC,
I love that you mentioned your husband’s beard. I needed to know that.
How amazing will it be? I’m guessing somewhat more amazing than it already is.
But I’m not sure, because I have a feeling every woman’s body responds differently to being on the pill. I’m a woman who naturally has a lot of testosterone, among other things, and so it throws me totally out of whack. For you it might not, although I’m guessing it does but just less so.
Also, I’ve been on the copper IUD, and when they say you bleed more on that, they aint lyin. But that problem doesn’t start for a few years.
Kind of annoying that the most obvious choices are so hard on a woman’s, body, isn’t it?
If you’re avoiding pregnancy but it wouldn’t be the end of the world, let me recommend spermicidal inserts, although you really do need to follow the instructions whereby you wait 10 minutes after insertion before any sperm enters (many women would consider this a feature, not a bug).
They obviously don’t protect you from STD’s or anything, though, so I suggest you go with spermicidal inserts for your hot bearded (bearded!) husband and condoms with anyone else.
Aunt Pythia
——
Dear Aunt Pythia,
Two questions. I googled “talk to a mathematician” because I wanted to see if anyone had an idea how I could split a league of 10 soccer teams into two groups, in order to minimise travel (well, its a bit more complex than that, but that’s sorta the gist).
But then I read your page, and of course the Sex questions, and a far more interesting one came to mind. So you said “Just to be clear, it is possible to see real female orgasms but you have to look for them, and they aren’t really considered mainstream porn.” And my question is, “Where?”
Love,
CurvePurve
Dear CurvePurve,
First the soccer question: I’d say cluster by geographic area, so nobody has to drive very far, but as you said it’s more complicated so I don’t have enough information to answer it. Even so, I’m going to take this moment to point out that the amount of traveling my friends do for their daughters’ soccer teams is super insane. They pretty much don’t have a life because of how much driving they do, even the ones who live in Manhattan. WTF?!
As for the second question, maybe try this.
Auntie P
——
Dear Aunt Pythia,
Okay, I’ll bite. Why do men and women report differing numbers of sexual partners? I imagine it’s partially due to social expectations.
But I did have this conversation with my partner, and I found out we defined “sex” differently – I said oral counted, she did not (I guess this could be a reflection of social pressures as well). Is that issue of differing definitions sufficient to explain the different numbers?
More generally, how do the curves compare between men and women for number of partners?
OK I’ll Bite
p.s. I promise to ask a more interesting sex question next time.
Dear OK,
I chose your question out of the remarkable collection of people doing as I asked last week and asking me this same question (thank you everyone!) because you promised to follow up with another sex question. I totally cannot hold you to that promise, since I don’t collect email addresses or anything, but I figure by putting it in italics it’s as good as a blood pledge.
It also inspires me to appeal to my readers more generally, since last week worked so well: please follow up with another, more interesting sex question next week, thanks!
On to your question. I love what you pointed out, that the different definitions of sex come into play. And I think that makes a lot of sense, especially the example you gave of oral sex.
Because why?
Because, as I think you’ll agree, when “oral sex” occurs, it’s often only the guy getting it! And then in what sense has the woman really had sex? Unless she’s Monica Lewinsky (one of my heros), there’s really nothing much there there. Which is why it totally makes sense that she wouldn’t “count” it.
Now, going back to the discrepancy.
Let’s just agree, once and for all, that if you actually got a good sample of all (straight) men and all (straight) women, meaning you got some normal men and a few slutty men, in proportion to the population at large, and if you got normal women and slutty women, again in proportion to the population at large, then the average number of sexual partners would have to be equal. It’s just a statistical truth.
One caveat: if we all had a bunch of sex, and then there was some war or illness that only affected men, and for whatever reason only affected slutty men, then we’d get a bias if we did the poll after all the slutty men died. But I don’t think that issue is in play here, and so we can’t explain the discrepancy in any way except that woman and/or men are lying about the number of people they’ve slept with.
Reader Artem commented last week with a link to a nifty article explaining this, called Men and women lie about sex to match gender expectations. The study is published here (thanks, other anonymous reader!).
From the article:
But when it came to sex, men wanted to be seen as “real men:” the kind who had many partners and a lot of sexual experience. Women, on the other hand, wanted to be seen as having less sexual experience than they actually had, to match what is expected of women.
Well, that’s the interpretation anyway. In any case they saw big discrepancies between men and women’s reported sexual experience, although these are college kids so nobody seemed to have much. Next they hooked people up to lie-detector tests and they changed their tune. This had been done before:
Back in 2003, women went from having fewer sexual partners than men (when not hooked up to a lie detector) to being essentially even to men (when hooked up to the lie detector.)
Here’s a link to an article on that 2003 study, which satisfies my statistician’s heart. The result was not exactly replicated when they did it more recently:
In this new study, women actually reported more sexual partners than men when they were both hooked up to a lie detector and thought they had to be truthful.
Hold on a second. What? That doesn’t even jive with the oral sex issue we talked about above. There must be some other thing going on. Maybe there’s a selection bias among college kids who do these studies. Maybe we should study if women are more honest than men when they’re attached to lie detectors. Maybe they have an urge to brag when attached to lie detectors.
Next week: stay tuned for OK’s (and y’alls) even more interesting sex question! I’m counting on you guys!
Love,
Aunt Pythia
p.s. I have no idea about the distribution of sexual partners for men and women. We’d have to get our hands on the raw data, which would be awesome. One of the reasons I’m proud to call myself a data scientist.
——
Please submit your sex or data science or other question to Aunt Pythia!
Moneyball Diplomacy
I’m on a train again to D.C. to attend a conference on how to use big data to enhance U.S. diplomacy and development.
I’ll be on a panel in the afternoon called Diving Into Data, which has the following blurb attached to it:
Facebook processes over 500 terabytes of data each day. More than a half billion tweets are sent daily. And so the volume of data grows. Much of this data is superfluous and is of little value to foreign policy and development experts. But a portion does contain significant information and the challenge is how to find and make use of that data. What will a rigorous economic analysis of this data reveal and how could the findings be effectively applied? Looking beyond real-time awareness and some of the other well know uses of big data, this panel will explore how a more thorough in-depth analysis of big data could prove useful in providing insights and trends that could be applied in the formulation and implementation of foreign policy.
Also on the schedule today, two keynote speakers: Nassim Taleb, author of a few books I haven’t read but everyone else has, and Kenneth Neil Cukier, author of a “big data” article I really didn’t like which was published in Foreign Affairs and which I blogged about here under the title of “The rise of big brother, big data”.
The full schedule of the day is here.
Speaking of big brother, this conference will be particularly interesting to me considering the remarkable amount of news we’ve been learning about this week centered on the U.S. as a surveillance state. Actually nothing I’ve read has surprised me, considering what I learned when I read this opinion piece on the subject, and when I watched this video with former NSA mathematician-turned whistleblower, which I blogged about here back in August 2012.
Who speaks Hebrew?
I got covered in an Israeli newspaper talking about Occupy.
Here’s the article. If you can read Hebrew, please tell me how it reads.
Update: here’s a pdf version of it.
Book out for early review
I’m happy to say that the book I’m writing with Rachel Schutt called Doing Data Science is officially out for early review. That means a few chapters which we’ve deemed “ready” have been sent to some prominent people in the field to see what they think. Thanks, prominent and busy people!
It also means that things are (knock on wood) wrapping up on the editing side. I’m cautiously optimistic that this book will be a valuable resource for people interested in what data scientists do, especially people interested in switching fields. The range of topics is broad, which I guess means that the most obvious complaint about the book will be that we didn’t cover things deeply enough, and perhaps that the level of pre-requisite assumptions is uneven. It’s hard to avoid.
Thanks to my awesome editor Courtney Nash over at O’Reilly for all her help!
And by the way, we have an armadillo on our cover, which is just plain cool:
How proxies fail
A lot of the time perfectly well-meaning data goals end up terribly wrong. Certain kinds of these problems stem from the same issue, namely using proxies.
Here’s how it works. People focus on a problem. It’s a real problem, but it’s hard to collect data on the exact question that one would like (how well are students learning? how well is the company functioning? how do we measure risk?).
People have trouble measuring the object in question directly, so they reasonably ask, how do we measure this problem?
They’re smart, so they come up with something, say some metric (standardized test scores, shareprice, VaR). It’s not perfect, though, and so they discuss in detail all the inadequacies with the metric. Even so, they’d really like to address this issue, so they decide to try it.
Then they start using it – hey, it works pretty well in spite of its known issues! We have something to focus on, to improve on!
Then two things happen. First, the people who were so thoughtful at the beginning slowly forget inadequacies of the metric, or are replaced by people who never had that conversation. Slowly the community involved with this proxy starts thinking this thing is a perfect measurement of the thing we actually care about. For all intents and purposes, of course, it is, because that’s what we’re measuring, and that’s how their paycheck is defined.
Second, the discrepancy between the proxy and the original underlying problem becomes more and more of a problem itself, and as people game the proxy, the effectiveness of the proxy is weakened. It no longer does a good job as a stand-in for the original problem, due to gaming and intense focus on the proxy. Sadly, that original problem, which was important, is ignored.
This is a tough problem to solve because we always have the urge to address problems, and we always make do with imperfect proxies and metrics. My guess at the best way to deal with the ensuing problems is to always have a minimum number of different ways to look at and quantify a problem, and to keep in mind each of their inadequacies. Have a dashboard approach, and of course always be on the look-out for metrics that are being gamed. It’s a hard sell of course because it requires deeper understanding and thoughtful interpretation.
How much would you pay to be my friend?
I am on my way to D.C. for a health analytics conference, where I hope to learn the state of the art for health data and modeling. So stay tuned for updates on that.
In the meantime, ponder this concept (hat tip Matt Stoller, who describes it as ‘neoliberal prostitution’). It’s a dating website called “What’s Your Price?” where suitors bid for dates.
What’s creepier, the sex-for-pay aspect of this, or the it’s-possibly-not-about-sex-it’s-about-dating aspect? I’m gonna go with the latter, personally, since it’s a new idea for me. What else can I monetize that I’ve been giving away too long for free?
Hey, kid, you want a bedtime story? It’s gonna cost you.
Let’s enjoy the backlash against hackathons
As much as I have loved my DataKind hackathons, where I get to meet a bunch of friendly nerds who are spend their weekend trying to solve problems using technology, I also have my reservations about the whole weekend hackathon culture, especially when:
- It’s a competition, so really you’re not solving problems as much as boasting, and/or
- you’re trying to solve a problem that nobody really cares about but which might make someone money, so you’re essentially working for free for a future VC asshole, and/or
- you kind of solve a problem that matters, but only for people like you (example below).
As Jake Porway mentions in this fine piece, having data and good intentions do not mean you can get serious results over a weekend. From his essay:
Without subject matter experts available to articulate problems in advance, you get results like those from the Reinvent Green Hackathon. Reinvent Green was a city initiative in NYC aimed at having technologists improve sustainability in New York. Winners of this hackathon included an app to help cyclists “bikepool” together and a farmer’s market inventory app. These apps are great on their own, but they don’t solve the city’s sustainability problems. They solve the participants’ problems because as a young affluent hacker, my problem isn’t improving the city’s recycling programs, it’s finding kale on Saturdays.
Don’t get me wrong, I’ve made some good friends and created some great collaborations via hackathons (and especially via Jake). But it only gets good when there’s major planning beforehand, a real goal, and serious follow-up. Actually a weekend hackathon is, at best, a platform from which to launch something more serious and sustained.
People who don’t get that are there for something other than that. What is it? Maybe this parody hackathon announcement can tell us.
It’s called National Day of Hacking Your Own Assumptions and Entitlement, and it has a bunch of hilarious and spot-on satirical commentary, including this definition of a hackathon:
Basically, a bunch of pallid millenials cram in a room and do computer junk. Harmless, but very exciting to the people who make money off the results.
This question from a putative participant of an “entrepreneur”-style hackathon:
“Why do we insist on applying a moral or altruistic gloss to our moneymaking ventures?”
And the internal thought process of a participant in a White House-sponsored hackathon:
I realized, especially in the wake of the White House murdering Aaron Swartz, persecuting/torturing Bradley Manning and threatening Jeremy Hammond with decades behind bars for pursuit of open information and government/corporate accountability that really, no-one who calls her or himself a “hacker” has any business partnering with an entity as authoritarian, secretive and tyrannical as the White House– unless of course you’re just a piece-of-shit money-grubbing disingenuous bootlicker who uses the mantle of “hackerdom” to add a thrilling and unjustified outlaw sheen to your dull life of careerist keyboard-poking for the status quo.
Aunt Pythia’s advice: finally some sex questions!
I’m psyched to be able to answer your sex-related question today, really. I just don’t know how to thank you guys. Please keep them coming.
And by the way, if you don’t know what you’re in for, go here for past advice columns and here for an explanation of the name Pythia. Most importantly,
Submit your question for Aunt Pythia at the bottom of this page!
——
Dear Aunt Pythia,
A bit of a fake [sex question] and [fake sex] question. Is there a correlation between the time taken from initiation to climax in fake sex, a.k.a. porn, and the real thing in studies anywhere?
I’m assuming that the requirements for filming, and the ability of editing, and various other factors (chemicals?) might mean that the simulated stimulation will be longer than the real-life version. If so, does the modal time change over the years (e.g movie theatre production runs vs video tape vs internet streaming times)? Or to put it another way, has our ability to maintain attention actually altered as distribution means have changed?
Fake Name
Dear Fake Name,
For a moment I was confused when I read this, because I so wanted it to be another question entirely, which it really isn’t.
Namely, I was wanting it to be a question of why women’s orgasms in porn never actually happen. I have never once in my entire porn-watching adult life seen a real woman have a real orgasm. WTF? Discussion needs to ensue here, it’s very messed up. Just to be clear, it is possible to see real female orgasms but you have to look for them, and they aren’t really considered mainstream porn.
Now that I know you’re talking about men orgasming, I have the following response: who cares?
Fake questions deserve fake answer,
Aunt Pythia
——
Dear Aunt Pythia,
Help!!! An all male cast of Mathdinosaurs sat on stage at the May 2013 Math Graduation. I wanted to puke from their smugness. We need a token alpha female mathematician here! Will you ask Mathbabe to speak here, please? Can she talk about how amazing little Mathbabes are and make the Mathdinosaurs cry? At least a little?
Will you ask Mathbabe to deliver the commencement address at Berkeley Math graduation in May 2014?
Puking and in need of rehydration
Dear Puking,
I included your question even though it’s not about sex because you’ve invented the phrase “token alpha female” which needed to happen. Also you referred to the Mathbabe in the third person, which she always appreciates.
I’m pretty sure she’d say yes if asked, she loves Berkeley! And she also loves talking about young female mathematicians and how awesome they are.
Aunt Pythia
——
Dear Aunt Pythia,
I recently saw the following statistic:
The survey also questioned students about their sex lives, finding that 72 percent enroll at Harvard as virgins and 27 percent graduate without having sex.
Surely this can’t be right!
Dance Off Pants Off
Dear DOPO,
Wait, why? Does it surprise you that quite a few people start having sex whilst in college? Or does it surprise you that not everyone has had sex by the time they leave? Or are you reading it incorrectly? Note it says: 72% of people didn’t enter actively sexing it up, 27% of people left not actively sexing it up. There’s no contradiction in terms here.
As an aside: my experience while a resident tutor at Harvard, after being an undergrad at UC Berkeley, was that those Ivy League students could really do with some more sex. It might relax them a bit – too stressed out by far.
Again, this is not the question I was hoping for, though. I was hoping for someone to ask me about how it’s possible that on average, when polled, straight men have more sexual partners than the average straight woman. Someone please ask me that, because it’s one of my favorite subjects in statistics.
Aunt Pythia
——
Dear Aunt Pythia,
I’ve started experimenting with some kink stuff—nothing too crazy, but sometimes I have rope marks or bruises on my ass. I’m still doing vanilla dating, though. What do I do to explain the marks/bruises when I get intimate with a vanilla guy? Thanks!
Boldly Daringly Sexually Mixing-it-up
Dear BDSM,
Yes! Yes! YES!!! Finally a straight up sex question. I thank you from the bottom of my heart.
I have asked a bunch of my favorite kinky people this question in the last couple of days and I’ve gotten a pretty consistent response. I will put them all together in a kind of decision-tree format just to be incredibly nerdy:
1) Only explain it if he asks.
2) If he asks, depending on your mood and how much you enjoy fucking with him and/or how worried you are about his reaction, you might either just tell him the truth outright or you might want to ask him “do you really want to know?”
3) If he answers to that “No I guess not”, depending on how you feel you might want to say, “Oh it’s just from playing rugby”, which for whatever reason seems to be a catch-all explanation of any bodily harm.
4) If he answers “I’m interested in knowing” then tell him the truth outright.
5) Important: when you tell him the truth, it has to be like you’re sharing an awesome thing which he’s lucky to know about. Don’t act ashamed of your kinks, because his reaction to it will be very dependent on how you present it. In other words, talk about it like it’s a secret Star Trek series that nobody’s ever heard about but which is now on Netflix.
I hope that helps!
Aunt Pythia
——
Please submit your sex or data science or other question to Aunt Pythia!
Technocrats and big data
Today I’m finally getting around to reporting on the congressional subcommittee I went to a few weeks ago on big data and analytics. Needless to say it wasn’t what I’d hoped.
My observations are somewhat disjointed, since there was no coherent discussion, so I guess I’ll just make a list:
- The Congressmen and women seem to know nothing more about the “Big Data Revolution” than what they’d read in the now-famous McKinsey report which talks about how we’ll need 180,000 data scientists in the next decade and how much money we’ll save and how competitive it will make our country.
- In other words, with one small exception I’ll discuss below, the Congresspeople were impressed, even awed, at the intelligence and power of the panelists. They were basically asking for advice on how to let big data happen on a bigger and better scale. Regulation never came up, it was all about, “how do we nurture this movement that is vital to our country’s health and future?”
- There were three useless panelists, all completely high on big data and making their money being like that. First there was a schmuck from the NSF who just said absolutely nothing, had been to a million panels before, and was simply angling to be invited to yet more.
- Next there was a guy who had started training data-ready graduates in some masters degree program. All he ever talked about is how programs like his should be funded, especially his, and how he was talking directly with employers in his area to figure out what to train his students to know.
- It was especially interesting to see how this second guy reacted when the single somewhat thoughtful and informed Congressman, whose name I didn’t catch because he came in and left quickly and his name tag was miniscule, asked him about whether or not he taught his students to be skeptical. The guy was like, I teach my students to be ready to deal with big data just like their employers want. The congressman was like, no that’s not what I asked, I asked whether they can be skeptical of perceived signals versus noise, whether they can avoid making huge costly mistakes with big data. The guy was like, I teach my students to deal with big data.
- Finally there was the head of IBM Research who kept coming up with juicy and misleading pro-data tidbits which made him sound like some kind of saint for doing his job. For example, he brought up the “premature infants are being saved” example I talked about in this post.
- The IBM guy was also the only person who ever mentioned privacy issues at all, and he summarized his, and presumably everyone else’s position on this subject, by saying “people are happy to give away their private information for the services they get in return.” Thanks, IBM guy!
- One more priceless moment was when one of the Congressmen asked the panel if industry has enough interaction with policy makers. The head of IBM Research said, “Why yes, we do!” Thanks, IBM guy!
I was reminded of this weird vibe and power dynamic, where an unchallenged mysterious power of big data rules over reason, when I read this New York Times column entitled Some Cracks in the Cult of Technocrats (hat tip Suresh Naidu). Here’s the leading paragraph:
We are living in the age of the technocrats. In business, Big Data, and the Big Brains who can parse it, rule. In government, the technocrats are on top, too. From Washington to Frankfurt to Rome, technocrats have stepped in where politicians feared to tread, rescuing economies, or at least propping them up, in the process.
The column was written by Chrystia Freeland and it discusses a recent paper entitled Economics versus Politics: Pitfalls of Policy Advice by Daron Acemoglu from M.I.T. and James Robinson from Harvard. A description of the paper from Freeland’s column:
Their critique is not the standard technocrat’s lament that wise policy is, alas, politically impossible to implement. Instead, their concern is that policy which is eminently sensible in theory can fail in practice because of its unintended political consequences.
In particular, they believe we need to be cautious about “good” economic policies that have the side effect of either reinforcing already dominant groups or weakening already frail ones.
“You should apply double caution when it comes to policies which will strengthen already powerful groups,” Dr. Acemoglu told me. “The central starting point is a certain suspicion of elites. You really cannot trust the elites when they are totally in charge of policy.”
Three examples they discuss in the paper: trade unions, financial deregulation in the U.S., privatization in Russia. Examples where something economists suggested would make the system better also acted to reinforce power of already powerful people.
If there’s one thing I might infer from my trip to Washington, it’s that the technocrats in charge nowadays, whose advice is being followed, may have subtly shifted away from deregulation economists and towards big data folks. Not that I’m holding my breath for Bob Rubin to be losing his grip any time soon.
Left Forum panels next weekend: #OWS Alt Banking meeting and a debate with Doug Henwood
Next weekend at Pace University in New York City I’ll be taking part in two panels at the Left Forum, a yearly conference of progressives that everybody who’s anybody seems to know about, although this will be my first year there. For example Noam Chomsky is coming this year.
First, from noon til 1:40 on Saturday June 8th, I’ll be debating how to shrink the financial sector with Doug Henwood, author of Wall Street: how it works and for whom. The panel will be moderated by my buddy Suresh Naidu, an occupier profiled in the Huffington Post. The announcement for this panel is here and includes room information.
Second, from 3:40 til 5:20, also on Saturday June 8th, I’ll be facilitating a meeting of the Alternative Banking group of OWS, which will be loads of fun. The idea is to explain to the panel audience how we roll in Alt Banking, to have a discussion about breaking up the banks, and to get the audience to participate as well. We expect them to enjoy getting on stack. The announcement for this panel is here, please come!
Registration for the Left Forum is still open and is affordable. Go here to register, and see you next weekend!
Huge fan of citibikes
In spite of the nasty corporate connection to megabank Citigroup, I’m a huge of the new bike share program in downtown Manhattan and Brooklyn. I got my annual membership for $95 last week and activated it online and I already used it three times yesterday even though it was raining the whole time.
It helps that I work on 21st street near 6th avenue, which is one of the 300 stations so far set up with bikes. I biked downtown along Broadway to NYU to have lunch with Johan, and since we’d walked along Bleecker Street for some distance, I grabbed a bike from a different station on the way up along 6th.
Then later in the day I was meeting someone at Bryant Park so I biked up there, getting ridiculously wet but being super efficient. Now you know where my priorities are.
Here’s the map I’ve been staring at for the past week. It’s interactive, but just to give you an idea I captured a screenshot:
Friday I’m meeting my buddy Kiri near her work in downtown Brooklyn for lunch. Yeah!!
Sign up today, people!
New Jersey at risk of implementing untested VAM-like teacher evaluation model
This is a guest post by Eugene Stern.
A big reason I love this blog is Cathy’s war on crappy models. She has posted multiple times already about the lousy performance of models that rate teachers based on year-to-year changes in student test scores (for example, read about it here). Much of the discussion focuses on the model used in New York City, but such systems have been, or are being, put in place all over the country. I want to let you know about the version now being considered for use across the river, in New Jersey. Once you’ve heard more, I hope you’ll help me try to stop it.
VAM Background
A little background if you haven’t heard about this before. Because it makes no sense to rate teachers based on students’ absolute grades or test scores (not all students start at the same place each year), the models all compare students’ test scores against some baseline. The simplest thing to do is to compare each student’s score on a test given at the end of the school year against their score on a test given at the end of the previous year. Teachers are then rated based on how much their students’ scores improved over the year.
Comparing with the previous year’s score controls for the level at which students start each year, but not for other factors beside the teacher that affect how much they learn. This includes attendance, in-school environment (curriculum, facilities, other students in the class), out-of-school learning (tutoring, enrichment programs, quantity and quality of time spent with parents/caregivers), and potentially much more. Fancier models try to take these into account by comparing each student’s end of year score with a predicted score. The predicted score is based both on the student’s previous score and on factors like those above. Improvement beyond the predicted score is then attributed to the teacher as “value added” (hence the name “value-added models,” or VAM) and turned into a teacher rating in some way, often using percentiles. One such model is used to rate teachers in New York City.
It’s important to understand that there is no single value-added model, rather a family of them, and that the devil is in the details. Two different teacher rating systems, based on two models of the predicted score, may perform very differently – both across the board, and in specific locations. Different factors may be more or less important depending on where you are. For example, income differences may matter more in a district that provides few basic services, so parents have to pay to get extracurriculars for their kids. And of course the test itself matters hugely as well.
Testing the VAM models
Teacher rating models based on standardized tests have been around for 25 years or so, but two things have happened in the last decade:
- Some people started to use the models in formal teacher evaluation, including tenure decisions.
- Some (other) people started to test the models.
This did not happen in the order that one would normally like. Wanting to make “data-driven decisions,” many cities and states decided to start rating teachers based on “data” before collecting any data to validate whether that “data” was any good. This is a bit like building a theoretical model of how cancer cells behave, synthesizing a cancer drug in the lab based on the model, distributing that drug widely without any trials, then waiting around to see how many people die from the side effects.
The full body count isn’t in yet, but the models don’t appear to be doing well so far. To look at some analysis of VAM data in New York City, start here and here. Note: this analysis was not done by the city but by individuals who downloaded the data after the city had to make it available because of disclosure laws.
I’m not aware of any study on the validity of NYC’s VAM ratings done by anyone actually affiliated with the city – if you know of any, please tell me. Again, the people preaching data don’t seem willing to actually use data to evaluate the quality of the systems they’re putting in place.
Assuming you have more respect for data than the mucky-mucks, let’s talk about how well the models actually do. Broadly, two ways a model can fail are being biased and being noisy. The point of the fancier value-added models is to try to eliminate bias by factoring in everything other than the teacher that might affect a student’s test score. The trouble is that any serious attempt to do this introduces a bunch of noise into the model, to the degree that the ratings coming out look almost random.
You’d think that a teacher doesn’t go from awful to great or vice versa in one year, but the NYC VAM ratings show next to no correlation in a teacher’s rating from one year to the next. You’d think that a teacher either teaches math well or doesn’t, but the NYC VAM ratings show next to no correlation in a teacher’s rating teaching a subject to one grade and their rating teaching it to another – in the very same year! (Gary Rubinstein’s blog, linked above, documents these examples, and a number of others.) Again, this is one particular implementation of a general class of models, but using such noisy data to make significant decisions about teachers’ careers seems nuts.
What’s happening in New Jersey
With all this as background, let’s turn to what’s happening in New Jersey.
You may be surprised that the version of the model proposed by Chris Christie‘s administration (the education commissioner is Christie appointee Chris Cerf, who helped put VAM in place in NYC) is about the simplest possible. There is no attempt to factor out bias by trying to model predicted scores, just a straight comparison between this year’s standardized test score and last year’s. For an overview, see this.
In more detail, the model groups together all students with the same score on last year’s test, and represents each student’s progress by their score on this year’s test, viewed as a percentile across this group. That’s it. A fancier version uses percentiles calculated across all students with the same score in each of the last several years. These can’t be calculated explicitly (you may not find enough students that got exactly the same score each the last few years), so they are estimated, using a statistical technique called quantile regression.
By design, both the simple and the fancy version ignore everything about a student except their test scores. As a modeler, or just as a human being, you might find it silly not to distinguish between a fourth grader in a wealthy suburb who scored 600 on a standardized test from a fourth grader in the projects with the same score. At least, I don’t know where to find a modeler who doesn’t find it silly, because nobody has bothered to study the validity of using this model to rate teachers. If I’m wrong, please point me to a study.
Politics and SGP
But here we get into the shell game of politics, where rating teachers based on the model is exactly the proposal that lies at the end of an impressive trail of doubletalk. Follow the bouncing ball.
These models, we are told, differ fundamentally from VAM (which is now seen as somewhat damaged goods politically, I suspect). While VAM tried to isolate teacher contribution, these models do no such thing – they are simply measuring student progress from year to year, which, after all, is what we truly care about. The models have even been rebranded with a new name: student growth percentiles, or SGP. SGP is sold as just describing student progress rather than attributing it to teachers, there can’t be any harm in that, right? – and nothing that needs validation, either. And because SGP is such a clean methodology – if you’re looking for a data-driven model to use for broad “educational assessment,” don’t get yourself into that whole VAM morass, use SGP instead!
Only before you know it, educational assessment turns into, you guessed it, rating teachers. That’s right: because these models aren’t built to rate teachers, they can focus on the things that really matter (student progress), and thus end up being – wait for it – much better for rating teachers! War is peace, friends. Ignorance is strength.
Creators of SGP
You can find a good discussion of SGP’s and their use in evaluation here, and a lot more from the same author, the impressively prolific Bruce Baker, here. Here’s a response from the creators of SGP. They maintain that information about student growth is useful (duh), and agree that differences in SGP’s should not be attributed to teachers (emphasis mine):
Large-scale assessment results are an important piece of evidence but are not sufficient to make causal claims about school or teacher quality.
SGP and teacher evaluations
But guess what?
The New Jersey Board of Ed and state education commissioner Cerf are putting in place a new teacher evaluation code, to be used this coming academic year and beyond. You can find more details here and here.
Summarizing: for math and English teachers in grades 4-8, 30% of their annual evaluation next year would be mandated by the state to come from those very same SGP’s that, according to their creators, are not sufficient to make causal claims about teacher quality. These evaluations are the primary input in tenure decisions, and can also be used to take away tenure from teachers who receive low ratings.
The proposal is not final, but is fairly far along in the regulatory approval process, and would become final in the next several months. In a recent step in the approval process, the weight given to SGP’s in the overall evaluation was reduced by 5%, from 35%. However, the 30% weight applies next year only, and in the future the state could increase the weight to as high as 50%, at its discretion.
Modeler’s Notes
Modeler’s Note #1: the precise weight doesn’t really matter. If the SGP scores vary a lot, and the other components don’t vary very much, SGP scores will drive the evaluation no matter what their weight.
Modeler’s Note #2: just reminding you again that this data-driven framework for teacher evaluation is being put in place without any data-driven evaluation of its effectiveness. And that this is a feature, not a bug – SGP has not been tested as an attribution tool because we keep hearing that it’s not meant to be one.
In a slightly ironic twist, commissioner Cerf has responded to criticisms that SGP hasn’t been tested by pointing to a Gates Foundation study of the effectiveness of… value-added models. The study is here. It draws pretty positive conclusions about how well VAM’s work. A number of critics have argued, pretty effectively, that the conclusions are unsupported by the data underlying the study, and that the data actually shows that VAM’s work badly. For a sample, see this. For another example of a VAM-positive study that doesn’t seem to stand up to scrutiny, see this and this.
Modeler’s Role Play #1
Say you were the modeler who had popularized SGP’s. You’ve said that the framework isn’t meant to make causal claims, then you see New Jersey (and other states too, I believe) putting a teaching evaluation model in place that uses SGP to make causal claims, without testing it first in any way. What would you do?
So far, the SGP mavens who told us that “Large-scale assessment results are an important piece of evidence but are not sufficient to make causal claims about school or teacher quality” remain silent about the New Jersey initiative, as far as I know.
Modeler’s Role Play #2
Now you’re you again, and you’ve never heard about SGP’s and New Jersey’s new teacher evaluation code until today. What do you do?
I want you to help me stop this thing. It’s not in place yet, and I hope there’s still time.
I don’t think we can convince the state education department on the merits. They’ve made the call that the new evaluation system is better than the current one or any alternatives they can think of, they’re invested in that decision, and we won’t change their minds directly. But we can make it easier for them to say no than to say yes. They can be influenced – by local school administrators, state politicians, the national education community, activists, you tell me who else. And many of those people will have more open minds. If I tell you, and you tell the right people, and they tell the right people, the chain gets to the decision makers eventually.
I don’t think I could convince Chris Christie, but maybe I could convince Bruce Springsteen if I met him, and maybe Bruce Springsteen could convince Chris Christie.
VAM-anifesto
I thought we could start with a manifesto – a direct statement from the modeling community explaining why this sucks. Directed at people who can influence the politics, and signed by enough experts (let’s get some big names in there) to carry some weight with those influencers.
Can you help? Help write it, sign it, help get other people to sign it, help get it to the right audience. Know someone whose opinion matters in New Jersey? Then let me know, and help spread the word to them. Use Facebook and Twitter if it’ll help. And don’t forget good old email, phone calls, and lunches with friends.
Or, do you have a better idea? Then put it down. Here. The comments section is wide open. Let’s not fall back on criticizing the politicians for being dumb after the fact. Let’s do everything we can to keep them from doing this dumb thing in the first place.
Shame on us if we can’t make this right.
Aunt Pythia’s advice: fake sex, boredom, peeing in the toilet, knitting Klein bottles, and data project management
If you don’t know what you’re in for, go here for past advice columns and here for an explanation of the name Pythia. Most importantly,
Please submit your questions for Aunt Pythia at the bottom of this column!
——
Dear Aunt Pythia,
Do you prefer that we ask you fake [sex questions] or [fake sex] questions? From your website it seems that you prefer the former, but would you also be amused by the latter?
Fakin’ Bacon
Dear Fakin,
I can’t tell, because I’ve gotten neither kind (frowny face).
If I started getting a bunch then I could do some data collecting on the subject. If I had to guess I’d go with the latter though.
Bring it on!
Aunt Pythia
——
Aunt Pythia,
Are boredom and intelligence correlated?
Bored
Dear Bored,
It has been my fantasy for quite a few years to be bored. Hasn’t happened. All I can conclude from my own experience is that being a working mother of three, blogger, knitting freak, and activist is not correlated to boredom.
Aunt P
——
Dear Aunt Pythia,
How can I get my husband to pee IN the toilet?
Pee I Shouldn’t See Ever, Dammit
Dear PISSED,
Start by asking him to be in charge of cleaning the bathroom. If that’s insufficient ask him to sit down to pee – turns out men can do that. If he’s unwilling, suggest that you’re going to pee standing up now for women’s lib reasons (whilst he’s still in charge of cleaning the bathroom).
Hope I helped!
Aunt Pythia
——
Dear Aunt Pythia,
How can I get my wife to stop nagging me about peeing in the toilet?
Isaac Peter Freely
Dear I.P. Freely,
Look for a nearby gas station and do your business there. That’ll shut her up.
Auntie P
——
Dear Aunt Pythia,
Is there any reason I should bother knitting a Klein Bottle? Isn’t just knowing I could do it enough? Or would it actually impress (or give pleasure to) others?
Procrastinating Parametricist
Dear PP,
If you’re looking for an excuse to knit a Klein Bottle, find a high school math teacher that would be psyched to use it as an exhibit for their class.
If you’re trying to understand how to rationalize the act of knitting anything ever, give up immediately, it makes no sense. We knitters do it because we love doing it.
Love and kisses,
Aunt Pythia
——
Dear Aunt Pythia,
I’m a Data Scientist (or Business Analytics pretending to be a Data Scientist) and I’m the leader of a small team at the company work for. We have to analyse data, fit models and so on. I’m struggling right now trying to figure out what’s the best way to manage our analysis.
I’m reading some stuff related to project management, and some stuff related to Scrum. However, at least for now, I don’t think they exactly fit our needs. Scrum seems great for software development, but I’m not so sure it works well for modeling development or statistical prototyping. Do you have any ideas on this? Should I just try scrum anyway?
Typically, most of our projects begin with some loose equirements (we want to understand this and that, or to predict this and that, or to learn the causal effect of this and that). Then, we get some data, spend sometime cleaning and aggregating it, then doing some descriptive analysis, some model fit and then preparing to present our results. I always have in mind what our results will look like, but there is always something I didn’t expect to intervene.
Say, I’m calculating the size of control group and then I realize my variables of interest aren’t normally distributed and have to adapt the way we compute sample size of control group. Then either I do a rough calculation based on assumptions of normality of data or we study and adapt new ways to better approximate our data (say, using a lognormal distribution). Anyway, I’ll probably delay our results or deliver results with inferior quality.
So, my question is, do you know of any software or methodology to use with data science or data analysis in the same ways as there is Scrum for software development?
Brazilian (fake?) Data Scientist
Dear B(f)DS,
I agree with you, data projects aren’t the same kettle of fish as engineering projects. By their very nature they take whimsical turns that can’t be drawn up beforehand.
Even so, I think forcing oneself to break down the steps of a data project can be useful, and for that reason I like using project management tools when I do data projects – not that it will give me a perfect estimate of time til completion, which it won’t, but it will give me a sense of trajectory of the project.
It helps, for example, if I say something like, “I’ll try to be done with exploratory data analysis by the end of the second day.” Otherwise I might just keep doing that without really getting much in return, but if I know I only have two days to squeeze out the juice, I’ll be more thoughtful and targeted in my work.
The other thing about using those tools is that upper-level managers love them. I think they love them so much that it’s worth using them even knowing they will be inaccurate in terms of time, because it makes people feel more in control. And actually being inaccurate doesn’t mean they’re meaningless – there’s more information in those estimates than in nothing.
Finally, one last thing that’s super useful about those tools is that, if your data team is being overloaded with work, you can use the tool to push back. So if someone is giving you a new project, you can point to all the other projects you already have and say, “these are all the projects that won’t be getting done if I take this one on.” Make the tool work for you!
To sum up, I say you try Scrum. After a few projects you can start doing a data analysis on Scrum, estimating how much of a time fudge factor you should add to each estimate do to unforeseen data issues.
I hope that’s helpful,
Aunt Pythia
——
Please submit your question to Aunt Pythia!
The Bounded Gaps Between Primes Theorem has been proved
There’s really exciting news in the world of number theory, my old field. I heard about it last month but it just hit the mainstream press.
Namely, mathematician Yitang Zhang just proved is that there are infinitely many pairs of primes that differ by at most 70,000,000. His proof is available here and, unlike Mochizuki’s claim of a proof of the ABC Conjecture, this has already been understood and confirmed by the mathematical community.
Go take a look at number theorist Emmanuel Kowalski‘s blog post on the subject if you want to understand the tools Zhang used in his proof.
Also, my buddy and mathematical brother Jordan Ellenberg has an absolutely beautiful article in Slate explaining why mathematicians believed this theorem had to be true, due to the extent to which we can consider prime numbers to act as if they are “randomly distributed.” My favorite passage from Jordan’s article:
It’s not hard to compute that, if prime numbers behaved like random numbers, you’d see precisely the behavior that Zhang demonstrated. Even more: You’d expect to see infinitely many pairs of primes that are separated by only 2, as the twin primes conjecture claims.
(The one computation in this article follows. If you’re not onboard, avert your eyes and rejoin the text where it says “And a lot of twin primes …”)
Among the first N numbers, about N/log N of them are primes. If these were distributed randomly, each number n would have a 1/log N chance of being prime. The chance that n and n+2 are both prime should thus be about (1/log N)^2. So how many pairs of primes separated by 2 should we expect to see? There are about N pairs (n, n+2) in the range of interest, and each one has a (1/log N)^2 chance of being a twin prime, so one should expect to find about N/(log N)^2 twin primes in the interval.
Congratulations!
Fight back against surveillance using TrackMeNot, TrackMeNot mobile?
After two days of travelling to the west coast and back, I’m glad to be back to my blog (and, of course, my coffee machine, which is the real source of my ability to blog every morning without distraction: it makes coffee at the push of a button, and that coffee has a delicious amount of caffeine).
Yesterday at the hotel I grabbed a free print edition of the Wall Street Journal to read on the plane, and I was super interested in this article called Phone Firm Sells Data on Customers. They talk about how phone companies (Verizon, specifically) are selling location data and browsing data about customers, how some people might be creeped out by this, and then they say:
The new offerings are also evidence of a shift in the relationship between carriers and their subscribers. Instead of merely offering customers a trusted conduit for communication, carriers are coming to see subscribers as sources of data that can be mined for profit, a practice more common among providers of free online services like Google Inc. and Facebook Inc.
Here’s the thing. It’s one thing to make a deal with the devil when I use Facebook: you give me something free, in return I let you glean information about me. But in terms of Verizon, I pay them like $200 per month for my family’s phone usage. That’s not free! Fuck you guys for turning around and selling my data!
And how are marketers going to use such location data? They will know how desperate you are for their goods and charge you accordingly. Like this for example, but on a much wider scale.
There are a two things I can do to object to this practice. First, I write this post and others, railing against such needless privacy invasion practices. Second, I can go to Verizon, my phone company, and get myself off the list. The instructions for doing so seem to be here, but I haven’t actually followed them yet.
Here’s what I wish a third option were: a mobile version of Trackmenot, which I learned about last week from Annelies Kamran.
Trackmenot, created by Daniel C. Howe and Helen Nissenbaum at what looks like the CS department of NYU, confuses the data gatherers by giving them an overload of bullshit information.
Specifically, it’s a Firefox add-on which sends you to all sorts of websites while you’re not actually using your browser. The data gatherers get endlessly confused about what kind of person you actually are this way, thereby fucking up the whole personal data information industry.
I have had this idea in the past, and I’m super happy it already exists. Now can someone do it for mobile please? Or even better, tell me it already exists?
Mr. Ratings Reformer Goes to Washington: Some Thoughts on Financial Industry Activism
This is a guest post by Marc Joffe, the principal consultant at Public Sector Credit Solutions, an organization that provides data and analysis related to sovereign and municipal securities. Previously, Joffe was a Senior Director at Moody’s Analytics for more than a decade.
Note to readers: for a bit of background on the SEC Credit Ratings Roundtable and the Franken Amendment see this recent mathbabe post.
I just returned from Washington after participating in the SEC’s Credit Ratings Roundtable. The experience was very educational, and I wanted to share what I’ve learned with readers interested in financial industry reform.
First and foremost, I learned that the Franken Amendment is dead. While I am not a proponent of this idea – under which the SEC would have set up a ratings agency assignment authority – I do welcome its intentions and mourn its passing. Thus, I want to take some time to explain why I think this idea is dead, and what financial reformers need to do differently if they want to see serious reforms enacted.
The Franken Amendment, as revised by the Dodd Frank conference committee, tasked the SEC with investigating the possibility of setting up a ratings assignment authority and then executing its decision. Within the SEC, the responsibility for Franken Amendment activities fell upon the Office of Credit Ratings (OCR), a relatively new creature of the 2006 Credit Rating Agency Reform Act.
OCR circulated a request for comments – posting the request on its web site and in the federal register – a typical SEC procedure. The majority of serious comments OCR received came from NRSROs and others with a vested interest in perpetuating the status quo or some close approximation thereof. Few comments came from proponents of the Franken Amendment, and some of those that did were inarticulate (e.g., a note from Joe Sixpack of Anywhere, USA saying that rating agencies are terrible and we just gotta do something about them).
OCR summarized the comments in its December 2012 study of the Franken Amendment. Progressives appear to have been shocked that OCR’s work product was not an originally-conceived comprehensive blueprint for a re-imagined credit rating business. Such an expectation is unreasonable. SEC regulators sit in Washington and New York; not Silicon Valley. There is little upside and plenty of political downside to taking major risks. Regulators are also heavily influenced by the folks they regulate, since these are the people they talk to on a day-to-day basis.
Political theorists Charles Lindblom and Aaron Wildavsky developed a theory that explains the SEC’s policymaking process quite well: it is called incrementalism. Rather than implement brand new ideas, policymakers prefer to make marginal changes by building upon and revising existing concepts.
While I can understand why Progressives think the SEC should “get off its ass” and really fix the financial industry, their critique is not based in the real world. The SEC is what it is. It will remain under budget pressure for the forseeable future because campaign donors want to restrict its activities. Staff will always be influenced by financial industry players, and out-of-the-box thinking will be limited by the prevailing incentives.
Proponents of the Franken Amendment and other Progressive reforms have to work within this system to get their reforms enacted. How? The answer is simple: when a request for comment arises they need to stuff the ballot box with varying and well informed letters supporting reform. The letters need to place proposed reforms within the context of the existing system, and respond to anticipated objections from status quo players. If 20 Progressive academics and Occupy-leaning financial industry veterans had submitted thoughtful, reality-based letters advocating the Franken Amendment, I believe the outcome would have been very different. (I should note that Occupy the SEC has produced a number of comment letters, but they did not comment on the Franken Amendment and I believe they generally send a single letter).
While the Franken Amendment may be dead, I am cautiously optimistic about the lifecycle of my own baby: open source credit rating models. I’ll start by explaining how I ended up on the panel and then conclude by discussing what I think my appearance achieved.
The concept of open source credit rating models is extremely obscure. I suspect that no more than a few hundred people worldwide understand this idea and less than a dozen have any serious investment in it. Your humble author and one person on his payroll, are probably the world’s only two people who actually dedicated more than 100 hours to this concept in 2012.
That said, I do want to acknowledge that the idea of open source credit rating models is not original to me – although I was not aware of other advocacy before I embraced it. Two Bay Area technologists started FreeRisk, a company devoted to open source risk models, in 2009. They folded the company without releasing a product and went on to more successful pursuits. FreeRisk left a “paper” trail for me to find including an article on the P2P Foundation’s wiki. FreeRisk’s founders also collaborated with Cate Long, a staunch advocate of financial markets transparency, to create riski.us – a financial regulation wiki.
In 2011, Cathy O’Neil (a.k.a. Mathbabe) an influential Progressive blogger who has a quantitative finance background ran a post about the idea of open source credit ratings, generating several positive comments. Cathy also runs the Alternative Banking group, an affiliate of Occupy Wall Street that attracts a number of financially literate activists.
I stumbled across Cathy’s blog while Googling “open source credit ratings”, sent her an email, had a positive phone conversation and got an invitation to address her group. Cathy then blogged about my open source credit rating work. This too was picked up on the P2P Foundation wiki, leading ultimately to a Skype call with the leader of the P2P Foundation, Michel Bauwens. Since then, Michel – a popularizer of progressive, collaborative concepts – has offered a number of suggestions about organizations to contact and made a number of introductions.
Most of my outreach attempts on behalf of this idea – either made directly or through an introduction – are ignored or greeted with terse rejections. I am not a proven thought leader, am not affiliated with a major research university and lack a resume that includes any position of high repute or authority. Consequently, I am only a half-step removed from the many “crackpots” that send around their unsolicited ideas to all and sundry.
Thus, it is surprising that I was given the chance to address the SEC Roundtable on May 14. The fact that I was able to get an invitation speaks well of the SEC’s process and is thus worth recounting. In October 2012, SEC Commissioner Dan Gallagher spoke at the Stanford Rock Center on Corporate Governance. He mentioned that the SEC was struggling with the task of implementing Dodd Frank Section 939A, which calls for the replacement of credit ratings in federal regulations, such as those that govern asset selection by money market funds.
After his talk, I pitched him the idea of open source credit ratings as an alternative creditworthiness standard that would satisfy the intentions of 939A. He suggested that I write to Tom Butler, head of the Office of Credit Ratings (OCR) and copy him. This led to a number of phone calls and ultimately a presentation to OCR staff in New York in January. Staff members that joined the meeting were engaged and asked good questions. I connected my proposal to an earlier SEC draft regulation which would have required structured finance issuers to publish cashflow waterall models in Python – a popular open source language.
I walked away from the meeting with the perception that, while they did not want to reinvent the industry, OCR staff were sincerely interested in new ideas that might create incremental improvements. That meeting led to my inclusion in the third panel of the Credit Ratings Roundtable.
For me, the panel discussion itself was mostly positive. Between the opening statement, questions and discussion, I probably had about 8 minutes to express my views. I put across all the points I hoped to make and even received a positive comment from one of the other panelists. On the downside, only one commissioner attended my panel – whereas all five had been present at the beginning of the day when Al Franken, Jules Kroll, Doug Peterson and other luminaries held the stage.
The roundtable generated less media attention than I expected, but I got an above average share of the limited coverage relative to the day’s other 25 panelists. The highlight was a mention in the Wall Street Journal in its pre-roundtable coverage.
Perhaps the fact that I addressed the SEC will make it easier for me to place op-eds and get speaking engagements to promote the open source ratings concept. Only time will tell. Ultimately, someone with a bigger reputation than mine will need to advocate this concept before it can progress to the next level.
Also, the idea is now part of the published record of SEC deliberations. The odds of it getting into a proposed regulation remain long in the near future, but these odds are much shorter than they were prior to the roundtable.
Political scientist John Kingdon coined the term “policy entrepreneurs” to describe people who look for and exploit opportunities to inject new ideas into the policy discussion. I like to think of myself as a policy entrepreneur, although I have a long way to go before I become a successful one. If you have read this far and also have strongly held beliefs about how the financial system should improve, I suggest you apply the concepts of incrementalism and policy entrepreneurship to your own activism.
Eben Moglen teaches us how not to be evil when data-mining
This is a guest post by Adam Obeng, a Ph.D. candidate in the Sociology Department at Columbia University. His work encompasses computational social science, social network analysis and sociological theory (basically anything which constitutes an excuse to sit in front of a terminal for unadvisably long periods of time). This post is Copyright Adam Obeng 2013 and licensed under a (Creative Commons Attribution-ShareAlike 3.0 Unported License). Crossposted on adamobeng.com.
Eben Moglen’s delivery leaves you in no doubt as to the sincerity of this sentiment. Stripy-tied, be-hatted and pocked-squared, he took to the stage at last week’s IDSE Seminar Series event without slides, but with engaging — one might say, prosecutorial — delivery. Lest anyone doubt his neckbeard credentials, he let slip that he had participated in the development of almost certainly the first networked email system in the United States, as well as mentioning his current work for the Freedom Box Foundation and the Software Freedom Law Center.
A superorganism called humankind
The content was no less captivating than the delivery: we were invited to consider the world where every human consciousness is connected by an artificial extra-skeletal nervous system, linking everyone into a new superorganism. What we refer to as data science is the nascent study of flows of neural data in that network. And having access to the data will entirely transform what the social sciences can explain: we will finally have a predictive understanding of human behaviour, based not on introspection but empirical science. It will do for the social sciences what Newton did for physics.
The reason the science of the nervous system – “this wonderful terrible art” – is optimised to study human behaviour is because consumption and entertainment are a large part of economic activity. The subjects of the network don’t own it. In a society which is more about consumption than production, the technology of economic power will be that which affects consumption. Indeed, what we produce becomes information about consumption which is itself used to drive consumption. Moglen is matter-of-fact: this will happen, and is happening.
And it’s also ineluctable that this science will be used to extend the reach of political authority, and it has the capacity to regiment human behaviour completely. It’s not entirely deterministic that it should happen at a particular place and time, but extrapolation from history suggests that somewhere, that’s how it’s going to be used, that’s how it’s going to come out, because it can. Whatever is possible to engineer will eventually be done. And once it’s happened somewhere, it will happen elsewhere. Unlike the components of other super-organisms, humans possess consciousness. Indeed, it is the relationship between sociality and consciousness that we call the human condition. The advent of the human species-being threatens that balance.
The Oppenheimer moment
Moglen’s vision of the future is, as he describes it, both familiar and strange. But his main point, is as he puts it, very modest: unless you are sure that this future is absolutely 0% possible, you should engage in the discussion of its ethics.
First, when the network is wrapped around every human brain, privacy will be nothing more than a relic of the human past. He believes that privacy is critical to creativity and freedom, but really the assumption that privacy – the ability to make decisions independent of the machines – should be preserved is axiomatic.
What is crucial about privacy is that it is not personal, or even bilateral, it is ecological: how others behave determine the meaning of the actions I take. As such, dealing with privacy requires an ecological ethics. It is irrelevant whether you consent to be delivered poisonous drinking water, we don’t regulate such resources by allowing individuals to make desicions about how unsafe they can afford their drinking water to be. Similarly, whether you opt in or opt out of being tracked online is irrelevant.
The existing questions of ethics that science has had to deal with – how to handle human subjects – are of no use here: informed consent is only sufficient when the risks to investigating a human subject produce apply only to that individual.
These ethical questions are for citizens, but perhaps even more so for those in the business of making products from personal information. Whatever goes on to be produced from your data will be trivially traced back to you. Whatever finished product you are used to make, you do not disappear from it. What’s more, the scientists are beholden to the very few secretive holders of data.
Consider, says Moglen,the question of whether punishment deters crime: there will be increasing amounts of data about it, but we’re not even going to ask – because no advertising sale depends on it. Consider also, the prospect of machines training humans, which is already beginning to happen. The Coursera business model is set to do to the global labour market what Google did to the global advertising market: auctioning off the good learners, found via their learning patterns, to to employers. Granted, defeating ignorance on a global scale is within grasp. But there are still ethical questions here, and evil is ethics undealt with.
One of the criticisms often levelled at techno-utopians is that the enabling power of technology can very easily be stymied by the human factors, the politics, the constants of our species, which cannot be overwritten by mere scientific progress. Moglen could perhaps be called a a techno-dystopian, but he has recognised that while the technology is coming, inevitably, how it will affect us depends on how we decide to use it.
But these decisions cannot just be made at the individual level, Moglen pointed out, we’ve changed everything except the way people think. I can’t say that I wholeheartedly agree with either Moglen’s assumptions or his conclusions, but he is obviously asking important questions, and he has shown the form in which they need to be asked.
Another doubt: as a social scientist, I’m also not convinced that having all these data available will make all human behaviour predictable. We’ve catalogued a billion stars, the Large Hadron Collider has produced a hundred thousand million million bytes of data, and yet we’re still trying to find new specific solutions to the three-body problem. I don’t think that just having more data is enough. I’m not convinced, but I don’t think it’s 0% possible.
This post is Copyright Adam Obeng 2013 and licensed under a (Creative Commons Attribution-ShareAlike 3.0 Unported License).
Money, food, and the local
I take the Economist into the bath with me on the weekend when I have time. It’s relaxing for whatever reason, even when it’s describing horrible things or when I disagree with it. I appreciate the Economist for at least discussing many of the issues I care about.
Last night I came across this book review, about the book “Money: The Unauthorised Biography” written by Felix Martin. It tells the story of an ad hoc currency system in Ireland popping up during a financial crisis more than 40 years ago. The moral of that story is supposed to be something about how banking should operate, but I was struck by this line in the review:
It helped that a lot of Irish life is lived locally: builders, greengrocers, mechanics and barmen all turned out to be dab hands at personal credit profiling.
It occurs to me that “living locally” is exactly what most people, at least in New York, don’t do at all.
At this point I’ve lived in my neighborhood near Columbia University for 8 years, which is long enough to know Bob, the guy at the hardware store who sells me air conditioners and spatulas. If our currency system froze and we needed to use IOU notes, I’m pretty sure Bob and I would be good.
But, even though I shop at Morty’s (Morton Williams) regularly, the turnover there is high enough that I have never connected with anyone working there. I’m shit out of luck for food, in other words, in the case of a currency freeze.
Bear with me for one more minute. When I read articles like this one, which is called Pay People to Cook at Home – in which the author proposes a government program that will pay young parents to stay home and cook healthy food – it makes me think two things.
First, that people sometimes get confused between what could or should happen and what might actually happen, mostly because they don’t think about power and who has it and what their best interests are. I’m not holding my breath for this government program, in other words, even though I think there’s definitely a link between a hostile food environment and bad health among our nation’s youth.
Second, that in some sense we traditionally had pretty good solutions to child care and home cooking, namely we lived together with our families and not everyone had a job, so someone was usually on hand to cook and watch the kids. It’s a natural enough arrangement, which we’ve chucked in favor of a cosmopolitan existence.
And when I say “natural”, I don’t mean “appealing”: my mom has a full-time job as a CS professor in Boston and is not interested in staying home and cooking. Nor am I, for that matter.
In other words, we’ve traded away localness for something else, which I’m personally benefitting from, but there are other direct cultural effects which aren’t always so awesome. Our dependency on international banking and credit scores and having very little time to cook for our kids are a few examples.










