I’m very gratified to say that my Lede Program for data journalism at Columbia is over, or at least the summer program is (some students go on to take Computer Science classes in the Fall).
My adorable and brilliant students gave final presentations on Tuesday and then we had a celebration Tuesday night at my house, and my bluegrass band played (didn’t know I have a bluegrass band? I play the fiddle! You can follow us on twitter!). It was awesome! I’m hoping to get some of their projects online soon, and I’ll definitely link to it when that happens.
It’s been an exciting week, and needless to say I’m exhausted. So instead of a frothy rant I’ll just share some reading with y’all:
- Andrew Gelman has a guest post by Phil Price on the worst infographic ever, which sadly comes from Vox. My students all know better than this. Hat tip Lambert Strether.
- Private equity firms are buying stuff all over the country, including Ferguson. I’m actually not sure this is a bad thing, though, if nobody else is willing to do it. Please discuss.
- Bloomberg has an interesting story about online PayDay loans and the world of investing. I am still on the search for someone who knows exactly how those guys target their ads online. Hat tip Aryt Alasti.
- Felix Salmon, now at Fusion, has set up a nifty interactive to help you figure out your lifetime earnings.
- Felix also set up this cool online game where you can play as a debt collector or a debtor.
- Is it time to end letter grades? Hat tip Rebecca Murphy.
- There’s a reason fast food workers are striking nationwide. The ratio of average CEO pay to average full-time worker pay is around 1252.
- People lie to women in negotiations. I need to remember this.
Have a great weekend!
I’ve been sent this recent New York Times article by a few people (thanks!). It’s called Grading Teachers, With Data From Class, and it’s about how standardized tests are showing themselves to be inadequate to evaluate teachers, so a Silicon Valley-backed education startup called Panorama is stepping into the mix with a data collection process focused on student evaluations.
Putting aside for now how much this is a play for collecting information about the students themselves, I have a few words to say about the signal which one gets from student evaluations. It’s noisy.
So, for example, I was a calculus teacher at Barnard, teaching students from all over the Columbia University community (so, not just women). I taught the same class two semesters in a row: first in Fall, then in Spring.
Here’s something I noticed. The students in the Fall were young (mostly first semester frosh), eager, smart, and hard-working. They loved me and gave me high marks on all categories, except of course for the few students who just hated math, who would typically give themselves away by saying “I hate math and this class is no different.”
The students in the Spring were older, less eager, probably just as smart, but less hard-working. They didn’t like me or the class. In particular, they didn’t like how I expected them to work hard and challenge themselves. The evaluations came back consistently less excited, with many more people who hated math.
I figured out that many of the students had avoided this class and were taking it for a requirement, didn’t want to be there, and it showed. And the result was that, although my teaching didn’t change remarkably between the two semesters, my evaluations changed considerably.
Was there some way I could have gotten better evaluations from that second group? Absolutely. I could have made the class easier. That class wanted calculus to be cookie-cutter, and didn’t particularly care about the underlying concepts and didn’t want to challenge themselves. The first class, by contrast, had loved those things.
My conclusion is that, once we add “get good student evaluations” to the mix of requirements for our country’s teachers, we are asking for them to conform to their students’ wishes, which aren’t always good. Many of the students in this country don’t like doing homework (in fact most!). Only some of them like to be challenged to think outside their comfort zone. We think teachers should do those things, but by asking them to get good student evaluations we might be preventing them from doing those things. A bad feedback loop would result.
I’m not saying teachers shouldn’t look at student evaluations; far from it, I always did and I found them useful and illuminating, but the data was very noisy. I’d love to see teachers be allowed to see these evaluations without there being punitive consequences.
I was was having a wonderful ramen lunch with the mathbabe and, as is all too common when two broad minded Ph.D.’s in math get together, we started talking about the horrible state math education is in for both advanced high school students and undergraduates.
One amusing thing we discovered pretty quickly is that we had independently come up with the same (radical) solution to at least part of the problem: throw out the traditional sequence which goes through first and second year calculus and replace it with a unified probability, statistics, calculus course where the calculus component was only for the smoothest of functions and moreover the applications of calculus are only to statistics and probability. Not only is everything much more practical and easier to motivate in such a course, students would hopefully learn a skill that is essential nowadays: how to separate out statistically good information from the large amount of statistical crap that is out there.
Of course, the downside is that the (interesting) subtleties that come from the proofs, the study of non-smooth functions and for that matter all the other stuff interesting to prospective physicists like DiffEQ’s would have to be reserved for different courses. (We also were in agreement that Gonick’s beyond wonderful“Cartoon Guide To Statistics” should be required reading for all the students in these courses, but I digress…)
The real point of this blog post is based on what happened next: but first you have to know I’m more or less one generation older than the mathbabe. This meant I was both able and willing to preface my next point with the words: “You know when I was young, in one way students were much better off because…” Now it is well known that using this phrase to preface a discussion often poisons the discussion but occasionally, as I hope in this case, some practices from days gone by ago can if brought back, help solve some of today’s educational problems.
By the way, and apropos of nothing, there is a cure for people prone to too frequent use of this phrase: go quickly to YouTube and repeatedly make them watch Monty Python’s Four Yorkshireman until cured:
Anyway, the point I made was that I am a member of the last generation of students who had to use slide rules. Another good reference is: here. Both these references are great and I recommend them. (The latter being more technical.) For those who have never heard of them, in a nutshell, a slide rule is an analog device that uses logarithms under the hood to do (sufficiently accurate in most cases) approximate multiplication, division, roots etc.
The key point is that using a slide rule requires the user to keep track of the “order of magnitude” of the answers— because slide rules only give you four or so significant digits. This meant students of my generation when taking science and math courses were continuously exposed to order of magnitude calculations and you just couldn’t escape from having to make order of magnitude calculations all the time—students nowadays, not so much. Calculators have made skill at doing order of magnitude calculations (or Fermi calculations as they are often lovingly called) an add-on rather than a base line skill and that is a really bad thing. (Actually my belief that bringing back slide rules would be a good thing goes back a ways: when that when I was a Program Director at the NSF in the 90’s, I actually tried to get someone to submit a proposal which would have been called “On the use of a hand held analog device to improve science and math education!” Didn’t have much luck.)
Anyway, if you want to try a slide rule out, alas, good vintage slide rules have become collectible and so expensive— because baby boomers like me are buying the ones we couldn’t afford when we were in high school – but the nice thing is there are lots of sites like this one which show you how to make your own.
Finally, while I don’t think they will ever be as much fun as using a slide rule, you could still allow calculators in classrooms.
Why? Because it would be trivial to have a mode in the TI calculator or the Casio calculator that all high school students seem to use, called “significant digits only.” With the right kind of problems this mode would require students to do order of magnitude calculations because they would never be able to enter trailing or leading zeroes and we could easily stick them with problems having a lot of them!
But calculators really bug me in classrooms and, so I can’t resist pointing out one last flaw in their omnipresence: it makes students believe in the possibility of ridiculously high precision results in the real world. After all, nothing they are likely to encounter in their work (and certainly not in their lives) will ever need (or even have) 14 digits of accuracy and, more to the point, when you see a high precision result in the real world, it is likely to be totally bogus when examined under the hood.
I am pushing an unusual way of considering economic health. I call it “distributional thinking.” It requires that you not aggregate everything into one statistic, but rather take a few samples from different parts of the distribution and consider things from those different perspectives.
So instead of saying “things are great because the economy has expanded at a rate of 4%” I’d like us to think about more individual definitions of “great.”
For example, it’s a good time to be rich right now. Really good. The stock market keeps hitting all-time highs, the jobs market is great in tech, and it’s still absolutely possible to hide wealth in off-shore tax havens.
It’s not so good to be middle class. Wages are stagnant and have been forever, and jobs are drying up due to automation and a lack of even maintenance-level infrastructure work. Colleges are super expensive, and the best the government can do is fiddle around the edges with interest rates.
It’s a really bad time to be poor in this country. Jobs are hard to find and conditions are horrible. There are more and more arrests for petty crimes as the violent crime rate goes down. Those petty crime arrests lead to big fees and sometimes jail time if you can’t pay the fee. Look at Ferguson as an example of what this kind of frustration this can lead to.
Once you are caught in the court system, private probation companies act as abusive debt collectors, and nobody controls their fees, which can be outrageous. To be clear, we let this happen in the name of saving money: private for-profit companies like this guarantee that they won’t cost anything to the local government because they make the people on probation pay for services.
And even though that’s an outrageous and predatory system, it’s not likely to go away. Once they are officially branded as criminals, the poor often lose their voting rights, which means they have little political recourse to protect themselves. On the flip side, they are largely silent about their struggles for the same reason.
Once you think about our economic health this way, you realize how comparatively meaningless the GDP is. It is no longer a good proxy to true economic health, where all classes would be more or less better off as it went up.
And until we get on the same page, where we all go up and down together, it is a mathematical fact that no one statistic could possibly capture the progress we are or are not making. Instead, we need to think distributionally.
Aunt Pythia is ginormously and ridonkulously excited to be here. She just got back from a nifty bike ride to the other side of the Hudson and took this picture of this amazing city on this amazing day:
OK, so full disclosure. Aunt Pythia kind of blew her load, so to speak, on the sex questions last week, so she’s making do with coyly answering nerdy questions. Because that’s what we got.
I hope you enjoy her efforts, and even if you despise them – especially if you despise them – don’t forget to:
please think of something to ask Aunt Pythia at the bottom of the page!
Hi Aunt Pythia,
I’m a math student at MIT, where you did a postdoc. I’m also into computers, and am considering working in some finance classes. I could see myself being happy working for some big financial company that I don’t really care about, as long as I have interesting problems to work on, make a ton of money, and have bright people I get to work with.
My interests right now are in very pure math, I get chills just thinking about categorical-theoretic concepts. I’m planning to learn commutative algebra and algebraic geometry soon. I’m also likely to take stochastic calculus.
What kind of math did you do? Any tips on if taking the pure math I love will be of use, or at least get me “cred” with financial companies?
I do love math, and seeing that you did math at MIT and have seen this world of things, maybe you have some advice to offer me.
Thank you dearly.
Don’t do it!
Don’t take the math to get “cred” with financial companies. Do what is sexy and beautiful to you. If you love category theory, do that, then do algebra and algebraic geometry. I did number theory in the form of arithmetic algebraic geometry myself. It’s awesomely beautiful and I don’t regret one moment of it.
Let’s say you do decide to go into the “real world.” At the end of the day, if you can do that math stuff we’ve been talking about, you can learn other stuff too. So I’m not going to worry about you on the technical side of things.
On the other side of things, I’d like you to rethink the idea that you “don’t mind who you work for as long as you have interesting problems.” Is that really true? Once you leave pure math there are real applications of your work, and they affect real people. Shit gets real real quick and stuff matters, and I urge you to think it through some more.
Dear Aunt Pythia,
Do all mathematicians visualize their problems? From a logical viewpoint there are a lot of mathematical spaces that don’t map onto an imagined 3d workspace but on limited conversations with working mathematicians they seem to me to do it at least at some stage of problem solving.
(I’m more of a physicist who visualizes nearly everything so maybe I’m misreading them.)
Most, but not all. I once had a conversation with someone who couldn’t understand my drawing of a geometric map between spaces. I was explaining the concept visually (or at least I thought I was!) but he forced me to write it down with double sums and formulas, and I thought that was the weirdest thing ever, but that’s how it became understandable to him.
In general we do think visually, although we really can’t think beyond three dimensions (even though we pretend we can). I guess time makes it 4. Most geometers I know, ironically, don’t have a very good working sense of 3 dimensions, and definitely don’t have a good sense of direction!
Come to think of it my sample is too small, so I’m mostly just saying that for fun. It would be neat to get actual statistics on that. Maybe if I’m ever pulled into going to JMM again I’ll make people fill out forms. Oh wait, I’m going to JMM this January.
I can ask about this, it’s a nice question! Readers, what else should I poll math nerds on?
Dear Aunt Pythia,
I’m an American mutt and for awhile I was annoyed when people asked “Where are you from” or “What’s your nationality”. I think I was sensitive to it because kids wanted to narrow down exactly which ethnic slurs to use. But as an adult, mostly people are just curious, and I’m happy to share since I’m curious about them too.
When I meet someone with an accent, I’m curious about them and their background, what it’s like in their home country, how they came to the US, etc.
What is an appropriate way to ask about someone’s ethnic background or country of origin? It seems like you should be able to ask anyone this question; it just seems rude when that person is different from you. Do you know what I mean?
WHy Ask That Rude qUestion
I like the subtle sign-off!
Here’s the thing, I think you nailed it. If your intention is to be mean, then don’t ask it. If your intention is to be friendly and to make a connection, then go ahead and ask it! I always ask cabbies where they come from, and then I get to learn about their countries. I have never experienced someone who doesn’t want to talk to me about their home country, and I’ve made quite a few friends. I’ve been invited to so many countries for visits, and that is always so incredibly generous and sweet! People are amazing.
Of course, some people just don’t do this kind of small-talk, and I get that too. It’s not for everyone. But it’s super fun for us extroverts.
Dear Aunt Pythia,
First off, you’re blog is both entertaining and informative, and you’ve found the sweet spot combination of the two that makes it addictive.
I find your work with the Lede program at Columbia fascinating and relevant to the growing, amorphous “big data” movement. I am a frequent visitor of websites such as Fivethirtyeight, which Nate Silver has rebranded as a news source that derives its stories from statistics and big data analytics. Even other sources, such as The Atlantic, have begun to follow suit and incorporate large statistical analyses into some of their stories. This experiment of basing our news stories on statistics brings hope that we can move closer to the ideal of an unbiased account.
In light of this new format (and your school), what sources do you consider the best? Are there any that you visit to get an insightful statistical perspective on the news. Or do you side with the criticism that many of these sites fuel a sensationalist, biased view of the world intended to spawn viral stories?
Will we ever find the right place for statistics in the news?
Considering unbiased reality in our ubiquitous (news)stories
Holy crap, nice sign-off. And thanks for being addicted to mathbabe! All my evil plans are working. Time to start on the next phase… moo-hooo-hahahahahaha.
OK, so here’s the thing. We will never have unbiased accounts. Never. At the very least we will have bias in the way that data is collected.
What I’ve spent the summer talking to my students about is getting used to the fact that there will always be bias, and how we therefore do our best to be at least somewhat aware of them, and try very hard not to obscure them. Transparency is the new objectivity!
This is of course disappointing to people who want there to be “one truth,” but that’s how science is. After a while we get used to the disappointment and we can all appreciate some really good signal/noise ratios.
As for the right place for statistics in the news, I think we’re figuring that out right now, and I’m excited to be part of it. And holy shit, have you seen the new ProPublica work on the Louisiana coast? Those guys are killing it.
Please submit your well-specified, fun-loving, cleverly-abbreviated question to Aunt Pythia!
Any time I see an article about the evaluation system for teachers in New York State, I wince. People get it wrong so very often. Yesterday’s New York Times article written by Elizabeth Harris was even worse than usual.
First, her wording. She mentioned a severe drop in student reading and math proficiency rates statewide and attributed it to a change in the test to the Common Core, which she described as “more rigorous.”
The truth is closer to “students were tested on stuff that wasn’t in their curriculum.” And as you can imagine, if you are tested on stuff you didn’t learn, your score will go down (the Common Core has been plagued by a terrible roll-out, and the timing of this test is Exhibit A). Wording like this matters, because Harris is setting up her reader to attribute the falling scores to bad teachers.
Harris ends her piece with a reference to a teacher-tenure lawsuit: ‘In one of those cases, filed in Albany in July, court documents contrasted the high positive teacher ratings with poor student performance, and called the new evaluation system “deficient and superficial.” The suit said those evaluations were the “most highly predictive measure of whether a teacher will be awarded tenure.”’
In other words, Harris is painting a picture of undeserving teachers sneaking into tenure in spite of not doing their job. It’s ironic, because I actually agree with the statement that the new evaluation system is “deficient and superficial,” but in my case I think it is overly punitive to teachers – overly random, really, since it incorporates the toxic VAM model – but in her framing she is implying it is insufficiently punitive.
Let me dumb Harris’s argument down even further: How can we have 26% English proficiency among students and 94% effectiveness among teachers?! Let’s blame the teachers and question the legitimacy of tenure.
Indeed, after reading the article I felt like looking into whether Harris is being paid by David Welch, the Silicon Valley dude who has vowed to fight teacher tenure nationwide. More likely she just doesn’t understand education and is convinced by simplistic reasoning.
In either case, she clearly needs to learn something about statistics. For that matter, so do other people who drag out this “blame the teacher” line whenever they see poor performance by students.
Because here’s the thing. Beyond obvious issues like switching the content of the tests away from the curriculum, standardized test scores everywhere are hugely dependent on the poverty levels of students. Some data:
It’s not just in this country, either:
The conclusion is that, unless you think bad teachers have somehow taken over poor schools everywhere and booted out the good teachers, and good teachers have taken over rich schools everywhere and booted out the bad teachers (which is supposed to be impossible, right?), poverty has much more of an effect than teachers.
Just to clarify this reasoning, let me give you another example: we could blame bad journalists for lower rates of newspaper readership at a given paper, but since newspaper readership is going down everywhere we’d be blaming journalists for what is a cultural issue.
Or, we could develop a process by which we congratulate specific policemen for a reduced crime rate, but then we’d have to admit that crime is down all over the country.
I’m not saying there aren’t bad teachers, because I’m sure there are. But by only focusing on rooting out bad teachers, we are ignoring an even bigger and harder problem. And no, it won’t be solved by privatizing and corporatizing public schools. We need to address childhood poverty. Here’s one more visual for the road:
For a while now I’ve been thinking I should build a decision tree for deciding which algorithm to use on a given data project. And yes, I think it’s kind of cool that “decision tree” would be an outcome on my decision tree. Kind of like a nerd pun.
I’m happy to say that I finally started work on my algorithm decision tree, thanks to this website called gliffy.com which allows me to build flowcharts with an easy online tool. It was one of those moments when I said to myself, this morning at 6am, “there should be a start-up that allows me to build a flowchart online! Let me google for that” and it totally worked. I almost feel like I willed gliffy.com into existence.
So here’s how far I’ve gotten this morning:
I looked around the web to see if I’m doing something that’s already been done and I came up with this:
I appreciate the effort but this is way more focused on the size of the data than I intend to be, at least for now. And here’s another one that’s even less like the one I want to build but is still impressive.
Because here’s what I want to focus on: what kind of question are you answering with which algorithm? For example, with clustering algorithms you are, you know, grouping similar things together. That one’s easy, kind of, although plenty of projects have ended up being clustering or classifying algorithms whose motivating questions did not originally take on the form “how would we group these things together?”.
In other words, the process of getting at algorithms from questions is somewhat orthogonal to the normal way algorithms are introduced, and for that reason taking me some time to decide what the questions are that I need to ask in my decision tree. Right about now I’m wishing I had taken notes when my Lede Program students asked me to help them with their projects, because embedded in those questions were some great examples of data questions in search of an algorithm.
Please give me advice!