I’m back from Haiti! It was amazing and awesome, and please stand by for more about that, with cultural observations and possibly a slide show if you’re all well behaved.
Today, thanks to my math camp buddy Lenore Cowen, I am going to share with you an amazing blog post by Pamela Ribon. Her post is called Barbie Fucks It Up Again and it describes a Barbie book entitled Barbie: I Can Be a Computer Engineer
Just to give you an idea of the plot, Barbie’s sister finds Barbie engaged on a project on her computer, and after asking her about it, Barbie responds:
“I’m only creating the design ideas,” Barbie says, laughing. “I’ll need Steven and Brian’s help to turn it into a real game!”
What the fucking shit, Barbie?
Today I want to tell you guys about core-econ.org, a free (although you do have to register) textbook my buddy Suresh Naidu is using this semester to teach out of and is also contributing to, along with a bunch of other economists.
It’s super cool, and I wish a class like that had been available when I was an undergrad. In fact I took an economics course at UC Berkeley and it was a bad experience – I couldn’t figure out why anyone would think that people behaved according to arbitrary mathematical rules. There was no discussion of whether the assumptions were valid, no data to back it up. I decided that anybody who kept going had to be either religious or willing to say anything for money.
Not much has changed, and that means that Econ 101 is a terrible gateway for the subject, letting in people who are mostly kind of weird. This is a shame because, later on in graduate level economics, there really is no reason to use toy models of society without argument and without data; the sky’s the limit when you get through the bullshit at the beginning. The goal of the Core Econ project is to give students a taste for the good stuff early; the subtitle on the webpage is teaching economics as if the last three decades happened.
What does that mean? Let’s take a look at the first few chapters of the curriculum (the full list is here):
- The capitalist revolution
- Innovation and the transition from stagnation to rapid growth
- Scarcity, work and progress
- Strategy, altruism and cooperation
- Property, contract and power
- The firm and its employees
- The firm and its customers
Once you register, you can download a given chapter in pdf form. So I did that for Chapter 6, The firm and its employees, and here’s a screenshot of the first page:
The chapter immediately dives into a discussion of Apple and Foxconn. Interesting! Topical! Like, it might actually help you understand the newspaper!! Can you imagine that?
The project is still in beta version, so give it some time to smooth out the rough edges, but I’m pretty excited about it already. It has super high production values and will squarely compete with the standard textbooks and curriculums, which is a good thing, both because it’s good stuff and because it’s free.
Specifically, he looked at the 2013 cab rides in New York City, which was provided under a FOIL request, and he stalked celebrities Bradley Cooper and Jessica Alba (and discovered that neither of them tipped the cabby). He also stalked a man who went to a slew of NYC titty bars: found out where the guy lived and even got a picture of him.
Previously, some other civic hackers had identified the cabbies themselves, because the original dataset had scrambled the medallions, but not very well.
The point he was trying to make was that we should not assume that “anonymized” datasets actually protect privacy. Instead we should learn how to use more thoughtful approaches to anonymizing stuff, and he proposes a method called “differential privacy,” which he explains here. It involves adding noise to the data, in a certain way, so that at the end any given person doesn’t risk too much of their own privacy by being included in the dataset versus being not included in the dataset.
Bottomline, it’s actually pretty involved mathematically, and although I’m a nerd and it doesn’t intimidate me, it does give me pause. Here are a few concerns:
- It means that most people, for example the person in charge of fulfilling FOIL requests, will not actually understand the algorithm.
- That means that, if there’s a requirement that such a procedure is used, that person will have to use and trust a third party to implement it. This leads to all sorts of problems in itself.
- Just to name one, depending on what kind of data it is, you have to implement differential privacy differently. There’s no doubt that a complicated mapping of datatype to methodology will be screwed up when the person doing it doesn’t understand the nuances.
- Here’s another: the third party may not be trustworthy and may have created a backdoor.
- Or they just might get it wrong, or do something lazy that doesn’t actually work, and they can get away with it because, again, the user is not an expert and cannot accurately evaluate their work.
Altogether I’m imagining that this is at best an expensive solution for very important datasets, and won’t be used for your everyday FOIL requests like taxicab rides unless the culture around privacy changes dramatically.
I’ve been reading Head First Java this past week and I’m super impressed and want to tell you guys about it if you don’t already know.
I wanted to learn what the big fuss was about object-oriented programming, plus it seems like all the classes my Lede students are planning to take either require python or java, so this seemed like a nice bridge.
But the book is outstanding, with quirky cartoons and a super fun attitude, and I’m on page 213 after less than a week, and yes that’s out of more than 600 pages but what I’m saying is that it’s a thrilling read.
My one complaint is how often the book talks about motivating programmers with women in tight sweaters. And no, I don’t think they were assuming the programmers were lesbians, but I could be wrong and I hope I am. At the beginning they made the point that people remember stuff better when there is emotional attachment to things, so I’m guessing they’re getting me annoyed to help me remember details on reference types.
Here’s another Head First book which my nerd mom recommended to me some time ago, and I bought but haven’t read yet, but now I really plan to: Head First Design Patterns. Because ultimately, programming is just a tool set and you need to learn how to think about constructing stuff with those tools. Exciting!
And by the way, there is a long list of Head First books, and I head good things about the whole series. Honestly I will never write a technical book in the old-fashioned dry way again.
Yesterday was a day filled with secrets and codes. In the morning, at The Platform, we had guest speaker Columbia history professor Matthew Connelly, who came and talked to us about his work with declassified documents. Two big and slightly depressing take-aways for me were the following:
- As records have become digitized, it has gotten easy for people to get rid of archival records in large quantities. Just press delete.
- As records have become digitized, it has become easy to trace the access of records, and in particular the leaks. Connelly explained that, to some extent, Obama’s harsh approach to leakers and whistleblowers might be explained as simply “letting the system work.” Yet another way that technology informs the way we approach human interactions.
After class we had section, in which we discussed the Computer Science classes some of the students are taking next semester (there’s a list here) and then I talked to them about prime numbers and the RSA crypto system.
I got really into it and wrote up an iPython Notebook which could be better but is pretty good, I think, and works out one example completely, encoding and decoding the message “hello”.
Yesterday was the end of the first half of the Lede Program, and the students presented their projects, which were really impressive. I am hoping some of them will be willing to put them up on a WordPress site or something like that in order to showcase them and so I can brag about them more explicitly. Since I didn’t get anyone’s permission yet, let me just say: wow.
During the second half of the program the students will do another project (or continue their first) as homework for my class. We’re going to start planning for that on the first day, so the fact that they’ve all dipped their toes into data projects is great. For example, during presentations yesterday I heard the following a number of times: “I spent most of my time cleaning my data” or “next time I will spend more time thinking about how to drill down in my data to find an interesting story”. These are key phrases for people learning lessons with data.
Since they are journalists (I’ve learned a thing or two about journalists and their mindset in the past few months) they love projects because they love deadlines and they want something they can add to their portfolio. Recently they’ve been learning lots of geocoding stuff, and coming up they’ll be learning lots of algorithms as well. So they’ll be well equipped to do some seriously cool shit for their final project. Yeah!
In addition to the guest lectures I’m having in The Platform, I’ll also be reviewing prerequisites for the classes many of them will be taking in the Computer Science department in the fall, so for example linear algebra, calculus, and basic statistics. I just bought them all a copy of How to Lie with Statistics as well as The Cartoon Guide to Statistics, both of which I adore. I’m also making them aware of Statistics Done Wrong, which is online. I am also considering The Cartoon Guide to Calculus, which I have but I haven’t read yet.
Keep an eye out for some of their amazing projects! I’ll definitely blog about them once they’re up.
My schedule nowadays is to go to the Lede Program classes every morning from 10am until 1pm, then office hours, when I can, from 2-4pm. The students are awesome and are learning a huge amount in a super short time.
So for instance, last time I mentioned we set up iPython notebooks on the cloud, on Amazon EC2 servers. After getting used to the various kinds of data structures in python like integers and strings and lists and dictionaries, and some simple for loops and list comprehensions, we started examining regular expressions and we played around with the old enron emails for things like social security numbers and words that had four or more vowels in a row (turns out that always means you’re really happy as in “woooooohooooooo!!!” or really sad as in “aaaaaaarghghgh”).
Then this week we installed git and started working in an editor and using the command line, which is exciting, and then we imported pandas and started to understand dataframes and series and boolean indexes. At some point we also plotted something in matplotlib. We had a nice discussion about unsupervised learning and how such techniques relate to surveillance.
My overall conclusion so far is that when you have a class of 20 people installing git, everything that can go wrong does (versus if you do it yourself, then just anything that could go wrong might), and also that there really should be a better viz tool than matplotlib. Plus my Lede students are awesome.