Archive for the ‘modeling’ Category

Nerding out: RSA on an iPython Notebook

Yesterday was a day filled with secrets and codes. In the morning, at The Platform, we had guest speaker Columbia history professor Matthew Connelly, who came and talked to us about his work with declassified documents. Two big and slightly depressing take-aways for me were the following:

  • As records have become digitized, it has gotten easy for people to get rid of archival records in large quantities. Just press delete.
  • As records have become digitized, it has become easy to trace the access of records, and in particular the leaks. Connelly explained that, to some extent, Obama’s harsh approach to leakers and whistleblowers might be explained as simply “letting the system work.” Yet another way that technology informs the way we approach human interactions.

After class we had section, in which we discussed the Computer Science classes some of the students are taking next semester (there’s a list here) and then I talked to them about prime numbers and the RSA crypto system.

I got really into it and wrote up an iPython Notebook which could be better but is pretty good, I think, and works out one example completely, encoding and decoding the message “hello”.

The underlying file is here but if you want to view it on the web just go here.

Crime and punishment

When I was prepping for my Slate Money podcast last week I read this column by Matt Levine at Bloomberg on the Citigroup settlement. In it he raises the important question of how the fine amount of $7 billion was determined. Here’s the key part:

 Citi’s and the Justice Department’s approaches both leave something to be desired. Citi’s approach seems to be premised on the idea that the misconduct was securitizing mortgages: The more mortgages you did, the more you gotta pay, regardless of how they performed. The DOJ’s approach, on the other hand, seems to be premised on the idea that the misconduct was sending bad e-mails about mortgages: The more “culpable” you look, the more it should cost you, regardless of how much damage you did.

would have thought that the misconduct was knowingly securitizing bad mortgages, and that the penalties ought to scale with the aggregate badness of Citi’s mortgages. So, for instance, you’d want to measure how often Citi’s mortgages didn’t match up to its stated quality-control standards, and then compare the actual financial performance of the loans that didn’t meet the standards to the performance of the loans that did. Then you could say, well, if Citi had lived up to its promises, investors would have lost $X billion less than they actually did. And then you could fine Citi that amount, or some percentage of that amount. And you could do a similar exercise for the other big banks — JPMorgan, say, which already settled, or Bank of America, which is negotiating its settlement — and get comparable amounts that appropriately balance market share (how many bad mortgages did you sell?) and culpability (how bad were they?).

I think he nailed something here, which has eluded me in the past, namely the concept of what comprises evidence of wrongdoing and how that translates into punishment. It’s similar to what I talked about in this recent post, where I questioned what it means to provide evidence of something, especially when the data you are looking for to gather evidence has been deliberately suppressed by either the people committing wrongdoing or by other people who are somehow gaining from that wrongdoing but are not directly involved.

Basically the way I see Levine’s argument is that the Department of Justice used a lawyerly definition of evidence of wrongdoing – namely, through the existence of emails saying things like “it’s time to pray.” After determining that they were in fact culpable, they basically did some straight-up negotiation to determine the fee. That negotiation was either purely political or was based on information that has been suppressed, because as far as anyone knows the number was kind of arbitrary.

Levine was suggesting a more quantitative definition for evidence of wrongdoing, which involves estimating both “how much you know” and “how much damage you actually did” to determine the damage, and then some fixed transformation of that damage becomes the final fee. I will ignore Citi’s lawyers’ approach since their definition was entirely self-serving.

Here’s the thing, there are problems with both approaches. For example, with the lawyerly approach, you are basically just sending the message that you should never ever write some things on email, and most or at least many people know that by now. In other words, you are training people to game the system, and if they game it well enough, they won’t get in trouble. Of course, given that this was yet another fine and nobody went to jail, you could make the argument – and I did on the podcast – that nobody got in trouble anyway.

The problem with the quantitative approach, is that first of all you still need to estimate “how much you knew” which again often goes back to emails, although in this case could be estimated by how often the stated standards were breached, and second of all, when taken as a model, can be embedded into the overall trading model of securities.

In other words, if I’m a quant at a nasty place that wants to trade in toxic securities, and I know that there’s a chance I’d be caught but I know the formula for how much I’d have to pay if I got caught, then I could include this cost, in addition to an estimate of the likelihood for getting caught, in an optimization engine to determine exactly how many toxic securities I should sell.

To avoid this scenario, it makes sense to have an element of randomness in the punishments for getting caught. Every now and then the punishment should be much larger than the quantitative model might suggest, so that there is less of a chance that people can incorporate the whole shebang into their optimization procedure. So maybe what I’m saying is that arriving at a random number, like the DOJ did, is probably better even though it is less satisfying.

Another possibility to actually deter crimes would be to arbitrarily increasing the likelihood of catching people up to no good, but that has been bounded from above by the way the SEC and the DOJ actually work.

Categories: finance, modeling

The future of work

People who celebrate the monthly jobs report getting better nowadays often forget to mention a few facts:

  • the new jobs are often temporary or part-time, with low wages
  • the old lost jobs, which we lose each month, were often full-time with higher wages

I could go on, and I have, and mention the usual complaints about the definition of the unemployment rate. But instead I’ll take a turn into a thought experiment I’ve been having lately.

Namely, what is the future of work?

It’s important to realize that in some sense we’ve been here before. When all the farming equipment got super efficient and we lost agricultural jobs by the thousands, people swarmed to the cities and we started building things with manufacturing. So if before we had “the age of the farm,” we then entered into “the age of stuff.” And I don’t know about you but I have LOTS of stuff.

Now that all the robots have been trained and are being trained to build our stuff for us, what’s next? What age are we entering?

I kind of want to complain at this point that economists are kind of useless when it comes to questions like this. I mean, aren’t they in charge of understanding the economy? Shouldn’t they have the answer here? I don’t think they have explained it if they do.

Instead, I’m pretty much left considering various science fiction plots I’ve heard about and read about over the years. And my conclusion is that we’re entering “the age of service.”

The age of service is a kind of pyramid scheme where rich people employ individuals to service them in various ways, and then those people are paid well so they can hire slightly less rich people to service them, and so on. But of course for this particular pyramid to work out, the rich have to be SUPER rich and they have to pay their servants very well indeed for the trickle down to work out. Either that or there has to be a wealth transfer some other way.

So, as with all theories of the future, we can talk about how this is already happening.

I noticed this recent Bloomberg View article about how rich people don’t have normal doctors like you and me. They just pay out of pocket for super expensive service outside the realm of insurance. This is not new but it’s expanding.

Here’s another example of the future of jobs, which I should applaud because at least someone has a  job but instead just kind of annoys me. Namely, the increasing frequency where I try to make a coffee date with someone (outside of professional meetings) and I have to arrange it with their personal assistant. I feel like, when it comes to social meetings, if you have time to be social, you have time to arrange your social calendar. But again, it’s the future of work here and I guess it’s all good.

More generally: there will be lots of jobs helping out old people and sick people. I get that, especially as the demographics tilt towards old people. But the mathematician in me can’t help but wonder, who will take care of the old people who used to be taking care of the old people? I mean, they by definition don’t have lots of extra cash floating around because they were at the bottom of the pyramid as younger workers.

Or do we have a system where people actually change jobs and levels as they age? That’s another model, where oldish people take care of truly old people and then at some point they get taken care of.

Of course, much like the Star Trek world, none of this has strong connection to the economy as it is set up now, so it’s hard to imagine a smooth transition to a reasonable system, and I’m not even claiming my ideas are reasonable.

By the way, by my definition most people who write computer programs – especially if they’re writing video games or some such – are in a service industry as well. Pretty much anyone who isn’t farming or building stuff in manufacturing is working in service. Writers, poets, singers, and teachers included. Hell, the future could be pretty awesome if we arrange things well.

Anyhoo, a whimsical post for Thursday, and if you have other ideas for the future of work and how that will work out economically, please comment.

Categories: economics, modeling

Two great articles about standardized tests

In the past 12 hours I’ve read two fascinating articles about the crazy world of standardized testing. They’re both illuminating and well-written and you should take a look.

First, my data journalist friend Meredith Broussard has an Atlantic piece called Why Poor Schools Can’t Win At Standardized Testing wherein she tracks down the money and the books in the Philadelphia public school system (spoiler: there’s not enough of either), and she makes the connection between expensive books and high test scores.

Here’s a key phrase from her article:

Pearson came under fire last year for using a passage on a standardized test that was taken verbatim from a Pearson textbook.

The second article, in the New Yorker, is written by Rachel Aviv and is entitled Wrong Answer. It’s a close look, with interviews, of the cheating scandal from Atlanta, which I have been studying recently. The article makes the point that cheating is a predictable consequence of the high-stakes “data-driven” approach.

Here’s a key phrase from the Aviv article:

After more than two thousand interviews, the investigators concluded that forty-four schools had cheated and that a “culture of fear, intimidation and retaliation has infested the district, allowing cheating—at all levels—to go unchecked for years.” They wrote that data had been “used as an abusive and cruel weapon to embarrass and punish.”

Putting the two together, it’s pretty clear that there’s an acceptable way to cheat, which is by stocking up on expensive test prep materials in the form of testing company-sponsored textbooks, and then there’s the unacceptable way to cheat, which is where teachers change the answers. Either way the standardized test scoring regime comes out looking like a penal system rather than a helpful teaching aid.

Before I leave, some recent goodish news on the standardized testing front (hat tip Eugene Stern): Chris Christie just reduced the importance of value-added modeling for teacher evaluation down to 10% in New Jersey.

The Platform starts today

Hey my class starts today, I’m totally psyched!

The syllabus is up on github here and I prepared an iPython notebook here showing how to do basic statistics in python, and culminating in an attempt to understand what a statistically significant but tiny difference means, in the context of the Facebook Emotion study. Here’s a useless screenshot which I’m including because I’m proud:

Screen Shot 2014-07-15 at 7.04.05 AM

If you want to follow along install anaconda on your machine and type “ipython notebook –pylab inline” into a terminal. Then you can just download this notebook and run it!

Most of the rest of the classes will feature an awesome guest lecturer, and I’m hoping to blog about those talks with their permission, so stay tuned.

Surveillance in NYC

There’s a CNN video news story explaining how the NYC Mayor’s Office of Data Analytics is working with private start-up Placemeter to count and categorize New Yorkers, often with the help of private citizens who install cameras in their windows. Here’s a screenshot from the Placemeter website:



You should watch the video and decide for yourself whether this is a good idea.

Personally, it disturbs me, but perhaps because of my priors on how much we can trust other people with our data, especially when it’s in private hands.

To be more precise, there is, in my opinion, a contradiction coming from the Placemeter representatives. On the one hand they try to make us feel safe by saying that, after gleaning a body count with their video tapes, they dump the data. But then they turn around and say that, in addition to counting people, they will also categorize people: gender, age, whether they are carrying a shopping bag or pushing strollers.

That’s what they are talking about anyway, but who knows what else? Race? Weight? Will they use face recognition software? Who will they sell such information to? At some point, after mining videos enough, it might not matter if they delete the footage afterwards.

Since they are a private company I don’t think such information on their data methodologies will be accessible to us via Freedom of Information Laws either. Or, let me put that another way. I hope that MODA sets up their contract so that such information is accessible via FOIL requests.

Great news: for-profit college Corinthian to close

I’ve talked before about the industry of for-profit colleges which exists largely to game the federal student loan program. They survive almost entirely on federal student loans of their students, while delivering terrible services and worthless credentials.

Well, good news: one of the worst of the bunch is closing its doors. Corinthian College, Inc (CCI) got caught lying about job placement of its graduates (in some cases, they said 100% when the truth was closer to 0%). They were also caught advertising programs they didn’t actually have.

But here’s what interests me the most, which I will excerpt from the California Office of the Attorney General:

CCI’s predatory marketing efforts specifically target vulnerable, low-income job seekers and single parents who have annual incomes near the federal poverty line. In internal company documents obtained by the Department of Justice, CCI describes its target demographic as “isolated,” “impatient,” individuals with “low self-esteem,” who have “few people in their lives who care about them” and who are “stuck” and “unable to see and plan well for future.”

I’d like to know more about how they did this. I’m guessing it was substantially online, and I’m guessing they got help from data warehousing services.

After skimming the complaint I’m afraid it doesn’t include such information, although it does say that the company advertised programs it didn’t have and then tricked potential students into filling out information about them so CCI could follow up and try to enroll them. Talk about predatory advertising!

Update: I’m getting some information by checking out their recent marketing job postings.

Categories: feedback loop, modeling

Get every new post delivered to your Inbox.

Join 1,453 other followers