Author Archive

The continued surveillance of poor black kids

There’s a new data-driven app out there called Kinvolved, featured this morning in the New York Times, and it’s exactly my worst fear. It tracks Harlem school children’s whereabouts, sending text messages to parents when they are tardy or absent from school.


When you look at the user agreement, it seems to say that the data is relatively safe and presumably not available for resale to marketers, but they also say they are allowed to change the agreement at any time.

Here’s my specific fear: what about when they go out of business? I’m thinking the data might be valuable at that point, and their investors might want some money back. And there’s a market, too: data brokers would love to get their grubby little hands on such data to add a layer to their profiles of poor black and brown kids.

This is a situation where FERPA, which is the federal child privacy law, is clearly not strong enough. Right now FERPA allows Kinvolved to be designated as “school officials” who have a “legitimate interest” in using and accessing any education records. And once they have that data, I don’t think there are real constraints to its use.

I’m not singling out Kinvolved for bad intentions; for all I know they mean well and they might even help some kids and families. But I don’t think the data the app is generating is being adequately protected, and it is yet again data concerning the nation’s most vulnerable population.

Categories: Uncategorized

Race and the race to the top

Bloomberg has a pretty amazing article today with two fantastic graphs. Here’s the article, but the graphs pretty much say it all.millionaire-school


Categories: Uncategorized

Todd Schneider’s “medium data”

Last night I had the pleasure of going to a Meetup given by Todd Schneider, who wrote this informative and fun blogpost about analyzing taxi and Uber data.

You should read his post; among other things it will tell you how long it takes to get to the airport from any NYC neighborhood by the time of day (on weekdays). This corroborates my fear of the dreader post-3pm flight.

Screen Shot 2016-01-21 at 8.26.55 AM

His Meetup was also cool, and in particular he posted a bunch of his code on github, and explained what he’d done as well.

For example, the raw data was more than half the size of his personal computer’s storage, so he used an external hard drive to hold the raw data and convert it to a SQL database on his personal computer for later use (he used PostgreSQL).

Also, in order to load various types of data into R, (which he uses instead of python but I forgive him because he’s so smart about it), he reduced the granularity of the geocoded events, and worked with them via the database as weights on square blocks of NYC (I think about 10 meters by 10 meters) before turning them into graphics. So if he wanted to map “taxicab pickups”, he first split the goegraphic area into little boxes, then counted how many pickups were in each box, then graphed that result instead. It reduced the number of rows of data by a factor larger than 10.

Todd calls this “medium data” because, after some amount of work, you can do it on a personal computer. I dig it.

Todd also gave a bunch of advice for people to follow if they want to do neat data analysis that gets lots of attention (his taxicab/ Uber post got a million hits from Reddit I believe). It was really useful and good advice, the most important of which was, if you’re not interested in this topic, nobody else will be either.

One interesting piece of analysis Todd showed us, which I can’t seem to find on his blog, was a picture of overall rides in taxis and Ubers, which seemed to indicate that Uber is taking over market share from taxis. That’s not so surprising, but it actually seemed to imply that the overall number of rides hasn’t changed much; it’s been a zero-sum game.

The reason this is interesting is that de Blasio’s contention has been that Uber is increasing traffic. But the above seems to imply that Uber doesn’t increase traffic (if “the number of rides” is a good proxy for traffic); rather, it’s taking business away from medallion cabs. Not a final analysis by any stretch but intriguing.

Finally, Todd more recently analyzed Citibike rides, take a look!

Categories: Uncategorized

I don’t want more women at Davos

There was a New York Times article yesterday entitled A Push for Gender Equality at the Davos World Economic Forum, and Beyond. It was about how only 18% of the attendees of the yearly dick-measuring contest called the World Economic Forum – or Davos for the initiated – are women, and how they are planning to force companies to bring more women to improve this embarrassing attendance statistic.

One thing the article didn’t consider is the question of whether it’s actually a good thing that women aren’t at Davos. I think it is; I’m proud that women have better things to do than spend their time in high-security luxury to disingenuously discuss the world’s poor.

Davos is a force of inequality. It brings together dealmakers in finance and technology, and also the TED-talkish Big Idea promoters and “thought leaders,” and it encourages them to mingle and make deals. And while they might discuss the world’s big problems – like increasing inequality itself – I’m pretty sure they try much harder to help themselves than to solve those problems. In any case, I have little faith in their proposed solutions, especially after talking to Bill Easterly on Slate Money last week.

Let’s just cancel Davos altogether, shall we? That will do the world more good than getting more women to attend.

Categories: Uncategorized

Crank up New York real estate taxes

There are two reasons to own a house. The first one is to live in it. The second is to sell it later at a profit.

These two reasons have led to two different housing markets in New York City. The first one what we might call the affordable housing market, and it simply refers to normal people who need to live somewhere but don’t have extra millions of dollars to spend. The second one is the luxury real estate market of New York, which is exactly for people who have large pots of investment money.

Those two housing markets compete with each other, and lately the luxury market is entirely dominating. This is partly due to the large amount of foreign money being laundered and funneled into real estate. (Update: the U.S. Treasury has said it will look into this, but some people are already claiming it won’t be enough.) It’s also partly due to general global inequality, which produces quite a few millionaires.

Finally, it’s partly due to the bizarre constellation of tax breaks we give new developments, even if only temporarily. It makes holding on to apartments relatively frictionless, even if they are empty, which many of them are. On a permanent basis owners of luxury apartments pay a tiny fraction of the real estate tax that other New Yorkers do relative to the sale price of their apartment (h/t Nathan Newman).

And that’s where we come to the problem. The people who want to live in New York are being shut out by the people who want to own apartment-shaped assets.

If you were a developer, looking for your next building project, you might succumb. Given the expense of land, it makes sense to maximize your profits and build 3- or 4-bedroom apartments that will be snatched up by Russian oligarchs rather than a large number of studios that will actually be lived in. It just makes you more money.

What should we do? Well, we could do nothing. In the long run we might have a city that consists of mostly empty apartments.

Or, we could decide that people should actually live here. In that case we should increase real estate taxes until things change.

Right now we create the exact wrong incentives. First, because non-residents don’t pay city income taxes, and second because we often delay taxes on new apartments and make taxes too low overall. If you think about that, we are actually setting up incentives for the situation we have: empty luxury apartments.

Instead we should make sure that luxury apartments pay more than their fair share of taxes, instead of less, and especially when they’re empty. Don’t worry, the billionaire owners can afford it, and if they can’t, then they can sell it to a mere millionaire who lives in Park Slope.

You see, if an apartment – especially an empty apartment – actually costs the owner a lot of money, they’d sell it, and they’d sell it to a person that would actually live there. That would bring prices down on those assets, because the rich people could simply shift their interest to the fine art market or some other place where holding assets doesn’t cost as much.

Finally, if real estate taxes went up, people might worry that their rent would go up too. But if the market as a whole became a market for normal people, instead of just for rich foreigners, the overall costs would become more reasonable, not less.

Categories: Uncategorized

The SHSAT matching algorithm isn’t that hard

My 13-year-old took the SHSAT in November, but we haven’t heard the results yet. In fact we’re expecting to wait two more months before we do.

What gives? Is it really that complicated to match kids to test schools?

A bit of background. In New York City, kids write down a list of their preferred public high schools that are not “SHSAT” schools. Separately, if they decide to take the SHSAT, they rank their preferences for those, which fall into a separate category and which include Stuyvesant and Bronx Science. They are promised that they will get into the first school on the list that their SHSAT score allows them to.

I often hear people say that the algorithm to figure out what SHSAT school a given kid gets into is super complicated and that’s why it takes 4 months to find out the results. But yesterday at lunch, my husband and I proved that theory incorrect by coming up with a really dumb way of doing it.

  1. First, score all the tests. This is the time-consuming part of the process, but I assume it’s automatically done by a machine somewhere in a huge DOE building in Brooklyn that I’ve heard about.
  2. Next, rank the kids according to score, highest first. Think of it as kids waiting in line at a supermarket check-out line, but in this scenario they just get their school assignment.
  3. Next, repeat the following step until all the schools are filled: take the first kid in line and give them their highest pick. Before moving on to the next kid, check to see if you just gave away the last possible slot to that particular school. If so, label that school with the score of that kid (it will be the cutoff score) and make everyone still in line erase that school from their list because it’s full and no longer available.
  4. By construction, every kid gets the top school that their score warranted, so you’re done.

A few notes and one caveat to this:

  1. Any kid with no schools in their list, either because they didn’t score high enough for the cutoffs or because the schools all filled up before they got to the head of the line, won’t get into an SHSAT school.
  2. The above algorithm would take very little time to actually run. As in, 5 minutes of computer time once the tests are scored.
  3. One caveat: I’m pretty sure they need to make sure that two kids with the same exact score and the same preference would both either get in or get out (because think of the lawsuit if not). So the actual way you’d implement the algorithm is when you ask for the next kid in line, you’d also ask for any other kid with the same score and the same top choice to step forward. Then you’d decide whether there’s room for the whole group or not.

So, why the long wait? I’m pretty sure it’s because the other public schools, the ones where there’s no SHSAT exam to get in (but there are myriad other requirements and processes involved, see e.g. page 4 of this document) don’t want people to be notified of their SHSAT placement 4 months before they get their say. It would foster too much unfair competition between the systems.

Finally, I’m guessing the algorithm for matching non-SHSAT schools is actually pretty complicated, which is I think why people keep talking about a “super complex algorithm.” It’s just not associated to the SHSAT.

Categories: Uncategorized

O’Neil family anthem

I’m working through final edits today, and it’s terribly stressful, so I’m glad I spent last night with my three sons listening to their favorite music.

The most important songs to share with you come from Rob Cantor, who just happens to be incredibly talented. I want to see him live with my kids but so far I haven’t found out about any concerts he’s planning. Here’s my fave Cantor tune (obviously, because I’m an emo):

Next, my 7-year-old’s favorite Cantor tune, Shia LaBeouf:

And my 13-year-old’s favorite, Old Bike:

Just in case you think we only listen to this guy, I wanted to share with you the song that all of us sing regularly, for whatever reason. We make up reasons to sing this song, and it can fairly be called the O’Neil/de Jong family anthem. It’s called First Kiss Today, and made – or constructed anyway – by Songify This. Bonus footage from Biden:

Categories: Uncategorized

Get every new post delivered to your Inbox.

Join 3,739 other followers