Flint residents don’t need water bottles, they need democracy

I’ve been unimpressed with the recent coverage of the Flint water crisis. The overall message is that there’s been a “run of bad luck” but that certain generous people and corporations are coming to the rescue. If you believe the reports, we should be grateful for all the water bottles being flown in from Nestle and Walmart, and we should rest assured that water filters are being handed out and installed, even though they are inadequate.

In many of the articles on Flint, the switch from Detroit to the Flint River is mentioned, as is the concept of water as a human right, but not much more is explained. Specifically, there are two important questions left unanswered. First, how did this happen? And second, where else is it going to happen?

When you think about how Flint residents got into this situation, it’s critical to remember it was directly caused by a suspension in democracy. It was an emergency manager appointed by Michigan Governor Rick Snyder that made the switch to the Flint River as a water source. I’ve talked a bit about which municipalities get their democratic powers taken away; turns out that process often involves poor people of color. The entire point of emergency management is to remove accountability from the actors who put people’s lives in danger under the guise of saving money. Rick Snyder is, unbelievably, still in office.

Speaking of money, what’s the larger story here? It’s that, as a country, we can’t seem to pony up the resources to keep up our infrastructure, especially when it comes to water. A 2012 report by USA Today found that water prices had doubled in a quarter of the cities surveyed since 2000. This is because federal funding for water and waste systems have been reduced by 80% since 1977. And that would make sense if our water infrastructure were robust, but it’s not. In fact it’s in crisis, and we’d need $1 trillion to update it. The result is widespread crappy water, expensive water, and privatized water system disasters. We just let it rot at the local level, in other words, and deal with it – or not – in the most expensive ways, when it’s already an urgent situation.

Guess where the pipes are the oldest and most decrepit? You guessed it, where poor people live. When we ignore infrastructure we are inviting yet another punitive tax on the poor, and as it happens, a life-long debilitating level of lead poisoning.

So, let’s answer the second question: where else is this going to happen? The answer is pretty much everywhere unless we get our priorities straight. And I’m not talking about water bottles.

Categories: Uncategorized

Raising kids the right way

Hey there’s finally been a New York Times column that agrees with me about how to raise kids, so I’m totally going to blog about it.

Seriously, I know that I’m 100% biased, as is anyone who tells you how to raise your kids, but I think Adam Grant has hit upon the perfect explanation of how I think about things in his recent column, How to Raise a Creative Child. Step One: Back Off.

The dumbed down version goes like this: yes, we all know it take a huge amount of practice to get good at the violin. But that doesn’t mean you should force your kids to practice all the time so they’ll become musicians. That’s confusing causation with correlation, the most common of all parental crimes. Instead, ask your kids to be ethical and trust them to find their passion.

The idea is if you give them a strong education in ethics, and then set them free within that framework, they might just decide they love the violin. If they do, then as long as you support their passion, they might just practice all the time and become musicians.

I’ve written a bunch about this exact issue over the years, because although I played the piano as a child, I don’t encourage my kids to play instruments. Because they aren’t begging for it like I did.

To be fair, this isn’t because I’m nervously trying to construct creative kids and want the conditions to be perfect. Mostly it’s common sense. Said plainly, why would I pay for expensive lessons that they don’t want? Why would I set myself up to remind them to practice when they could care less? It sounds like torture for everyone involved, and I honestly don’t understand parents who do it.

I grew up in Lexington, Massachusetts, a hotbed of striving upperly-mobile parenthood, and I was absolutely surrounded by kids – especially second-generation Asian kids – who were being forced to display precocity in all kinds of ways. These kids were miserable, and they hated their violins and cellos. Not all the time, and not in every way, but let me say it like this: very few of them still play music. (Whereas I do, and by the way my bluegrass band has a gig, stay tuned.)

I know, it’s not a lot of evidence, but I still think I’m right, because it’s parenting and people are totally irrational when it comes to this kind of thing, so bear with me, and read the references in Adam Grant’s piece as well, maybe they’re scientific-y.

Of course, it all depends on the definition of creative, which is of course not obvious and I could easily imagine the result changing depending on how you do it. Not to mention that “creativity” isn’t the only thing you’d want from your children. In fact, it’s not my personal goal for my kids to be creative. If I had to choose, I’d say I want my kids to be generous and ethical.

Here’s a bit more background on this very question. a Harvard Education School report called THE CHILDREN WE MEAN TO RAISE: The Real Messages Adults Are Sending About Values that found the following:

About 80% of the youth in our survey report that their parents are more concerned about achievement or happiness than caring for others. A similar percentage of youth perceive teachers as prioritizing students’ achievements over their caring. Youth were also 3 times more likely to agree than disagree with this statement: “My parents are prouder if I get good grades in my classes than if I’m a caring community member in class and school.” Our conversations with and observations of parents also suggest that the power and frequency of parents’ daily messages about achievement and happiness are drowning out their messages about concern for others.

When I read this report I performed an exceptionally biased poll in my own household and made sure my kids knew what’s up. And they all do, most probably because I am not forcing them to practice the piano.

Categories: Uncategorized

At CPDP, thinking about privacy

Brussels is a pretty nice place for a hellhole (according to Trump). I got here early yesterday and walked around; obviously I bought a bunch (technically an asston) of chocolate and took pictures of impudent statues.

20160127_082934

I know this sounds entirely unhistorical and arrogant, but I can’t help thinking that Brussels was created out of some indulgent American fantasy of Europe that confused Paris and Amsterdam and added a bunch of chocolate stores, beer, and waffles. Oh, and gold leaf.

20160127_082131

It’s a great city; possibly it’s replaced Amsterdam as the place I’d like to live if I moved away from New York (which will never happen). It’s pedestrian dominated, there are plenty of sex shops, and the recycling bins are covered with graffiti. In other words, it’s got the right values and it’s not overly sanitized. Trump’s got it wrong once again.

I’m here for an annual conference called CPDP, which stands for Computers, Privacy, and Data Protection. This morning I attended a super interesting panel on privacy and the world’s poor. In that panel I learned about an algorithm being used to sort unemployed people in Poland. As is typical of many of the algorithms I’m interested, it’s both entirely opaque and high impact; the open information laws also don’t apply for inscrutable reasons.

Later today I’ll be on a panel in which we’ll discuss software tools that investigate privacy and data protection in the real world. Besides me, the people on the panel are working within the context of European privacy and protection laws, which are both very different and much more protective than we have in the states (although the UK is an exception). I will surely learn a lot, both about how people think about data and privacy over here and what the obstacles are to enforcing the strong laws.

Categories: Uncategorized

The continued surveillance of poor black kids

There’s a new data-driven app out there called Kinvolved, featured this morning in the New York Times, and it’s exactly my worst fear. It tracks Harlem school children’s whereabouts, sending text messages to parents when they are tardy or absent from school.

stay-connected

When you look at the user agreement, it seems to say that the data is relatively safe and presumably not available for resale to marketers, but they also say they are allowed to change the agreement at any time.

Here’s my specific fear: what about when they go out of business? I’m thinking the data might be valuable at that point, and their investors might want some money back. And there’s a market, too: data brokers would love to get their grubby little hands on such data to add a layer to their profiles of poor black and brown kids.

This is a situation where FERPA, which is the federal child privacy law, is clearly not strong enough. Right now FERPA allows Kinvolved to be designated as “school officials” who have a “legitimate interest” in using and accessing any education records. And once they have that data, I don’t think there are real constraints to its use.

I’m not singling out Kinvolved for bad intentions; for all I know they mean well and they might even help some kids and families. But I don’t think the data the app is generating is being adequately protected, and it is yet again data concerning the nation’s most vulnerable population.

Categories: Uncategorized

Race and the race to the top

Bloomberg has a pretty amazing article today with two fantastic graphs. Here’s the article, but the graphs pretty much say it all.millionaire-school

millionaire-age.png

Categories: Uncategorized

Todd Schneider’s “medium data”

Last night I had the pleasure of going to a Meetup given by Todd Schneider, who wrote this informative and fun blogpost about analyzing taxi and Uber data.

You should read his post; among other things it will tell you how long it takes to get to the airport from any NYC neighborhood by the time of day (on weekdays). This corroborates my fear of the dreader post-3pm flight.

Screen Shot 2016-01-21 at 8.26.55 AM

His Meetup was also cool, and in particular he posted a bunch of his code on github, and explained what he’d done as well.

For example, the raw data was more than half the size of his personal computer’s storage, so he used an external hard drive to hold the raw data and convert it to a SQL database on his personal computer for later use (he used PostgreSQL).

Also, in order to load various types of data into R, (which he uses instead of python but I forgive him because he’s so smart about it), he reduced the granularity of the geocoded events, and worked with them via the database as weights on square blocks of NYC (I think about 10 meters by 10 meters) before turning them into graphics. So if he wanted to map “taxicab pickups”, he first split the goegraphic area into little boxes, then counted how many pickups were in each box, then graphed that result instead. It reduced the number of rows of data by a factor larger than 10.

Todd calls this “medium data” because, after some amount of work, you can do it on a personal computer. I dig it.

Todd also gave a bunch of advice for people to follow if they want to do neat data analysis that gets lots of attention (his taxicab/ Uber post got a million hits from Reddit I believe). It was really useful and good advice, the most important of which was, if you’re not interested in this topic, nobody else will be either.

One interesting piece of analysis Todd showed us, which I can’t seem to find on his blog, was a picture of overall rides in taxis and Ubers, which seemed to indicate that Uber is taking over market share from taxis. That’s not so surprising, but it actually seemed to imply that the overall number of rides hasn’t changed much; it’s been a zero-sum game.

The reason this is interesting is that de Blasio’s contention has been that Uber is increasing traffic. But the above seems to imply that Uber doesn’t increase traffic (if “the number of rides” is a good proxy for traffic); rather, it’s taking business away from medallion cabs. Not a final analysis by any stretch but intriguing.

Finally, Todd more recently analyzed Citibike rides, take a look!

Categories: Uncategorized

I don’t want more women at Davos

There was a New York Times article yesterday entitled A Push for Gender Equality at the Davos World Economic Forum, and Beyond. It was about how only 18% of the attendees of the yearly dick-measuring contest called the World Economic Forum – or Davos for the initiated – are women, and how they are planning to force companies to bring more women to improve this embarrassing attendance statistic.

One thing the article didn’t consider is the question of whether it’s actually a good thing that women aren’t at Davos. I think it is; I’m proud that women have better things to do than spend their time in high-security luxury to disingenuously discuss the world’s poor.

Davos is a force of inequality. It brings together dealmakers in finance and technology, and also the TED-talkish Big Idea promoters and “thought leaders,” and it encourages them to mingle and make deals. And while they might discuss the world’s big problems – like increasing inequality itself – I’m pretty sure they try much harder to help themselves than to solve those problems. In any case, I have little faith in their proposed solutions, especially after talking to Bill Easterly on Slate Money last week.

Let’s just cancel Davos altogether, shall we? That will do the world more good than getting more women to attend.

Categories: Uncategorized

Crank up New York real estate taxes

There are two reasons to own a house. The first one is to live in it. The second is to sell it later at a profit.

These two reasons have led to two different housing markets in New York City. The first one what we might call the affordable housing market, and it simply refers to normal people who need to live somewhere but don’t have extra millions of dollars to spend. The second one is the luxury real estate market of New York, which is exactly for people who have large pots of investment money.

Those two housing markets compete with each other, and lately the luxury market is entirely dominating. This is partly due to the large amount of foreign money being laundered and funneled into real estate. (Update: the U.S. Treasury has said it will look into this, but some people are already claiming it won’t be enough.) It’s also partly due to general global inequality, which produces quite a few millionaires.

Finally, it’s partly due to the bizarre constellation of tax breaks we give new developments, even if only temporarily. It makes holding on to apartments relatively frictionless, even if they are empty, which many of them are. On a permanent basis owners of luxury apartments pay a tiny fraction of the real estate tax that other New Yorkers do relative to the sale price of their apartment (h/t Nathan Newman).

And that’s where we come to the problem. The people who want to live in New York are being shut out by the people who want to own apartment-shaped assets.

If you were a developer, looking for your next building project, you might succumb. Given the expense of land, it makes sense to maximize your profits and build 3- or 4-bedroom apartments that will be snatched up by Russian oligarchs rather than a large number of studios that will actually be lived in. It just makes you more money.

What should we do? Well, we could do nothing. In the long run we might have a city that consists of mostly empty apartments.

Or, we could decide that people should actually live here. In that case we should increase real estate taxes until things change.

Right now we create the exact wrong incentives. First, because non-residents don’t pay city income taxes, and second because we often delay taxes on new apartments and make taxes too low overall. If you think about that, we are actually setting up incentives for the situation we have: empty luxury apartments.

Instead we should make sure that luxury apartments pay more than their fair share of taxes, instead of less, and especially when they’re empty. Don’t worry, the billionaire owners can afford it, and if they can’t, then they can sell it to a mere millionaire who lives in Park Slope.

You see, if an apartment – especially an empty apartment – actually costs the owner a lot of money, they’d sell it, and they’d sell it to a person that would actually live there. That would bring prices down on those assets, because the rich people could simply shift their interest to the fine art market or some other place where holding assets doesn’t cost as much.

Finally, if real estate taxes went up, people might worry that their rent would go up too. But if the market as a whole became a market for normal people, instead of just for rich foreigners, the overall costs would become more reasonable, not less.

Categories: Uncategorized

The SHSAT matching algorithm isn’t that hard

My 13-year-old took the SHSAT in November, but we haven’t heard the results yet. In fact we’re expecting to wait two more months before we do.

What gives? Is it really that complicated to match kids to test schools?

A bit of background. In New York City, kids write down a list of their preferred public high schools that are not “SHSAT” schools. Separately, if they decide to take the SHSAT, they rank their preferences for those, which fall into a separate category and which include Stuyvesant and Bronx Science. They are promised that they will get into the first school on the list that their SHSAT score allows them to.

I often hear people say that the algorithm to figure out what SHSAT school a given kid gets into is super complicated and that’s why it takes 4 months to find out the results. But yesterday at lunch, my husband and I proved that theory incorrect by coming up with a really dumb way of doing it.

  1. First, score all the tests. This is the time-consuming part of the process, but I assume it’s automatically done by a machine somewhere in a huge DOE building in Brooklyn that I’ve heard about.
  2. Next, rank the kids according to score, highest first. Think of it as kids waiting in line at a supermarket check-out line, but in this scenario they just get their school assignment.
  3. Next, repeat the following step until all the schools are filled: take the first kid in line and give them their highest pick. Before moving on to the next kid, check to see if you just gave away the last possible slot to that particular school. If so, label that school with the score of that kid (it will be the cutoff score) and make everyone still in line erase that school from their list because it’s full and no longer available.
  4. By construction, every kid gets the top school that their score warranted, so you’re done.

A few notes and one caveat to this:

  1. Any kid with no schools in their list, either because they didn’t score high enough for the cutoffs or because the schools all filled up before they got to the head of the line, won’t get into an SHSAT school.
  2. The above algorithm would take very little time to actually run. As in, 5 minutes of computer time once the tests are scored.
  3. One caveat: I’m pretty sure they need to make sure that two kids with the same exact score and the same preference would both either get in or get out (because think of the lawsuit if not). So the actual way you’d implement the algorithm is when you ask for the next kid in line, you’d also ask for any other kid with the same score and the same top choice to step forward. Then you’d decide whether there’s room for the whole group or not.

So, why the long wait? I’m pretty sure it’s because the other public schools, the ones where there’s no SHSAT exam to get in (but there are myriad other requirements and processes involved, see e.g. page 4 of this document) don’t want people to be notified of their SHSAT placement 4 months before they get their say. It would foster too much unfair competition between the systems.

Finally, I’m guessing the algorithm for matching non-SHSAT schools is actually pretty complicated, which is I think why people keep talking about a “super complex algorithm.” It’s just not associated to the SHSAT.

Categories: Uncategorized

O’Neil family anthem

I’m working through final edits today, and it’s terribly stressful, so I’m glad I spent last night with my three sons listening to their favorite music.

The most important songs to share with you come from Rob Cantor, who just happens to be incredibly talented. I want to see him live with my kids but so far I haven’t found out about any concerts he’s planning. Here’s my fave Cantor tune (obviously, because I’m an emo):

Next, my 7-year-old’s favorite Cantor tune, Shia LaBeouf:

And my 13-year-old’s favorite, Old Bike:

Just in case you think we only listen to this guy, I wanted to share with you the song that all of us sing regularly, for whatever reason. We make up reasons to sing this song, and it can fairly be called the O’Neil/de Jong family anthem. It’s called First Kiss Today, and made – or constructed anyway – by Songify This. Bonus footage from Biden:

Categories: Uncategorized

Surveillance and wifi in NYC subways

This morning I heard some news from the Cuomo administration (hat tip Maxine Rockoff).

Namely, we’re set to get mobile tickets in the NYC subways:

MobileTicketing.jpg

In addition, they’re saying we will have wi-fi in the stations, as well as surveillance cameras on all the subways and buses. Oh, and charging stations for USB chargers.

My guess: the surveillance cameras will continue to function long after the USB chargers get filled with gum.

Categories: Uncategorized

Which Michigan cities are in receivership?

Yesterday at my Occupy meeting we watched a recent Rachel Maddow piece on the suspension of democracy in Michigan:

If it’s too long, the short version is that instead of having elected officials, some specially chosen towns have instead ‘Emergency Managers,’ who do things like save money by pumping in poisonous water.

So, as usual, my group had a bunch of questions, among them: what is the racial make-up of the towns who are in receivership?

Well first, here’s a list of towns currently under receivership, which I mapped on Google Maps:

Screen Shot 2016-01-11 at 7.47.53 AM

You can interact with my map here.

And next I looked at a census map of where black people live in Michigan:

Screen Shot 2016-01-11 at 7.47.12 AM

Taken from this website which displays 2010 census data

I also wanted to zoom into the Detroit area:

and compared that to the municipalities under receivership in the area:

Screen Shot 2016-01-11 at 7.45.46 AM.png

Take a closer look here.

Just in case you’re wondering, that teal spot on the left is exactly where the Inkster is. And Wayne County’s government is also in receivership, but it’s a county, not a town.

Categories: Uncategorized

The economics of weight loss

Tomorrow’s recording of Slate Money will concern New Year’s resolutions. We’re talking about gym memberships and health classes, Fitbits and other “quantified self” devices, and the economics of Weight Watchers and other weight loss industry companies.

I’m in charge of researching the weight loss industry, which was estimated at $64 billion in 2014. That’s huge, but actually it’s dwindling, as people formally diet less often and instead try to informally “eat healthy.”

In fact, Weight Watchers is an old person’s company; the average age is 48, and Oprah’s recent help notwithstanding, younger people are more likely to be interested in quantified self devices which can track calories burned and so on than they are in getting together in person and talking with people about the struggle.

Also, Obamacare doesn’t cover weight programs outside of a doctor’s office, so that has dried up funds as well.

This is good news, because there’s really no evidence that weight loss programs work long-term, but they are expensive. They keep doing studies but they never come out with any positive results beyond 12 months. That’s because they don’t have any evidence.

For example, if I joined Weight Watchers, I’d pay $44.95 per month, although I get refunded if I lost 10 pounds quickly enough. I’d be able to go to meetings two blocks away from my house every Wednesday. The plan will auto-renew and charge my credit card unless I cancel it, which is tantamount to admitting defeat. I’m wondering what the statistics are on people who are paying monthly but no longer attending meetings.

If you want an extreme example of the current dysfunction around dieting, look no further than the show The Biggest Loser, which the Guardian featured recently with the tag line, “It’s a miracle no one has died yet.”

So, given how much money people put into this stuff even now, why are they doing it? After all, if we were expected to pay a doctor to set the bones of our broken leg, but it only worked for a few months before our leg started breaking again, we’d call the doctor a quack and demand our money back. But somehow with diets it’s different. Why?

I have a complicated theory.

The first level is the “I’m an exception” law of human nature, whereby everyone thinks they somehow will prove to be an exception to statistical rules. It’s the same magical reasoning that makes people buy lottery tickets when they know their chances of winning are slim, and they even know their expected value is negative.

The second level is entertainment. This is also taken directly from the lottery mindset; even if you know you’re not winning the lottery, the momentary fantasy of possibly winning is delicious, and you relish it. The cost of that fantasy is a small price to pay for the freedom to believe in this future for one day.

I think the same kind of thing happens when people join diets. They get to fantasize about how great their lives will be once they’re finally thin. And of course the prevalent fat shaming helps this myth, as does the advertising from the diet industry. It’s all about imagining a “new you,” as if you also get a personality transplant along with losing weight.

But there’s something even more seductive about weight loss regimens that lotteries don’t have, namely public support. When someone announces that they’re on a diet, which happens pretty often, everyone around them has been trained to “be supportive” in their endeavor. At the same time, people rarely announce they’ve gone off their diet. So you’ve got asymmetrical dieting attention.

That attention also has a moral flavor to it. Since people are expected to have control over their weight, they are given moral standing when they announce their diet; it is a sign they are finally “taking control.” Never mind that their chances of long-term success are minimal.

The third and final phase, which is the saddest, is guilt. Because we’ve bought in to the idea that people have direct control over their weight, when people end up giving up, they feel personally guilty and end up paying extra money for basically nothing in return.

Of course, no part of this story is all that different from the story of gym memberships or even Fitbit-like device acquisition. Seen together, it’s just a question of what quasi-moralistic self-help fad happens to be popular at any given moment. And there’s tons of money in all of it.

Categories: Uncategorized

Finishing up Weapons of Math Destruction

Great news, you can now pre-order my book on Amazon. That doesn’t mean it’s completely and utterly finished, but nowadays I’m working on endnote formatting rather than having existential crises about the content. I’m also waiting to see the proposed design of the book’s cover, for which I sent in a couple of screen shots of my python code. And pretty soon I get to talk about stuff like font, which I don’t care about at all.

But here’s the weird part. This means it’s beginning.

You see, when you’ve lived your life as a mathematician and quant, projects are usually wrapped up right around now. You do your research, give talks, and finally write your paper, and then it’s over. I mean, not entirely, because sometimes people read your paper, but actually that mostly doesn’t happen for the published version but instead with the preprint archive. By the time you’ve finished submitting your paper, you’re kind of over your result and you want to move on.

When you do a data science project, a similar thing happens. The bulk of the work happens pre-publishing. Once the model is published, it’s pretty much over for you, and you go on to the next project.

Not so with book publishing. This whole process, as long and as arduous and soul-sucking as it’s been, is just a pre-cursor to the actual event, which is the publication of the book (September 6th of next this year). Then I get to go around talking about this stuff with people for weeks if not months. And although I’m very familiar with the content, the point of writing the book wasn’t simply for me to know the stuff and then move on, it’s for me to spread that message. So it’s also exciting to talk to other people about it.

I also recently got a booking agent, which you can tell if you’ve noticed my new Contact Page. That means that when people invite me to give a talk they’re going to deal with her first, and she’s going to ask for real money (or some other good reason I might want to do it). This might offend some people, especially academics who are used to having people available to donate their time for free, but I’m really glad to have her, given how many talk requests I get on a weekly basis.

Categories: talks

Racial identity and video games

Yesterday I stumbled upon an article entitled The Web is not a post-racial utopia, which concerns a videogame called Rust. It explains that when player enters the world of the game, they are “born” naked and alone. The game consists of surviving the wilderness. I’m guessing it’s like a grown-up version of Minecraft in some sense.

In the initial version of the game, all the players were born bald and white. In a later version, race was handed out randomly. And as you can guess, the complaints came pouring in after the change, as well as a marked increase in racially hostile language.

This is all while blacks and Hispanics play more videogames than whites. They were not complaining about being cast as a white man in the initial version, because it’s so common. Videogame designers are almost all white guys.

I’ll paraphrase from a great interview with one of the newest Star Wars heros John Boyega when I say, I’m pretty sure there wouldn’t have been any complaints if everybody were born a randomly colored alien. White people are okay with being cast as a green alien avatar, but no way they’re going to be cast as a black man. WTF, white people?

Of course, not everyone’s complaining. In fact the reactions are interesting although extreme. They’re thinking of setting up analytics to track the reactions. They’re also thinking of assigning gender and other differences randomly to avatars. And by the way, it looks like they’ve recently been attacked by hackers.

For what it’s worth, I’d love to see men in video games dealing with getting their period. Actually, that’s a great idea. Why not have that as part of the 7th grade ‘Health and Sexuality’ curriculum for both boys and girls? Those who advance to the next level can experience being pregnant and suffering sciatica. Or maybe even hot flashes and menopause, why not?

Categories: Uncategorized

Parenting is really a thing

I’d been skating along with the parenting thing for quite a while. I have three sons, the oldest of whom is 15 and the youngest 7. It’s been a blizzard of pancakes and lost teeth, and almost nothing has really fazed me.

Until about 3 months ago, when my little guy broke his leg. The pain was excruciating, and traumatic for both him and anyone near him, even after his cast was set. He was in a wheelchair for 7 weeks all told, which was probably too long, but we had conflicting advice and went with what we were told by the doctor.

Then, finally, the cast came off three weeks ago. I thought this episode was finally over. But my son refused to walk.

It was more important for him to go to school than anything, so back he went into his wheelchair for the next few days. I figured he’d get back to walking over the weekend. He didn’t. The doctor who took off the cast had dismissed his fear, saying he’d be walking “by the afternoon.” Another doctor told us there was “nothing physically wrong with him.” But after a week of begging him to try, and threatening to take away his computer, we were all a mess.

Then, when my husband was out of town, I got even more anxious. I made the mistake of taking him to see a pediatrician who I don’t trust, but it was right before Christmas and I was desperate. Mistake. The guy told me he had “hysterical paralysis” and gave me the number of a psychiatrist who charges $1500 per hour and doesn’t take insurance.

Luckily, friends of mine suggested physical therapy. I found an amazing pediatric physical therapist who came to our house and convinced him to try stepping while leaning on the table for support. Then came days and days of grueling and stressful practice. We didn’t see much progress, but at least it was some exercise.

Finally, I decided it was all too intense and stressful. I drove him and me to a hotel near my favorite yarn store in Massachusetts – a yearly tradition but it’s usually the whole family – and we just went swimming for hours and hours in the hotel pool. I could see how joyous he became in the water, where there was no sense of gravity and he was once again fully able-bodied. I had to drag him out of the pool every time. I think he would have slept in it if I’d let him.

Yesterday morning we checked out of the hotel. We had stopped talking days before about when he’d start walking, we’d just enjoyed each other’s company and snuggled every chance we got. On the way out of the elevator and on to the check-out desk, my son said to me, “I’m just going to walk now.” And he did.

So, parenting is really a thing. The hardest part has been learning to trust my kids to get through difficult things even when I can’t help them directly. I knew that about homework already, but from now on I guess it just gets bigger and harder.

Categories: Uncategorized

We could use some tools of social control to use on police

You may have noticed I’ve not been writing much recently. That’s because I turned in the latest draft of my book, and then I promptly took a short vacation from writing. In fact I ensconced myself in a ridiculous crochet project:

crochet.jpeg

which is supposed to be a physical manifestation of this picture proof:

wallet_back.jpg

which I discussed a few months ago.

Anyhoo, I’ve gotten to thinking about the theme of my book, which is, more or less, how black box algorithms have become tools of social control. I have a bunch of examples in my book, but two of the biggies are the Value-Added Model, which is used against teachers, and predictive policing models, which are used by the police against civilians (usually, you guessed it, young men of color).

That makes me think – what’s missing here? Why haven’t we built, for example, models which assess police?

If you looked for it, the closes you’d come might be the CompStat data-driven policing models that measure a cop by how many arrests and tickets he’s made. Basically the genesis of the quota system.

But of course that’s only one side of it, and the less interesting one; how about how many people the policeman has shot or injured? As far as I know, that data isn’t analyzed, if it’s even formally collected.

That’s not to say I want a terrible, unaccountable model that unfairly judges police like the one we have for teachers. But I do think our country has got its priorities backwards when we put so much focus and money towards getting rid of the worst teachers but we do very little towards getting rid of the worst cops.

The example I have in mind is, of course, the police that shot 12-year-old Tamir Rice and didn’t get indicted. The prosecutor was quoted as saying, “We don’t second-guess police officers.” I maintain that we should do exactly that. We should collect and analyze data around police actions as long as children are getting killed.

Categories: Uncategorized

Forecasting precipitation

What does it means when you’re given a precipitation forecast? And how do you know if it’s accurate? What does it mean when you see that there’s a 37% chance of rain?

Screen Shot 2015-12-28 at 6.20.12 AM

Well, there are two input variables you have to keep in mind: first, the geographic location – where you’re looking for a forecast, and second, the time window you’re looking at.

For simplicity let’s fix a specific spot – say 116th and Broadway – and let’s also fix a specific one hour time window, say 1am-2am.

Now we can ask again, how would we interpret a “37% chance of rain” for this location during this time? And when do we decide our forecast is good? It’s trickier than you might think.

***

Let’s first think about interpretation. Hopefully what that phrase means is something like, 37 out of 100 times, with these exact conditions, you’ll see a non-zero, measurable amount of rain or other precipitation during an hour. So far so good.

Of course, we only have exactly one of these exact hours. So we cannot directly test the forecast with that one hour. Instead, we should collect a lot of data on the forecast. Start by building 101 bins, labeled from 0 to 100, and throw each forecasted hour into the appropriate bin, along with a record of the actual precipitation outcome.

So if it actually rains between 1am and 2am at 116th and Columbia, I’d throw this record into the “37” bin, along with a note that said “YES IT RAINED.” I’d short hand this note by attaching a “1” to the record, which stands for “100% chance of rain because it actually rained.” I’d attach a “0” to each hour where it didn’t rain.

I’d do this for every single hour of every single day and at every single location as well, of course not into the “37” bin but into the bin with the forecasted number, along with the note of whether rain came. I’d do this for 100 years, or at least 1, and by the end of it I’d presumably have a lot of data in each bin.

So for the “0” bin I’d have many many hours where there wasn’t supposed to be rain. Was there sometimes rain? Yeah, probably. So my “0” bin would have a bunch of records with “0” labels and a few with “1” labels. Each time a “1” record made its way into the “0” bin, it would represent a failure of the model. I’d need to count such a failure against the model somehow.

But then again, what about the “37” bin? Well I’d want to know, for all the hours forecasted to have a 37% chance of rain, how often it actually happened. If I ended up with 100 examples, I’d hope that 37 our of the 100 examples ended up with rain. If it actually happened 50 times out of 100, I’d be disappointed – another failure of the model. I’d need to count this against the model.

Of course to be more careful I’d rather have 100,000 examples accumulated in bin “37” and see 50,000 of those hours actually had rain. With that data I’d be fairly certain this forecasting engine is inaccurate.

Or, if 37,003 of those examples actually saw rain, then I’d be extremely pleased. I’d be happy to trust this model when it says 37% chance of rain. But then again, it might still be kind of inaccurate when it comes to the bin labeled “72”.

We’ve worked so hard to interpret the forecast that we’re pretty close to determining if it’s accurate. Let’s go ahead and finish the job.

***

Let’s take a quick reality check first though. Since I’ve already decided to fix on a specific location, namely at 116th and Broadway, the ideal forecast would always just be 1 or 0: it’s either going to rain or it’s not.

In other words, we have the ideal forecast to measure all other forecasts against. Let’s call that God’s forecast, or if you’re an atheist like me, call it “mother nature’s forecast,” or MNF for short. If you tried to test MNF, you’d set up your 101 bins but you’d only ever use 2 of them. And they’d always be right.

***

OK, but this is the real world, and forecasting weather is hard, especially when it’s a day or two in advance, so let’s try instead to compare two forecasts head to head. Which one is better, Google or Dark Sky?

I’d want a way to assign scores to each of them and choose the better precipitation model. Here’s how I’d do it.

First, I’d do the bin thing for each of them, over the same time period. Let’s say I’m still obsessed with my spot, 116th and Broadway, and I’ve fixed a year or two of hourly forecasts to compare the two models.

Here’s my method. Instead of rewarding a model for accuracy, I’m going to penalize it for inaccuracy. Specifically, I’ll assign it a squared error term for each time it forecast wrong.

To see how that plays out, let’s look at the “37” bin for each model. As we mentioned above, any time the model forecasts 37% chance of rain, it’s wrong. It either rains, in which case it’s off by 1.00-0.37 = 0.63, or it doesn’t rain, in which case the error term is 0.37. I will assign it the square of those terms as penalty for its wrongness.

***

How did I come up with the square of the error term? Why is it a good choice? For one, it has the following magical property: it will be minimized when the label “37” is the most accurate.

In other words, if we fix for a moment the records that end up in the “37” bin, the sum of the squared error terms will be the smallest when the true proportion of “1”s to “0”s in that bin is 37%.

Said another way, if we have 100,000 records in the “37” bin, and actually 50,000 of them correspond to rainy hours, then the sum of all the squared error terms ends up much bigger than if only 37,000 of them turned into rain. So that’s a way of penalizing a model for inaccuracy.

To be more precise, if our true chances of rain is t but our bin is actually labeled t + \epsilon, then the average penalty term, assuming we’ve collected enough data to ignore measurement error, will be t (1-(t+\epsilon))^2 + (1-t)(t+\epsilon)^2, or

t (1-t) + \epsilon^2.

The crucial fact is that \epsilon^2 is always positive, so the above penalty term will be minimized when \epsilon is zero, or in other words when the label of the bin perfectly corresponds to the actual chance of rain.

Moreover, other ways of penalizing a specific record in the “37” bin, say by summing up the absolute value of the error term, don’t have this property.

***

The above has nothing to do with “bin 37,” of course. I could have chosen any bin. To compare two forecasting models, then, we add up all the squared error terms of all the forecasts over a fixed time period.

Note that any model that ever spits out “37” is going to get some error no matter what. Or in other words, a model that wants to be closer to MNF would minimize the number of forecasts to put into the “37” bin and try hard to put forecasts into either the “0” bin or the “1” bin, assuming of course that they had confidence in the forecast.

Actually, the worst of all bins – the one the forecast accumulates the most penalty for – is the “50” bin. Putting an hourly forecast into the “50” bin is like giving up – you’re going to get a big penalty no matter what, because again, it’s either going to rain or it isn’t. Said another way, the above error term 2 t (t-1) is maximized at t = 0.5.

But the beauty of the square error penalty is that it also rewards certainty. Another way of saying this is that, if I am a forecast and I want to improve my score, I can either:

  1. make sure the forecasts in each bin are as accurate as possible, or
  2. try to get some of the forecasts in each bin out of their current bins and closer to either 0 or 1.

Either way their total sum of square error will go down.

I’m dwelling on this because there’s a forecast out there that we want to make sure is deeply shamed by any self-respecting scoring system. Namely, the forecast that says there’s a n% chance of rain for every hour of every day, where n is chosen to be the average hourly chance of rain. This is a super dumb forecast, and we want to make sure it doesn’t score as well as God or mother nature, and thankfully it doesn’t (even though it’s perfectly accurate within each bin, and it only uses one bin).

Then again, it would be good to make sure Google scores better than the super dumb forecast, which I’d be happy to do if I could actually get my hands on this data.

***

One last thing. This entire conversation assumed that the geographic location is fixed at 116th and Broadway. In general, forecasts are made over some larger land mass, and that fact affects the precipitation forecast. Specifically, if there’s an 80% chance that precipitation will occur over half the land mass and a 0% chance it will occur over the other half for a specific time window, the forecast will read 40%. This is something like the chance that an average person in that land mass will experience precipitation, assuming they don’t move and that people are equidistributed over the land mass.

Then again with the proliferation of apps that intend to give forecasts for pinpointed locations, this old-fashioned forecasting method will probably be gone soon.

Categories: Uncategorized

My favorite scams of 2015

Am I the only person who’s noticed just a whole lot of scams recently? I blame it on our global supply chain that’s entirely opaque and impenetrable to the outsider. We have no idea how things are made, what they’re made with, or how the end results get shipped around the world.

Seriously, anything goes. And that’s probably not going to change. The question is, will scams proliferate, or will we figure out an authentication system?

Who knows. For now, let’s just remind ourselves of a few recent examples (and please provide more in the comments if you think of any!).

  1. VW’s cheating emissions scandal, obviously. That’s the biggest scam that came out this year, and it happened to one of the biggest car companies in the world. We’re still learning how it went down, but clearly lots of people were in on it. What’s amazing to me is that no whistleblower did anything; we learned about it from road tests by an external group. Good for them.
  2. Fake artisanal chocolate from Brooklyn. The Amish-looking hipsters just melted chocolate they bought. I mean, the actual story is a bit more complicated and you should read it, but it just goes to show you how much marketing plays a part in this stuff. But expert chocolate lovers could tell the difference, which is kind of nice to know.
  3. Fake bamboo products at Bed, Bugs, & Beyond. I call it that because whenever one of my friends gets bedbugs (it happens periodically in NYC) I go with them to B, B & B for new sheets and pillows. It’s fun. Anyhoo, they were pretending to sell bamboo products but it was actually made from rayon. And before you tell me that rayon is made from plant cellulose, which it is, let me explain that the chemical process that turns plants into cellulose (called extruding) is way more involved and harmful to the environment than simply grabbing bamboo fibers. That’s why people pay more for bamboo products, because they think they’re having less environmental impact.
  4. We eat horsemeat all the fucking time, including in 2015. This is a recurring story (I’m looking at you, Ikea) but yes, it also happened in 2015.
  5. And last but not least, my favorite scam of 2015, a yarn distributor called Trendsetter Yarns was discovered to be selling Mimi, a yarn from Lotus Yarns in China, which was labeled as “100% mink” when it was in fact an angora mix with – and this is the most outrageous part – 17% nylon fibers mixed in!!! As you can imagine, the fiber community was thrown into a tizzy when this came out; we yarn snobs turn up our noses at nylon. The story is that a woman who is allergic to angora, and who had bought the “100% mink” yarn specifically so that she’d have no reaction, did, and got suspicious, and sent it to a lab. Bingo, baby.
notmink

This skein might look 100% mink, but it’s not.

Categories: Uncategorized

Star Wars Christmas Special

Look, I don’t smoke pot. I’m allergic to it or something, it’s not a principle or anything. But sometimes I wish I did, because sometimes I find an activity that’s so perfect for the state of being high that I am deeply jealous of the people who can achieve it.

That happened yesterday, when my teenagers introduced me to the Star Wars Christmas Special, which is a truly extraordinary feature length movie, and is really a perfect stoner flick.

I’m really not giving anything away by telling you that there’s a lot of scenes involving Chewbacca’s family, hoping he makes it home in time for “Life Day.” Each of those scenes is inexplicably long and devoid of subtitles.

In fact, it’s really not a stretch to say that every scene in the entire movie is inexplicably long. But that’s perfect for high folks, who are known to drive at 15 miles an hour on the highway and worry they’re speeding.

For those of you who are not high: I suggest you skip this one. I watched it because I’m a huge Star Wars nerd, but even I couldn’t remember why while I was doing it, except that I like hanging out with teenagers rolling on the rug in laughter because it’s so bad it’s good.

According to my kids, George Lucas himself said about this film that if he “had enough time, he’d track down every copy of this film and destroy it.” You have been warned.

Categories: Uncategorized