Datadive update

Home > data science, open source tools, statistics > Datadive update

Datadive update

October 16, 2011 Cathy O'Neil, mathbabe

I left my datadive team at 9:15pm last night hard at work, visualizing the data in various ways as well as finding interesting inconsistencies. I will try to post some actual results later, but I want to wait for them to be (somewhat) finalized. For now I can make some observations.

First, I really can’t believe how cool it is to meet all of these friendly and hard-working nerds who volunteered their entire weekend to clean and dig through data. It’s a really amazing group and I’m proud of how much they’ve done.
Second, about half of the data scientists are women. Awesome and unusual to see so many nerd women outside of academics!
Third, data cleaning is hard work and is a huge part of the job of a data scientist. I should never forget that. Having said that, though, we might want to spend some time before the next datadive pre-cleaning and formatting the data so that people have more time to jump into the analytics. As it is we learned a lot about data cleaning as a group, but next time we could learn a lot about comparing methodology.
Statistical software packages such as Stata have trouble with large (250MB) files compared to python, probably because of the way they put everything into memory at once. So it’s cool that everyone comes to a datadive with their own laptop and language, but some thought should be put into what project they work on depending on this information.
We read Gelman, Fagan and Kiss’s article about using the Stop and Frisk data to understand racial profiling, with the idea that we could test it out on more data or modify their methodology to slightly change the goal. However, they used crime statistics data that we don’t have and can’t find and which are essential to a good study.
As an example of how crucial crime data like this is, if you hear the statement, “10% of the people living in this community are black but 50% of the people stopped and frisked are black,” it sounds pretty damning, but if you add “50% of crimes are committed by blacks” then it sound less so. We need that data for the purpose of analysis.
Why is crime statistics data so hard to find? If you go to NYPD’s site and search for crime statistics, you get really very little information, which is not broken down by area (never mind x and y coordinates) or ethnicity. That stuff should be publicly available. In any case it’s interesting that the Stop and Frisk data is but the crime stats data isn’t.
Oh my god check out our wiki, I just looked and I’m seeing some pretty amazing graphics. I saw some prototypes last night and I happen to know that some of these visualizations are actually movies, showing trends over time. Very cool!
One last observation: this is just the beginning. The data is out there, the wiki is set up, and lots of these guys want to continue their work after this weekend is over. That’s what I’m talking about.

The final presentation is this morning, I can’t wait to see what all the teams came up with. Go, Data Without Borders!

Categories: data science, open source tools, statistics

Comments (3)

Angela Zadi

October 20, 2011 at 3:18 pm

From “Up Against the Wall” to Up In Their Faces
STOP, STOP & FRISK!
On October 21st at 1 pm be at the State Office Building in Harlem as:
Cornel West, Professor, Author, Public Intellectual
Carl Dix, Revolutionary Communist Party
Rev. Stephen Phelps, Interim Senior Minister of Riverside Church
Rev. Earl Kooperkamp, Rector of St. Mary’s Episcopal Church
Debra Sweet, National Director of World Can’t Wait
Rev. Omar Wilks, Union Pentecostal Church
Prof. Jim Vrettos, John Jay College of Criminal Justice
Elaine Brower, Military Mom and World Can’t Wait
Commit Non-Violent Civil Disobedience to STOP “Stop & Frisk”
The New York Police Department is on pace to “Stop & Frisk” over 700,000 people in 2011! That’s more than 1,900 people each day. More than 85% of those stopped are Black or Latino, many are as young as 11 or 12, and more than 90% of them were doing nothing wrong when the police stopped, humiliated, brutalized them or worse.
Everyone knows it is wrong. It is illegal, racist, unconstitutional and intolerable! But THIS FRIDAY people are putting themselves on the line to STOP IT. This is the beginning; this is serious; we won’t stop until Stop & Frisk is ended.
Join the non-violent civil disobedience – OR – BE THERE TO BEAR WITNESS & SUPPORT!
WEAR BLACK
Friday, October 21
1pm Rally at Harlem State Office Building
1:30 March to NYPD 28th Precinct at West 123rd and Frederick Douglass Boulevard
Endorsed by:
Rev. Luis Barrios, John Jay College of Criminal Justice
Herb Boyd, journalist, author, Harlem NY
Eve Ensler,Tony Award winning Playwrite, Creator of VDay

Brian Figueroux, Esq.
Chris Hedges, Pulitzer Prize winning journalist
Nicholas Heyward, Father of Nicholas Heyward, Jr. who was killed by police
Sikivu Hutchinson, author
Lawrence Lucas, Our Lady of Lourdes RC Church
Cynthia McKinney, former Congressperson
Efia Nwangaza, Malcolm X Center, Greenville, SC
Bill Quigley, Loyola Law New Orleans
Michael Ratner, President Emeritus Center for Constitutional Rights

Mark Lewis Taylor, Princeton University
Sunsara Taylor, writer Revolution Newspaper and World Can’t Wait Advisory Board
The Stop Mass Incarceration Network: PO Box 941, New York, NY 10002
stopmassincarceration@ymail.com * 973.756.7666 * stopmassincarceration.tumblr.com

LikeLike
jenfns

October 21, 2011 at 9:35 am

My mom is a crime data analyst in a Bay Area suburb. I’ll ask her about data availability. I do know that for public records requests, she has to pull and clean the data for the lawyers.

LikeLike
Cathy O'Neil, mathbabe

October 21, 2011 at 9:38 am

Awesome, thanks!

LikeLike