Stockholm Tutorial on Data Science

October 5, 2015

I’m super excited to be teaching a day-long technical tutorial in data science in Stockholm in one month. Stockholm is gorgeous and Sweden is an amazing country. Last time I was there with the entire family, the husband was giving talks the whole time and the little guy had an ear infection, so it was kind of a bust (although not entirely; I did become the bus queen of Stockholm). This time I’m going alone. Cheese fondue and meatballs will be eaten.

Here’s the flier for the event:


According to my calculation, 1000 SEK is equivalent to $120. That’s with coffee and lunch though, so I feel like as long as I explain k-nearest neighbors we’re good. Also, this is a draft of the flier. I told them to change the “prior knowledge” to be less focused on statistics. After all, data scientists are not all stats majors.

So far there have been around 20 people who have signed up, mostly affiliated with Statistics Sweden, the Swedish government agency responsible for producing official statistics regarding Sweden, established in 1749. This means I’ll be addressing the important question, what’s the difference between statistics and data science?

Well, it’s kind of hard to answer that question abstractly. I need to supply examples of realistic “found data” which we use in data science. So that’s my plan for the day, to create a few iPython notebooks with examples of the kind of data and algorithmic techniques that you’d typically find in nature. I think once these statisticians see those examples they will be comfortable knowing how much better off they are in Sweden measuring the inflation rate (currently at -0.2%) than we are trying to understand whether people like specific brand names by scouring Twitter.

However! I’m totally not above stealing other people’s examples to make my points, so if you know of a nice example or two, that involves scraping (or API’ing), cleaning, and algorithmizing, and especially if it does all this in python, then please make suggestions. Otherwise I’ll look up some topics in my book and try to do it myself.

[Update: Holy crap look at this repository of iPython notebooks which explain data science stuff! Amazing.]

I definitely want to spend at least some time showing the audience how much the answer can depend on seemingly benign choices of hyperparameters and so on. If I end up with good examples I’ll be sure to share them here.

Beyond my tutorial, I’m also giving a keynote talk at an associated conference taking place at the very nice Hotel Sign, which is also where they’re putting me up.

  1. October 5, 2015 at 7:08 am

    Have fun in Sweden!
    I was just wondering what happened to the original computer adage “Garbage in, garbage out”.


  2. G.
    October 5, 2015 at 7:20 am

    If only I had been able to travel home to Stockholm for this! I’ve shared it on FB, so hopefully some interested friends will attend.


  3. October 13, 2015 at 12:44 pm

    Wish I was going to be in Sweden at that time. It’d be rad if it were recorded and made available some how. Perhaps some online ed site would be a good fit for the material.


