Home > data science, open source tools > The pandas module and the IPython notebook

The pandas module and the IPython notebook

September 15, 2011

Last night I attended this Meetup on a cool package that Wes McKinney has been writing (in python and in cython, which I guess is like python but is as fast as c). That guys has been ridiculously prolific in his code, and we can all thank him for it, because pandas looks really useful.

To sum up what he’s done: he’s imported the concept of the R dataframe into python, with SQL query-like capabilities as well, and potentially with some map-reduce functionality, although he hasn’t tested it on huge data. He’s also in the process of adding “statsmodel” functionality to the dataframe context (he calls a dataframe a Series), with more to come soon he’s assured us.

So for example he demonstrated how quickly one could regress various stocks against each other, and if we had a column of dates and months (so actually hierarchical labels of the data), then you could use a “groupby” statement to regress within each month and year. Very cool!

He demonstrated all of this within his IPython Notebook, which seems to demonstrate lots of what I liked when I learned about Elastic-R (though not all, like the cloud computing part of Elastic-R is just awesome), namely the ability to basically send your python session to someone like a website url and to collaborate. Note, I just saw the demo I can’t speak from personal experience, but hopefully I will be able to soon! It’s a cool way to remotely use a powerful machine and not need to worry about your local setup.

  1. September 15, 2011 at 7:53 am

    Just an aside: Cython is a language which allows you to mix Python and C, using a syntax with is basically the same as Python. You can sometimes use it to speed up Python code, but much of the time people use it to wrap a C library for use within Python. William Stein et. al.’s Sage project uses Cython extensively. Sage also has a notebook interface, which you can use with unmodified Python if you want — I haven’t tried the IPython notebook, so I can’t compare.


  2. Aaron Schumacher
    September 15, 2011 at 12:20 pm

    Why is it called “pandas”?


  3. September 15, 2011 at 7:54 pm

    The name “pandas” came from pan-el da-ta (http://en.wikipedia.org/wiki/Panel_data) =)


    • September 15, 2011 at 8:32 pm

      Nice! Wes himself!


    • Aaron Schumacher
      September 15, 2011 at 9:46 pm

      Thanks! I wanted to check out your talk in person but it said it was full. 😦


  4. Thomas Kluyver
    November 16, 2011 at 8:51 am

    Technical note: a Series in pandas is a column in a data frame – the data frame class itself is DataFrame.


  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: