The pandas module and the IPython notebook
Last night I attended this Meetup on a cool package that Wes McKinney has been writing (in python and in cython, which I guess is like python but is as fast as c). That guys has been ridiculously prolific in his code, and we can all thank him for it, because pandas looks really useful.
To sum up what he’s done: he’s imported the concept of the R dataframe into python, with SQL query-like capabilities as well, and potentially with some map-reduce functionality, although he hasn’t tested it on huge data. He’s also in the process of adding “statsmodel” functionality to the dataframe context (he calls a dataframe a Series), with more to come soon he’s assured us.
So for example he demonstrated how quickly one could regress various stocks against each other, and if we had a column of dates and months (so actually hierarchical labels of the data), then you could use a “groupby” statement to regress within each month and year. Very cool!
He demonstrated all of this within his IPython Notebook, which seems to demonstrate lots of what I liked when I learned about Elastic-R (though not all, like the cloud computing part of Elastic-R is just awesome), namely the ability to basically send your python session to someone like a website url and to collaborate. Note, I just saw the demo I can’t speak from personal experience, but hopefully I will be able to soon! It’s a cool way to remotely use a powerful machine and not need to worry about your local setup.