Home > data science > What’s Mahout?

What’s Mahout?

March 29, 2012

Mahout is an Apache project, which means it’s open source software.

Specifically, Mahout consists of machine learning algorithms that are (typically) map-reducable, and implemented in a map-reduce framework (Hadoop), which means you can either use them on the cloud or on your own personal distributed cluster of machines.

So in other words, if you have a massive amount of data up in the cloud, and you want to apply some machine-learning algorithm to your data, then you may want to consider using Mahout.

Yesterday I learned about a recommendation algorithm and how to map-reduce it (i.e. make it as fast as I want by distributing the work on many machines) by reading Mahout in Action. And the cool thing is that it’s already implemented and optimized, which is good because there’s a big difference between thinking I know how to map-reduce something and making it fast.

So if Netflix ever fails, but their data is miraculously left intact, they can send me and a few other nerds in as a kind of data scientist rescue squad and we can help figure out how to reassemble the recommendations of new movies based on what people have already watched and rated.

If that ever really happens, I hope we’d get t-shirts that say “data scientist rescue squad” on the back.

Update: a mahout is also someone who drives an elephant. And Mahout drives Hadoop, which is the name of Doug Cutting‘s son’s toy elephant. Doug is the guy who started Hadoop at Yahoo! but now he’s at Cloudera.

Categories: data science
  1. wanderinginpr
    March 29, 2012 at 7:49 am

    Yeah, a neon t-shirt and perhaps a jazzy little tune to go with it! I can just hear Ghostbusters right now. 🙂


    • March 29, 2012 at 8:37 am

      Oooh, Ghostbusters-style is a great idea! Now I’m thinking we will need our own van (with data science logo) to go with the theme music.


  2. March 29, 2012 at 9:42 am

    Additionally the book Mahout in Action makes for a pretty good introductory textbook on machine learning. I could imagine building a class around it. Also, contrary to many people’s first instinct, “Mahout” rhymes with “trout” and not “boot”, and Gene Kelly was an early proponent.


  3. March 30, 2012 at 12:49 pm

    Love t-shirts bit! [https://twitter.com/#!/drjerrynyc/status/185769357366923265] Thanks too for the implicit recommendation of Mahout in Action.


  4. April 4, 2012 at 4:39 pm

    Has anyone tried Radoop?



  1. No trackbacks yet.
Comments are closed.
%d bloggers like this: