Specifically, Mahout consists of machine learning algorithms that are (typically) map-reducable, and implemented in a map-reduce framework (Hadoop), which means you can either use them on the cloud or on your own personal distributed cluster of machines.
So in other words, if you have a massive amount of data up in the cloud, and you want to apply some machine-learning algorithm to your data, then you may want to consider using Mahout.
Yesterday I learned about a recommendation algorithm and how to map-reduce it (i.e. make it as fast as I want by distributing the work on many machines) by reading Mahout in Action. And the cool thing is that it’s already implemented and optimized, which is good because there’s a big difference between thinking I know how to map-reduce something and making it fast.
So if Netflix ever fails, but their data is miraculously left intact, they can send me and a few other nerds in as a kind of data scientist rescue squad and we can help figure out how to reassemble the recommendations of new movies based on what people have already watched and rated.
If that ever really happens, I hope we’d get t-shirts that say “data scientist rescue squad” on the back.
Update: a mahout is also someone who drives an elephant. And Mahout drives Hadoop, which is the name of Doug Cutting‘s son’s toy elephant. Doug is the guy who started Hadoop at Yahoo! but now he’s at Cloudera.