Apache Mahout is the service on Hadoop that is in charge of what is often called “data science”. Mahout is all about learning algorithms, pattern recognition and alike. An interesting fact about Mahout is that under the hood MapReduce was replaced by Spark.
Mahout is in charge of the following tasks:
- Machine Learning. Learning from existing data and.
- Recommendation Mining. This is what we often see at websites. Remember the “You bought X, you might be interested in Y”? This is exactly what Mahout can do for you.
- Cluster data. Mahout can cluster documents and data that has some similarities.
- Classification. Learn from existing classifications.
A Mahout program is written in Java. The next listing shows how the recommendation builder works.
|DataModel model = new FileDataModel(new File(“/home/var/mydata.xml”));
RecommenderEvaluator eval = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderBuilder builder = new MyRecommenderBuilder();
Double res = eval.evaluate(builder, null, model, 0.9, 1.0);
A Mahout program