Apr 11, 2014

Mahout in Action - KMeans Clustering - Make things work

I wanted to learn Mahout. Well, I tried to build from the source using maven, but it was a bad idea for a while. Therefore,I downloaded it from here, which is official place to download. Create a java project in eclipse, added the jars (both mahout** .jar and *.jar s in the lib folder of the downloaded folder. Yes, there were lots of jar! :). And write a class that exactly contain the coding shown in page 123 "Mahout in Action" 2012 edition (Listing 7.2). Then run the program as java application.

IT DID NOT WORK!

Fortunately, both in stack overflow and in some other blogs, solutions were included.

  1. First of all, the book had aimed Mahout 0.5 and I used 0.9. In different places it had mentioned that, after 0.6, the examples were not working, due to certain changes in the code. So if you have a higher distribution version that 0.5 and you are using the book I referred, you would have faced the problems I faced.
  2. Change the Class name "Cluster" to "Kluster". 
  3. Change the KMeansDriver.run(conf, new Path("input/points"),    new Path("input/clusters"),    new Path("output"), new EuclideanDistanceMeasure(),    0.001, 10, true, false);   function to KMeansDriver.run(new Path("input/points"), new Path("input/clusters"), new Path("output"), convergenceDelta, maxIterations, runClustering, clusterClassificationThreshold, runSequential); where convergenceDelta = 0.001, maxIterations = 10, runClustering = true, clusteringClassificationThreshold = 0, runSequential = true.
  4. Change the last string of SequenceFile.Reader reader    = new SequenceFile.Reader(fs,new Path("output/" + Kluster.CLUSTERED_POINTS_DIR    + "/part-m-0000"), conf);  to "/part-m-0"
  5. Change the WeightedVectorWritable to WeightedPropertyVectorWritable;
After doing these changes, it worked