In a project I'm going to use clustering algorithms implemented in Python, such as k-means.
Clustering
http://stackoverflow.com/questions/1545606/python-k-means-algorithm
scipy.cluster has been reported to have some problems, so for now I'll use PyCluster (following advice given on stackoverflow).
Install PyCluster:
pip install PyCluster |
The example from stackoverflow for k-means:
>>> import numpy >>> import Pycluster >>> points = numpy.vstack([numpy.random.multivariate_normal(mean, 0.03 * numpy.diag([1,1]), 20) for mean in [(1, 1), (2, 4), (3, 2)]]) >>> labels, error, nfound = Pycluster.kcluster(points, 3) >>> labels # Cluster number for each point array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32) >>> error # The within-cluster sum of distances for the solution 1.7721661785401261 >>> nfound # Number of times this solution was found 1 |
Plotting
http://stackoverflow.com/questions/9847026/plotting-output-of-kmeanspycluster-impl