Clustering in Python

In a project I’m going to use clustering algorithms implemented in Python, such as k-means.

Clustering

http://stackoverflow.com/questions/1545606/python-k-means-algorithm

scipy.cluster has been reported to have some problems, so for now I’ll use PyCluster (following advice given on stackoverflow).

Install PyCluster:

pip install PyCluster

The example from stackoverflow for k-means:

>>> import numpy
>>> import Pycluster
>>> points = numpy.vstack([numpy.random.multivariate_normal(mean, 
                                                            0.03 * numpy.diag([1,1]),
                                                            20) 
                           for mean in [(1, 1), (2, 4), (3, 2)]])
>>> labels, error, nfound = Pycluster.kcluster(points, 3)
>>> labels  # Cluster number for each point
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)
>>> error   # The within-cluster sum of distances for the solution
1.7721661785401261
>>> nfound  # Number of times this solution was found
1

Plotting

http://stackoverflow.com/questions/9847026/plotting-output-of-kmeanspycluster-impl

Leave a Reply