Resources:Wikipedia Kmeans article: http://en.wikipedia.org/wiki/Kmeans_clustering Wikipedia Hierarchical clustering article: http://en.wikipedia.org/wiki/Hierarchical_clustering
Basic clustering strategies:  Partitioning  dividing the data points into groups based on a criteria such as grouping by closeness (kmeans)
 Hierarchical  start by grouping closest points and building up clusters based on distance (bottom up or agglomerative). There is also topdown.
 Modeling  try to fit the best model (i.e., Gaussian to the data)
Kmeans clustering:  Fix the number of clusters in advance
 Assign each data point to a cluster at random (random partition method)
 Recompute the cluster centers as the average (centroid) of the points in each cluster.
 Reassign the data points to the cluster with the closest centroid.
 Repeat steps 3 and 4 a certain number of times or until the clusters don't change.
Subtleties:  Most of the time it is not obvious how many clusters to use  so often people will try many choices and look for best result.
 Due to the random starting centroids, the results may change each time you run the algorithm. Typically, you will repeat the clustering many times, say 100 and then choose the one that has the smallest within cluster sum of squares distance to centroids.
 Kmeans is NP hard  don't get a global optimum.
Example 1: Go through the kmeans clustering algorithm by hand in one dimension with the points 1, 2, 3, 4, 6, 8 for two initial assignments of points and 2 clusters.
Example 2: Generate two distinct clusters of 50 twodimensional vectors (Gaussian with mean 0 and Gausian with mean 1) and plot. x < rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) points(0, 0, pch="+", col="blue", cex=3) points(1, 1, pch="+", col="green", cex=3)
Example 3: Cluster the data from Example 2 using kmeans clustering with two clusters. Plot the results and look at the return values.plot(x, col = cl$cluster) points(cl$centers, col = 1:nclusts, pch = "x", cex = 3) points(0, 0, pch="+", col="blue", cex=3) points(1, 1, pch="+", col="green", cex=3)
Now adjust the code to try it with 3 clusters and with 4 clusters.
Example 4: Look at change in withincluster variance as a function of number of clusters wss[1] < (nrow(x)1)*sum(apply(x,2,var)) wss[k] < sum(kmeans(x, centers=k)$withinss) plot(1:15, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")
Hierarchical clustering:Uses agglomeration to successively merge clusters. Initialization: Calculate the distance between all pairs of points. Initial cluster list are just single points. Set the levels L(0) = 0 and m = 0.
 Update step:
 Find the pair of clusters that are closest (depends on which linkage method you choose).
 Merge the clusters into a new cluster and add to cluster list.
 Increment m and set L(m) to the cluster linkage distance.
 Delete the rows and columns corresponding to the clusters you merged. Add a row and column corresponding to new cluster.
 If more than 1 cluster remains, repeat 2.
Linkage methods:  Complete  distance between clusters A and B is the maximum distance between a point from A and a point from B (goes for compact clusters)
 Single  distance between clusters A and B is the minimum distance between a point from A and a point from B (friendoffriend)
 Average  distance between clusters A and B is the average of the distances between points in A and points in B
 Ward  distance between clusters A and B is the increase in variance for clusters being merged (goes for compact spherical shaped clusters)
 Centroid  distance between clusters A and B is the centroid of the cluster
Example 1: Simple example (row numbers are displayed) hc0 < hclust(dist(x), "complete")
Example 2: Simple example with row names names(x) < as.character(x) hc1 < hclust(dist(x), "complete")
Example 3: Simple example using single linkage and labels down hc2 < hclust(dist(x), "single")
Example 4: More complicated example from manual (notice ^2) hc3 < hclust(dist(USArrests)^2, "ward")
