Week 7, Wednesday – Statistically Speaking

Hierarchical Clustering

Hierarchical Clustering produces clusters that are strikingly similar to K-means. In fact, the outcome can sometimes be identical to k-means clustering. However, the entire procedure is distinct. There are two types: agglomerative and divisive. Agglomerative is a bottom-up approach, while Divisive is the inverse. Today, I concentrated primarily on the Agglomerative approach.

Step 1: Make each data point into a single point cluster, resulting in N clusters.

Step 2: Combine the two closest data points into one cluster. This produces N-1 clusters.

Step 3: Combine the two closest clusters into one. This results in the formation of N-2 clusters.

Step 4: Repeat Step 3 until there is only one large cluster.

Step 5: Finish.

Cluster closeness differs from data point closeness, which can be measured using techniques such as the euclidean distance between the points.

I learned about dendrograms, which have a vertical axis that shows the Euclidean distance between two points and a horizontal axis that shows the data points. As a result, the higher the lines, the more dissimilar the clusters. We can set dissimilarity thresholds, and the largest clusters below the thresholds are what we need based on the number of lines cut in our dendrogram by the threshold.

Leave a Reply Cancel reply