Clustering based on Instruction Mix Data

Clustering is performed on the instruction mix data across the following nine dimensions (various instruction categories).

We use the hierarchical/agglomerative clustering approach. This is dynamically generated from live performance data in PADS.

The algorithm begins with a forest of clusters that will finally form a hierarchical tree. At the outset, each data point forms its own cluster. When two clusters a, b from this forest are combined into a single cluster c, a and b are removed from the forest, and c is added to the forest. When only one cluster remains in the forest, the algorithm stops, and this cluster becomes the root.

We use the euclidean distance as the distance metric between any two data points. At every iteration, each cluster is joined together with neighboring clusters (using average distance between cluster members) while combining clusters.

The figure below shows the heirarchical clusters using a dendrogram. Please see this for an introduction to dendrogram interpretation.
The horizontal axis refers to a distance measure between clusters.
Click on the image below to get a high resolution version.


Distance

Cluster assignments

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20