Silhouette coefficient

The silhouette coefficient is a metric that doesn't need to know the labeling of the dataset. It gives an idea of the separation between clusters.

It is composed of two different elements:

  • The mean distance between a sample and all other points in the same class (a)
  • The mean distance between a sample and all other points in the nearest cluster (b)

The formula for this coefficient s is defined as follows:

The silhouette coefficient is only defined if the number of classes is at least two, and the coefficient for a whole sample set is the mean of the coefficient for all samples.

Get Machine Learning for Developers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.