This score is useful to check whether the clustering algorithm meets an important requirement: a cluster should contain only samples belonging to a single class. It's defined as:
It's bounded between 0 and 1, with low values indicating a low homogeneity. In fact, when the knowledge of Ypred reduces the uncertainty of Ytrue, H(Ytrue|Ypred) becomes smaller (h → 1) and viceversa. For our example, the homogeneity score can be computed as:
from sklearn.metrics import homogeneity_scoreprint(homogeneity_score(digits['target'], Y))0.739148799605
The digits['target'] array contains the true labels while Y contains the predictions ...