18

Clustering

Machine learning methods can generally be classified into two main categories of models: supervised learning and unsupervised learning. Thus far, we have been working on supervised learning models, since we train our models with a target y or response variable. In other words, in the training data for our models, we know the “correct” answer. Unsupervised models are modeling techniques in which the “correct” answer is unknown. Many of these methods involve clustering, where the two main methods are k-means clustering and hierarchical clustering.

18.1 k-Means

The technique known as k-means works by first selecting how many clusters, k, exist in the data. The algorithm randomly selects k points in the data and calculates the distance ...

Get Pandas for Everyone: Python Data Analysis, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.