CHAPTER 6Clustering

One of the more common machine learning strands you'll come across is clustering, mainly because it's very useful. For example, marketing companies love it because they can group customers into segments. This chapter describes the details of clustering and illustrates how clusters work and where they are used.

What Is Clustering?

If you boil down all the definitions of clustering out there, you get “organizing a group of objects that share similar characteristics.” It's classed as an unsupervised learning method, which means there's no prior training data from which to learn. In Figure 6.1 you see there are three distinct groupings of data; each one of those groups is a cluster.

The main aim is to find structure within a given set of data. Because there are a lot of algorithms to choose from, clustering casts a wide net. This is where experimentation comes in handy; which algorithm is the right choice? Sometimes you just need to put some code together and play with it. You'll do that shortly.

Graphical representation of three distinct groupings of data; each one of those groups is a cluster within a given set of data.

Figure 6.1: A graph representation of a cluster

Where Is Clustering Used?

Clustering is a widely used machine learning approach. Although it might seem simple, do not underestimate the importance of grouping ...

Get Machine Learning, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.