Chapter 43. In Depth: Support Vector Machines
Support vector machines (SVMs) are a particularly powerful and flexible class of supervised algorithms for both classification and regression. In this chapter, we will explore the intuition behind SVMs and their use in classification problems.
We begin with the standard imports:
In
[
1
]:
%
matplotlib
inlineimport
numpy
as
np
import
matplotlib.pyplot
as
plt
plt
.
style
.
use
(
'seaborn-whitegrid'
)
from
scipy
import
stats
Note
Full-size, full-color figures are available in the supplemental materials on GitHub.
Motivating Support Vector Machines
As part of our discussion of Bayesian classification (see Chapter 41), we learned about a simple kind of model that describes the distribution of each underlying class, and experimented with using it to probabilistically determine labels for new points. That was an example of generative classification; here we will consider instead discriminative classification. That is, rather than modeling each class, we will simply find a line or curve (in two dimensions) or manifold (in multiple dimensions) that divides the classes from each other.
As an example of this, consider the simple case of a classification task in which the two classes of points are well separated (see Figure 43-1).
In
[
2
]:
from
sklearn.datasets
import
make_blobs
X
,
y
=
make_blobs
(
n_samples
=
50
,
centers
=
2
,
random_state
=
0
,
cluster_std
=
0.60
)
plt
.
scatter
(
X
[:,
0
],
X
[:,
1
],
c
=
y
,
s
=
50
,
cmap
=
'autumn'
);
Get Python Data Science Handbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.