The Gini coefficient

The metric that the decision tree uses to decide if the root node is called the Gini coefficient. The higher the value of this coefficient, the better the job that this particular feature does at splitting the data into distinct groups. In order to learn how to compute the Gini coefficient for a feature, let's consider the following diagram:

Computing the Gini coefficient

In the preceding diagram, the following happens:

  1. The feature splits the data into two groups.
  2. In the left-hand group, we have two triangles and one circle.
  3. Therefore, the Gini for the left-hand group is (2 triangles/3 total data points)^2+ (1 circle/3 ...

Get Machine Learning with scikit-learn Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.