Chapter 5. Dimension Reduction
As described in the Assessing a model/overfitting section of Chapter 2, Data Pipelines, the indiscriminative reliance of a large number of features may cause overfitting; the model may become so tightly coupled with the training set that different validation sets will generate a vastly different outcome and quality metrics such as AuROC.
Dimension reduction techniques alleviate these problems by detecting features that have little influence on the overall model behavior.
This chapter introduces three categories of dimension reduction techniques with two implementations in Scala:
- Divergence with an implementation of the Kullback-Leibler distance
- Principal components analysis
- Estimation of low dimension feature space for ...
Get Scala for Machine Learning - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.