Chapter 6. Memorization methods
This chapter covers
- Building single-variable models
- Cross-validated variable selection
- Building basic multivariable models
- Starting with decision trees, nearest neighbor, and naive Bayes models
The simplest methods in data science are what we call memorization methods. These are methods that generate answers by returning a majority category (in the case of classification) or average value (in the case of scoring) of a subset of the original training data. These methods can vary from models depending on a single variable (similar to the analyst’s pivot table), to decision trees (similar to what are called business rules), to nearest neighbor and Naive Bayes methods.[1] In this chapter, you’ll learn how to ...
Get Practical Data Science with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.