Book description
Machine learning without advanced math! This book presents a serious, practical look at machine learning, preparing you for valuable insights on your own data. The Art of Machine Learning is packed with real dataset examples and sophisticated advice on how to make full use of powerful machine learning methods. Readers will need only an intuitive grasp of charts, graphs, and the slope of a line, as well as familiarity with the R programming language. You’ll become skilled in a range of machine learning methods, starting with the simple k-Nearest Neighbors method (k-NN), then on to random forests, gradient boosting, linear/logistic models, support vector machines, the LASSO, and neural networks.Final chapters introduce text and image classification, as well as time series. You’ll learn not only how to use machine learning methods, but also why these methods work, providing the strong foundational background you’ll need in practice. Additional features:
- How to avoid common problems, such as dealing with “dirty” data and factor variables with large numbers of levels
- A look at typical misconceptions, such as dealing with unbalanced data
- Exploration of the famous Bias-Variance Tradeoff, central to machine learning, and how it plays out in practice for each machine learning method
- Dozens of illustrative examples involving real datasets of varying size and field of application
- Standard R packages are used throughout, with a simple wrapper interface to provide convenient access.
After finishing this book, you will be well equipped to start applying machine learning techniques to your own datasets.
Publisher resources
Table of contents
- Cover Page
- Title Page
- Copyright Page
- About the Author
- About the Technical Reviewer
- BRIEF CONTENTS
- CONTENTS IN DETAIL
- ACKNOWLEDGMENTS
- INTRODUCTION
- PART I: PROLOGUE, AND NEIGHBORHOOD-BASED METHODS
-
1 REGRESSION MODELS
- 1.1 Example: The Bike Sharing Dataset
- 1.2 Machine Learning and Prediction
- 1.3 Introducing the k-Nearest Neighbors Method
- 1.4 Dummy Variables and Categorical Variables
- 1.5 Analysis with qeKNN()
- 1.6 The Regression Function: The Basis of ML
- 1.7 The Bias-Variance Trade-off
- 1.8 Example: The mlb Dataset
- 1.9 k-NN and Categorical Features
- 1.10 Scaling
- 1.11 Choosing Hyperparameters
- 1.12 Holdout Sets
- 1.13 Pitfall: p-Hacking and Hyperparameter Selection
- 1.14 Pitfall: Long-Term Time Trends
- 1.15 Pitfall: Dirty Data
- 1.16 Pitfall: Missing Data
- 1.17 Direct Access to the regtools k-NN Code
- 1.18 Conclusions
-
2 CLASSIFICATION MODELS
- 2.1 Classification Is a Special Case of Regression
- 2.2 Example: The Telco Churn Dataset
- 2.3 Example: Vertebrae Data
- 2.4 Pitfall: Error Rate Improves Only Slightly Using the Features
- 2.5 The Confusion Matrix
- 2.6 Clearing the Confusion: Unbalanced Data
- 2.7 Receiver Operating Characteristic and Area Under Curve
- 2.8 Conclusions
- 3 BIAS, VARIANCE, OVERFITTING, AND CROSS-VALIDATION
- 4 DEALING WITH LARGE NUMBERS OF FEATURES
- PART II: TREE-BASED METHODS
- 5 A STEP BEYOND K-NN: DECISION TREES
- 6 TWEAKING THE TREES
- 7 FINDING A GOOD SET OF HYPERPARAMETERS
- PART III: METHODS BASED ON LINEAR RELATIONSHIPS
-
8 PARAMETRIC METHODS
- 8.1 Motivating Example: The Baseball Player Data
- 8.2 The lm() Function
- 8.3 Wrapper for lm() in the qe*-Series: qeLin()
- 8.4 Use of Multiple Features
- 8.5 Dimension Reduction
- 8.6 Least Squares and Residuals
- 8.7 Diagnostics: Is the Linear Model Valid?
- 8.8 The R-Squared Value(s)
- 8.9 Classification Applications: The Logistic Model
- 8.10 Bias and Variance in Linear/Generalized Linear Models
- 8.11 Polynomial Models
- 8.12 Blending the Linear Model with Other Methods
- 8.13 The qeCompare() Function
- 8.14 What’s Next
- 9 CUTTING THINGS DOWN TO SIZE: REGULARIZATION
- PART IV: METHODS BASED ON SEPARATING LINES AND PLANES
- 10 A BOUNDARY APPROACH: SUPPORT VECTOR MACHINES
-
11 LINEAR MODELS ON STEROIDS: NEURAL NETWORKS
- 11.1 Overview
- 11.2 Working on Top of a Complex Infrastructure
- 11.3 Example: Vertebrae Data
- 11.4 Neural Network Hyperparameters
- 11.5 Activation Functions
- 11.6 Regularization
- 11.7 Example: Fall Detection Data
- 11.8 Pitfall: Convergence Problems
- 11.9 Close Relation to Polynomial Regression
- 11.10 Bias vs. Variance in Neural Networks
- 11.11 Discussion
- PART V: APPLICATIONS
- 12 IMAGE CLASSIFICATION
- 13 HANDLING TIME SERIES AND TEXT DATA
- A LIST OF ACRONYMS AND SYMBOLS
- B STATISTICS AND ML TERMINOLOGY CORRESPONDENCE
- C MATRICES, DATA FRAMES, AND FACTOR CONVERSIONS
- D PITFALL: BEWARE OF “P-HACKING”!
- INDEX
Product information
- Title: The Art of Machine Learning
- Author(s):
- Release date: January 2024
- Publisher(s): No Starch Press
- ISBN: 9781718502109
You might also like
book
Deep Learning
Ever since computers began beating us at chess, they've been getting better at a wide range …
book
Machine Learning Engineering in Action
Field-tested tips, tricks, and design patterns for building machine learning projects that are deployable, maintainable, and …
book
Machine Learning Interviews
As tech products become more prevalent today, the demand for machine learning professionals continues to grow. …
book
Applied Machine Learning and AI for Engineers
While many introductory guides to AI are calculus books in disguise, this one mostly eschews the …