Book description
This practical guide provides more than 200 self-contained recipes to help you solve machine learning challenges you may encounter in your work. If you're comfortable with Python and its libraries, including pandas and scikit-learn, you'll be able to address specific problems, from loading data to training models and leveraging neural networks.
Each recipe in this updated edition includes code that you can copy, paste, and run with a toy dataset to ensure that it works. From there, you can adapt these recipes according to your use case or application. Recipes include a discussion that explains the solution and provides meaningful context.
Go beyond theory and concepts by learning the nuts and bolts you need to construct working machine learning applications. You'll find recipes for:
- Vectors, matrices, and arrays
- Working with data from CSV, JSON, SQL, databases, cloud storage, and other sources
- Handling numerical and categorical data, text, images, and dates and times
- Dimensionality reduction using feature extraction or feature selection
- Model evaluation and selection
- Linear and logical regression, trees and forests, and k-nearest neighbors
- Supporting vector machines (SVM), naäve Bayes, clustering, and tree-based models
- Saving, loading, and serving trained models from multiple frameworks
Publisher resources
Table of contents
- Preface
-
1. Working with Vectors, Matrices,
and Arrays in NumPy
- 1.0. Introduction
- 1.1. Creating a Vector
- 1.2. Creating a Matrix
- 1.3. Creating a Sparse Matrix
- 1.4. Preallocating NumPy Arrays
- 1.5. Selecting Elements
- 1.6. Describing a Matrix
- 1.7. Applying Functions over Each Element
- 1.8. Finding the Maximum and Minimum Values
- 1.9. Calculating the Average, Variance, and Standard Deviation
- 1.10. Reshaping Arrays
- 1.11. Transposing a Vector or Matrix
- 1.12. Flattening a Matrix
- 1.13. Finding the Rank of a Matrix
- 1.14. Getting the Diagonal of a Matrix
- 1.15. Calculating the Trace of a Matrix
- 1.16. Calculating Dot Products
- 1.17. Adding and Subtracting Matrices
- 1.18. Multiplying Matrices
- 1.19. Inverting a Matrix
- 1.20. Generating Random Values
-
2. Loading Data
- 2.0. Introduction
- 2.1. Loading a Sample Dataset
- 2.2. Creating a Simulated Dataset
- 2.3. Loading a CSV File
- 2.4. Loading an Excel File
- 2.5. Loading a JSON File
- 2.6. Loading a Parquet File
- 2.7. Loading an Avro File
- 2.8. Querying a SQLite Database
- 2.9. Querying a Remote SQL Database
- 2.10. Loading Data from a Google Sheet
- 2.11. Loading Data from an S3 Bucket
- 2.12. Loading Unstructured Data
-
3. Data Wrangling
- 3.0. Introduction
- 3.1. Creating a Dataframe
- 3.2. Getting Information about the Data
- 3.3. Slicing DataFrames
- 3.4. Selecting Rows Based on Conditionals
- 3.5. Sorting Values
- 3.6. Replacing Values
- 3.7. Renaming Columns
- 3.8. Finding the Minimum, Maximum, Sum, Average, and Count
- 3.9. Finding Unique Values
- 3.10. Handling Missing Values
- 3.11. Deleting a Column
- 3.12. Deleting a Row
- 3.13. Dropping Duplicate Rows
- 3.14. Grouping Rows by Values
- 3.15. Grouping Rows by Time
- 3.16. Aggregating Operations and Statistics
- 3.17. Looping over a Column
- 3.18. Applying a Function over All Elements in a Column
- 3.19. Applying a Function to Groups
- 3.20. Concatenating DataFrames
- 3.21. Merging DataFrames
-
4. Handling Numerical Data
- 4.0. Introduction
- 4.1. Rescaling a Feature
- 4.2. Standardizing a Feature
- 4.3. Normalizing Observations
- 4.4. Generating Polynomial and Interaction Features
- 4.5. Transforming Features
- 4.6. Detecting Outliers
- 4.7. Handling Outliers
- 4.8. Discretizating Features
- 4.9. Grouping Observations Using Clustering
- 4.10. Deleting Observations with Missing Values
- 4.11. Imputing Missing Values
- 5. Handling Categorical Data
-
6. Handling Text
- 6.0. Introduction
- 6.1. Cleaning Text
- 6.2. Parsing and Cleaning HTML
- 6.3. Removing Punctuation
- 6.4. Tokenizing Text
- 6.5. Removing Stop Words
- 6.6. Stemming Words
- 6.7. Tagging Parts of Speech
- 6.8. Performing Named-Entity Recognition
- 6.9. Encoding Text as a Bag of Words
- 6.10. Weighting Word Importance
- 6.11. Using Text Vectors to Calculate Text Similarity in a Search Query
- 6.12. Using a Sentiment Analysis Classifier
-
7. Handling Dates and Times
- 7.0. Introduction
- 7.1. Converting Strings to Dates
- 7.2. Handling Time Zones
- 7.3. Selecting Dates and Times
- 7.4. Breaking Up Date Data into Multiple Features
- 7.5. Calculating the Difference Between Dates
- 7.6. Encoding Days of the Week
- 7.7. Creating a Lagged Feature
- 7.8. Using Rolling Time Windows
- 7.9. Handling Missing Data in Time Series
-
8. Handling Images
- 8.0. Introduction
- 8.1. Loading Images
- 8.2. Saving Images
- 8.3. Resizing Images
- 8.4. Cropping Images
- 8.5. Blurring Images
- 8.6. Sharpening Images
- 8.7. Enhancing Contrast
- 8.8. Isolating Colors
- 8.9. Binarizing Images
- 8.10. Removing Backgrounds
- 8.11. Detecting Edges
- 8.12. Detecting Corners
- 8.13. Creating Features for Machine Learning
- 8.14. Encoding Color Histograms as Features
- 8.15. Using Pretrained Embeddings as Features
- 8.16. Detecting Objects with OpenCV
- 8.17. Classifying Images with Pytorch
- 9. Dimensionality Reduction Using Feature Extraction
- 10. Dimensionality Reduction Using Feature Selection
-
11. Model Evaluation
- 11.0. Introduction
- 11.1. Cross-Validating Models
- 11.2. Creating a Baseline Regression Model
- 11.3. Creating a Baseline Classification Model
- 11.4. Evaluating Binary Classifier Predictions
- 11.5. Evaluating Binary Classifier Thresholds
- 11.6. Evaluating Multiclass Classifier Predictions
- 11.7. Visualizing a Classifier’s Performance
- 11.8. Evaluating Regression Models
- 11.9. Evaluating Clustering Models
- 11.10. Creating a Custom Evaluation Metric
- 11.11. Visualizing the Effect of Training Set Size
- 11.12. Creating a Text Report of Evaluation Metrics
- 11.13. Visualizing the Effect of Hyperparameter Values
-
12. Model Selection
- 12.0. Introduction
- 12.1. Selecting the Best Models Using Exhaustive Search
- 12.2. Selecting the Best Models Using Randomized Search
- 12.3. Selecting the Best Models from Multiple Learning Algorithms
- 12.4. Selecting the Best Models When Preprocessing
- 12.5. Speeding Up Model Selection with Parallelization
- 12.6. Speeding Up Model Selection Using Algorithm-Specific Methods
- 12.7. Evaluating Performance After Model Selection
- 13. Linear Regression
-
14. Trees and Forests
- 14.0. Introduction
- 14.1. Training a Decision Tree Classifier
- 14.2. Training a Decision Tree Regressor
- 14.3. Visualizing a Decision Tree Model
- 14.4. Training a Random Forest Classifier
- 14.5. Training a Random Forest Regressor
- 14.6. Evaluating Random Forests with Out-of-Bag Errors
- 14.7. Identifying Important Features in Random Forests
- 14.8. Selecting Important Features in Random Forests
- 14.9. Handling Imbalanced Classes
- 14.10. Controlling Tree Size
- 14.11. Improving Performance Through Boosting
- 14.12. Training an XGBoost Model
- 14.13. Improving Real-Time Performance with LightGBM
-
15. K-Nearest Neighbors
- 15.0. Introduction
- 15.1. Finding an Observation’s Nearest Neighbors
- 15.2. Creating a K-Nearest Neighbors Classifier
- 15.3. Identifying the Best Neighborhood Size
- 15.4. Creating a Radius-Based Nearest Neighbors Classifier
- 15.5. Finding Approximate Nearest Neighbors
- 15.6. Evaluating Approximate Nearest Neighbors
- 16. Logistic Regression
- 17. Support Vector Machines
- 18. Naive Bayes
- 19. Clustering
-
20. Tensors with PyTorch
- 20.0. Introduction
- 20.1. Creating a Tensor
- 20.2. Creating a Tensor from NumPy
- 20.3. Creating a Sparse Tensor
- 20.4. Selecting Elements in a Tensor
- 20.5. Describing a Tensor
- 20.6. Applying Operations to Elements
- 20.7. Finding the Maximum and Minimum Values
- 20.8. Reshaping Tensors
- 20.9. Transposing a Tensor
- 20.10. Flattening a Tensor
- 20.11. Calculating Dot Products
- 20.12. Multiplying Tensors
-
21. Neural Networks
- 21.0. Introduction
- 21.1. Using Autograd with PyTorch
- 21.2. Preprocessing Data for Neural Networks
- 21.3. Designing a Neural Network
- 21.4. Training a Binary Classifier
- 21.5. Training a Multiclass Classifier
- 21.6. Training a Regressor
- 21.7. Making Predictions
- 21.8. Visualize Training History
- 21.9. Reducing Overfitting with Weight Regularization
- 21.10. Reducing Overfitting with Early Stopping
- 21.11. Reducing Overfitting with Dropout
- 21.12. Saving Model Training Progress
- 21.13. Tuning Neural Networks
- 21.14. Visualizing Neural Networks
- 22. Neural Networks for Unstructured Data
- 23. Saving, Loading, and Serving Trained Models
- Index
- About the Authors
Product information
- Title: Machine Learning with Python Cookbook, 2nd Edition
- Author(s):
- Release date: August 2023
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098135720
You might also like
book
Machine Learning with Python Cookbook
This practical guide provides nearly 200 self-contained recipes to help you solve machine learning challenges you …
book
Python Machine Learning - Third Edition
Applied machine learning with a solid foundation in theory. Revised and expanded for TensorFlow 2, GANs, …
book
Introduction to Machine Learning with Python
Machine learning has become an integral part of many commercial applications and research projects, but this …
book
Machine Learning with Python for Everyone
The Complete Beginner's Guide to Understanding and Building Machine Learning Systems with Python will help you …