Python Machine Learning By Example - Fourth Edition

Book description

Author Yuxi (Hayden) Liu teaches machine learning from the fundamentals to building NLP transformers and multimodal models with best practice tips and real-world examples using PyTorch, TensorFlow, scikit-learn, and pandas

Key Features

  • Discover new and updated content on NLP transformers, PyTorch, and computer vision modeling
  • Includes a dedicated chapter on best practices and additional best practice tips throughout the book to improve your ML solutions
  • Implement ML models, such as neural networks and linear and logistic regression, from scratch
  • Purchase of the print or Kindle book includes a free PDF copy

Book Description

The fourth edition of Python Machine Learning By Example is a comprehensive guide for beginners and experienced machine learning practitioners who want to learn more advanced techniques, such as multimodal modeling. Written by experienced machine learning author and ex-Google machine learning engineer Yuxi (Hayden) Liu, this edition emphasizes best practices, providing invaluable insights for machine learning engineers, data scientists, and analysts.

Explore advanced techniques, including two new chapters on natural language processing transformers with BERT and GPT, and multimodal computer vision models with PyTorch and Hugging Face. You’ll learn key modeling techniques using practical examples, such as predicting stock prices and creating an image search engine.

This hands-on machine learning book navigates through complex challenges, bridging the gap between theoretical understanding and practical application. Elevate your machine learning and deep learning expertise, tackle intricate problems, and unlock the potential of advanced techniques in machine learning with this authoritative guide.

What you will learn

  • Follow machine learning best practices throughout data preparation and model development
  • Build and improve image classifiers using convolutional neural networks (CNNs) and transfer learning
  • Develop and fine-tune neural networks using TensorFlow and PyTorch
  • Analyze sequence data and make predictions using recurrent neural networks (RNNs), transformers, and CLIP
  • Build classifiers using support vector machines (SVMs) and boost performance with PCA
  • Avoid overfitting using regularization, feature selection, and more

Who this book is for

This expanded fourth edition is ideal for data scientists, ML engineers, analysts, and students with Python programming knowledge. The real-world examples, best practices, and code prepare anyone undertaking their first serious ML project.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Getting Started with Machine Learning and Python
    1. An introduction to machine learning
      1. Understanding why we need machine learning
      2. Differentiating between machine learning and automation
      3. Machine learning applications
    2. Knowing the prerequisites
    3. Getting started with three types of machine learning
      1. A brief history of the development of machine learning algorithms
    4. Digging into the core of machine learning
      1. Generalizing with data
      2. Overfitting, underfitting, and the bias-variance trade-off
        1. Overfitting
        2. Underfitting
      3. The bias-variance trade-off
      4. Avoiding overfitting with cross-validation
      5. Avoiding overfitting with regularization
      6. Avoiding overfitting with feature selection and dimensionality reduction
    5. Data preprocessing and feature engineering
      1. Preprocessing and exploration
        1. Dealing with missing values
        2. Label encoding
        3. One-hot encoding
        4. Dense embedding
        5. Scaling
      2. Feature engineering
        1. Polynomial transformation
        2. Binning
    6. Combining models
      1. Voting and averaging
      2. Bagging
      3. Boosting
      4. Stacking
    7. Installing software and setting up
      1. Setting up Python and environments
      2. Installing the main Python packages
        1. NumPy
        2. SciPy
        3. pandas
        4. scikit-learn
        5. TensorFlow
        6. PyTorch
    8. Summary
    9. Exercises
    10. Join our book’s Discord space
  3. Building a Movie Recommendation Engine with Naïve Bayes
    1. Getting started with classification
      1. Binary classification
      2. Multiclass classification
      3. Multi-label classification
    2. Exploring Naïve Bayes
      1. Bayes’ theorem by example
      2. The mechanics of Naïve Bayes
    3. Implementing Naïve Bayes
      1. Implementing Naïve Bayes from scratch
      2. Implementing Naïve Bayes with scikit-learn
    4. Building a movie recommender with Naïve Bayes
      1. Preparing the data
      2. Training a Naïve Bayes model
    5. Evaluating classification performance
    6. Tuning models with cross-validation
    7. Summary
    8. Exercises
    9. References
    10. Join our book’s Discord space
  4. Predicting Online Ad Click-Through with Tree-Based Algorithms
    1. A brief overview of ad click-through prediction
    2. Getting started with two types of data – numerical and categorical
    3. Exploring a decision tree from the root to the leaves
      1. Constructing a decision tree
      2. The metrics for measuring a split
        1. Gini Impurity
        2. Information Gain
    4. Implementing a decision tree from scratch
    5. Implementing a decision tree with scikit-learn
    6. Predicting ad click-through with a decision tree
    7. Ensembling decision trees – random forests
    8. Ensembling decision trees – gradient-boosted trees
    9. Summary
    10. Exercises
    11. Join our book’s Discord space
  5. Predicting Online Ad Click-Through with Logistic Regression
    1. Converting categorical features to numerical – one-hot encoding and ordinal encoding
    2. Classifying data with logistic regression
      1. Getting started with the logistic function
      2. Jumping from the logistic function to logistic regression
    3. Training a logistic regression model
      1. Training a logistic regression model using gradient descent
      2. Predicting ad click-through with logistic regression using gradient descent
      3. Training a logistic regression model using stochastic gradient descent (SGD)
      4. Training a logistic regression model with regularization
      5. Feature selection using L1 regularization
      6. Feature selection using random forest
    4. Training on large datasets with online learning
    5. Handling multiclass classification
    6. Implementing logistic regression using TensorFlow
    7. Summary
    8. Exercises
    9. Join our book’s Discord space
  6. Predicting Stock Prices with Regression Algorithms
    1. What is regression?
    2. Mining stock price data
      1. A brief overview of the stock market and stock prices
    3. Getting started with feature engineering
      1. Acquiring data and generating features
    4. Estimating with linear regression
      1. How does linear regression work?
      2. Implementing linear regression from scratch
      3. Implementing linear regression with scikit-learn
      4. Implementing linear regression with TensorFlow
    5. Estimating with decision tree regression
      1. Transitioning from classification trees to regression trees
      2. Implementing decision tree regression
    6. Implementing a regression forest
    7. Evaluating regression performance
    8. Predicting stock prices with the three regression algorithms
    9. Summary
    10. Exercises
    11. Join our book’s Discord space
  7. Predicting Stock Prices with Artificial Neural Networks
    1. Demystifying neural networks
      1. Starting with a single-layer neural network
        1. Layers in neural networks
      2. Activation functions
      3. Backpropagation
      4. Adding more layers to a neural network: DL
    2. Building neural networks
      1. Implementing neural networks from scratch
      2. Implementing neural networks with scikit-learn
      3. Implementing neural networks with TensorFlow
      4. Implementing neural networks with PyTorch
    3. Picking the right activation functions
    4. Preventing overfitting in neural networks
      1. Dropout
      2. Early stopping
    5. Predicting stock prices with neural networks
      1. Training a simple neural network
      2. Fine-tuning the neural network
    6. Summary
    7. Exercises
    8. Join our book’s Discord space
  8. Mining the 20 Newsgroups Dataset with Text Analysis Techniques
    1. How computers understand language – NLP
      1. What is NLP?
      2. The history of NLP
      3. NLP applications
    2. Touring popular NLP libraries and picking up NLP basics
      1. Installing famous NLP libraries
        1. Corpora
      2. Tokenization
      3. PoS tagging
      4. NER
      5. Stemming and lemmatization
      6. Semantics and topic modeling
    3. Getting the newsgroups data
    4. Exploring the newsgroups data
    5. Thinking about features for text data
      1. Counting the occurrence of each word token
      2. Text preprocessing
      3. Dropping stop words
      4. Reducing inflectional and derivational forms of words
    6. Visualizing the newsgroups data with t-SNE
      1. What is dimensionality reduction?
      2. t-SNE for dimensionality reduction
      3. Representing words with dense vectors – word embedding
      4. Building embedding models using shallow neural networks
      5. Utilizing pre-trained embedding models
    7. Summary
    8. Exercises
    9. Join our book’s Discord space
  9. Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling
    1. Learning without guidance – unsupervised learning
    2. Getting started with k-means clustering
      1. How does k-means clustering work?
      2. Implementing k-means from scratch
      3. Implementing k-means with scikit-learn
      4. Choosing the value of k
    3. Clustering newsgroups dataset
      1. Clustering newsgroups data using k-means
      2. Describing the clusters using GPT
    4. Discovering underlying topics in newsgroups
      1. Topic modeling using NMF
      2. Topic modeling using LDA
    5. Summary
    6. Exercises
    7. Join our book’s Discord space
  10. Recognizing Faces with Support Vector Machine
    1. Finding the separating boundary with SVM
      1. Scenario 1 – identifying a separating hyperplane
      2. Scenario 2 – determining the optimal hyperplane
      3. Scenario 3 – handling outliers
      4. Implementing SVM
      5. Scenario 4 – dealing with more than two classes
        1. One-vs-rest
        2. One-vs-one
        3. Multiclass cases in scikit-learn
      6. Scenario 5 – solving linearly non-separable problems with kernels
      7. Choosing between linear and RBF kernels
    2. Classifying face images with SVM
      1. Exploring the face image dataset
      2. Building an SVM-based image classifier
      3. Boosting image classification performance with PCA
    3. Estimating with support vector regression
      1. Implementing SVR
    4. Summary
    5. Exercises
    6. Join our book’s Discord space
  11. Machine Learning Best Practices
    1. Machine learning solution workflow
    2. Best practices in the data preparation stage
      1. Best practice 1 – Completely understanding the project goal
      2. Best practice 2 – Collecting all fields that are relevant
      3. Best practice 3 – Maintaining the consistency and normalization of field values
      4. Best practice 4 – Dealing with missing data
      5. Best practice 5 – Storing large-scale data
    3. Best practices in the training set generation stage
      1. Best practice 6 – Identifying categorical features with numerical values
      2. Best practice 7 – Deciding whether to encode categorical features
      3. Best practice 8 – Deciding whether to select features and, if so, how to do so
      4. Best practice 9 – Deciding whether to reduce dimensionality and, if so, how to do so
      5. Best practice 10 – Deciding whether to rescale features
      6. Best practice 11 – Performing feature engineering with domain expertise
      7. Best practice 12 – Performing feature engineering without domain expertise
        1. Binarization and discretization
        2. Interaction
        3. Polynomial transformation
      8. Best practice 13 – Documenting how each feature is generated
      9. Best practice 14 – Extracting features from text data
        1. tf and tf-idf
        2. Word embedding
        3. Word2Vec embedding
    4. Best practices in the model training, evaluation, and selection stage
      1. Best practice 15 – Choosing the right algorithm(s) to start with
        1. Naïve Bayes
        2. Logistic regression
        3. SVM
        4. Random forest (or decision tree)
        5. Neural networks
      2. Best practice 16 – Reducing overfitting
      3. Best practice 17 – Diagnosing overfitting and underfitting
      4. Best practice 18 – Modeling on large-scale datasets
    5. Best practices in the deployment and monitoring stage
      1. Best practice 19 – Saving, loading, and reusing models
        1. Saving and restoring models using pickle
        2. Saving and restoring models in TensorFlow
        3. Saving and restoring models in PyTorch
      2. Best practice 20 – Monitoring model performance
      3. Best practice 21 – Updating models regularly
    6. Summary
    7. Exercises
    8. Join our book’s Discord space
  12. Categorizing Images of Clothing with Convolutional Neural Networks
    1. Getting started with CNN building blocks
      1. The convolutional layer
      2. The non-linear layer
      3. The pooling layer
    2. Architecting a CNN for classification
    3. Exploring the clothing image dataset
    4. Classifying clothing images with CNNs
      1. Architecting the CNN model
      2. Fitting the CNN model
      3. Visualizing the convolutional filters
    5. Boosting the CNN classifier with data augmentation
      1. Flipping for data augmentation
      2. Rotation for data augmentation
      3. Cropping for data augmentation
    6. Improving the clothing image classifier with data augmentation
    7. Advancing the CNN classifier with transfer learning
      1. Development of CNN architectures and pretrained models
      2. Improving the clothing image classifier by fine-tuning ResNets
    8. Summary
    9. Exercises
    10. Join our book’s Discord space
  13. Making Predictions with Sequences Using Recurrent Neural Networks
    1. Introducing sequential learning
    2. Learning the RNN architecture by example
      1. Recurrent mechanism
      2. Many-to-one RNNs
      3. One-to-many RNNs
      4. Many-to-many (synced) RNNs
      5. Many-to-many (unsynced) RNNs
    3. Training an RNN model
    4. Overcoming long-term dependencies with LSTM
    5. Analyzing movie review sentiment with RNNs
      1. Analyzing and preprocessing the data
      2. Building a simple LSTM network
      3. Stacking multiple LSTM layers
    6. Revisiting stock price forecasting with LSTM
    7. Writing your own War and Peace with RNNs
      1. Acquiring and analyzing the training data
      2. Constructing the training set for the RNN text generator
      3. Building and training an RNN text generator
    8. Summary
    9. Exercises
    10. Join our book’s Discord space
  14. Advancing Language Understanding and Generation with the Transformer Models
    1. Understanding self-attention
      1. Key, value, and query representations
        1. Attention score calculation and embedding vector generation
      2. Multi-head attention
    2. Exploring the Transformer’s architecture
      1. The encoder-decoder structure
      2. Positional encoding
      3. Layer normalization
    3. Improving sentiment analysis with BERT and Transformers
      1. Pre-training BERT
        1. MLM
        2. NSP
      2. Fine-tuning of BERT
      3. Fine-tuning a pre-trained BERT model for sentiment analysis
      4. Using the Trainer API to train Transformer models
    4. Generating text using GPT
      1. Pre-training of GPT and autoregressive generation
      2. Writing your own version of War and Peace with GPT
    5. Summary
    6. Exercises
    7. Join our book’s Discord space
  15. Building an Image Search Engine Using CLIP: a Multimodal Approach
    1. Introducing the CLIP model
      1. Understanding the mechanism of the CLIP model
        1. Vision encoder
        2. Text encoder
        3. Contrastive learning
      2. Exploring applications of the CLIP model
        1. Zero-shot image classification
        2. Zero-shot text classification
        3. Image and text retrieval
        4. Image and text generation
        5. Transfer learning
    2. Getting started with the dataset
      1. Obtaining the Flickr8k dataset
      2. Loading the Flickr8k dataset
      3. Architecting the CLIP model
        1. Vision encoder
        2. Text encoder
      4. Projection head for contrastive learning
        1. CLIP model
    3. Finding images with words
      1. Training a CLIP model
      2. Obtaining embeddings for images and text to identify matches
      3. Image search using the pre-trained CLIP model
      4. Zero-shot classification
    4. Summary
    5. Exercises
    6. References
    7. Join our book’s Discord space
  16. Making Decisions in Complex Environments with Reinforcement Learning
    1. Setting up the working environment
    2. Introducing OpenAI Gym and Gymnasium
      1. Installing Gymnasium
    3. Introducing reinforcement learning with examples
      1. Elements of reinforcement learning
      2. Cumulative rewards
      3. Approaches to reinforcement learning
        1. Policy-based approach
        2. Value-based approach
    4. Solving the FrozenLake environment with dynamic programming
      1. Simulating the FrozenLake environment
      2. Solving FrozenLake with the value iteration algorithm
      3. Solving FrozenLake with the policy iteration algorithm
    5. Performing Monte Carlo learning
      1. Simulating the Blackjack environment
      2. Performing Monte Carlo policy evaluation
      3. Performing on-policy Monte Carlo control
    6. Solving the Blackjack problem with the Q-learning algorithm
      1. Introducing the Q-learning algorithm
      2. Developing the Q-learning algorithm
    7. Summary
    8. Exercises
    9. Join our book’s Discord space
  17. Other Books You May Enjoy
  18. Index

Product information

  • Title: Python Machine Learning By Example - Fourth Edition
  • Author(s): Yuxi Liu
  • Release date: July 2024
  • Publisher(s): Packt Publishing
  • ISBN: 9781835085622