Machine Learning with LightGBM and Python

Book description

Take your software to the next level and solve real-world data science problems by building production-ready machine learning solutions using LightGBM and Python

Key Features

  • Get started with LightGBM, a powerful gradient-boosting library for building ML solutions
  • Apply data science processes to real-world problems through case studies
  • Elevate your software by building machine learning solutions on scalable platforms
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Machine Learning with LightGBM and Python is a comprehensive guide to learning the basics of machine learning and progressing to building scalable machine learning systems that are ready for release.

This book will get you acquainted with the high-performance gradient-boosting LightGBM framework and show you how it can be used to solve various machine-learning problems to produce highly accurate, robust, and predictive solutions. Starting with simple machine learning models in scikit-learn, you’ll explore the intricacies of gradient boosting machines and LightGBM. You’ll be guided through various case studies to better understand the data science processes and learn how to practically apply your skills to real-world problems. As you progress, you’ll elevate your software engineering skills by learning how to build and integrate scalable machine-learning pipelines to process data, train models, and deploy them to serve secure APIs using Python tools such as FastAPI.

By the end of this book, you’ll be well equipped to use various -of-the-art tools that will help you build production-ready systems, including FLAML for AutoML, PostgresML for operating ML pipelines using Postgres, high-performance distributed training and serving via Dask, and creating and running models in the Cloud with AWS Sagemaker.

What you will learn

  • Get an overview of ML and working with data and models in Python using scikit-learn
  • Explore decision trees, ensemble learning, gradient boosting, DART, and GOSS
  • Master LightGBM and apply it to classification and regression problems
  • Tune and train your models using AutoML with FLAML and Optuna
  • Build ML pipelines in Python to train and deploy models with secure and performant APIs
  • Scale your solutions to production readiness with AWS Sagemaker, PostgresML, and Dask

Who this book is for

This book is for software engineers aspiring to be better machine learning engineers and data scientists unfamiliar with LightGBM, looking to gain in-depth knowledge of its libraries. Basic to intermediate Python programming knowledge is required to get started with the book. The book is also an excellent source for ML veterans, with a strong focus on ML engineering with up-to-date and thorough coverage of platforms such as AWS Sagemaker, PostgresML, and Dask.

Table of contents

  1. Machine Learning with LightGBM and Python
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share Your Thoughts
    8. Download a free PDF copy of this book
  6. Part 1: Gradient Boosting and LightGBM Fundamentals
  7. Chapter 1: Introducing Machine Learning
    1. Technical requirements
    2. What is machine learning?
      1. Machine learning paradigms
    3. Introducing models, datasets, and supervised learning
      1. Models
      2. Hyperparameters
      3. Datasets
      4. Overfitting and generalization
      5. Supervised learning
      6. Model performance metrics
      7. A modeling example
    4. Decision tree learning
      1. Entropy and information gain
      2. Building a decision tree using C4.5
      3. Overfitting in decision trees
      4. Building decision trees with scikit-learn
      5. Decision tree hyperparameters
    5. Summary
    6. References
  8. Chapter 2: Ensemble Learning – Bagging and Boosting
    1. Technical requirements
    2. Ensemble learning
    3. Bagging and random forests
      1. Random forest
    4. Gradient-boosted decision trees
      1. Gradient descent
      2. Gradient boosting
      3. Gradient-boosted decision tree hyperparameters
      4. Gradient boosting in scikit-learn
    5. Advanced boosting algorithm – DART
    6. Summary
    7. References
  9. Chapter 3: An Overview of LightGBM in Python
    1. Technical requirements
    2. Introducing LightGBM
      1. LightGBM optimizations
      2. Hyperparameters
      3. Limitations of LightGBM
    3. Getting started with LightGBM in Python
      1. LightGBM Python API
      2. LightGBM scikit-learn API
    4. Building LightGBM models
      1. Cross-validation
      2. Parameter optimization
      3. Predicting student academic success
    5. Summary
    6. References
  10. Chapter 4: Comparing LightGBM, XGBoost, and Deep Learning
    1. Technical requirements
    2. An overview of XGBoost
      1. Comparing XGBoost and LightGBM
      2. Python XGBoost example
    3. Deep learning and TabTransformers
      1. What is deep learning?
      2. Introducing TabTransformers
    4. Comparing LightGBM, XGBoost, and TabTransformers
      1. Predicting census income
      2. Detecting credit card fraud
    5. Summary
    6. References
  11. Part 2: Practical Machine Learning with LightGBM
  12. Chapter 5: LightGBM Parameter Optimization with Optuna
    1. Technical requirements
    2. Optuna and optimization algorithms
      1. Introducing Optuna
      2. Optimization algorithms
      3. Pruning strategies
    3. Optimizing LightGBM with Optuna
      1. Advanced Optuna features
    4. Summary
    5. References
  13. Chapter 6: Solving Real-World Data Science Problems with LightGBM
    1. Technical requirements
    2. The data science life cycle
      1. Defining the data science life cycle
    3. Predicting wind turbine power generation with LightGBM
      1. Problem definition
      2. Data collection
      3. Data preparation
      4. EDA
      5. Modeling
      6. Model deployment
      7. Communicating results
    4. Classifying individual credit scores with LightGBM
      1. Problem definition
      2. Data collection
      3. Data preparation
      4. EDA
      5. Modeling
      6. Model deployment and results
    5. Summary
    6. References
  14. Chapter 7: AutoML with LightGBM and FLAML
    1. Technical requirements
    2. Automated machine learning
      1. Automating feature engineering
      2. Automating model selection and tuning
      3. Risks of using AutoML systems
    3. Introducing FLAML
      1. Cost Frugal Optimization
      2. BlendSearch
      3. FLAML limitations
    4. Case study – using FLAML with LightGBM
      1. Feature engineering
      2. FLAML AutoML
      3. Zero-shot AutoML
    5. Summary
    6. References
  15. Part 3: Production-ready Machine Learning with LightGBM
  16. Chapter 8: Machine Learning Pipelines and MLOps with LightGBM
    1. Technical requirements
    2. Introducing machine learning pipelines
      1. Scikit-learn pipelines
    3. Understanding MLOps
    4. Deploying an ML pipeline for customer churn
      1. Building an ML pipeline using scikit-learn
      2. Building an ML API using FastAPI
      3. Containerizing our API
      4. Deploying LightGBM to Google Cloud
    5. Summary
  17. Chapter 9: LightGBM MLOps with AWS SageMaker
    1. Technical requirements
    2. An introduction to AWS and SageMaker
      1. AWS
      2. SageMaker
      3. SageMaker Clarify
    3. Building a LightGBM ML pipeline with Amazon SageMaker
      1. Setting up a SageMaker session
      2. Preprocessing step
      3. Model training and tuning
      4. Evaluation, bias, and explainability
      5. Deploying and monitoring the LightGBM model
      6. Results
    4. Summary
    5. References
  18. Chapter 10: LightGBM Models with PostgresML
    1. Technical requirements
    2. Introducing PostgresML
      1. Latency and round trips
    3. Getting started with PostgresML
      1. Training models
      2. Deploying and prediction
      3. PostgresML dashboard
    4. Case study – customer churn with PostgresML
      1. Data loading and preprocessing
      2. Training and hyperparameter optimization
      3. Predictions
    5. Summary
    6. References
  19. Chapter 11: Distributed and GPU-Based Learning with LightGBM
    1. Technical requirements
    2. Distributed learning with LightGBM and Dask
    3. GPU training for LightGBM
      1. Setting up LightGBM for the GPU
      2. Running LightGBM on the GPU
    4. Summary
    5. References
  20. Index
    1. Why subscribe?
  21. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Machine Learning with LightGBM and Python
  • Author(s): Andrich van Wyk
  • Release date: September 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781800564749