Machine Learning with Spark and Python, 2nd Edition

Book description

Machine Learning with Spark and Python Essential Techniques for Predictive Analytics, Second Edition simplifies ML for practical uses by focusing on two key algorithms. This new second edition improves with the addition of Spark—a ML framework from the Apache foundation. By implementing Spark, machine learning students can easily process much large data sets and call the spark algorithms using ordinary Python code.
 
Machine Learning with Spark and Python focuses on two algorithm families (linear methods and ensemble methods) that effectively predict outcomes. This type of problem covers many use cases such as what ad to place on a web page, predicting prices in securities markets, or detecting credit card fraud. The focus on two families gives enough room for full descriptions of the mechanisms at work in the algorithms. Then the code examples serve to illustrate the workings of the machinery with specific hackable code.

Table of contents

  1. Cover
  2. Introduction
    1. Who This Book Is For
    2. What This Book Covers
    3. What Has Changed Since the First Edition
    4. How This Book Is Structured
    5. What You Need to Use This Book
    6. Reader Support for This Book
  3. CHAPTER 1: The Two Essential Algorithms for Making Predictions
    1. Why Are These Two Algorithms So Useful?
    2. What Are Penalized Regression Methods?
    3. What Are Ensemble Methods?
    4. How to Decide Which Algorithm to Use
    5. The Process Steps for Building a Predictive Model
    6. Chapter Contents and Dependencies
    7. Summary
    8. References
  4. CHAPTER 2: Understand the Problem by Understanding the Data
    1. The Anatomy of a New Problem
    2. Classification Problems: Detecting Unexploded Mines Using Sonar
    3. Visualizing Properties of the Rocks Versus Mines Data Set
    4. Real-Valued Predictions with Factor Variables: How Old Is Your Abalone?
    5. Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes
    6. Multiclass Classification Problem: What Type of Glass Is That?
    7. Using PySpark to Understand Large Data Sets
    8. Summary
    9. Reference
  5. CHAPTER 3: Predictive Model Building: Balancing Performance, Complexity, and Big Data
    1. The Basic Problem: Understanding Function Approximation
    2. Factors Driving Algorithm Choices and Performance—Complexity and Data
    3. Measuring the Performance of Predictive Models
    4. Achieving Harmony between Model and Data
    5. Using PySpark for Training Penalized Regression Models on Extremely Large Data Sets
    6. Summary
    7. Reference
  6. CHAPTER 4: Penalized Linear Regression
    1. Why Penalized Linear Regression Methods Are So Useful
    2. Penalized Linear Regression: Regulating Linear Regression for Optimum Performance
    3. Solving the Penalized Linear Regression Problem
    4. Extension of Linear Regression to Classification Problems
    5. Summary
    6. References
  7. CHAPTER 5: Building Predictive Models Using Penalized Linear Methods
    1. Python Packages for Penalized Linear Regression
    2. Multivariable Regression: Predicting Wine Taste
    3. Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines
    4. Multiclass Classification: Classifying Crime Scene Glass Samples
    5. Linear Regression and Classification Using PySpark
    6. Using PySpark to Predict Wine Taste
    7. Logistic Regression with PySpark: Rocks Versus Mines
    8. Incorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings
    9. Multiclass Logistic Regression with Meta Parameter Optimization
    10. Summary
    11. References
  8. CHAPTER 6: Ensemble Methods
    1. Binary Decision Trees
    2. Bootstrap Aggregation: “Bagging”
    3. Gradient Boosting
    4. Random Forests
    5. Summary
    6. References
  9. CHAPTER 7: Building Ensemble Models with Python
    1. Solving Regression Problems with Python Ensemble Packages
    2. Incorporating Non-Numeric Attributes in Python Ensemble Models
    3. Solving Binary Classification Problems with Python Ensemble Methods
    4. Solving Multiclass Classification Problems with Python Ensemble Methods
    5. Solving Regression Problems with PySpark Ensemble Packages
    6. Summary
    7. References
  10. Index
  11. End User License Agreement

Product information

  • Title: Machine Learning with Spark and Python, 2nd Edition
  • Author(s): Michael Bowles
  • Release date: November 2019
  • Publisher(s): Wiley
  • ISBN: 9781119561934