Introduction to Statistical and Machine Learning Methods for Data Science

Book description

Boost your understanding of data science techniques to solve real-world problems

Data science is an exciting, interdisciplinary field that extracts insights from data to solve business problems. This book introduces common data science techniques and methods and shows you how to apply them in real-world case studies. From data preparation and exploration to model assessment and deployment, this book describes every stage of the analytics life cycle, including a comprehensive overview of unsupervised and supervised machine learning techniques. The book guides you through the necessary steps to pick the best techniques and models and then implement those models to successfully address the original business need.

No software is shown in the book, and mathematical details are kept to a minimum. This allows you to develop an understanding of the fundamentals of data science, no matter what background or experience level you have.

Table of contents

  1. About This Book
  2. About These Authors
  3. Acknowledgments
  4. Foreword
  5. Chapter 1: Introduction to Data Science
    1. Chapter Overview
    2. Data Science
      1. Mathematics and Statistics
      2. Computer Science
      3. Domain Knowledge
      4. Communication and Visualization
      5. Hard and Soft Skills
    3. Data Science Applications
    4. Data Science Lifecycle and the Maturity Framework
      1. Understand the Question
      2. Collect the Data
      3. Explore the Data
      4. Model the Data
      5. Provide an Answer
    5. Advanced Analytics in Data Science
    6. Data Science Practical Examples
      1. Customer Experience
      2. Revenue Optimization
      3. Network Analytics
      4. Data Monetization
    7. Summary
    8. Additional Reading
  6. Chapter 2: Data Exploration and Preparation
    1. Chapter Overview
    2. Introduction to Data Exploration
      1. Nonlinearity
      2. High Cardinality
      3. Unstructured Data
      4. Sparse Data
      5. Outliers
      6. Mis-scaled Input Variables
    3. Introduction to Data Preparation
      1. Representative Sampling
      2. Event-based Sampling
      3. Partitioning
      4. Imputation
      5. Replacement
      6. Transformation
      7. Feature Extraction
      8. Feature Selection
    4. Model Selection
      1. Model Generalization
      2. Bias–Variance Tradeoff
    5. Summary
  7. Chapter 3: Supervised Models – Statistical Approach
    1. Chapter Overview
    2. Classification and Estimation
    3. Linear Regression
      1. Use Case: Customer Value
    4. Logistic Regression
      1. Use Case: Collecting Predictive Model
    5. Decision Tree
      1. Use Case: Subscription Fraud
    6. Summary
  8. Chapter 4: Supervised Models – Machine Learning Approach
    1. Chapter Overview
    2. Supervised Machine Learning Models
    3. Ensemble of Trees
      1. Random Forest
      2. Gradient Boosting
      3. Use Case: Usage Fraud
    4. Neural Network
      1. Use Case: Bad Debt
    5. Summary
  9. Chapter 5: Advanced Topics in Supervised Models
    1. Chapter Overview
    2. Advanced Machine Learning Models and Methods
    3. Support Vector Machines
      1. Use Case: Fraud in Prepaid Subscribers
    4. Factorization Machines
      1. Use Case: Recommender Systems Based on Customer Ratings in Retail
    5. Ensemble Models
      1. Use Case Study: Churn Model for Telecommunications
    6. Two-stage Models
      1. Use Case: Anti-attrition
    7. Summary
    8. Additional Reading
  10. Chapter 6: Unsupervised Models—Structured Data
    1. Chapter Overview
    2. Clustering
    3. Hierarchical Clustering
      1. Use Case: Product Segmentation
    4. Centroid-based Clustering (k-means Clustering)
      1. Use Case: Customer Segmentation
    5. Self-organizing Maps
      1. Use Case Study: Insolvent Behavior
    6. Cluster Evaluation
      1. Cluster Profiling
      2. Additional Topics
    7. Summary
    8. Additional Reading
  11. Chapter 7: Unsupervised Models—Semi Structured Data
    1. Chapter Overview
    2. Association Rules Analysis
      1. Market Basket Analysis
      2. Confidence and Support Measures
      3. Use Case: Product Bundle Example
      4. Expected Confidence and Lift Measures
      5. Association Rules Analysis Evaluation
      6. Use Case: Product Acquisition
    3. Sequence Analysis
      1. Use Case: Next Best Offer
    4. Link Analysis
      1. Use Case: Product Relationships
    5. Path Analysis
      1. Use Case Study: Online Experience
    6. Text Analytics
      1. Use Case Study: Call Center Categorization
    7. Summary
    8. Additional Reading
  12. Chapter 8: Advanced Topics in Unsupervised Models
    1. Chapter Overview
    2. Network Analysis
      1. Network Subgraphs
      2. Network Metrics
      3. Use Case: Social Network Analysis to Reduce Churn in Telecommunications
    3. Network Optimization
      1. Network Algorithms
      2. Use Case: Smart Cities – Improving Commuting Routes
    4. Summary
  13. Chapter 9: Model Assessment and Model Deployment
    1. Chapter Overview
    2. Methods to Evaluate Model Performance
      1. Speed of Training
      2. Speed of Scoring
      3. Business Knowledge
      4. Fit Statistics
      5. Data Splitting
      6. K-fold Cross-validation
      7. Goodness-of-fit Statistics
      8. Confusion Matrix
      9. ROC Curve
      10. Model Evaluation
    3. Model Deployment
      1. Challenger Models
      2. Monitoring
    4. Model Operationalization
    5. Summary

Product information

  • Title: Introduction to Statistical and Machine Learning Methods for Data Science
  • Author(s): Carlos Reis Pinheiro, Mike Patetta
  • Release date: August 2021
  • Publisher(s): SAS Institute
  • ISBN: 9781953329622