Book description
Learn the ropes of supervised machine learning with R by studying popular real-world use cases, and understand how it drives object detection in driverless cars, customer churn, and loan default prediction.
Key Features
- Study supervised learning algorithms by using real-world datasets
- Fine tune optimal parameters with hyperparameter optimization
- Select the best algorithm using the model evaluation framework
Book Description
R provides excellent visualization features that are essential for exploring data before using it in automated learning.
Applied Supervised Learning with R helps you cover the complete process of employing R to develop applications using supervised machine learning algorithms for your business needs. The book starts by helping you develop your analytical thinking to create a problem statement using business inputs and domain research. You will then learn different evaluation metrics that compare various algorithms, and later progress to using these metrics to select the best algorithm for your problem. After finalizing the algorithm you want to use, you will study the hyperparameter optimization technique to fine-tune your set of optimal parameters. The book demonstrates how you can add different regularization terms to avoid overfitting your model.
By the end of this book, you will have the advanced skills you need for modeling a supervised machine learning algorithm that precisely fulfills your business needs.
What you will learn
- Develop analytical thinking to precisely identify a business problem
- Wrangle data with dplyr, tidyr, and reshape2
- Visualize data with ggplot2
- Validate your supervised machine learning model using k-fold
- Optimize hyperparameters with grid and random search, and Bayesian optimization
- Deploy your model on Amazon Web Services (AWS) Lambda with plumber
- Improve your model's performance with feature selection and dimensionality reduction
Who this book is for
This book is specially designed for beginner and intermediate-level data analysts, data scientists, and data engineers who want to explore different methods of supervised machine learning and its use cases. Some background in statistics, probability, calculus, linear algebra, and programming will help you thoroughly understand and follow the concepts covered in this book.
Table of contents
- Preface
- Chapter 1:
-
R for Advanced Analytics
- Introduction
- Working with Real-World Datasets
- Reading Data from Various Data Formats
- Write R Markdown Files for Code Reproducibility
- Data Structures in R
- DataFrame
- Data Processing and Transformation
- The Apply Family of Functions
-
Useful Packages
- The dplyr Package
- Exercise 15: Implementing the dplyr Package
- The tidyr Package
- Exercise 16: Implementing the tidyr Package
- Activity 3: Create a DataFrame with Five Summary Statistics for All Numeric Variables from Bank Data Using dplyr and tidyr
- The plyr Package
- Exercise 17: Exploring the plyr Package
- The caret Package
- Data Visualization
- Line Charts
- Histogram
- Boxplot
- Summary
- Chapter 2:
-
Exploratory Analysis of Data
- Introduction
- Defining the Problem Statement
- Understanding the Science Behind EDA
- Exploratory Data Analysis
-
Univariate Analysis
- Exploring Numeric/Continuous Features
- Exercise 19: Visualizing Data Using a Box Plot
- Exercise 20: Visualizing Data Using a Histogram
- Exercise 21: Visualizing Data Using a Density Plot
- Exercise 22: Visualizing Multiple Variables Using a Histogram
- Activity 4: Plotting Multiple Density Plots and Boxplots
- Exercise 23: Plotting a Histogram for the nr.employed, euribor3m, cons.conf.idx, and duration Variables
-
Exploring Categorical Features
- Exercise 24: Exploring Categorical Features
- Exercise 25: Exploring Categorical Features Using a Bar Chart
- Exercise 26: Exploring Categorical Features using Pie Chart
- Exercise 27: Automate Plotting Categorical Variables
- Exercise 28: Automate Plotting for the Remaining Categorical Variables
- Exercise 29: Exploring the Last Remaining Categorical Variable and the Target Variable
- Bivariate Analysis
- Studying the Relationship between Two Numeric Variables
- Studying the Relationship between a Categorical and a Numeric Variable
- Studying the Relationship Between Two Categorical Variables
- Multivariate Analysis
- Validating Insights Using Statistical Tests
- Categorical Dependent and Numeric/Continuous Independent Variables
- Categorical Dependent and Categorical Independent Variables
- Summary
- Chapter 3:
-
Introduction to Supervised Learning
- Introduction
- Summary of the Beijing PM2.5 Dataset
- Regression and Classification Problems
- Machine Learning Workflow
- Regression
-
Exploratory Data Analysis (EDA)
- Exercise 42: Exploring the Time Series Views of PM2.5, DEWP, TEMP, and PRES variables of the Beijing PM2.5 Dataset
- Exercise 43: Undertaking Correlation Analysis
- Exercise 44: Drawing a Scatterplot to Explore the Relationship between PM2.5 Levels and Other Factors
- Activity 5: Draw a Scatterplot between PRES and PM2.5 Split by Months
- Model Building
- Exercise 45: Exploring Simple and Multiple Regression Models
- Model Interpretation
- Classification
-
Evaluation Metrics
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- R-squared
- Adjusted R-square
- Mean Reciprocal Rank (MRR)
- Exercise 47: Finding Evaluation Metrics
- Confusion Matrix-Based Metrics
- Accuracy
- Sensitivity
- Specificity
- F1 Score
- Exercise 48: Working with Model Evaluation on Training Data
- Receiver Operating Characteristic (ROC) Curve
- Exercise 49: Creating an ROC Curve
- Summary
- Chapter 4:
-
Regression
- Introduction
- Linear Regression
- Model Diagnostics
- Residual versus Fitted Plot
- Normal Q-Q Plot
- Scale-Location Plot
- Residual versus Leverage
- Improving the Model
- Quantile Regression
- Polynomial Regression
- Ridge Regression
- LASSO Regression
- Elastic Net Regression
- Poisson Regression
- Cox Proportional-Hazards Regression Model
- NCCTG Lung Cancer Data
- Summary
- Chapter 5:
-
Classification
- Introduction
-
Getting Started with the Use Case
- Some Background on the Use Case
- Defining the Problem Statement
- Data Gathering
- Exercise 63: Exploring Data for the Use Case
- Exercise 64: Calculating the Null Value Percentage in All Columns
- Exercise 65: Removing Null Values from the Dataset
- Exercise 66: Engineer Time-Based Features from the Date Variable
- Exercise 67: Exploring the Location Frequency
- Exercise 68: Engineering the New Location with Reduced Levels
- Classification Techniques for Supervised Learning
- Logistic Regression
- How Does Logistic Regression Work?
- Evaluating Classification Models
- What Metric Should You Choose?
- Evaluating Logistic Regression
-
Decision Trees
- How Do Decision Trees Work?
- Exercise 72: Create a Decision Tree Model in R
- Activity 9: Create a Decision Tree Model with Additional Control Parameters
- Ensemble Modelling
- Random Forest
- Why Are Ensemble Models Used?
- Bagging – Predecessor to Random Forest
- How Does Random Forest Work?
- Exercise 73: Building a Random Forest Model in R
- Activity 10: Build a Random Forest Model with a Greater Number of Trees
- XGBoost
- Deep Neural Networks
- Choosing the Right Model for Your Use Case
- Summary
- Chapter 6:
- Feature Selection and Dimensionality Reduction
- Chapter 7:
- Model Improvements
- Chapter 8:
- Model Deployment
- Chapter 9:
-
Capstone Project - Based on Research Papers
- Introduction
- Exploring Research Work
- The mlr Package
- Problem Design from the Research Paper
- Features in Scene Dataset
- Implementing Multilabel Classifier Using the mlr and OpenML Packages
-
Constructing a Learner
- Adaptation Methods
- Transformation Methods
- Binary Relevance Method
- Classifier Chains Method
- Nested Stacking
- Dependent Binary Relevance Method
- Stacking
- Exercise 103: Generating Decision Tree Model Using the classif.rpart Method
- Train the Model
- Exercise 104: Train the Model
- Predicting the Output
- Performance of the Model
- Resampling the Data
- Binary Performance for Each Label
- Benchmarking Model
- Conducting Benchmark Experiments
- Exercise 105: Exploring How to Conduct a Benchmarking on Various Learners
- Accessing Benchmark Results
- Learner Performances
- Predictions
- Summary
-
Appendix
- Chapter 1: R for Advanced Analytics
- Chapter 2: Exploratory Analysis of Data
- Chapter 3: Introduction to Supervised Learning
- Chapter 4: Regression
- Chapter 5: Classification
- Chapter 6: Feature Selection and Dimensionality Reduction
- Chapter 7: Model Improvements
- Chapter 8: Model Deployment
- Chapter 9: Capstone Project - Based on Research Papers
Product information
- Title: Applied Supervised Learning with R
- Author(s):
- Release date: May 2019
- Publisher(s): Packt Publishing
- ISBN: 9781838556334
You might also like
book
Hands-On Deep Learning with R
Explore and implement deep learning to solve various real-world problems using modern R libraries such as …
book
Applied Unsupervised Learning with R
Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data. Key Features …
book
Advanced Machine Learning with R
Master an array of machine learning techniques with real-world projects that interface TensorFlow with R, H2O, …
book
Practical R 4: Applying R to Data Manipulation, Processing and Integration
Get started with an accelerated introduction to the R ecosystem, programming language, and tools including R …