Book description
Explore over 110 recipes to analyze data and build predictive models with simple and easy-to-use R code
About This Book
- Apply R to simplify predictive modeling with short and simple code
- Use machine learning to solve problems ranging from small to big data
- Build a training and testing dataset, applying different classification methods.
Who This Book Is For
This book is for data science professionals, data analysts, or people who have used R for data analysis and machine learning who now wish to become the go-to person for machine learning with R. Those who wish to improve the efficiency of their machine learning models and need to work with different kinds of data set will find this book very insightful.
What You Will Learn
- Create and inspect transaction datasets and perform association analysis with the Apriori algorithm
- Visualize patterns and associations using a range of graphs and find frequent item-sets using the Eclat algorithm
- Compare differences between each regression method to discover how they solve problems
- Detect and impute missing values in air quality data
- Predict possible churn users with the classification approach
- Plot the autocorrelation function with time series analysis
- Use the Cox proportional hazards model for survival analysis
- Implement the clustering method to segment customer data
- Compress images with the dimension reduction method
- Incorporate R and Hadoop to solve machine learning problems on big data
In Detail
Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You'll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.
Style and approach
This is an easy-to-follow guide packed with hands-on examples of machine learning tasks. Each topic includes step-by-step instructions on tackling difficulties faced when applying R to machine learning.
Table of contents
- Preface
-
Practical Machine Learning with R
- Introduction
- Downloading and installing R
- Downloading and installing RStudio
- Installing and loading packages
- Understanding of basic data structures
- Basic commands for subsetting
- Reading and writing data
- Manipulating data
- Applying basic statistics
- Visualizing data
- Getting a dataset for machine learning
- Data Exploration with Air Quality Datasets
- Analyzing Time Series Data
-
R and Statistics
- Introduction
- Understanding data sampling in R
- Operating a probability distribution in R
- Working with univariate descriptive statistics in R
- Performing correlations and multivariate analysis
- Conducting an exact binomial test
- Performing a student's t-test
- Performing the Kolmogorov-Smirnov test
- Understanding the Wilcoxon Rank Sum and Signed Rank test
- Working with Pearson's Chi-squared test
- Conducting a one-way ANOVA
- Performing a two-way ANOVA
-
Understanding Regression Analysis
- Introduction
- Different types of regression
- Fitting a linear regression model with lm
- Summarizing linear model fits
- Using linear regression to predict unknown values
- Generating a diagnostic plot of a fitted model
- Fitting multiple regression
- Summarizing multiple regression
- Using multiple regression to predict unknown values
- Fitting a polynomial regression model with lm
- Fitting a robust linear regression model with rlm
- Studying a case of linear regression on SLID data
- Applying the Gaussian model for generalized linear regression
- Applying the Poisson model for generalized linear regression
- Applying the Binomial model for generalized linear regression
- Fitting a generalized additive model to data
- Visualizing a generalized additive model
- Diagnosing a generalized additive model
- Survival Analysis
-
Classification 1 - Tree, Lazy, and Probabilistic
- Introduction
- Preparing the training and testing datasets
- Building a classification model with recursive partitioning trees
- Visualizing a recursive partitioning tree
- Measuring the prediction performance of a recursive partitioning tree
- Pruning a recursive partitioning tree
- Handling missing data and split and surrogate variables
- Building a classification model with a conditional inference tree
- Control parameters in conditional inference trees
- Visualizing a conditional inference tree
- Measuring the prediction performance of a conditional inference tree
- Classifying data with the k-nearest neighbor classifier
- Classifying data with logistic regression
- Classifying data with the Naïve Bayes classifier
-
Classification 2 - Neural Network and SVM
- Introduction
- Classifying data with a support vector machine
- Choosing the cost of a support vector machine
- Visualizing an SVM fit
- Predicting labels based on a model trained by a support vector machine
- Tuning a support vector machine
- The basics of neural network
- Training a neural network with neuralnet
- Visualizing a neural network trained by neuralnet
- Predicting labels based on a model trained by neuralnet
- Training a neural network with nnet
- Predicting labels based on a model trained by nnet
-
Model Evaluation
- Introduction
- Estimating model performance with k-fold cross-validation
- Estimating model performance with Leave One Out Cross Validation
- Performing cross-validation with the e1071 package
- Performing cross-validation with the caret package
- Ranking the variable importance with the caret package
- Ranking the variable importance with the rminer package
- Finding highly correlated features with the caret package
- Selecting features using the caret package
- Measuring the performance of the regression model
- Measuring prediction performance with a confusion matrix
- Measuring prediction performance using ROCR
- Comparing an ROC curve using the caret package
- Measuring performance differences between models with the caret package
-
Ensemble Learning
- Introduction
- Using the Super Learner algorithm
- Using ensemble to train and test
- Classifying data with the bagging method
- Performing cross-validation with the bagging method
- Classifying data with the boosting method
- Performing cross-validation with the boosting method
- Classifying data with gradient boosting
- Calculating the margins of a classifier
- Calculating the error evolution of the ensemble method
- Classifying data with random forest
- Estimating the prediction errors of different classifiers
-
Clustering
- Introduction
- Clustering data with hierarchical clustering
- Cutting trees into clusters
- Clustering data with the k-means method
- Drawing a bivariate cluster plot
- Comparing clustering methods
- Extracting silhouette information from clustering
- Obtaining the optimum number of clusters for k-means
- Clustering data with the density-based method
- Clustering data with the model-based method
- Visualizing a dissimilarity matrix
- Validating clusters externally
-
Association Analysis and Sequence Mining
- Introduction
- Transforming data into transactions
- Displaying transactions and associations
- Mining associations with the Apriori rule
- Pruning redundant rules
- Visualizing association rules
- Mining frequent itemsets with Eclat
- Creating transactions with temporal information
- Mining frequent sequential patterns with cSPADE
- Using the TraMineR package for sequence analysis
- Visualizing sequence, Chronogram, and Traversal Statistics
-
Dimension Reduction
- Introduction
- Why to reduce the dimension?
- Performing feature selection with FSelector
- Performing dimension reduction with PCA
- Determining the number of principal components using the scree test
- Determining the number of principal components using the Kaiser method
- Visualizing multivariate data using biplot
- Performing dimension reduction with MDS
- Reducing dimensions with SVD
- Compressing images with SVD
- Performing nonlinear dimension reduction with ISOMAP
- Performing nonlinear dimension reduction with Local Linear Embedding
-
Big Data Analysis (R and Hadoop)
- Introduction
- Preparing the RHadoop environment
- Installing rmr2
- Installing rhdfs
- Operating HDFS with rhdfs
- Implementing a word count problem with RHadoop
- Comparing the performance between an R MapReduce program and a standard R program
- Testing and debugging the rmr2 program
- Installing plyrmr
- Manipulating data with plyrmr
- Conducting machine learning with RHadoop
- Configuring RHadoop clusters on Amazon EMR
Product information
- Title: Machine Learning with R Cookbook - Second Edition
- Author(s):
- Release date: October 2017
- Publisher(s): Packt Publishing
- ISBN: 9781787284395
You might also like
book
Machine Learning with R - Second Edition
Discover how to build machine learning algorithms, prepare data, and dig deep into data prediction techniques …
book
Mastering Machine Learning with R - Second Edition
Master machine learning techniques with R to deliver insights in complex projects About This Book Understand …
book
Machine Learning with R
R gives you access to the cutting-edge software you need to prepare data for machine learning. …
book
Machine Learning Using R
Examine the latest technological advancements in building a scalable machine learning model with Big Data using …