Book description
Master the art of building analytical models using R
About This Book
- Load, wrangle, and analyze your data using the world's most powerful statistical programming language
- Build and customize publication-quality visualizations of powerful and stunning R graphs
- Develop key skills and techniques with R to create and customize data mining algorithms
- Use R to optimize your trading strategy and build up your own risk management system
- Discover how to build machine learning algorithms, prepare data, and dig deep into data prediction techniques with R
Who This Book Is For
This course is for data scientist or quantitative analyst who are looking at learning R and take advantage of its powerful analytical design framework. It's a seamless journey in becoming a full-stack R developer.
What You Will Learn
- Describe and visualize the behavior of data and relationships between data
- Gain a thorough understanding of statistical reasoning and sampling
- Handle missing data gracefully using multiple imputation
- Create diverse types of bar charts using the default R functions
- Familiarize yourself with algorithms written in R for spatial data mining, text mining, and so on
- Understand relationships between market factors and their impact on your portfolio
- Harness the power of R to build machine learning algorithms with real-world data science applications
- Learn specialized machine learning techniques for text mining, big data, and more
In Detail
The R learning path created for you has five connected modules, which are a mini-course in their own right. As you complete each one, you'll have gained key skills and be ready for the material in the next module!
This course begins by looking at the Data Analysis with R module. This will help you navigate the R environment. You'll gain a thorough understanding of statistical reasoning and sampling. Finally, you'll be able to put best practices into effect to make your job easier and facilitate reproducibility.
The second place to explore is R Graphs, which will help you leverage powerful default R graphics and utilize advanced graphics systems such as lattice and ggplot2, the grammar of graphics. You'll learn how to produce, customize, and publish advanced visualizations using this popular and powerful framework.
With the third module, Learning Data Mining with R, you will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs.
The Mastering R for Quantitative Finance module pragmatically introduces both the quantitative finance concepts and their modeling in R, enabling you to build a tailor-made trading system on your own. By the end of the module, you will be well-versed with various financial techniques using R and will be able to place good bets while making financial decisions.
Finally, we'll look at the Machine Learning with R module. With this module, you'll discover all the analytical tools you need to gain insights from complex data and learn how to choose the correct algorithm for your specific needs. You'll also learn to apply machine learning methods to deal with common tasks, including classification, prediction, forecasting, and so on.
Style and approach
Learn data analysis, data visualization techniques, data mining, and machine learning all using R and also learn to build models in quantitative finance using this powerful language.
Table of contents
-
R: Data Analysis and Visualization
- Table of Contents
- R: Data Analysis and Visualization
-
I. Module 1: Data Analysis with R
- 1. RefresheR
- 2. The Shape of Data
- 3. Describing Relationships
- 4. Probability
- 5. Using Data to Reason About the World
- 6. Testing Hypotheses
- 7. Bayesian Methods
- 8. Predicting Continuous Variables
- 9. Predicting Categorical Variables
- 10. Sources of Data
- 11. Dealing with Messy Data
- 12. Dealing with Large Data
- 13. Reproducibility and Best Practices
-
II. Module 2: R Graphs
- 1. R Graphics
-
2. Basic Graph Functions
- Introduction
- Creating basic scatter plots
- Creating line graphs
- Creating bar charts
- Creating histograms and density plots
- Creating box plots
- Adjusting x and y axes' limits
- Creating heat maps
- Creating pairs plots
- Creating multiple plot matrix layouts
- Adding and formatting legends
- Creating graphs with maps
- Saving and exporting graphs
-
3. Beyond the Basics – Adjusting Key Parameters
- Introduction
- Setting colors of points, lines, and bars
- Setting plot background colors
- Setting colors for text elements – axis annotations, labels, plot titles, and legends
- Choosing color combinations and palettes
- Setting fonts for annotations and titles
- Choosing plotting point symbol styles and sizes
- Choosing line styles and width
- Choosing box styles
- Adjusting axis annotations and tick marks
- Formatting log axes
- Setting graph margins and dimensions
-
4. Creating Scatter Plots
- Introduction
- Grouping data points within a scatter plot
- Highlighting grouped data points by size and symbol type
- Labeling data points
- Correlation matrix using pairs plots
- Adding error bars
- Using jitter to distinguish closely packed data points
- Adding linear model lines
- Adding nonlinear model curves
- Adding nonparametric model curves with lowess
- Creating three-dimensional scatter plots
- Creating Quantile-Quantile plots
- Displaying the data density on axes
- Creating scatter plots with a smoothed density representation
-
5. Creating Line Graphs and Time Series Charts
- Introduction
- Adding customized legends for multiple-line graphs
- Using margin labels instead of legends for multiple-line graphs
- Adding horizontal and vertical grid lines
- Adding marker lines at specific x and y values using abline
- Creating sparklines
- Plotting functions of a variable in a dataset
- Formatting time series data for plotting
- Plotting the date or time variable on the x axis
- Annotating axis labels in different human-readable time formats
- Adding vertical markers to indicate specific time events
- Plotting data with varying time-averaging periods
- Creating stock charts
-
6. Creating Bar, Dot, and Pie Charts
- Introduction
- Creating bar charts with more than one factor variable
- Creating stacked bar charts
- Adjusting the orientation of bars – horizontal and vertical
- Adjusting bar widths, spacing, colors, and borders
- Displaying values on top of or next to the bars
- Placing labels inside bars
- Creating bar charts with vertical error bars
- Modifying dot charts by grouping variables
- Making better, readable pie charts with clockwise-ordered slices
- Labeling a pie chart with percentage values for each slice
- Adding a legend to a pie chart
-
7. Creating Histograms
- Introduction
- Visualizing distributions as count frequencies or probability densities
- Setting the bin size and the number of breaks
- Adjusting histogram styles – bar colors, borders, and axes
- Overlaying a density line over a histogram
- Multiple histograms along the diagonal of a pairs plot
- Histograms in the margins of line and scatter plots
-
8. Box and Whisker Plots
- Introduction
- Creating box plots with narrow boxes for a small number of variables
- Grouping over a variable
- Varying box widths by the number of observations
- Creating box plots with notches
- Including or excluding outliers
- Creating horizontal box plots
- Changing the box styling
- Adjusting the extent of plot whiskers outside the box
- Showing the number of observations
- Splitting a variable at arbitrary values into subsets
- 9. Creating Heat Maps and Contour Plots
- 10. Creating Maps
-
11. Data Visualization Using Lattice
- Introduction
- Creating bar charts
- Creating stacked bar charts
- Creating bar charts to visualize cross-tabulation
- Creating a conditional histogram
- Visualizing distributions through a kernel-density plot
- Creating a normal Q-Q plot
- Visualizing an empirical Cumulative Distribution Function
- Creating a boxplot
- Creating a conditional scatter plot
- 12. Data Visualization Using ggplot2
- 13. Inspecting Large Datasets
- 14. Three-dimensional Visualizations
-
15. Finalizing Graphs for Publications and Presentations
- Introduction
- Exporting graphs in high-resolution image formats – PNG, JPEG, BMP, and TIFF
- Exporting graphs in vector formats – SVG, PDF, and PS
- Adding mathematical and scientific notations (typesetting)
- Adding text descriptions to graphs
- Using graph templates
- Choosing font families and styles under Windows, Mac OS X, and Linux
- Choosing fonts for PostScripts and PDFs
-
III. Module 3: Learning Data Mining with R
- 1. Warming Up
-
2. Mining Frequent Patterns, Associations, and Correlations
- An overview of associations and patterns
- Market basket analysis
- Hybrid association rules mining
- Mining sequence dataset
- The R implementation
- High-performance algorithms
-
3. Classification
- Classification
- Generic decision tree induction
- High-value credit card customers classification using ID3
- Web spam detection using C4.5
- Web key resource page judgment using CART
- Trojan traffic identification method and Bayes classification
- Identify spam e-mail and Naïve Bayes classification
- Rule-based classification of player types in computer games and rule-based classification
- 4. Advanced Classification
- 5. Cluster Analysis
-
6. Advanced Cluster Analysis
- Customer categorization analysis of e-commerce and DBSCAN
- Clustering web pages and OPTICS
- Visitor analysis in the browser cache and DENCLUE
- Recommendation system and STING
- Web sentiment analysis and CLIQUE
- Opinion mining and WAVE clustering
- User search intent and the EM algorithm
- Customer purchase data analysis and clustering high-dimensional data
- SNS and clustering graph and network data
-
7. Outlier Detection
- Credit card fraud detection and statistical methods
- Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
- Intrusion detection and density-based methods
- Intrusion detection and clustering-based methods
- Monitoring the performance of the web server and classification-based methods
- Detecting novelty in text, topic detection, and mining contextual outliers
- Collective outliers on spatial data
- Outlier detection in high-dimensional data
- 8. Mining Stream, Time-series, and Sequence Data
- 9. Graph Mining and Network Analysis
- 10. Mining Text and Web Data
-
IV. Module 4: Mastering R for Quantitative Finance
- 1. Time Series Analysis
- 2. Factor Models
- 3. Forecasting Volume
- 4. Big Data – Advanced Analytics
- 5. FX Derivatives
- 6. Interest Rate Derivatives and Models
-
7. Exotic Options
- A general pricing approach
- The role of dynamic hedging
- How R can help a lot
- A glance beyond vanillas
- Greeks – the link back to the vanilla world
- Pricing the Double-no-touch option
- Another way to price the Double-no-touch option
- The life of a Double-no-touch option – a simulation
- Exotic options embedded in structured products
- References
- 8. Optimal Hedging
- 9. Fundamental Analysis
- 10. Technical Analysis, Neural Networks, and Logoptimal Portfolios
- 11. Asset and Liability Management
- 12. Capital Adequacy
- 13. Systemic Risks
-
V. Module 5: Machine Learning with R module
- 1. Introducing Machine Learning
-
2. Managing and Understanding Data
- R data structures
- Managing data with R
-
Exploring and understanding data
- Exploring the structure of data
-
Exploring numeric variables
- Measuring the central tendency – mean and median
- Measuring spread – quartiles and the five-number summary
- Visualizing numeric variables – boxplots
- Visualizing numeric variables – histograms
- Understanding numeric data – uniform and normal distributions
- Measuring spread – variance and standard deviation
- Exploring categorical variables
- Exploring relationships between variables
- 3. Lazy Learning – Classification Using Nearest Neighbors
- 4. Probabilistic Learning – Classification Using Naive Bayes
-
5. Divide and Conquer – Classification Using Decision Trees and Rules
- Understanding decision trees
- Example – identifying risky bank loans using C5.0 decision trees
- Understanding classification rules
- Example – identifying poisonous mushrooms with rule learners
-
6. Forecasting Numeric Data – Regression Methods
- Understanding regression
- Example – predicting medical expenses using linear regression
- Understanding regression trees and model trees
- Example – estimating the quality of wines with regression trees and model trees
- 7. Black Box Methods – Neural Networks and Support Vector Machines
- 8. Finding Patterns – Market Basket Analysis Using Association Rules
- 9. Finding Groups of Data – Clustering with k-means
- 10. Evaluating Model Performance
- 11. Improving Model Performance
- 12. Specialized Machine Learning Topics
-
A. Reflect and Test Yourself Answers
-
Module 1: Data Analysis with R
- Chapter 1: RefresheR
- Chapter 2: The Shape of Data
- Chapter 3: Describing Relationships
- Chapter 4: Probability
- Chapter 5: Using Data to Reason About the World
- Chapter 6: Testing Hypotheses
- Chapter 7: Bayesian Methods
- Chapter 8: Predicting Continuous Variables
- Chapter 9: Predicting Categorical Variables
- Chapter 10: Sources of Data
- Chapter 11: Dealing with Messy Data
- Chapter 12: Dealing with Large Data
-
Module 2: R Graphs
- Chapter 1: R Graphics
- Chapter 2: Basic Graph Functions
- Chapter 3: Beyond the Basics – Adjusting Key Parameters
- Chapter 4: Creating Scatter Plots
- Chapter 5: Creating Line Graphs and Time Series Charts
- Chapter 6: Creating Bar, Dot, and Pie Charts
- Chapter 7: Creating Histograms
- Chapter 8: Box and Whisker Plots
- Chapter 9: Creating Heat Maps and Contour Plots
- Module 4: Mastering R for Quantitative Finance
-
Module 5: Machine Learning with R
- Chapter 1: Introducing Machine Learning
- Chapter 2: Managing and Understanding Data
- Chapter 3: Lazy Learning – Classification Using Nearest Neighbors
- Chapter 4: Probabilistic Learning – Classification Using Naive Bayes
- Chapter 5: Divide and Conquer – Classification Using Decision Trees and Rules
- Chapter 6: Forecasting Numeric Data – Regression Methods
- Chapter 7: Black Box Methods – Neural Networks and Support Vector Machines
- Chapter 8: Finding Patterns – Market Basket Analysis Using Association Rules
-
Module 1: Data Analysis with R
- B. Bibliography
- Index
Product information
- Title: R: Data Analysis and Visualization
- Author(s):
- Release date: June 2016
- Publisher(s): Packt Publishing
- ISBN: 9781786463500
You might also like
book
R Data Analysis Projects
Get valuable insights from your data by building data analysis systems from scratch with R. About …
book
R: Recipes for Analysis, Visualization and Machine Learning
Get savvy with R language and actualize projects aimed at analysis, visualization and machine learning About …
book
Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization
Carry out a variety of advanced statistical analyses including generalized additive models, mixed effects models, multiple …
book
R Data Visualization Recipes
Translate your data into info-graphics using popular packages in R About This Book Use R's popular …