Book description
Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this expanded second edition. The R language provides everything you need to do statistical work, but its structure can be difficult to master. These task-oriented recipes make you productive with R immediately. Solutions range from basic tasks to input and output, general statistics, graphics, and linear regression.
Each recipe addresses a specific problem and includes a discussion that explains the solution and provides insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an intermediate user, this book will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process.
- Create vectors, handle variables, and perform basic functions
- Simplify data input and output
- Tackle data structures such as matrices, lists, factors, and data frames
- Work with probability, probability distributions, and random variables
- Calculate statistics and confidence intervals and perform statistical tests
- Create a variety of graphic displays
- Build statistical models with linear regressions and analysis of variance (ANOVA)
- Explore advanced statistical techniques, such as finding clusters in your data
Publisher resources
Table of contents
- Welcome to the R Cookbook, 2nd Edition
-
1. Getting Started and Getting Help
- 1.1. Downloading and Installing R
- 1.2. Installing RStudio
- 1.3. Starting RStudio
- 1.4. Entering Commands
- 1.5. Exiting from RStudio
- 1.6. Interrupting R
- 1.7. Viewing the Supplied Documentation
- 1.8. Getting Help on a Function
- 1.9. Searching the Supplied Documentation
- 1.10. Getting Help on a Package
- 1.11. Searching the Web for Help
- 1.12. Finding Relevant Functions and Packages
- 1.13. Searching the Mailing Lists
- 1.14. Submitting Questions to Stack Overflow or Elsewhere in the Community
-
2. Some Basics
- 2.1. Printing Something to the Screen
- 2.2. Setting Variables
- 2.3. Listing Variables
- 2.4. Deleting Variables
- 2.5. Creating a Vector
- 2.6. Computing Basic Statistics
- 2.7. Creating Sequences
- 2.8. Comparing Vectors
- 2.9. Selecting Vector Elements
- 2.10. Performing Vector Arithmetic
- 2.11. Getting Operator Precedence Right
- 2.12. Typing Less and Accomplishing More
- 2.13. Creating a Pipeline of Function Calls
- 2.14. Avoiding Some Common Mistakes
-
3. Navigating the Software
- 3.1. Getting and Setting the Working Directory
- 3.2. Creating a New RStudio Project
- 3.3. Saving Your Workspace
- 3.4. Viewing Your Command History
- 3.5. Saving the Result of the Previous Command
- 3.6. Displaying Loaded Packages via the Search Path
- 3.7. Viewing the List of Installed Packages
- 3.8. Accessing the Functions in a Package
- 3.9. Accessing Built-in Datasets
- 3.10. Installing Packages from CRAN
- 3.11. Installing a Package from GitHub
- 3.12. Setting or Changing a Default CRAN Mirror
- 3.13. Running a Script
- 3.14. Running a Batch Script
- 3.15. Locating the R Home Directory
- 3.16. Customizing R Startup
- 3.17. Using R and RStudio in the Cloud
-
4. Input and Output
- 4.1. Entering Data from the Keyboard
- 4.2. Printing Fewer Digits (or More Digits)
- 4.3. Redirecting Output to a File
- 4.4. Listing Files
- 4.5. Dealing with “Cannot Open File” in Windows
- 4.6. Reading Fixed-Width Records
- 4.7. Reading Tabular Data Files
- 4.8. Reading from CSV Files
- 4.9. Writing to CSV Files
- 4.10. Reading Tabular or CSV Data from the Web
- 4.11. Reading Data from Excel
- 4.12. Writing a Data Frame to Excel
- 4.13. Reading Data from a SAS File
- 4.14. Reading Data from HTML Tables
- 4.15. Reading Files with a Complex Structure
- 4.16. Reading from MySQL Databases
- 4.17. Accessing a Database with dbplyr
- 4.18. Saving and Transporting Objects
-
5. Data Structures
- 5.1. Appending Data to a Vector
- 5.2. Inserting Data into a Vector
- 5.3. Understanding the Recycling Rule
- 5.4. Creating a Factor (Categorical Variable)
- 5.5. Combining Multiple Vectors into One Vector and a Factor
- 5.6. Creating a List
- 5.7. Selecting List Elements by Position
- 5.8. Selecting List Elements by Name
- 5.9. Building a Name/Value Association List
- 5.10. Removing an Element from a List
- 5.11. Flattening a List into a Vector
- 5.12. Removing NULL Elements from a List
- 5.13. Removing List Elements Using a Condition
- 5.14. Initializing a Matrix
- 5.15. Performing Matrix Operations
- 5.16. Giving Descriptive Names to the Rows and Columns of a Matrix
- 5.17. Selecting One Row or Column from a Matrix
- 5.18. Initializing a Data Frame from Column Data
- 5.19. Initializing a Data Frame from Row Data
- 5.20. Appending Rows to a Data Frame
- 5.21. Selecting Data Frame Columns by Position
- 5.22. Selecting Data Frame Columns by Name
- 5.23. Changing the Names of Data Frame Columns
- 5.24. Removing NAs from a Data Frame
- 5.25. Excluding Columns by Name
- 5.26. Combining Two Data Frames
- 5.27. Merging Data Frames by Common Column
- 5.28. Converting One Atomic Value into Another
- 5.29. Converting One Structured Data Type into Another
-
6. Data Transformations
- 6.1. Applying a Function to Each List Element
- 6.2. Applying a Function to Every Row of a Data Frame
- 6.3. Applying a Function to Every Row of a Matrix
- 6.4. Applying a Function to Every Column
- 6.5. Applying a Function to Parallel Vectors or Lists
- 6.6. Applying a Function to Groups of Data
- 6.7. Creating a New Column Based on Some Condition
-
7. Strings and Dates
- 7.1. Getting the Length of a String
- 7.2. Concatenating Strings
- 7.3. Extracting Substrings
- 7.4. Splitting a String According to a Delimiter
- 7.5. Replacing Substrings
- 7.6. Generating All Pairwise Combinations of Strings
- 7.7. Getting the Current Date
- 7.8. Converting a String into a Date
- 7.9. Converting a Date into a String
- 7.10. Converting Year, Month, and Day into a Date
- 7.11. Getting the Julian Date
- 7.12. Extracting the Parts of a Date
- 7.13. Creating a Sequence of Dates
-
8. Probability
- 8.1. Counting the Number of Combinations
- 8.2. Generating Combinations
- 8.3. Generating Random Numbers
- 8.4. Generating Reproducible Random Numbers
- 8.5. Generating a Random Sample
- 8.6. Generating Random Sequences
- 8.7. Randomly Permuting a Vector
- 8.8. Calculating Probabilities for Discrete Distributions
- 8.9. Calculating Probabilities for Continuous Distributions
- 8.10. Converting Probabilities to Quantiles
- 8.11. Plotting a Density Function
-
9. General Statistics
- 9.1. Summarizing Your Data
- 9.2. Calculating Relative Frequencies
- 9.3. Tabulating Factors and Creating Contingency Tables
- 9.4. Testing Categorical Variables for Independence
- 9.5. Calculating Quantiles (and Quartiles) of a Dataset
- 9.6. Inverting a Quantile
- 9.7. Converting Data to z-Scores
- 9.8. Testing the Mean of a Sample (t-Test)
- 9.9. Forming a Confidence Interval for a Mean
- 9.10. Forming a Confidence Interval for a Median
- 9.11. Testing a Sample Proportion
- 9.12. Forming a Confidence Interval for a Proportion
- 9.13. Testing for Normality
- 9.14. Testing for Runs
- 9.15. Comparing the Means of Two Samples
- 9.16. Comparing the Locations of Two Samples Nonparametrically
- 9.17. Testing a Correlation for Significance
- 9.18. Testing Groups for Equal Proportions
- 9.19. Performing Pairwise Comparisons Between Group Means
- 9.20. Testing Two Samples for the Same Distribution
-
10. Graphics
- 10.1. Creating a Scatter Plot
- 10.2. Adding a Title and Labels
- 10.3. Adding (or Removing) a Grid
- 10.4. Applying a Theme to a ggplot Figure
- 10.5. Creating a Scatter Plot of Multiple Groups
- 10.6. Adding (or Removing) a Legend
- 10.7. Plotting the Regression Line of a Scatter Plot
- 10.8. Plotting All Variables Against All Other Variables
- 10.9. Creating One Scatter Plot for Each Group
- 10.10. Creating a Bar Chart
- 10.11. Adding Confidence Intervals to a Bar Chart
- 10.12. Coloring a Bar Chart
- 10.13. Plotting a Line from x and y Points
- 10.14. Changing the Type, Width, or Color of a Line
- 10.15. Plotting Multiple Datasets
- 10.16. Adding Vertical or Horizontal Lines
- 10.17. Creating a Boxplot
- 10.18. Creating One Boxplot for Each Factor Level
- 10.19. Creating a Histogram
- 10.20. Adding a Density Estimate to a Histogram
- 10.21. Creating a Normal Quantile–Quantile Plot
- 10.22. Creating Other Quantile–Quantile Plots
- 10.23. Plotting a Variable in Multiple Colors
- 10.24. Graphing a Function
- 10.25. Displaying Several Figures on One Page
- 10.26. Writing Your Plot to a File
-
11. Linear Regression and ANOVA
- 11.1. Performing Simple Linear Regression
- 11.2. Performing Multiple Linear Regression
- 11.3. Getting Regression Statistics
- 11.4. Understanding the Regression Summary
- 11.5. Performing Linear Regression Without an Intercept
- 11.6. Regressing Only Variables That Highly Correlate with Your Dependent Variable
- 11.7. Performing Linear Regression with Interaction Terms
- 11.8. Selecting the Best Regression Variables
- 11.9. Regressing on a Subset of Your Data
- 11.10. Using an Expression Inside a Regression Formula
- 11.11. Regressing on a Polynomial
- 11.12. Regressing on Transformed Data
- 11.13. Finding the Best Power Transformation (Box–Cox Procedure)
- 11.14. Forming Confidence Intervals for Regression Coefficients
- 11.15. Plotting Regression Residuals
- 11.16. Diagnosing a Linear Regression
- 11.17. Identifying Influential Observations
- 11.18. Testing Residuals for Autocorrelation (Durbin–Watson Test)
- 11.19. Predicting New Values
- 11.20. Forming Prediction Intervals
- 11.21. Performing One-Way ANOVA
- 11.22. Creating an Interaction Plot
- 11.23. Finding Differences Between Means of Groups
- 11.24. Performing Robust ANOVA (Kruskal–Wallis Test)
- 11.25. Comparing Models by Using ANOVA
-
12. Useful Tricks
- 12.1. Peeking at Your Data
- 12.2. Printing the Result of an Assignment
- 12.3. Summing Rows and Columns
- 12.4. Printing Data in Columns
- 12.5. Binning Your Data
- 12.6. Finding the Position of a Particular Value
- 12.7. Selecting Every nth Element of a Vector
- 12.8. Finding Minimums or Maximums
- 12.9. Generating All Combinations of Several Variables
- 12.10. Flattening a Data Frame
- 12.11. Sorting a Data Frame
- 12.12. Stripping Attributes from a Variable
- 12.13. Revealing the Structure of an Object
- 12.14. Timing Your Code
- 12.15. Suppressing Warnings and Error Messages
- 12.16. Taking Function Arguments from a List
- 12.17. Defining Your Own Binary Operators
- 12.18. Suppressing the Startup Message
- 12.19. Getting and Setting Environment Variables
- 12.20. Use Code Sections
- 12.21. Executing R in Parallel Locally
- 12.22. Executing R in Parallel Remotely
-
13. Beyond Basic Numerics and Statistics
- 13.1. Minimizing or Maximizing a Single-Parameter Function
- 13.2. Minimizing or Maximizing a Multiparameter Function
- 13.3. Calculating Eigenvalues and Eigenvectors
- 13.4. Performing Principal Component Analysis
- 13.5. Performing Simple Orthogonal Regression
- 13.6. Finding Clusters in Your Data
- 13.7. Predicting a Binary-Valued Variable (Logistic Regression)
- 13.8. Bootstrapping a Statistic
- 13.9. Factor Analysis
-
14. Time Series Analysis
- 14.1. Representing Time Series Data
- 14.2. Plotting Time Series Data
- 14.3. Extracting the Oldest or Newest Observations
- 14.4. Subsetting a Time Series
- 14.5. Merging Several Time Series
- 14.6. Filling or Padding a Time Series
- 14.7. Lagging a Time Series
- 14.8. Computing Successive Differences
- 14.9. Performing Calculations on Time Series
- 14.10. Computing a Moving Average
- 14.11. Applying a Function by Calendar Period
- 14.12. Applying a Rolling Function
- 14.13. Plotting the Autocorrelation Function
- 14.14. Testing a Time Series for Autocorrelation
- 14.15. Plotting the Partial Autocorrelation Function
- 14.16. Finding Lagged Correlations Between Two Time Series
- 14.17. Detrending a Time Series
- 14.18. Fitting an ARIMA Model
- 14.19. Removing Insignificant ARIMA Coefficients
- 14.20. Running Diagnostics on an ARIMA Model
- 14.21. Making Forecasts from an ARIMA Model
- 14.22. Plotting a Forecast
- 14.23. Testing for Mean Reversion
- 14.24. Smoothing a Time Series
-
15. Simple Programming
- 15.1. Choosing Between Two Alternatives: if/else
- 15.2. Iterating with a Loop
- 15.3. Defining a Function
- 15.4. Creating a Local Variable
- 15.5. Choosing Between Multiple Alternatives: switch
- 15.6. Defining Defaults for Function Parameters
- 15.7. Signaling Errors
- 15.8. Protecting Against Errors
- 15.9. Creating an Anonymous Function
- 15.10. Creating a Collection of Reusable Functions
- 15.11. Automatically Reindenting Code
-
16. R Markdown and Publishing
- 16.1. Creating a New Document
- 16.2. Adding a Title, Author, or Date
- 16.3. Formatting Document Text
- 16.4. Inserting Document Headings
- 16.5. Inserting a List
- 16.6. Showing Output from R Code
- 16.7. Controlling Which Code and Results Are Shown
- 16.8. Inserting a Plot
- 16.9. Inserting a Table
- 16.10. Inserting a Table of Data
- 16.11. Inserting Math Equations
- 16.12. Generating HTML Output
- 16.13. Generating PDF Output
- 16.14. Generating Microsoft Word Output
- 16.15. Generating Presentation Output
- 16.16. Creating a Parameterized Report
- 16.17. Organizing Your R Markdown Workflow
- Index
Product information
- Title: R Cookbook, 2nd Edition
- Author(s):
- Release date: June 2019
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492040682
You might also like
book
R Graphics Cookbook, 2nd Edition
This O’Reilly cookbook provides more than 150 recipes to help scientists, engineers, programmers, and data analysts …
book
R Statistics Cookbook
Solve real-world statistical problems using the most popular R packages and techniques Key Features Learn how …
book
R Graphics Cookbook
This practical guide provides more than 150 recipes to help you generate high-quality graphs quickly, without …
book
R in Action, Second Edition
R in Action, Second Edition presents both the R language and the examples that make it …