Book description
With more than 200 practical recipes, this book helps you perform data analysis with R quickly and efficiently. The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.
Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an experienced data programmer, it will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process.
- Create vectors, handle variables, and perform other basic functions
- Input and output data
- Tackle data structures such as matrices, lists, factors, and data frames
- Work with probability, probability distributions, and random variables
- Calculate statistics and confidence intervals, and perform statistical tests
- Create a variety of graphic displays
- Build statistical models with linear regressions and analysis of variance (ANOVA)
- Explore advanced statistical techniques, such as finding clusters in your data
"Wonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language—one practical example at a time."—Jeffrey Ryan, software consultant and R package author
Publisher resources
Table of contents
- R Cookbook
- Preface
-
1. Getting Started and Getting Help
- Introduction
- 1.1. Downloading and Installing R
- 1.2. Starting R
- 1.3. Entering Commands
- 1.4. Exiting from R
- 1.5. Interrupting R
- 1.6. Viewing the Supplied Documentation
- 1.7. Getting Help on a Function
- 1.8. Searching the Supplied Documentation
- 1.9. Getting Help on a Package
- 1.10. Searching the Web for Help
- 1.11. Finding Relevant Functions and Packages
- 1.12. Searching the Mailing Lists
- 1.13. Submitting Questions to the Mailing Lists
-
2. Some Basics
- Introduction
- 2.1. Printing Something
- 2.2. Setting Variables
- 2.3. Listing Variables
- 2.4. Deleting Variables
- 2.5. Creating a Vector
- 2.6. Computing Basic Statistics
- 2.7. Creating Sequences
- 2.8. Comparing Vectors
- 2.9. Selecting Vector Elements
- 2.10. Performing Vector Arithmetic
- 2.11. Getting Operator Precedence Right
- 2.12. Defining a Function
- 2.13. Typing Less and Accomplishing More
- 2.14. Avoiding Some Common Mistakes
-
3. Navigating the Software
- Introduction
- 3.1. Getting and Setting the Working Directory
- 3.2. Saving Your Workspace
- 3.3. Viewing Your Command History
- 3.4. Saving the Result of the Previous Command
- 3.5. Displaying the Search Path
- 3.6. Accessing the Functions in a Package
- 3.7. Accessing Built-in Datasets
- 3.8. Viewing the List of Installed Packages
- 3.9. Installing Packages from CRAN
- 3.10. Setting a Default CRAN Mirror
- 3.11. Suppressing the Startup Message
- 3.12. Running a Script
- 3.13. Running a Batch Script
- 3.14. Getting and Setting Environment Variables
- 3.15. Locating the R Home Directory
- 3.16. Customizing R
-
4. Input and Output
- Introduction
- 4.1. Entering Data from the Keyboard
- 4.2. Printing Fewer Digits (or More Digits)
- 4.3. Redirecting Output to a File
- 4.4. Listing Files
- 4.5. Dealing with “Cannot Open File” in Windows
- 4.6. Reading Fixed-Width Records
- 4.7. Reading Tabular Data Files
- 4.8. Reading from CSV Files
- 4.9. Writing to CSV Files
- 4.10. Reading Tabular or CSV Data from the Web
- 4.11. Reading Data from HTML Tables
- 4.12. Reading Files with a Complex Structure
- 4.13. Reading from MySQL Databases
- 4.14. Saving and Transporting Objects
-
5. Data Structures
- Introduction
- 5.1. Appending Data to a Vector
- 5.2. Inserting Data into a Vector
- 5.3. Understanding the Recycling Rule
- 5.4. Creating a Factor (Categorical Variable)
- 5.5. Combining Multiple Vectors into One Vector and a Factor
- 5.6. Creating a List
- 5.7. Selecting List Elements by Position
- 5.8. Selecting List Elements by Name
- 5.9. Building a Name/Value Association List
- 5.10. Removing an Element from a List
- 5.11. Flatten a List into a Vector
- 5.12. Removing NULL Elements from a List
- 5.13. Removing List Elements Using a Condition
- 5.14. Initializing a Matrix
- 5.15. Performing Matrix Operations
- 5.16. Giving Descriptive Names to the Rows and Columns of a Matrix
- 5.17. Selecting One Row or Column from a Matrix
- 5.18. Initializing a Data Frame from Column Data
- 5.19. Initializing a Data Frame from Row Data
- 5.20. Appending Rows to a Data Frame
- 5.21. Preallocating a Data Frame
- 5.22. Selecting Data Frame Columns by Position
- 5.23. Selecting Data Frame Columns by Name
- 5.24. Selecting Rows and Columns More Easily
- 5.25. Changing the Names of Data Frame Columns
- 5.26. Editing a Data Frame
- 5.27. Removing NAs from a Data Frame
- 5.28. Excluding Columns by Name
- 5.29. Combining Two Data Frames
- 5.30. Merging Data Frames by Common Column
- 5.31. Accessing Data Frame Contents More Easily
- 5.32. Converting One Atomic Value into Another
- 5.33. Converting One Structured Data Type into Another
-
6. Data Transformations
- Introduction
- 6.1. Splitting a Vector into Groups
- 6.2. Applying a Function to Each List Element
- 6.3. Applying a Function to Every Row
- 6.4. Applying a Function to Every Column
- 6.5. Applying a Function to Groups of Data
- 6.6. Applying a Function to Groups of Rows
- 6.7. Applying a Function to Parallel Vectors or Lists
-
7. Strings and Dates
- Introduction
- 7.1. Getting the Length of a String
- 7.2. Concatenating Strings
- 7.3. Extracting Substrings
- 7.4. Splitting a String According to a Delimiter
- 7.5. Replacing Substrings
- 7.6. Seeing the Special Characters in a String
- 7.7. Generating All Pairwise Combinations of Strings
- 7.8. Getting the Current Date
- 7.9. Converting a String into a Date
- 7.10. Converting a Date into a String
- 7.11. Converting Year, Month, and Day into a Date
- 7.12. Getting the Julian Date
- 7.13. Extracting the Parts of a Date
- 7.14. Creating a Sequence of Dates
-
8. Probability
- Introduction
- 8.1. Counting the Number of Combinations
- 8.2. Generating Combinations
- 8.3. Generating Random Numbers
- 8.4. Generating Reproducible Random Numbers
- 8.5. Generating a Random Sample
- 8.6. Generating Random Sequences
- 8.7. Randomly Permuting a Vector
- 8.8. Calculating Probabilities for Discrete Distributions
- 8.9. Calculating Probabilities for Continuous Distributions
- 8.10. Converting Probabilities to Quantiles
- 8.11. Plotting a Density Function
-
9. General Statistics
- Introduction
- 9.1. Summarizing Your Data
- 9.2. Calculating Relative Frequencies
- 9.3. Tabulating Factors and Creating Contingency Tables
- 9.4. Testing Categorical Variables for Independence
- 9.5. Calculating Quantiles (and Quartiles) of a Dataset
- 9.6. Inverting a Quantile
- 9.7. Converting Data to Z-Scores
- 9.8. Testing the Mean of a Sample (t Test)
- 9.9. Forming a Confidence Interval for a Mean
- 9.10. Forming a Confidence Interval for a Median
- 9.11. Testing a Sample Proportion
- 9.12. Forming a Confidence Interval for a Proportion
- 9.13. Testing for Normality
- 9.14. Testing for Runs
- 9.15. Comparing the Means of Two Samples
- 9.16. Comparing the Locations of Two Samples Nonparametrically
- 9.17. Testing a Correlation for Significance
- 9.18. Testing Groups for Equal Proportions
- 9.19. Performing Pairwise Comparisons Between Group Means
- 9.20. Testing Two Samples for the Same Distribution
-
10. Graphics
- Introduction
- 10.1. Creating a Scatter Plot
- 10.2. Adding a Title and Labels
- 10.3. Adding a Grid
- 10.4. Creating a Scatter Plot of Multiple Groups
- 10.5. Adding a Legend
- 10.6. Plotting the Regression Line of a Scatter Plot
- 10.7. Plotting All Variables Against All Other Variables
- 10.8. Creating One Scatter Plot for Each Factor Level
- 10.9. Creating a Bar Chart
- 10.10. Adding Confidence Intervals to a Bar Chart
- 10.11. Coloring a Bar Chart
- 10.12. Plotting a Line from x and y Points
- 10.13. Changing the Type, Width, or Color of a Line
- 10.14. Plotting Multiple Datasets
- 10.15. Adding Vertical or Horizontal Lines
- 10.16. Creating a Box Plot
- 10.17. Creating One Box Plot for Each Factor Level
- 10.18. Creating a Histogram
- 10.19. Adding a Density Estimate to a Histogram
- 10.20. Creating a Discrete Histogram
- 10.21. Creating a Normal Quantile-Quantile (Q-Q) Plot
- 10.22. Creating Other Quantile-Quantile Plots
- 10.23. Plotting a Variable in Multiple Colors
- 10.24. Graphing a Function
- 10.25. Pausing Between Plots
- 10.26. Displaying Several Figures on One Page
- 10.27. Opening Additional Graphics Windows
- 10.28. Writing Your Plot to a File
- 10.29. Changing Graphical Parameters
-
11. Linear Regression and ANOVA
- Introduction
- 11.1. Performing Simple Linear Regression
- 11.2. Performing Multiple Linear Regression
- 11.3. Getting Regression Statistics
- 11.4. Understanding the Regression Summary
- 11.5. Performing Linear Regression Without an Intercept
- 11.6. Performing Linear Regression with Interaction Terms
- 11.7. Selecting the Best Regression Variables
- 11.8. Regressing on a Subset of Your Data
- 11.9. Using an Expression Inside a Regression Formula
- 11.10. Regressing on a Polynomial
- 11.11. Regressing on Transformed Data
- 11.12. Finding the Best Power Transformation (Box–Cox Procedure)
- 11.13. Forming Confidence Intervals for Regression Coefficients
- 11.14. Plotting Regression Residuals
- 11.15. Diagnosing a Linear Regression
- 11.16. Identifying Influential Observations
- 11.17. Testing Residuals for Autocorrelation (Durbin–Watson Test)
- 11.18. Predicting New Values
- 11.19. Forming Prediction Intervals
- 11.20. Performing One-Way ANOVA
- 11.21. Creating an Interaction Plot
- 11.22. Finding Differences Between Means of Groups
- 11.23. Performing Robust ANOVA (Kruskal–Wallis Test)
- 11.24. Comparing Models by Using ANOVA
-
12. Useful Tricks
- Introduction
- 12.1. Peeking at Your Data
- 12.2. Widen Your Output
- 12.3. Printing the Result of an Assignment
- 12.4. Summing Rows and Columns
- 12.5. Printing Data in Columns
- 12.6. Binning Your Data
- 12.7. Finding the Position of a Particular Value
- 12.8. Selecting Every nth Element of a Vector
- 12.9. Finding Pairwise Minimums or Maximums
- 12.10. Generating All Combinations of Several Factors
- 12.11. Flatten a Data Frame
- 12.12. Sorting a Data Frame
- 12.13. Sorting by Two Columns
- 12.14. Stripping Attributes from a Variable
- 12.15. Revealing the Structure of an Object
- 12.16. Timing Your Code
- 12.17. Suppressing Warnings and Error Messages
- 12.18. Taking Function Arguments from a List
- 12.19. Defining Your Own Binary Operators
-
13. Beyond Basic Numerics and Statistics
- Introduction
- 13.1. Minimizing or Maximizing a Single-Parameter Function
- 13.2. Minimizing or Maximizing a Multiparameter Function
- 13.3. Calculating Eigenvalues and Eigenvectors
- 13.4. Performing Principal Component Analysis
- 13.5. Performing Simple Orthogonal Regression
- 13.6. Finding Clusters in Your Data
- 13.7. Predicting a Binary-Valued Variable (Logistic Regression)
- 13.8. Bootstrapping a Statistic
- 13.9. Factor Analysis
-
14. Time Series Analysis
- Introduction
- 14.1. Representing Time Series Data
- 14.2. Plotting Time Series Data
- 14.3. Extracting the Oldest or Newest Observations
- 14.4. Subsetting a Time Series
- 14.5. Merging Several Time Series
- 14.6. Filling or Padding a Time Series
- 14.7. Lagging a Time Series
- 14.8. Computing Successive Differences
- 14.9. Performing Calculations on Time Series
- 14.10. Computing a Moving Average
- 14.11. Applying a Function by Calendar Period
- 14.12. Applying a Rolling Function
- 14.13. Plotting the Autocorrelation Function
- 14.14. Testing a Time Series for Autocorrelation
- 14.15. Plotting the Partial Autocorrelation Function
- 14.16. Finding Lagged Correlations Between Two Time Series
- 14.17. Detrending a Time Series
- 14.18. Fitting an ARIMA Model
- 14.19. Removing Insignificant ARIMA Coefficients
- 14.20. Running Diagnostics on an ARIMA Model
- 14.21. Making Forecasts from an ARIMA Model
- 14.22. Testing for Mean Reversion
- 14.23. Smoothing a Time Series
- Index
- About the Author
- Colophon
- Copyright
Product information
- Title: R Cookbook
- Author(s):
- Release date: March 2011
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9780596809157
You might also like
book
R Cookbook
실용적인 275가지 레시피로 배우는 R 데이터 분석 R은 통계, 그래픽 작업, 통계적 프로그래밍을 하는 데 매우 …
book
R Cookbook, 2nd Edition
Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this …
book
R Bioinformatics Cookbook
Over 60 recipes to model and handle real-life biological data using modern libraries from the R …
book
R Graphics Cookbook
This practical guide provides more than 150 recipes to help you generate high-quality graphs quickly, without …