Data Wrangling with R

Book description

Take your data wrangling skills to the next level by gaining a deep understanding of tidyverse libraries and effectively prepare your data for impressive analysis

Purchase of the print or Kindle book includes a free PDF eBook

Key Features

  • Explore state-of-the-art libraries for data wrangling in R and learn to prepare your data for analysis
  • Find out how to work with different data types such as strings, numbers, date, and time
  • Build your first model and visualize data with ease through advanced plot types and with ggplot2

Book Description

In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you'll need plenty of tools that enable you to extract the most useful knowledge from data.

Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization.

The book begins by teaching you how to load and explore datasets. Then, you'll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you'll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards.

By the end of this book, you'll have learned how to create your first data model and build an application with Shiny in R.

What you will learn

  • Discover how to load datasets and explore data in R
  • Work with different types of variables in datasets
  • Create basic and advanced visualizations
  • Find out how to build your first data model
  • Create graphics using ggplot2 in a step-by-step way in Microsoft Power BI
  • Get familiarized with building an application in R with Shiny

Who this book is for

If you are a professional data analyst, data scientist, or beginner who wants to learn more about data wrangling, this book is for you. Familiarity with the basic concepts of R programming or any other object-oriented programming language will help you to grasp the concepts taught in this book. Data analysts looking to improve their data manipulation and visualization skills will also benefit immensely from this book.

Table of contents

  1. Data Wrangling with R
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share Your Thoughts
    8. Download a free PDF copy of this book
  6. Part 1: Load and Explore Data
  7. Chapter 1: Fundamentals of Data Wrangling
    1. What is data wrangling?
    2. Why data wrangling?
      1. Benefits
    3. The key steps of data wrangling
      1. Frameworks in Data Science
    4. Summary
    5. Exercises
    6. Further reading
  8. Chapter 2: Loading and Exploring Datasets
    1. Technical requirements
    2. How to load files to RStudio
      1. Loading a CSV file to R
    3. Tibbles versus Data Frames
    4. Saving files
    5. A workflow for data exploration
      1. Loading and viewing
      2. Descriptive statistics
      3. Missing values
      4. Data distributions
      5. Visualizations
    6. Basic Web Scraping
      1. Getting data from an API
    7. Summary
    8. Exercises
    9. Further reading
  9. Chapter 3: Basic Data Visualization
    1. Technical requirements
    2. Data visualization
    3. Creating single-variable plots
      1. Dataset
      2. Boxplots
      3. Density plot
    4. Creating two-variable plots
      1. Scatterplot
      2. Bar plot
      3. Line plot
    5. Working with multiple variables
      1. Plots side by side
    6. Summary
    7. Exercises
    8. Further reading
  10. Part 2: Data Wrangling
  11. Chapter 4: Working with Strings
    1. Introduction to stringr
      1. Detecting patterns
      2. Subset strings
      3. Managing lengths
      4. Mutating strings
      5. Joining and splitting
      6. Ordering strings
    2. Working with regular expressions
      1. Learning the basics
    3. Creating frequency data summaries in R
      1. Regexps in practice
      2. Creating a contingency table using gmodels
    4. Text mining
      1. Tokenization
      2. Stemming and lemmatization
      3. TF-IDF
      4. N-grams
    5. Factors
    6. Summary
    7. Exercises
    8. Further reading
  12. Chapter 5: Working with Numbers
    1. Technical requirements
    2. Numbers in vectors, matrices, and data frames
      1. Vectors
      2. Matrices
      3. Data frames
    3. Math operations with variables
      1. apply functions
    4. Descriptive statistics
      1. Correlation
    5. Summary
    6. Exercises
    7. Further reading
  13. Chapter 6: Working with Date and Time Objects
    1. Technical requirements
    2. Introduction to date and time
    3. Date and time with lubridate
      1. Arithmetic operations with datetime
      2. Time zones
    4. Date and time using regular expressions (regexps)
    5. Practicing
    6. Summary
    7. Exercises
    8. Further reading
  14. Chapter 7: Transformations with Base R
    1. Technical requirements
    2. The dataset
    3. Slicing and filtering
      1. Slicing
      2. Filtering
    4. Grouping and summarizing
    5. Replacing and filling
    6. Arranging
    7. Creating new variables
    8. Binding
    9. Using data.table
    10. Summary
    11. Exercises
    12. Further reading
  15. Chapter 8: Transformations with Tidyverse Libraries
    1. Technical requirements
    2. What is tidy data
      1. The pipe operator
    3. Slicing and filtering
      1. Slicing
      2. Filtering
    4. Grouping and summarizing data
    5. Replacing and filling data
    6. Arranging data
    7. Creating new variables
      1. The mutate function
    8. Joining datasets
      1. Left Join
      2. Right join
      3. Inner join
      4. Full join
      5. Anti-join
    9. Reshaping a table
    10. Do more with tidyverse
    11. Summary
    12. Exercises
    13. Further reading
  16. Chapter 9: Exploratory Data Analysis
    1. Technical requirements
    2. Loading the dataset to RStudio
    3. Understanding the data
    4. Treating missing data
    5. Exploring and visualizing the data
      1. Univariate analysis
      2. Multivariate analysis
      3. Exploring
    6. Analysis report
      1. Report
      2. Next steps
    7. Summary
    8. Exercises
    9. Further reading
  17. Part 3: Data Visualization
  18. Chapter 10: Introduction to ggplot2
    1. Technical requirements
    2. The grammar of graphics
      1. Data
      2. Geometry
      3. Aesthetics
      4. Statistics
      5. Coordinates
      6. Facets
      7. Themes
    3. The basic syntax of ggplot2
    4. Plot types
      1. Histograms
      2. Boxplot
      3. Scatterplot
      4. Bar plots
      5. Line plots
      6. Smooth geometry
      7. Themes
    5. Summary
    6. Exercises
    7. Further reading
  19. Chapter 11: Enhanced Visualizations with ggplot2
    1. Technical requirements
    2. Facet grids
    3. Map plots
    4. Time series plots
    5. 3D plots
    6. Adding interactivity to graphics
    7. Summary
    8. Exercises
    9. Further reading
  20. Chapter 12: Other Data Visualization Options
    1. Technical requirements
    2. Plotting graphics in Microsoft Power BI using R
    3. Preparing data for plotting
    4. Creating word clouds in RStudio
    5. Summary
    6. Exercises
    7. Further reading
  21. Part 4: Modeling
  22. Chapter 13: Building a Model with R
    1. Technical requirements
    2. Machine learning concepts
      1. Classification models
      2. Regression models
      3. Supervised and unsupervised learning
    3. Understanding the project
      1. The dataset
      2. The project
      3. The algorithm
    4. Preparing data for modeling in R
    5. Exploring the data with a few visualizations
    6. Selecting the best variables
    7. Modeling
      1. Training
      2. Testing and evaluating the model
      3. Predicting
    8. Summary
    9. Exercises
    10. Further reading
  23. Chapter 14: Build an Application with Shiny in R
    1. Technical requirements
    2. Learning the basics of Shiny
      1. Get started
      2. Basic functions
    3. Creating an application
      1. The project
      2. Coding
    4. Deploying the application on the web
    5. Summary
    6. Exercises
    7. Further reading
  24. Conclusion
    1. References
    2. Why subscribe?
  25. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Data Wrangling with R
  • Author(s): Gustavo R Santos
  • Release date: February 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781803235400