Book description
Become a data wrangling expert and make well-informed decisions by effectively utilizing and analyzing raw unstructured data in a systematic manner Purchase of the print or Kindle book includes a free PDF eBook
Key Features
- Implement query optimization during data wrangling using the SQL language with practical use cases
- Master data cleaning, handle the date function and null value, and write subqueries and window functions
- Practice self-assessment questions for SQL-based interviews and real-world case study rounds
Book Description
The amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data.
The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You’ll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You’ll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling.
By the end of this book, you’ll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.
What you will learn
- Build time series models using data wrangling
- Discover data wrangling best practices as well as tips and tricks
- Find out how to use subqueries, window functions, CTEs, and aggregate functions
- Handle missing data, data types, date formats, and redundant data
- Build clean and efficient data models using data wrangling techniques
- Remove outliers and calculate standard deviation to gauge the skewness of data
Who this book is for
This book is for data analysts looking for effective hands-on methods to manage and analyze large volumes of data using SQL. The book will also benefit data scientists, product managers, and basically any role wherein you are expected to gather data insights and develop business strategies using SQL as a language. If you are new to or have basic knowledge of SQL and databases and an understanding of data cleaning practices, this book will give you further insights into how you can apply SQL concepts to build clean, standardized data models for accurate analysis.
Table of contents
- Data Wrangling with SQL
- Acknowledgements
- Dedication
- Contributors
- About the authors
- About the reviewers
- Preface
- Part 1:Data Wrangling Introduction
- Chapter 1: Database Introduction
- Chapter 2: Data Profiling and Preparation before Data Wrangling
- Part 2:Data Wrangling Techniques Using SQL
- Chapter 3: Data Wrangling on String Data Types
- Chapter 4: Data Wrangling on the DATE Data Type
-
Chapter 5: Handling NULL Values
- The impact of missing data and NULL values on data analysis
- Understanding the importance of data validation and cleaning before analyzing data
- Identifying NULL/missing values
- NULL values versus zero values
- Using the IS NULL and IS NOT NULL operators to filter and select data with NULL values
- Using the COALESCE and IFNULL functions to replace NULL values with a default value
- Summary
- Chapter 6: Pivoting Data Using SQL
- Part 3:SQL Subqueries, Aggregate And Window Functions
- Chapter 7: Subqueries and CTEs
- Chapter 8: Aggregate Functions
- Chapter 9: SQL Window Functions
- Part 4:Optimizing Query Performance
-
Chapter 10: Optimizing Query Performance
- Introduction to query optimization
- Query execution plan
- Query optimization techniques
- Query monitoring and troubleshooting
- Tips and tricks for writing efficient queries
- Summary
- In the next chapter, we will learn about descriptive statistics using SQL, which will provide us with insights into the distribution, central tendency, and variability of data, which can, in turn, help us identify outliers and anomalies. Common SQL functions and statements used for descriptive statistics include COUNT, AVG, MIN, MAX, and GROUP BY. By using SQL to analyze data, researchers and analysts can efficiently extract and summarize information from large datasets.
- Part 5:Data Science And Wrangling
- Chapter 11: Descriptive Statistics with SQL
-
Chapter 12: Time Series with SQL
- Running totals
- Lead and lag for time series analysis
- Percentage change
- Moving averages
- Rank for time series analysis
- CTE for time series analysis
- Forecasting with linear regression
- Summary
- In the next chapter, we will learn different methods to find outliers in the data easily. Outlier detection is an important aspect of data analysis as it helps determine if the data is correct, looks at the skewness of the data, and removes any unexpected values.
- Chapter 13: Outlier Detection
- Index
- Other Books You May Enjoy
Product information
- Title: Data Wrangling with SQL
- Author(s):
- Release date: July 2023
- Publisher(s): Packt Publishing
- ISBN: 9781837630028
You might also like
book
Fuzzy Data Matching with SQL
If you were handed two different but related sets of data, what tools would you use …
video
Master SQL for Data Analysis
SQL is a popular language for extracting, stacking, and querying data from databases. Master SQL to …
book
SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL, 4th Edition
The #1 Easy, Common-Sense Guide to SQL Queries—Updated with More Advanced Techniques and Solutions Foreword by …
book
Business Intelligence with Databricks SQL
Master critical skills needed to deploy and use Databricks SQL and elevate your BI from the …