Chapter 2. Function Junction

Now that we’ve looked at SELECT, we can look at adding various SQL functions to it to help us “slice and dice” the data, deal with data quality issues, and transform it to the shape we need. We’ve seen COUNT already, so we’ll start with it and the rest of the aggregate functions that often help during EDA to understand some simple statistics. In this chapter, we will cover the following topics:

You will get very used to reading, and ideally very good at using, SQL functions by the end of this book. I take a highly functional approach to the problems presented, so something like the following will become a common sight to you:

SELECT
    LEFT(REPLACE(REPLACE(StreetAddress, ' ', ''), '.', ''), 10) AddressPrefix
FROM crm.CustomerAddress

Always read functions calling functions from the inside out. The previous code should read to you as follows:

  1. Replace all spaces in the results from the first REPLACE, also with an empty character.

  2. Replace all period (full-stop) characters in the value from the column Street​Ad⁠dress with an empty character.

  3. Take the left 10 characters of that result and return that truncated value in a column labeled AddressPrefix.

To illustrate most functions, we will ...

Get Fuzzy Data Matching with SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.