Chapter 6. Anomaly Detection on Normally Distributed Data
There are many ways to detect outliers in your data. You have already been introduced to one way in Chapter 4 on histograms. However, visualizing those outliers in a histogram only gets you so far. What if you want to quantify that anomaly and communicate those findings to your stakeholders? In this chapter, you will learn three different techniques you can use to quantify and visualize outliers using Tableau.
By the end of this chapter, you will be able to use standard deviations, median with quartiles, and z-scores to flag outliers in your data and present them visually to your stakeholders. It is also important to note that these methods should be used on data that fit a normal distribution, which you have learned about in previous chapters.
Understanding Standard Deviations
Standard deviation is a statistical measure that quantifies the amount of variability or dispersion within a set of data points. It measures how spread out the values are in a dataset from the mean, or average, of the data.
Mathematically, the standard deviation is calculated as the square root of the variance. The variance is obtained by taking the average of the squared differences between each data point and the mean. Recall the empirical rule (68–95–99.7) shown in Figure 6-1 (previously shown in Chapter 4).
Since standard deviations are a function of variability, they are a perfect measure to detect outliers within a dataset. To put it simply, ...
Get Statistical Tableau now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.