Appendix B. Summary Statistics and Data Wrangling: Passing the Ball

This appendix contains materials to help you understand some basic statistics. If the topics are new to you, we encourage you to read this material after Chapter 1 or Chapter 2 and before you dive too far into the book.

In Chapter 2, you looked at quarterback performance at different pass depths in an effort to understand which aspect of play was fundamental to performance and which aspect was noisier, possibly leading you astray as you aimed to make predictions about future performance. You were lucky enough to have the data more or less in ready-made form for you to perform this analysis. You did have to create your own variable for analysis, but such data wrangling was minimal at best.

Sports analytics generally, and football analytics specifically, are still in their early stages of development. As such, datasets may not always be the cleanest, or tidy. Tidy datasets are usually in a table form that computers can easily read and humans can easily understand. Furthermore, data analysis in any field (and football analytics is no different) often requires datasets that were created for different purposes. This is where data wrangling can come in handy. Because so many people have had to clean up messy data, many terms exist in this field. Some synonyms for data wrangling include data cleaning, data manipulating, data mutating, shaping, tidying, and munging. More specifically, these terms describe the process ...

Get Football Analytics with Python & R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.