Chapter 5. Generalized Linear Models: Completion Percentage over Expected

In Chapters 3 and 4, you used both simple and multiple regression to adjust play-by-play data for the context of the play. In the case of ball carriers, you adjusted for the situation (such as down, distance, yards to go) to calibrate individual player statistics on the play level, and later the season level. This approach clearly can be applied to the passing game, and more specifically, quarterbacks. As discussed in Chapter 3, Minnesota quarterback Sam Bradford set the NFL record for seasonal completion percentage in 2016, completing a whopping 71.6% of his passes.

Bradford, however, was just a middle-of-the-pack quarterback in terms of efficiency—whether measured by yards per pass attempt, expected points per passing attempt, or touchdown passes. The Vikings won only 7 of his 15 starts that year. The reason Bradford’s completion percentage was so high was that he averaged just 6.6 yards for depth per target (37th in the NFL, per PFF). In general, passes that are thrown longer distances are completed at a lower rate.

To see this, you will create Figure 5-1 in Python or Figure 5-2 in R. First, load the data. Then, filter pass plays (play_type == "pass") with a passer (passer_id.notnull() in Python or !is.na(passer_id) in R), and a pass depth (air_yards.notnull() in Python or !is.na(air_yards) in R). In Python, use this code:

## Python
import pandas as pd
import numpy as np
import nfl_data_py as nfl
import ...

Get Football Analytics with Python & R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.