This article shows how to adjust a team’s EPA per play for the strength of their opponent. The benefits of adjusted EPA will be demonstrated as well!
Here we are going to take a look at how to adjust a team’s epa per play to the strength of their opponent. This technique will use weekly epa/play metrics, which can ultimately summarize a team’s season-long performance. It is also possible to adjust the epa of individual plays with this process if you are so inclined to do so.
Quick note: the adjustments were inspired by the work done in this paper. It’s a bit technical but a good additional read!
Alright, let’s get into it by first loading up our data!
NFL_PBP <- purrr::map_df(2009:2019, function(x) {
readr::read_csv(
glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_{x}.csv.gz")
)
})
With the data loaded, we can finally get down to business by summarizing each team’s weekly epa/play.
library(tidyverse)
epa_data <- NFL_PBP %>%
dplyr::filter(!is.na(epa), !is.na(ep), !is.na(posteam), play_type == "pass" | play_type == "run") %>%
dplyr::group_by(game_id, season, week, posteam, home_team) %>%
dplyr::summarise(
off_epa = mean(epa),
) %>%
dplyr::left_join(NFL_PBP %>%
filter(!is.na(epa), !is.na(ep), !is.na(posteam), play_type == "pass" | play_type == "run") %>%
dplyr::group_by(game_id, season, week, defteam, away_team) %>%
dplyr::summarise(def_epa = mean(epa)),
by = c("game_id", "posteam" = "defteam", "season", "week"),
all.x = TRUE
) %>%
dplyr::mutate(opponent = ifelse(posteam == home_team, away_team, home_team)) %>%
dplyr::select(game_id, season, week, home_team, away_team, posteam, opponent, off_epa, def_epa)
Now we can get into the fun part: adjusting a team’s epa/play based on the strength of the opponent they are up against.
# Construct opponent dataset and lag the moving average of their last ten games.
opponent_data <- epa_data %>%
dplyr::select(-opponent) %>%
dplyr::rename(
opp_off_epa = off_epa,
opp_def_epa = def_epa
) %>%
dplyr::group_by(posteam) %>%
dplyr::arrange(season, week) %>%
dplyr::mutate(
opp_def_epa = pracma::movavg(opp_def_epa, n = 10, type = "s"),
opp_def_epa = dplyr::lag(opp_def_epa),
opp_off_epa = pracma::movavg(opp_off_epa, n = 10, type = "s"),
opp_off_epa = dplyr::lag(opp_off_epa)
)
# Merge opponent data back in with the weekly epa data
epa_data <- epa_data %>%
left_join(
opponent_data,
by = c("game_id", "season", "week", "home_team", "away_team", "opponent" = "posteam"),
all.x = TRUE
)
Don’t fret that the opponent’s epa columns will have NAs in the first week. You simply can’t lag from the first observation.
The final piece of the equation needed to make the adjustments is the league mean for epa/play on offense and defense. We need to know how strong the opponent is relative to the average team in the league.
epa_data <- epa_data %>%
dplyr::left_join(epa_data %>%
dplyr::filter(posteam == home_team) %>%
dplyr::group_by(season, week) %>%
dplyr::summarise(
league_mean = mean(off_epa + def_epa)
) %>%
dplyr::ungroup() %>%
dplyr::group_by(season) %>%
dplyr::mutate(
league_mean = lag(pracma::movavg(league_mean, n = 10, type = "s"), ) # We lag because we need to know the league mean up to that point in the season
),
by = c("season", "week"),
all.x = TRUE
)
Finally, we can get to adjusting a team’s epa/play. We’ll create an adjustment measure by subtracting the opponent’s epa/play metrics from the league mean. Then we add the adjustment measure to each team’s weekly performance.
# Adjust EPA
epa_data <- epa_data %>%
dplyr::mutate(
off_adjustment_factor = ifelse(!is.na(league_mean), league_mean - opp_def_epa, 0),
def_adjustment_factor = ifelse(!is.na(league_mean), league_mean - opp_off_epa, 0),
adjusted_off_epa = off_epa + off_adjustment_factor,
adjusted_def_epa = def_epa + def_adjustment_factor,
)
We’re done! You can now view each team’s epa/play adjusted for their strength of schedule. Let’s check out how different the league looks by comparing unadjusted epa to adjusted epa stats.
Above, you can see that some teams are revealed to be stronger after adjusting their epa/play while other teams appear to be weaker. We can use these adjustments to make more accurate predictions of individual NFL games.
Here, each metrics are used in separate glm models to predict the outcome of games from the past two seasons. Their accuracy is below.
[1] "Adjusted EPA Accuracy"
[1] 0.6404494
[1] "Normal EPA Accuracy"
[1] 0.6348315
There is a slight edge to the adjusted EPA model. Its a solid start but there is more work to be done in finding the best version on epa/play.
Thanks to Sebastian Carl and Ben Baldwin for setting this forum up! I can’t wait to see others’ works and improvements to my own make its way on here.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/mrcaseb/open-source-football, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Goldberg (2020, Aug. 20). Open Source Football: Adjusting EPA for Strength of Opponent. Retrieved from https://www.opensourcefootball.com/posts/2020-08-20-adjusting-epa-for-strenght-of-opponent/
BibTeX citation
@misc{goldberg2020adjusting, author = {Goldberg, Jonathan}, title = {Open Source Football: Adjusting EPA for Strength of Opponent}, url = {https://www.opensourcefootball.com/posts/2020-08-20-adjusting-epa-for-strenght-of-opponent/}, year = {2020} }