Game excitement calculation and a win probability figure.
First we need to import our dependencies. These pacakges are what make this analysis possible.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
Next we will read in our data from the nflfastR data repo.
# Read in data
YEAR = 2019
data = pd.read_csv('https://github.com/guga31bb/nflfastR-data/blob/master/data/' \
'play_by_play_' + str(YEAR) + '.csv.gz?raw=True',
compression='gzip', low_memory=False)
Perfect! Our data and notebook are set up and ready to go. The next step is to filter our df to include only the game we would like to work with. We will subset by game_id
(which we will need later). The new nflfastR game ids are very convenient and use the following format:
YEAR_WEEK_AWAY_HOME
Note, the year needs to be in YYYY format and single digit weeks must lead with a 0.
#Subset the game of interest
game_df = data[
(data.game_id== '2019_09_MIN_KC')
]
#View a random sample of our df to ensure everything is correct
game_df.sample(3)
play_id game_id ... xyac_success xyac_fd
23013 1294 2019_09_MIN_KC ... 1.000000 1.000000
23080 3077 2019_09_MIN_KC ... 0.140994 0.107368
23077 2992 2019_09_MIN_KC ... NaN NaN
[3 rows x 340 columns]
The last step in preprocessing for this particular analysis is dropping null values to avoid jumps in our WP chart. To clean things up, we can filter the columns to show only those that are of importance to us.
cols = ['home_wp','away_wp','game_seconds_remaining']
game_df = game_df[cols].dropna()
#View new df to again ensure everything is correct
game_df
home_wp away_wp game_seconds_remaining
22960 0.560850 0.439150 3600.0
22961 0.560850 0.439150 3600.0
22962 0.599848 0.400152 3596.0
22963 0.612526 0.387474 3590.0
22964 0.629503 0.370497 3584.0
... ... ... ...
23132 0.697633 0.302367 59.0
23134 0.806030 0.193970 24.0
23135 0.910061 0.089939 4.0
23136 0.927525 0.072475 3.0
23137 1.000000 0.000000 0.0
[166 rows x 3 columns]
Everything looks good to go! Before we use this data to create the WP chart, we are going to calculate the game’s excitement index.
We are using Luke Benz’ formula for GEI which can be found here. It’s simple yet effective which is why I like it so much. As Luke notes, “the formula sums the absolute value of the win probability change from each play”. Here, we are creating a function (inspired by ChiefsAnalytics) that follows his formula. This function requires a single parameter game_id
. The new version of nflfastR’s game id must be used here.
#Calculate average length of 2019 games for use in our function
avg_length = data.groupby(by=['game_id'])['epa'].count().mean()
def calc_gei(game_id):
game = data[(data['game_id']==game_id)]
#Length of game
length = len(game)
#Adjusting for game length
normalize = avg_length / length
#Get win probability differences for each play
win_prob_change = game['home_wp'].diff().abs()
#Normalization
gei = normalize * win_prob_change.sum()
return gei
Let’s run the function by passing in our game id from earlier.
print(f"Vikings @ Chiefs GEI: {calc_gei('2019_09_MIN_KC')}")
Vikings @ Chiefs GEI: 4.652632439280925
This seemed to be a pretty exciting game. Let’s compare it to other notable games from last season.
# Week 1 blowout between the Ravens and Dolphins
print(f"Ravens @ Dolphins GEI: {calc_gei('2019_01_BAL_MIA')}")
# Week 14 thriller between the 49ers and Saints
Ravens @ Dolphins GEI: 0.9723172478637379
print(f"49ers @ Saints GEI: {calc_gei('2019_14_SF_NO')}")
49ers @ Saints GEI: 5.190375267367869
Yep, the Vikings vs Chiefs game was definitely one of the more exciting regular season games of last season. Let’s see how it looks visually with a WP chart!
Matplotlib and Seaborn can be used together to create some beautiful plots. Before we start, below is a useful line of code that prints out all usable matplotlib styles. You can also see how each of them look by checking out the documentation.
#Print all matplotlib styles
print(plt.style.available)
['Solarize_Light2', '_classic_test_patch', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']
Since we already have all of our data set up from Step 1, we can jump straight to the plot!
#Set style
plt.style.use('dark_background')
#Create a figure
fig, ax = plt.subplots(figsize=(16,8))
#Generate lineplots
sns.lineplot('game_seconds_remaining', 'away_wp',
data=game_df, color='#4F2683',linewidth=2)
sns.lineplot('game_seconds_remaining', 'home_wp',
data=game_df, color='#E31837',linewidth=2)
#Generate fills for the favored team at any given time
<AxesSubplot:xlabel='game_seconds_remaining', ylabel='home_wp'>
ax.fill_between(game_df['game_seconds_remaining'], 0.5, game_df['away_wp'],
where=game_df['away_wp']>.5, color = '#4F2683',alpha=0.3)
ax.fill_between(game_df['game_seconds_remaining'], 0.5, game_df['home_wp'],
where=game_df['home_wp']>.5, color = '#E31837',alpha=0.3)
#Labels
plt.ylabel('Win Probability %', fontsize=16)
plt.xlabel('', fontsize=16)
#Divider lines for aesthetics
plt.axvline(x=900, color='white', alpha=0.7)
plt.axvline(x=1800, color='white', alpha=0.7)
plt.axvline(x=2700, color='white', alpha=0.7)
plt.axhline(y=.50, color='white', alpha=0.7)
#Format and rename xticks
ax.set_xticks(np.arange(0, 3601,900))
[<matplotlib.axis.XTick object at 0x000000002F30CF60>, <matplotlib.axis.XTick object at 0x000000002F30CB00>, <matplotlib.axis.XTick object at 0x000000002F33FD30>, <matplotlib.axis.XTick object at 0x000000002F3D0438>, <matplotlib.axis.XTick object at 0x000000002F3D08D0>]
plt.gca().invert_xaxis()
x_ticks_labels = ['End','End Q3','Half','End Q1','Kickoff']
ax.set_xticklabels(x_ticks_labels, fontsize=12)
#Titles
[Text(0, 0, 'End'), Text(900, 0, 'End Q3'), Text(1800, 0, 'Half'), Text(2700, 0, 'End Q1'), Text(3600, 0, 'Kickoff')]
plt.suptitle('Minnesota Vikings @ Kansas City Chiefs',
fontsize=20, style='italic',weight='bold')
plt.title('KC 26, MIN 23 - Week 9 ', fontsize=16,
style='italic', weight='semibold')
#Creating a textbox with GEI score
props = dict(boxstyle='round', facecolor='black', alpha=0.6)
plt.figtext(.133,.85,'Game Excitement Index (GEI): 4.65',style='italic',bbox=props)
#Citations
plt.figtext(0.131,0.137,'Graph: @mnpykings | Data: @nflfastR')
#Save figure if you wish
#plt.savefig('winprobchart.png', dpi=300)
Wow, this game had a ton of WP changes. No wonder it had a high GEI!
Things to be aware of:
Sometimes the plot generates small gaps in the fill. This only occurs when the previous data point is on the opposite side of the 50% threshold compared to the current data point or vice versa (this happens twice to the Chiefs’ WP line towards the end of the game). The .fill_between()
function only checks to fill at each new data point and not inbetween. This is very minor and the dark background makes it hardly noticeable, but I wanted to address it to make sure nobody gets confused if this happens to them.
The nflfastR win probability model is a little wonky in OT due to it not accounting for ties as Sebastian mentions here. Be mindful of this when calculating GEI or creating WP charts with OT games.
That concludes this tutorial. Thanks for reading, I hope you learned some python in the process! Big thanks to Sebastian Carl and Ben Baldwin for everything they do; I’m looking forward to watching this platform grow! The future of sports analytics has never looked brighter.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/mrcaseb/open-source-football, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bolger (2020, Aug. 21). Open Source Football: Game Excitement and Win Probability in the NFL. Retrieved from https://www.opensourcefootball.com/posts/2020-08-21-game-excitement-and-win-probability-in-the-nfl/
BibTeX citation
@misc{bolger2020game, author = {Bolger, Max}, title = {Open Source Football: Game Excitement and Win Probability in the NFL}, url = {https://www.opensourcefootball.com/posts/2020-08-21-game-excitement-and-win-probability-in-the-nfl/}, year = {2020} }