Wednesday, March 18, 2015

Simulating the Tournament

I'm currently in the midst of Coursera's Data Science Specialization, in which I learn how to apply statistical methods I've learned elsewhere to the R programming language.  Because the individual course projects aren't always the most stimulating, and because I'm not really using R at work yet, I decided that I needed to start some sort of project to really force me to engage with and thus learn the language.  With March Madness fast approaching, the choice of project was obvious: I was going to make a program to simulate the tournament.

Of course, there are already a lot of tournament simulations out there.  Ken Pomeroy always does log5 analysis based off of his rating system.  The team at FiveThirtyEight does a slightly more comprehensive look, accounting for many of my favorite rating systems, as well as a few other factors.  These simulations are great, and provide a good overview of the proceedings.  In order to make this whole exercise worthwhile and not just repeat what's already out there, I had to find a different angle.  Here enters the concept of matchups.  Every March, you hear commentators wax poetic about how the tournament is all about matchups, meaning certain underdogs are more equipped to beat the team at hand.  While I think a lot of this is post-hoc rationalization of otherwise random events, I can also accept that there might be specific combinations of traits that can lead to unexpected outcomes.  What I hope to achieve with this model is to use as much data as possible to find any nuances that other models may miss.

The question that arises then, is what do we mean when we talk about matchups?  Well first, we need to identify specific aspects of team play that we can both ascribe to the overall quality of a team, and quantify.  The best place to start with this is Dean Oliver's four factors, which are:

Shooting
Turnovers
Rebounding
Free Throws

These are typically measured through effective field goal %, turnover %, offensive rebound %, and FTA/FGA %.  While those metrics serve their purpose from an evaluative standpoint, I needed to drill down a little further to be able to actually simulate a basketball game.  After some trial and error, I ended up collecting the following metrics for each team (offensively and defensively):

Two-point shooting percentage
Three-point shooting percentage
Three-point attempt percentage (what percentage of shots taken are 3-pointers)
Free throw percentage*
Foul percentage (per "play")
Offensive rebound percentage (per opportunity)
Turnover percentage (per possesion)
Adjusted Pace (possessions per 40 minutes)

*Free throw percentage is the one metric not adjusted at all for defense.

As you might be able to tell, the denominator for many of these metrics is different.  Specifically, the difference between "possesion" and "play" was something I needed to sort out.  One possession can last several "plays."  For example, a team can miss a shot, grab a rebound and then shoot again.  Since the ball never changed hands, that would be considered one possession, but for the purposes of my sim, I need to think about this as two separate plays.  I thus defined a play as the period of a time between when a team gets the ball, and when they give it up (through turnover, shot, or stoppage due to foul).  In the prior example, the team with the ball could have turned it over before either of the shots, so I need to convert the possession-based metrics to play-based metrics, in order to account for the possibility of a turnover at all times.

As you can probably tell, a lot of this got fairly complicated.  I was originally going to go more into depth about each part, but I won't for now.  If you're hyper-curious about something, let me know and I'll show my work.  As I work to clean up certain parts of the model in the coming months, I will probably make individual posts about my findings on a certain area, which will include my initial work on that subject. 

For now, let's go to the results.  I ran the sim 10000 times, which took about 3 hours to process.  Each column represents the number of times that team reached the round in question (with the R1 column, representing the number of times each team won the title....I have ordered it by that column).  Like me, you may be a little surprised at who's first on the list:



My ramshackle model projects Wisconsin as the most likely winner of the tourney, winning it all nearly 15% of the time.  This of course, flies in the face of most other models that have Kentucky as the clear favorite (with 30-50% chance to win).  What's most amazing is how Wisconsin (who would face Kentucky in the Final Four) won 2/3 of their semi-final games, suggesting that they have a bit of an advantage over the Wildcats.  I don't necessarily think that's true, but it's at least fun to imagiene that it is.  Now for a summarization of the rest of my feelings about how this turned out.

The Good

The top seven teams in reality are the top seven teams in my sim, winning the tournament about 55% of the time.  This distribution shows that the model is at least figuring out which teams are the best, and assigning them wins more frequently.  55% is still probably lower than it should be*, but it's at least approaching the level where I would feel confident in it.  Aside from those top teams, the model does a good job of picking out the under-seeded teams I would expect to do well, like Utah, Texas, and Ohio State.

*It's a very top-heavy year.

The Bad

Even if my model is correct in saying that Kentucky is overrated, it is highly unlikely that they're that overrated.  The most glaring part of this (and this goes for other high seeds as well) is that every 15 and 16 seed has a 10-20% chance of pulling a major upset.  If just one or two of those teams had an increased chance of winning, I wouldn't be taken aback, as I expect my model to find anomalies like that.  But if the model is raising the likelihood of all the lower seeds winning, then I am guessing I need to make some adjustments to account for just how good the best teams are.  Even though we've seen a lot of notable upsets in recent years, we still haven't seen anything to suggests a 16-seed should have a 10% chance to win.

I did do a little debugging of my code, and found one anomaly worth exploring: Each game had, on average, 10-15% more possessions than I would expect, based on the teams' adjusted tempos.  I'm still not sure why this is happening, but it's odd that extra possessions seem to result in more parity rather than less.  My theory is that the extra possessions lead to more fouls, which leads to more high-scoring possessions for the underdogs once they get into the bonus.  This counteracts the disadvantage of having to play more possessions against a better team, and allows the underdog to win more often than it should.  I imagine if I can fix my tempo issues, then this might solve a lot of my problems.

The Future

Due to time constraints, I wasn't able to make a perfect model.  That didn't stop me from thinking long and hard about what changes and enhancements I would like to make in the future.  Those future modifications can be grouped into three buckets.

1. Past Performance may not equal True Talent

This might be the biggest drawback to my current model.  The inputs are all simply the given statistic for the 30 to 34 games each team has played this season.  While it's certainly possible that many of these metrics have stabilized to the point where they largely reflect true talent, I don't really know that for sure.

2. Opponent adjustments

In my current model, I adjusted teams' raw metrics by opponent on two levels.  One, I corrected their season-long metrics by the number of standard deviations the average opponent offense and/or defense was from the mean*.  Two, for each game between teams I took the simple average of the offensive and defensive metrics to get each teams talent level for that game.  While I believe these basic assumptions to be good start to adjusting for opponents, I also don't think of this as a final state for the model.  There are likely subtleties to the metrics, such that they respond in different manners to defensive influence, both over the course of the season and within a game.

*I only regressed 3-point defense by 20% of the opponent two-point strength.  It doesn't seem that 3-point shooting percentage is indicative of skill on it's own, but a team that closes out on twos also appears to use the same skill to close out on three-point shooters as well.

3. Special Circumstances

There are a number of special circumstances I did not account for.  I did not include blocks, with the knowledge that this is included in 2-point field goal defense, and the assumption that the shooting team recovers about as many of them as they do offensive boards.  I did not account for the fact that it's easier to grab offensive boards on three-point field goals, although this should be a relatively easy adjustment in the near future.  I did not account for end-of-game theatrics.  I didn't account for jump balls.  I assumed all shooting fouls were on two-point buckets (and I didn't adjust two-point shooting percentage to account for and-ones, so team probably shoot slightly better than they should).  I didn't account for a million other little subtleties that make the model (like all others) imperfect.  If I'm being honest, correcting the deficiencies from parts #1 and #2 are more important, so I don't imagine many of these things will be improved upon in the near future.  Still, there are a lot of ways I can make this model better and better.  Hopefully, the results will be more in line with reality next season.



No comments:

Post a Comment