Tuesday, May 15, 2012

Beer Pong Sabermetrics II

If you thought I was going to stop at one post about beer pong, then you were sorely mistaken.  I wanted to address a few things related to my previous post, and since I'll be all graduated soon, time was running out to use the software.  Thus, I spent some time during this gorgeous day where I didn't have school in a nearly-empty computer cluster running simulations.  Yup, I'm awesome.

So maybe some of my previous results were exaggerated.  My bad.

My original intention for this post was to add some graphs to show what I was describing in the original post.  Specifically, I wanted to show the distribution of results for the GB vs. AA match that had very different outcomes depending upon which team started.  As I did this, I noticed that about 4% of the games simulated ended in a tie.  It turns out that the 25 simulated turns in my spreadsheet weren't enough for some of the games, and this skewed my results towards the team that went first.  The stat I was using to determine who won averaged a cell that should have only been returning 1 or 2 depending on which team won.  Since some games were ending without a victor, this cell would return a 0, and skew the results.  Thus, all the numbers from my original post were wrong (oops).  Here are the correct results for select scenarios:

GG vs. GG:  The team that goes first wins 50.25% of the time.
GB vs. AA:  GB wins 37.1% of the time when it goes first and 35.56% of the time when it goes second.  The expected cup differentials for GB turned out exactly the same for both scenarios (-1.12).
GA vs. AA:  GA wins 66.15% of the time when it goes first and 66.26% of the time when it goes second.  Cup differential was once again the same for both (+1.43).

As you can see, the base scenario is still basically even while the effects of the others are greatly lessened if not completely eliminated.  It appears that GB should still elect to go first in that scenario, but it doesn't actually increase win probability by 15%.  I was pretty surpised by the initial results, so this correction makes sense to me.

33% made is a strong player?  That seems low.

I selected 33% since that was roughly my skill level during practice and no one at the tournament severely outplayed me.  Thus, I theorized that that was a good estimate for a typical strong player.  Yes, I realize that people who play beer pong for a living probably have higher true talent levels than that, but I was mostly concerned with the majority of players who aren't as good.

That being said, a friend brought up an interesting point.  If all four players in a game could make every shot, then the team that goes first would win every time (ignorning rebuttals and overtime).  Thus, he theorized that as talent levels go up, there is an added benefit to going first.  Here is what the sim said:

All four players hit 40% of shots:  The team that goes first wins 50.09% of the time
All four players hit 50% of shots:  The team that goes first wins 49.19% of the time
All four players hit 80% of shots:  The team that goes first wins 45.83% of the time
All four players hit 95% of shots:  The team that goes first wins 55.78% of the time

It appears that higher talent levels remain even for quite a while.  At 80%, there is actually a slight dip for the team that goes first, but then it looks to satisfy our hypothesis as we get closer to 100%.  Regardless of this, it looks like the difference is never too extreme, and of course you're unlikely to encounter a true talent 80% anywhere (Yes, I'm sure we've all had games where we've hit 80% of our shots, but good luck averaging that rate).

Doesn't beer pong get harder as it goes along?

The other thing my friend proposed was that it's harder to hit cups later in the game.  While I normally wouldn't want to make this assumption without some empirical data on whether this is true, it's definitely worth simming.  Since we don't have any idea of what this effect is, I did a quick and dirty estimate where each cup removed reduces the odds of hitting a cup by 5% (Thus, you're only 55% as likely to hit a one cup rack as you are a full rack).  Since the initial true talent level now only represents one's likelihood of hitting a full rack, I upped the numbers a bit.  Here are the results for a couple of tests:

All four players with initial true talent of 40%:  The team that went first won 49.18% of the time.
All four players with initial true talent of 50%:  The team that went first won 49.28% of the time.

It looks like we still get pretty even results, although these simulations show a slight advantage to the team that defers.  This could suggest the building advantage for the second team that I described in the original post could come more into play here.  Still, the effect is so small that it probably won't make a practical difference.

Friday, May 11, 2012

Beer Pong Sabermetrics

Two weekends ago, I had the pleasure of attending the inaugural tournament hosted by State Line BPL.  While my team's fake meijer.com sponsorship didn't quite propel us to victory, I had quite the good time.  However, it was not all fun and games in Rock-ford.  Important philosophical questions were raised.  That is, when given the choice, should you go first or defer?

For those of you not familiar with the official WSOBP rules, the start of the game is slightly different from how you probably played beer pong in college.  Instead of a face-off, one team gets to select going first (with the caveat that the first turn of a game is only one shot) or deferring.  My team chose to defer whenever we had the choice, since it would provide us the first chance at a bonus shot (other difference: if you make both cups, you don't get a full send-back, just one extra shot).  The prevailing wisdom, however, was that the first shot was better because it essentially gave you an extra shot before the "real game" started.  I figured that either way, it probably didn't swing things too much.  Of course, that didn't stop me from spending a bunch of time trying to find out for sure.

First, I did a little thought exercise, just to get my bearings.  Let's say you're a pretty good player who can hit 1/3 of your shots (as is your teammate).  If you went first, then your first turn would have an expected value of 1/3 cup, since only one of you would shoot.  If you went second, you would both have a 1/3 chance of making a cup, which means you would have a 1/9 chance of making both of your shots.  You would then have a 1/3 chance of making your bonus shot.  Doing the math, this gives you an expected value of 19/27 cups on your first turn (1(4/9) + 2(2/27) + 3(1/27) = 19/27).  Thus, going second gives you an advantage of 10/27 expected cups through one full turn.  Of course, the team that went first then gets to take their turn, and will then have their own advantage, although this time only by 1/3 expected cup (assuming the same skill level). 

If we extend this out for this example (with two completely balanced teams), then Team A will always have an expected value of 1/3 more cups made after their turn.  Likewise, Team B will always have a slightly higher expected value of 10/27 more cups made after their turn. This means that Team A will have the advantage early on since they went first, but eventually those small differences in favor of Team B will swing the overall advantage their way.  All in all, this seems pretty balanced, and the hypothesis that comes from this is that it doesn't matter who goes first if the teams are equally matched.

Of course, if you have noticed the length of this post, I didn't stop there.  In one of my MBA classes, we've been using process simulation software to estimate various wacky things like fast-food drive through wait times and roulette strategies.  One such tool, an Excel add in called @Risk, allows you to use various distributions to run Monte Carlo simulations.  I used this to simulate beer pong games with various skill levels of players in order to determine if there are times when going first is an advantage or disadvantage.  Before I share my findings, a few assumptions and limitations:

1. I started out with three levels of beer pong players for simplicity's sake:  A good player with a mean shot percentage of 0.33 and a standard deviation of 0.1, an average player with a mean shot percentage of 0.25 and a standard deviation of 0.08, and a bad player with a mean shot percentage of 0.1 and a standard deviation of 0.03.  These players will be henceforth known as good (G), average (A), and bad (B).

2. I assumed a normal distribution for one's shot-making ability.  This may or may not be appropriate for the matter at hand, but until someone does an empirical study on this, I'm just going to use a normal distribution.

3. I didn't account for "hot streaks" or anything like that.  Since it's been shown in studies of other sports that hot streaks are generally just random variation, I would assume that beer pong is not any different (And yes, I'm classifying beer pong as a sport for the purposes of this exercise).  Yes, there could be a magical zone between sober and drunk where you're especially awesome, but odds are you're probably just remembering your best games and forgetting all the times you lost.

4. I didn't account for rebuttals/overtime, since I couldn't think of a simple way to implement it.  Perhaps I will follow up on this later, but I can't imagine it will sway the results by much.  In the situations where one team has a superior player, it may sway the numbers slightly towards that team if I was to include it.

Without further ado, here are my findings:

Good team vs. good team (GG vs. GG)

The original objective of this investigation was to see if two evenly matched teams could gain an advantage by going first or deferring.  As with all future tests, I ran 10000 simulations of a beer pong game with players of the afforementioned skill levels.  For this scenario, the team that went second won 50.26% of the time.  From this, it would appear that deferring is the better strategy, but there are caveats.  First, I ran this a few other times, and occasionally got values slightly less than 50%, so this is not a hard and fast value.  Second, even if we accepted 50.26% as the true probability, that isn't really enough of an advantage to have practical application.

Good team vs. average team (GG vs. AA)

Next, I wanted to see if various "mismatches" change the equations.  In this case, I pit an all-star team against a more average outfit.  Here is the superior team's chances of winning for the different choices:

Going first:  77.67%
Deferring:  77.88%

Once again, we see a slight advantage in deferring.  Of course, that advantage is too small to make a practical difference.

One stud vs. another stud (GA vs. GA)

Let's split up the good team from the previous example to see if "mixed" teams get different results.  In this case, the odds of winning for the team that goes first is 50.56%, which suggests a slight reversal for this case.  Still, this is close enough to even to be dismissed.

Mixed bag vs. average team (GB vs. AA)

This is where it gets interesting.  Here's the familiar scenario where one player carries a team against a more balanced opponent.  The hope was to see if a more extreme situation might tip the balance one way.  The results for the "mixed bag" team (GB):

Going first:  45.45%
Deferring:  29.37%

I ran this several more times to confirm the wide spread, and every time I got very similar results.  It appears that you can add over 15% win probability just by going first if you find yourself in such a situation.
Of course, this wide of a spread is very unexpected, so I had to dig deeper.  I actually had been calculating the average "cup spread" of each simulated game, but I haven't yet shown it because the metric hadn't added anything to the previous analyses, as it mirrored the win percentage results.  For this example however, the results are quite interesting and shine a bit of light on the previous findings:

GB going first: GB wins by an average of 2.53, AA wins by an average of 3.90
GB going second: GB wins by an average of 3.43, AA wins by an average of 3.22

This shows that while GB is less likely to win in the second scenario, they are generally expected to produce larger wins.  This is likely because the second scenario gives the best player in the game more chances to hit bonus shots.  Thus, it looks like the difference between the two choices is actually pretty even.  By multiplying the two statistics together and subtracting, we can find the average cup margin between the teams:

GB going first:  GB expects a cup margin of -1.09
GB going second:  GB expects a cup margin of -1.2

Here we find a very small difference overall in how many cups the team is expected to hit regardless of the initial decision.  Of course, getting the win is usually the main objective, so teams that find themselves in these situations are advised to only concern themselves with the first set of results.  Of course, if you find yourself in pool play of a tournament that only cares about cup differential, you might want to gamble on a bigger victory by going second (if you are indeed GB).  Additionally, if you find yourself needing a big win (in terms of cups) in the final game of pool play to make the tournament, gambling on option 2 might be a better selection.

One stud vs. average team (GA vs. AA)

Here's another familiar scenario where everyone is competent, but one person excels.  Here are the probabilities of victory for the team with the stud (GA):

Going first:  67.53%
Deferring:  64.52%

Once again, we see the team with the best player seeing an advantage to going first.  It's not as pronounced in a more "balanced" game, but you'll still see an advantage going first in these scenarios.  Once again the cup statistics show the reason for this imbalance:

GA going first: GA wins by an average of 3.70, AA wins by an average of 3.08, GAs expects a cup margin of +1.42
GA going second: GA wins by an average of 3.75, AA wins by an average of 2.87, GA expects a cup margin of +1.41

We see again that the actual difference in cup margin is very slight and that the option with lower win probability for GA actually has a higher cup probability.  Once again, it seems to be a matter of risk versus reward.

Conclusion

Overall, it doesn't seem to matter which option you choose for most scenarios.  There are a few things to consider when one player is better than everyone else, but in practice it may be hard to determine the true talent of a player, so these findings may not apply as often as you think.