Monday, March 14, 2016

NCAA Tourney Predictor 2.0

If you're a regular reader of the blog or a sentient web crawler, you may have thought my crowning analytical achievement in this space was the college football playoff predictor.  You would be wrong.  Four months in advance of creating a different kind of life, I have birthed a different entity with one very specific purpose: Picking the NCAA Men's Basketball Tournament. 

Why does this exist?

The simple answer is that winter is boring.  The complicated answer can be found in my post about last year's beta version.  Basically, I wanted to do something different (and thus worthwhile) by accounting for the importance that "matchups" supposedly play in March.  While I think that word is often thrown around to explain simple random variation, I still think it's possible that certain teams' construction might be more adept at beating certain other teams.  By breaking down teams to their key components and simulating games against the teams they actually face, I hope to glean some more subtle insights that other models may miss.

How does it work?

The basic explanation for this model is very simple: I simulate the tournament 10,000 times, and base the win percentages for each level on the results of those simulations.  The backbone that makes this actually representative of the teams involved is far more complicated.  Let's break the main concepts I wrestled with into separate sections.

1. Team Statistics

To simulate the outcomes of games for the teams involved, I need to pull in several metrics to describe those team's performance.  To avoid overly complicating things, I focus on the four factors, and add in a couple of wrinkles.  The full list of stats I use to simulate a game (for both offense and defense) are:

Two-point shooting percentage
Three-point shooting percentage
Percentage of shots taken from three
Free throw shooting percentage
Offensive rebounding percentage
Turnover percentage
Foul percentage
Adjusted Tempo

For every statistic except for the foul rate, I get my metrics from kenpom.com.  While Pomeroy does report on foul rate, he uses the metric FT/FGA.  There are reasons that this is a better measure of a team's true talent in this area, but it's kind of useless in a simulation.  Thus, I pull foul rate information from Team Rankings.  All data is up to date through the end of Championship Fortnight.

2. Team Statistic Adjustments

Most of the statistics above can be used out of the box.  If a team shoots 40 percent on three-pointers, I can plug that directly in to the simulation (there are, of course, opponent adjustments...I'll get to those in a bit).  That said, two metrics (turnover percentage and adjusted pace) need a little work done to be able to integrate with everything else.  The reason for this is a seemingly semantic difference that is actually quite important: the difference between possessions and plays.

The concept of the possession, which describes one team's trip to their end of the court, is the main underlying concept of mainstream basketball analytics.  Understanding the possession, and team's performance relative to the number of possessions, helps analysts to tease out non-obvious truths.  Slow-paced teams such as Virginia and Wisconsin may not put up huge point-per-game numbers, but when we look at scoring per possession, we see a different story.  In the metrics I pull from Pomeroy's site, the adjusted tempo and turnover rate metrics both share a denominator of possessions.  From an analytical standpoint, there is no problem with this.  But when I turn to my simulation, I need to view these in a slightly different light - that is, through the concept of the "play".  Each individual team possession ends with a made shot, defensive rebound, or turnover.  Within each of these possessions can be one or more plays.  A team can shoot, miss, grab an offensive rebound, and shoot again.  This will count as one possession, but two plays.  From the standpoint of my simulation, I am looking to simulate one play at a time.  Thus, I need to know the likelihood that a team turns the ball over on each given play, as opposed to the whole possession.

So how do we do this?  Well from a high level, it's pretty easy to grasp.  The ratio of turonvers per possession to turnovers per play is as such:

TORATE(POSS)/TORATE(PLAY) = 1/(1-EP)

Simple enough, right?  We simply take the ratio of turnovers per possession to turnovers per play, and set it equal to a multiplier based on the number of extra plays (EP) we would expect from that team.  Where does the term on the right come from?  Two explanations:  One, we can simply look at the right hand of the equation as Plays/Possessions.  The ratio of plays to possessions will always be plays/(plays-extra plays), which is roughly the form of that equation.  The second explanation uses infinite sums (fun!)  Let's suppose a team gets an "extra" play 10% of the time.  This means that the ratio of plays to possessions will be 1*(0.1)^0 + 1*(0.1)^1 + 1*(0.1)^2 + ....  If you're inclined towards math, you may notice this resembles the infinite sum (for all k from 0 to infinity) of the term 1/x^k, for x greater than 1.  This sum works out to 1 + (1/(x-1)).  In our case though, EP is decidedly less than 1, so we instead have to use 1/EP as our x term.  If we plug that in, and then reduce everything to its simplest form, we will eventually get 1/(1-EP) which is the term on the right side of the above equation.

The next step in this journey is to calculate the value of our EP metric.  To do this, we need to identify every time that a play is added to a possession.  These times are:

1. An offensive rebound after a missed field goal
2. A non-shooting defensive foul
3. An offensive rebound after a missed free throw

If we combine all of these situations, we get the following equation (the TO% you see has to be the per-play number, not the per-possession number we start out with.  Remember this in a couple paragraphs):

Extra Plays = (((1-FG%)*OR%)*(1-FOUL%-TO%)) + (FOUL%*NONSHOOT%) + (FOUL%*SHOOTFOUL%*(1-FT%)*OR%)

That's a lot of stuff.  Let's re-write that in words so it hopefully makes a little more sense:

Extra Plays = (Odds of rebounding a missed shot*Odds of taking a shot) + (Odds of drawing a non-shooting foul) + (Odds of rebounding a missed free throw)

There is one final wrinkle.  We need the extra plays number in order to calculate the per-play turnover rate.  But we need the per-play turnover rate to be able to calculate the extra play number.  This initially seemed like a breaking point for me, such that I wouldn't be able to move forward with the metrics at hand.  Luckily, some math happened.  Specifically, I noticed that my original equation could be re-stated as such:

F(TO1)/G(TO2) = 1/(1-H(TO2))

Where F, G, and H are functions representing turnover rate per possession, turnover rate per play, and extra plays, respectively (TO1 is the per-possession metric, while TO2 is the per-play metric).  If we plug in the values for each of these functions, we can actually simply everything to find a function X such that TO2 = X(TO1).  This function is:

TORATE(PLAY) = TORATE(POSS) * (1 - x*y - z) / (1 - (TORATE(POSS) * x))

In this equation, x is the odds of rebounding a missed shot, y is the odds of not drawing a foul, and z is the odds of rebounding a missed free throw.  I could hardly believe this actually worked that I had to demonstrate it through an example:


So yeah, all that work just to convert a metric from per-possession to per-play.  But it works, so that's all that really matters.

3. Foul Odds and Ends

In the previous section, you may have noticed the NONSHOOT% and SHOOTFOUL% metrics, and wondered what those were.  Well, hopefully the names are relatively straightforward: NONSHOOT% represents the number of times a foul is called without incurring free throws.  SHOOTFOUL% is simply the inverse.  In my previous screenshot, you'll see I used values of .387 and .613 for these respectively.  Why?  Well, it involves more fun math.

To start my journey, I needed to find a couple of basic stats and work from there.  First, I determined that the average number of fouls a team commits per half is 9.2 (This was based on 2014-2015 data.  I may need to revisit this now that more fouls are called).  As I don't have game logs for every game, I then approximated a distribution of the specific occurrences of each distinct number of fouls, ranging from 1-15.  This gave me a rough idea of how many times teams get into the bonus and the double bonus.  I then compared this to the number of free throws taken per team per half (roughly 10), and determined that the average foul creates the opportunity to shoot 1.1 free throws.  Finally, the pre-tournament free throw rate for 2014-2015 was 69.1%.  Now, with a data model that approximates the typical reality for a college team, I needed to find the following metrics:

% of fouls resulting in shooting a free throw (mentioned above)
% of fouls coming during the act of shooting
% of those shooting fouls where the field goal is made (aka and-ones)

The first metric is used only in my possession-to-play adjustment from the last section.  My simulation uses team-specific foul rates, and then counts up the number of fouls as you would in a regular game, which means the simulation itself has no use for general number like that.  What it does need in order to function properly, is an idea of the second and third figures.  Once again, I did not have individual game logs that would help me determine these two figures, so I needed to use all of the knowledge gained from the previous paragraph to estimate these figures for college basketball at large.  I did so using the following set-up in Excel:


The "Free Throws" and "Free Throws Per Foul" rows were based on the frequencies in the "Percentage of Time" row and the two goal values.  I then used Solver to set those two goals values, such that my "Total FT Per Foul" metric equaled to 1.1 (that metric was calculated by taking a SUMPRODUCT on the "Percentage of Time" row and the "Free Throws Per Foul" row.  Running solver gave me the values you see in the screenshot.  I felt good about the results, as the ~20% mark for making the field goals matches up pretty well with what I've read on the subject.  Once I calculated those values, I was able to calculate the % of fouls resulting in a free throw quite simply, by taking another SUMPRODUCT, this time on the "Percentage of Time" row and "Percentage of Shooting Fouls" row.

The one caveat to all of this is that applying one value for every team probably ignores some inherent differences between the teams.  Change any one of those three inputs for an individual team, and you're likely to see those numbers move a bit.  That said, I don't feel too bad about keeping it simple for now, for a few reasons.  One, I doubt there is a wide range of true talent when it comes to drawing and-ones, and even if there is, it's not going to have a huge impact on the outcome of a game (and-ones are pretty rare).  Two, calculating this figures for each individual team would take a fair amount of computational time, which might slow down my simulation.  Sure, I might be able to find a way to approximate these numbers by running a regression on the inputs or something like that, but that would take more research.  I do hope to improve this in the long-term, but for now, my model does incorporate team's foul rates on both offense and defense, and that should be a representative enough input to reasonably model reality.

4. Opponent Adjustments (out of game)

As mentioned earlier, the inputs to this model consist of several statistics that measure each team's talent in a number of areas.  However, I cannot use these measures as they are, because the numbers are not opponent-adjusted.  It's much harder to perform well against a tougher schedule, and that must be factored in if I am to recreate a realistic simulation.   

Luckily, opponent strength metrics are readily easy to come by.  The front page of Pomeroy's site has adjusted ratings for both opponent's offense and defense.  Of course, these number only apply to offenses and defenses at a high level.  What's the best way to apply these high-level metrics to each of the skill-specific metrics I'll be adjusting?

As it turns out, this is not a question with an easy answer, nor one with readily available research that I could use.  Long term, this is probably the number one opportunity for improvement in my model.  But for now, I feel I was able to come up with a decent approximation.

Using the data for all 351 teams, I took the five metrics I am adjusting for opponent strength (2FG%, OR%, TO%, Foul Rate, and 3FG%), and found the means and standard deviations for the population.  I did the same for the adjusted opponents' offensive and defensive ratings.  My first thought was to simply add these standard deviations together and be done.  So, if the Arizona defense was three standard deviations above average at preventing offensive rebounds, and if their opponent's offenses were one standard deviation above average, then the opponent-adjusted metric would show Arizona to be four standard deviations above average.  However, when I ran simulations based on this (using 2015 data), Kansas became the title favorite.  2015 Kansas was a good team, but was well below the other top-line teams in overall efficiency.  I expect my model to spit out some different results (otherwise what's the point?), but this was a bit too weird.  I quickly discerned that the reason for this was that this aggressive application of opponent adjustment was over-rewarding Kansas' supremely difficult schedule.

What I soon realized was that I had made a math mistake.  If a common set of opponents is one standard deviation above average on offense, we should not expect them to be one standard deviation above average in each individual component of that offense.  Rather, we would expect (on average) that these offenses are somewhere between 0 and 1 standard deviation above average in each individual component.  How then should I weight each component without any information as to opponent's proclivities in each area?  Well, recall that when we talk about the four factors, we generally consider their contribution towards making a good offense in the following proportions:

Shooting: 40%
Preventing Turnovers: 25%
Offensive Rebounds: 20%
Drawing Fouls: 15%

Given this, I did the following:  I created a spreadsheet with four random number columns, corresponding to each of the four factors (so the first column had random numbers from 1 to 40, and so on).  I then created a fifth column that summed the first four columns.  I then found the standard deviations of each column, and divided the standard deviation of the individual columns by the standard deviation of the summation column.  This gave me the following figures:

Shooting: .619
Turnover: .428
Offensive Rebounds: .351
Fouls: .235

After approximating the proportional differences in standard deviations by metric, I did one final thing before applying this to my opponent adjustments.  As it is unlikely that all of these skills are independent of each other (as they were in my crude spreadsheet goofery), I regressed all of these figures 50% towards one (so I added 1 to each, and divided by two).  Teams with good athletes are likely to be able to apply those skills across multiple areas.  Yes, some teams have wildly different skillsets and some teams openly choose not to excel in all areas, but I felt regressing 50% of the way was a good starting point.

To conclude this section, let's apply this modified adjustment to the earlier example.  Arizona is still credited with being three standard deviations above average at preventing offensive boards.  But now, we assume that the offenses they faced were only 1*(.675) = .675 standard deviations above average.  This means that Arizona's true talent metric now says they are 3.675 standard deviations above average instead of the original 4.  This lead to better results in the simulations I performed afterward, so I will keep this logic for now.

5. Opponent Adjustments (in game)

The final item of note concerns what happens when I actually match up two teams against each other for a game.  Let's say Team A is a 30% true talent offensive rebounding team, and that their opponent (Team B) only allows 20% of opponent's misses to be rebounded.  How many offensive rebound should we expect Team A to grab in this matchup?  My original model last year simply took the average of the two, but I knew this wasn't sufficient.  We know full well that things like 3FG% defense are tenuous as best, and so I wasn't sure how much credit we should give defenses for other things as well.  Luckily, Mr. Pomeroy spent the offseason tracking down exactly what I needed for my model.  This allowed me to apply the following weights to offense and defense:

Two-point shooting:  50% offense
Three-point shooting:  83% offense
Foul rate:  36% offense
Offensive rebounding:  73% offense
Turnover rate:  49% offense

Thus, for our earlier example, we would weight Team A's performance by 73% and Team B's by 27%.  This would give us an opponent-adjustment number of 27.3% for team A, which is what I would use in the actual simulation of a game between these two teams.

You will notice in the linked article that defenses have very little control over opponent's possession lengths and free throw shooting.  Thus, I left those un-adjusted.  I will likely delve into possession length a bit more in the future, but as it's unlikely to sway the needle much, it's not a top priority.

What I fixed from last year

If you clicked through to last year's post and looked at the first simulation, you probably noticed that something was off.  Namely, the highest-seeded teams weren't anywhere near as dominant as they should have been.  Given last year's top-heavy nature, the best teams should have been at least 95% to win their first game, instead of 80-90% as they were in my first run of the model.  After poring over the model, I noticed one key mistake: my opponent adjustments for turnovers was wrong.  The reason: Excel made me stupid.  I had copied over the opponent adjustment field from two-point field goals through all of the other metrics.  The problem with this is that while all of the other metrics improve in a positive matter (the higher your offensive rebounding rate, the better you are, etc...), committing turnovers is the opposite.  Thus, I needed to simply change a sign in a formula in my master spreadsheet and everything started to line up with what I would expect.

Things that are still outstanding

I have already mentioned many of things I would like to fix in future versions of this simulation, but I thought I would summarize them, for transparency's sake.  Here are the items, in a rough order of importance:

1. Opponent adjustments (before game)
2. Individual team foul metrics
3. Possession length adjustments
4. Additional statistics (ie. block rate)

While I got to a place that I reasonably happy with for the opponent adjustments, it's still mostly guesswork.  My main project for next season will be conducting a little bit of research into determining the best way to adjust team statistics for their level of competition.  I also think it's worth investigating the foul issues I spoke about earlier, even if I don't think the changes will move the needle all that much.  On the smaller end, my algorithm to determine play length (I used a gamma distribution, so I wouldn't get a bunch of 1 and 2 second plays) is good but not perfect.  Really slow teams get sped up a bit, while faster teams get slightly slowed down.  One could argue that it's a good thing that I'm building a little incidental regression into my model, but still, I can probably improve this.  Finally, there are a few scenarios (jump balls, blocks, end-of-game weirdness) that my model completely disregards at this point.  I don't consider these minor aspects of gameplay super important, but they might be worth a little future investigation.

The Tourney, Predicted

Finally, here's what you came for: The output.  Each number value in the table represents the number of times a team (the row) reach a certain round (the column).  For example, this means that in 10,000 simulations, Kansas made the Elite Eight a smidge over 56% of the time.  I have ordered this by championship probability.



The first thing that probably sticks out to you is the four Big 12 teams in the top six.  This made me worry that I still hadn't calibrated my opponent adjustments correctly (remember that the Big 12 was the strongest conference this year). That said, there are other Big 12 teams whose fates line up with their seed/ratings as well as other strong-scheduled teams (Virginia) whose championship odds don't seem exaggerated.  I am guessing that things are skewed slightly towards the power-conference teams (one example - I have the average 2-seed with about a 92% chance of winning in the first round, while KenPom averages around 88%), but this is starting to get closer to the truth.

I wanted to take a minute to point out a few odd results, and offer potential explanations.  The one that sticks out the most to me is Cincinnati receiving a 73% chance of beating St. Joseph's in the first round.  These teams feature fairly  equal efficiency ratings and strengths of schedule, so it's surprising that the result would differ so much from 50-50.  As it turns out, the main differentiator between the teams is three point defense.  Saint Joseph's efficiency rating benefits greatly from being top 15 in the nation in preventing three point makes, while Cincinnati has been quite poor at this.  But as we know, there is little predictive value in this, and my model compensates accordingly.  As a result, Saint Joseph's main strength is effectively neutralized, which gives the advantage to the Bearcats.

The second odd result is West Virginia, who has an 89% chance of beating a really good Stephen F. Austin team, and the second best chance of winning the whole tournament.  The key here seems to be West Virginia's excellence in certain areas (1st in raw offensive rebounding and 2nd in raw defensive turnovers) combined with their amazing strength of schedule (5th).  This combination of traits serves to make the Mountaineers number one by a distance in the opponent-adjusted versions of those measures (4 percentage points above the next best team in turnovers!)  As I said earlier, I think that I may be over adjusting in some instances, but the performances of teams like West Virginia will be a clear barometer of just how much I need to tip the scales in the future.

Sunday, March 13, 2016

Bracket Thoughts 2016

AHAHAHAHA CBS, THAT'S WHAT YOU GET FOR DRAGGING ON THE SELECTION SHOW...THE FIRST EVER LEAKED BRACKET.  TEAMS STUDYING THEIR OPPONENTS BEFORE THEY'RE OFFICIALLY ANNOUNCED.  I GUESS THAT'S WHAT HAPPENS WHEN YOU DEVOTE AIRTIME TO CHARLES BARKLEY, WHO DOESN'T EVEN WATCH NBA GAMES, LET ALONE COLLEGE.


*Ahem* Alrighty then, onto some more civilized discourse:

How did the Committee Do?


via GIPHY

Ha...never mind the civilized discourse.  Last year's bracket was possibly the best ever, with few major seeding issues and no legitimate exclusion gripes.  This year's bracket is garbage.  I could go on for a while about why, but let's try our best to summarize it into three points:

1. The gosh-darned RPI

Last year, I spent a few paragraphs commending the committee on finally discovering advanced metrics like a decade after the rest of us.  Unfortunately, I wasn't even able to make it through the listing of the one-seeds before the shrieking terror that is the RPI showed its face:


The RPI wouldn't end there.  Before the bracket was revealed, we saw roughly a dozen seedings/inclusions signaling its presence (The Pac-12 getting a bump almost everywhere is the most egregious instance of RPI influence, but top ten caliber teams Indiana and Kentucky facing off in the second round is a good poster child).  I've written a lot about why the RPI is bad before, so let me sum it up by saying this: Teams (and even conferences) with the ability to do so are able to manipulate the rating by using their scheduling power to pick and choose the teams that will provide the magic combination of boosting one's schedule without suffering too many losses.  That such a metric is king of the selection room (in even years, at least) is abhorrent.

2. Mid-Majors need not apply


Of course, raw RPI ratings are not quite the end to every discussion in the selection room.  There's also lengthy discussion about victories against groups like the Top 50 RPI teams.  The exclusion of non-power teams like St. Mary's, Monmouth, and Valparaiso (as well as the criminal under-seeding of Wichita State) shows that the committee's criteria of "good wins" and "bad losses" creates a near-insurmountable wall for such teams.*  Over the past decade, we've had nearly one mid-major final four participant per year, but that still hasn't been enough to convince the committee that these teams might actually be occasionally, you know, good.  Even the RPI says they are.

*To spell this out: Big conference teams have more games against good teams and fewer against big teams.  By design, this means they'll have more chances to beat good teams and fewer chances to lose to bad teams.  Furthermore, it's easier for the big-conference teams to get some of these games at home, which makes them easier to win.  So if you're simply adding up big wins or bad losses and calling it a day, you're showing your ass.

3. Inconsistency of criteria


My mention of Monmouth leads to my final point.  The committee may not be as consistent year-to-year as we would like, but they've always hammered home the importance of non-conference scheduling.  And sure, it's clear that some bubble victims (hi there South Carolina!) fell off the bracket because of that.  But what makes no sense is Monmouth, who had nonconference strength of schedule rank of 20th and won five road/neutral games in the execution of that schedule.  I fully understand that Monmouth's computer numbers are poor, but given how much the committee ignored that in other instances,* it's surprising that the Hawks didn't make it in over non-conference scheduling misers such as Michigan (235th), Colorado (247th), and Pitt (332nd). 

*Temple is 76th in BPI, 86th in KenPom, 98th in Sagarin, and 99th in LRMC.  How on earth is that an at-large team, let alone one that avoids Dayton?  The Owls might be the worst at-large selection the committee has ever made.

Mid-Major Thoughts

I generally like to reserve a little space in this post for a discussion of the mid-majors' conference tournament results.  This is more relevant than ever this year, with a seemingly high number of upsets making people question the value of determining automatic qualifiers in this manner.  Furthermore, every conference will decide their champion this way starting next year, as the Ivy ends its time as the final holdout.  I understand the frustration that some excellent seasons will go for naught, but here's the thing: Deciding conference champions in this manner is roughly just as fair as deciding it based on regular season records.

Of course, I am not the first one to have this thought, but I wanted to expand on the premise a bit.  From a purely theoretical standpoint, there are several legitimate defenses of the conference tournament.  First of all, the regular season generally only provides a clear champion a few times in any given season.  This season, conferences like the Big West and Ohio Valley saw a handful of teams finish within one or two games of the regular-season crown.  Because regular season crowns often come down to single games, the cry of the unfairness of tournament randomness can be just as easily applied to odd regular season results.  Secondly, ever-expanding leagues effectively mandate un-balanced schedules, which means that not everyone faces and equal path towards a regular-season crown.  Third, many conferences understand the randomness of conference tourneys and work to hedge their bets to a degree, granting the best regular season teams significant advantages such as home court or double-byes into the semi-finals.  The Sun Belt is a prime example of this, which last year set in motion a chain of events that led to the best bobblehead of all time.  Finally, conference tournaments are incredibly EXCITING.  Let's not forget that the whole point of this endeavor is to entertain.

From a practical standpoint, it might make even more sense.  Mr. Pomeroy shares his findings from the 2010 season in the link above, but it's easy to find a bunch of examples of the righteousness of conference tourneys from more recent years.  Setting aside the conferences where a bunch of teams are all roughly the same quality but one wins a game or two more, there are a number of times when the clearly best team wouldn't make the tournament if we awarded the crown to the regular season champ.  Take the 2014 WAC:

 
Or perhaps the 2012 Summit League:
 
 

In both of these cases, the better team (per Pomeroy) didn't win the regular season crown.  But luckily, they both got a shot to redeem their bad luck/poor performance and win the conference tournament.  And then we got Nate Wolters and Sim Bhullar into the big dance, so that's cool.

In the end, I'm not saying that conference tournaments are perfect.  There's going to be a lot of ridiculousness and a lot of heartbreak.  But as long as the small conferences only get to send one team to the dance, there won't ever be a perfectly fair way to make sure the "best" teams makes it.  So we might as well go with the most exciting possible way to do it.

All that said, this year was a bit of a bloodbath.  As usual, I have singled out the conferences that had either a clear frontrunner (or two) or those with multiple top-100 teams:



While we'll still see a good number of the best that some small conferences have to offer, the victims of championship fortnight will cause the peripheries of the Cinderella class to suffer.  Valparaiso, Saint Mary's, and San Diego State would have made formidable 12 seeds; without them, we're a little less likely to see the patented 12 over 5 upset.  On the other end of things, some of the most promising potential 16-seeds (Texas Southern, Wagner) failed to advance, which means we'll probably have to wait another year for the upset to end all upsets.  Still, not all was lost.  There was a lot of stress over Monmouth's loss earlier in week and their subsequent snub, but some models show that Iona is the more dangerous team, which doesn't even account for the fact that they have a potential NBA player in AJ English (many such upsets feature a stud like that).  Furthermore, other defeated one-seeds such as Hofstra, North Florida, and UAB would have been nice stories, but in reality those teams weren't any better than the teams that won in their place.  In all, the carnage was real, but won't have that major of an impact on the potential for chaos.

A Moment with My Teams

Per usual, let's spend a minute discussing my teams.  The two provide an interesting contrast as they both came off of extreme seasons (ND's first Elite Eight trip of the millennium, Creighton's worst season under McDermott), and ended up regressing to the mean, so to speak.  In other words, they both had fine seasons.

Thanks to the aforementioned tournament run last season, the baseline expectations for the Irish program were probably higher than ever at the start of the year.  A team that had lost its two best players was still ranked, even though (per usual) they added no instant impact freshman.  A core of Demetrius Jackson, Steve Vasturia, Bonzie Colson, and Zach Auguste was/is still quite good, but it's clear that Mike Brey deservingly receives a level of respect that he didn't have previously.  While the final results may have been slightly disappointing, there were enough highs* to justify my ever-growing trust in Brey.  I have no expectations whatsoever for the tournament, as it generally requires defense, but maybe that's for the best.

*Beating five top-25 caliber teams, carrying the #1 offensive efficiency rating for a few weeks, getting some nice contributions from the Freshmen

As for Creighton, I certainly expected improvement over a dismal 2015, but I did not expect the team to feature easily the best defense of the McDermott era (not that that's a high bar).  By necessity, this wasn't a defense based on raw athletic dominance, or even particularly great individual performances (Groselle's block rate is the only impressive individual defensive stat on the team).  Rather, the team focused on closing out on shooters and providing smart help defense.  That the team could see modest success with such a simple approach gives me confidence for the future.  This newfound defensive competency helped contribute to a statistical resume as good as or better than many of the at large teams selected today (The computers all liked the Jays roughly the same, as they ranked in the 40s in Pomeroy, BPI, Sagarin, and LRMC).

So why didn't they even get close to being selected?  Because of the same thing that haunted the team last year: poor late game play.  A year after placing 332nd (out of 351 teams) in the Pomeroy "luck" metric* in 2015, the Jays were even worse this year, finishing 336th largely thanks to a 1-4 record in one-score games.  The standard analytical line is that there isn't such a thing as repeatable clutch performance, and that we should expect Creighton to be roughly as fortunate as everyone else going forward.  Still, when the luck dragons attack your team back to back years, it can be a little unnerving.  Creighton's performed well in the clutch before, and they return enough talent to be able to do so again.  Here's hoping the Jays return to the dance next year.

*This simply measures the difference between expected results based on efficiency and actual win-loss results

Let's finally get to the regional breakdowns.  In all fairness, I got to this quicker than CBS.

EAST

Even thought I spent some time yelling at the committee, at least they had the good sense to make the regions relatively even in terms of overall strength.  Still, the East is probably the closest thing we have to a "group of death," with West Virginia, Kentucky, and Indiana all under-seeded to varying degrees.  Add in some great point guards (Demetrius Jackson, Kris Dunn, Yogi Ferrell, and Tyler Ulis), a vulnerable but still dangerous 2-seed in Xavier, and some great mid-majors, and you have yourself the best region of 2016.* 

* Just ignore Tulsa, a team that literally no one thought would make the tournament (and they're still not even the worst inclusion)

Best First Round Matchup: Kentucky vs. Stony Brook

It's beyond tempting to pick the turnover-fest in the 3-14 game ("Press Virginia" and SFA are the top two teams in the nation in defensive turnover rate), but I have to go with my very favorite matchup of the first round.  Normally, I would be perturbed that my favorite mid-major has to face off against a badly under-seeded team.  That said, Stony Brook has a few matchup advantages against a surprisingly porous Kentucky defense, which doesn't generate as many steals or rebounds as you would think.  It's probably a bit much to ask Jameel Warney to topple the Wildcats all by himself, but I feel distant echoes of Morehead State over Louisville in this one, so you never know.

Best Potential Matchup:  North Carolina vs. Indiana

This potential Sweet Sixteen battle would give us two great offenses with slightly different tendencies.  Sure, both teams love to crash the offensive boards with great frequency.  But whereas the Tar Heels' success is predicated on not turning the ball over and taking a lot of shots close to the basket, Indiana simply wants to make buckets from everywhere (they're top ten in both 2-point and 3-point shooting percentage).  We might not get a better clash of offenses all March.

The Pick for Houston:  North Carolina

Whoever the Heels face in the Sweet Sixteen is going to give them trouble, but if they can pass that test, I really like them against whoever emerges from the bottom of this bracket.  West Virginia's pressing ways probably won't work against a team with size, veteran leadership, and the proven ability to hold onto the ball.  When in doubt, take the team that was #1 in the preseason.


MIDWEST

The Midwest features several of the components that make up a great bracket.  Seniors such as Georges Niang and Roosevelt Jones get their final chances to shine on the biggest stage in the sport.  Strong mid-majors like Little Rock and Iona appear primed for a Cinderella run.  And Big East champ Seton Hall not only features an exciting young team, but also the first openly gay player in NCAA tournament history in Derrick Gordon.  That said, let's not kid ourselves about what this bracket is all about:  The third entry in the now-annual Michigan State-Virginia grudge match.

Best First Round Matchup: Iowa State vs. Iona

I've already mentioned that Iona features future NBA player AJ English.  What I haven't mentioned is that he gets to face off against a pretty bad defense whose only high-level skill is avoiding fouls.  Iona doesn't feature a front line that can dominate the offensive boards like last year's UAB squad (who beat the Cyclones in the first round), but they still have a shot of scoring a bunch early and forcing Iowa State to play catch-up.

Best Potential Matchup: Virginia vs. Michigan State

I love the idea of Denzel Valentine vs. Malcolm Brogdon deciding a Final Four berth, and I think it's as likely of an Elite Eight pairing as possible.*  In terms of talent, depth, and execution, there isn't a team I like better in 2016 than the Spartans.  But their one Achilles heel (forcing turnovers) matches up reallllly poorly with Tony Bennett's veteran squad.  Michigan State might have to go outside their comfort zone a bit to stop a surprisingly great Cavalier offense, but there's no one I trust more than Tom Izzo to pull it off.

*Utah has come on strong as of late, but there just isn't a team that matches up great with either of these teams.  Both Michigan State and Virginia are in the top five of all the total team metrics I regularly cite.

The Pick for Houston: Michigan State

It's them or the Hoos.  Virginia doesn't have an X-factor like Deyonta Davis, so I'll go with Sparty for the moment.  Real scientific I know, but it's possible that picking between these two teams is actually a Schrodingerian dilemma, so I'm fine with it.


SOUTH

The South is obviously dominated by the team at the top of the bracket, but there's a lot to like outside of Kansas.  Whoever comes out of the Cal-Maryland quartet is likely to be sporting a team with just as much (if not more) talent than the Jayhawks.  Miami and Villanova aren't the strongest high seeds in the bottom half of the bracket, but they're still great senior-laden teams that are unlikely to beat themselves.  Arizona is the rare under-seeded Pac 12 team, and has a size advantage over Kansas.  And then of course, the one team in the bracket with significant Final Four experience lurks quietly in the play-in game.  This isn't the most challenging bracket ever for an overall #1 seed (that would be the one 2010 Kansas unsuccessfully faced), but it's not weak.

Best First Round Matchup: Wichita State vs. Vanderbilt

Sure, it's hard to pass up the second time in three years that an 8-seeded Colorado will be a heavy underdog in the first round (it's actually not hard).  But, I'll take the game that features teams that are way to good to actually be playing in it.  Wichita's efficiency rating of 12th is an outlier to be sure, but it's not that big of one, and it doesn't account for some of their poor performance coming during point guard Fred Van Vleet's absence.  They're clearly a top-25 caliber team.  On the other side, Vanderbilt is a talent-laden team whose only major flaw (bad luck in close games - 346th in KenPom's luck metric) held them back from being great.  I don't actually love the matchup that much for the Dores - their lack of offensive rebounding or forcing turnovers plays right into Wichita's hands - but it's hard to count out a team with that much NBA talent.  I like the Shockers by a hair, but this should be one of the best games of the first-four era.

Best Potential Matchup: Kansas vs. California

It's been long known that Bill Self is wary of the three-pointer on offense, but things have been changing as of late.  That's good for the Jayhawks, because California leads the nation in two-point field goal defense.  Perry Ellis is going to find a way to score points, but I doubt Kansas can beat the Bears on two-pointers alone.  Add in Kansas' weird lack of depth (their player with the 5th most minutes played has played less than 40% of available minutes...how does that happen?) and Cal's proclivity to draw fouls, and this feels like a potential land mine for the overall #1 seed.

The Pick for Houston: Kansas

That said, I'm still picking Kansas.  Everyone in this bracket has the raw talent (Maryland, Cal) or the overall quality (Miami, Nova) to take down Kansas, but there's no one threat that scares me enough not to pick the favorite.


WEST

If you like odd brackets, this one's for you.  Other than a weakened Duke team, there isn't much in the way of perennial powers lurking in the West, which will likely allow for at least one school to have a program-best March.  Sure, the Oregon schools are both over-seeded, but I'm so excited to see Dillon Brooks and Gary Payton II in the tournament that I don't really care.  This feels like the bracket with the most potential for weirdness, which probably means it will be a straightforward affair.  Still, that would give us some amazing matchups in Anaheim, so I'm all for that.

Best First Round Matchup: Duke vs. UNC Wilmington

I'm skipping over the steal-fest between Oregon State and VCU so that I can point something out about Duke.  Let's take a quick look at what Duke has accomplished over the past decade:


Over the past seven seasons, Duke has had a top ten offense every year, and has not been seeded lower than their current 4 seed.  You may notice that there is a touch more variance when it comes to their defense.  Indeed, the previous two years in which Duke displayed a lack of defensive skill (2012 and 2014), it ultimately resulted in a first-round exit to a far lesser opponent.  Obviously, two odd results are not deterministic, but when you combine this with Duke's depth problems, you have yourself a very vulnerable situation for Coach K.  UNC Wilmington isn't the most terrifying matchup Duke could have encountered, but the Seahawks do have nice balance, both among teammates (no one has a usage rate about 22%) and their team metrics (nothing stands out, but nothing is bad either, except for a penchant for committing fouls).  I'm guessing Grayson Allen hits like 18 free throws and the Blue Devils prevail, but it won't be easy.

Best Potential Matchup: Oklahoma vs. Texas A&M

College basketball fans sometimes fawn a little too much over senior-laden teams that have played together for years.  But in this instance, you would have my permission to fawn as much as is possible.  Buddy Hield is the obvious draw here, but everyone on each starting five is going to contribute significantly if their team is to win.  Alex Caruso in particular has always been one of my favorites.  A friend who graduated from A&M once compared him to a poor man's Manu Ginobili, which is ridiculous, but also weirdly appropriate.

The Pick for Houston: Baylor

A bizarre bracket deserves a bizarre pick.  This isn't to say that Oregon and/or Oklahoma can't make the West a chalky affair, but without an obvious favorite, this feels like a good time to pick a non-obvious team.  The Bears have the opportunity to crash the boards against a few undersized teams, so why not?


FINAL FOUR

To be honest, I don't know who to go with at this point.  The three teams I liked most for the title (Sparty, UNC, and UVA, in that order) are all bunched up on one side of the bracket.  I'm wary of Kansas' lack of depth, but they might be the safest pick, regardless.  I'm not seeing an early release of who people are picking on ESPN yet, so I can' determine what the best chances for arbitrage are either.  I guess that means I'll stick with Michigan State for now, but that will almost certainly change as the week progresses.