Data is king. When we’re making any decision of consequence about the future — whether in business, sports or politics — our tech-filled world demands that we collect data to make a meaningful prediction.
Despite the criticism that gets leveled at the NCAA tournament selection committee, the group does take a data-driven approach to choosing the competitors for March Madness. The RPI, or Ratings Percentage Index, helps the committee select and seed the field of 68 teams. The RPI takes into account all college basketball games and measures not only the strength of each team’s opponents but strength of its opponents’ opponents as well.
In contrast, the preseason AP and Coaches polls don’t use data from the current season. They can’t, as members must submit their ballots before seeing a single game. It’s the ultimate in gut-level, subjective analysis.
So surely the RPI rankings from the end of the season are a better predictor of tournament success than the preseason polls, right? Right?
Nope. The RPI doesn’t beat out the early polls — well, at least not by one measure. The margin between them is admittedly narrow — they all predict the correct result around 70 percent of the time — but the fact that it’s so close shows how remarkable the gut-instinct ratings are and how flawed the RPI is.
To find this result, I looked at how often the team that was ranked higher by each system won a tournament game. This study includes the past 16 years of men’s NCAA tournaments, or 1045 games, including play-in rounds.
The RPI ranks every Division 1 team, so it makes a prediction in every tournament game.12 The polls, by contrast, rank only the top 25 teams, leaving most of the eventual March Madness field unranked. To partially correct for this, I included all teams that got votes in the polls and ranked them according to the same points system that determines the top 25. For game outcomes, any ranked team is predicted to beat any unranked team, a higher-ranked team is predicted to beat a lower-ranked one, and whenever two unranked teams meet, no prediction is made.13
Over the past 16 tournaments, the RPI’s higher-ranked team won 69.2 percent of tournament games (723 correct, 322 incorrect). Surprisingly, the preseason AP poll did slightly better, as the higher-ranked team won 71.8 percent of tournament games (674-265, with no prediction in 106 games).
The preseason Coaches poll picks games with the same accuracy of 71.8 percent (671-263 with no prediction in 111 games).14
|Game outcomes picked|
To see the predictive power of the preseason poll, consider Wisconsin in last year’s tournament. The Badgers ranked ninth in the preseason AP poll, as they brought back Bronson Koenig and Nigel Hayes, two starters on the 2015 team that lost to Duke in the title game.
Wisconsin didn’t live up to preseason expectations during the 2017 regular season. The team was 25-9 heading into the tournament, ranking 35th on the RPI, which resulted in a No. 8 seed. However, they beat No. 1-seeded Villanova in the Round of 32, and then came within 1 point of beating Florida to make the Elite Eight.
What can we learn from this study? Is data no longer king when it comes to college basketball?
For one thing, the preseason polls harness the wisdom of crowds, a surprisingly powerful predictor. No one ballot is perfect, as each will make some bad calls and reflect a person’s biases. However, putting many ballots together helps cancel out these small errors and leaves a powerful predictor of team strength. FiveThirtyEight has known this for years and incorporated these polls in its NCAA tournament predictions.
In addition, the RPI is a poor predictor because it restricts itself to wins and losses. More accurate methods use teams’ margin of victory or points per possession to make rankings and predictions. These approaches do a better job of stripping away the noise built into wins and losses, as a team’s record can look very different depending on the outcomes of a few fluke buzzer-beaters or blown calls.
This year’s tournament might be a particularly good one to demonstrate the predictive power of the preseason polls. Before the season started, Duke was ranked first and Michigan State was ranked second in both the AP and Coaches polls. However, these teams underachieved to end up as the 2 and 3 seed, respectively, in the Midwest region (and 6th and 15th in RPI).
The preseason poll should remind us why these two teams are capable of deep tournament run: They have superior talent. Duke has three first-round picks, according to the latest ESPN NBA mock draft — big men Marvin Bagley III and Wendell Carter Jr., and shooting guard Grayson Allen — with point guard Trenton Duval listed as a second-round pick. Michigan State features two projected lottery picks — Jaren Jackson Jr. and Miles Bridges — along their front line. Both teams have the next-level talent needed to win six straight games and cut down the nets on April 2.
What team might be overrated according to the preseason poll? Virginia, the No. 1 seed in the South region, tops this list, as they didn’t crack the top 25 in either preseason poll. Virginia always plays incredible defense, but this team lacks for NBA talent.
Data is king, but only if used properly. The RPI is a poor method for evaluating college basketball teams. To get an edge in your tournament pool, you might be better off ignoring the metric the committee uses and looking to the preseason polls, a surprisingly powerful predictor of the tournament.
Check out our latest March Madness predictions.
Ed Feng has a Ph.D. from Stanford and developed the predictive algorithms for his sports analytics site The Power Rank.