FiveThirtyEight’s NBA Predictions: RAPTOR vs. ELO

Dane Van Domelen
5 min readJan 26, 2020

Some evidence for independent prognostic value

FiveThirtyEight’s NBA Predictions

FiveThirtyEight publishes predictions for every NBA game. Actually, two sets of predictions: “RAPTOR” and “ELO”. For each game, they publish a point spread and win probability based on each algorithm.

Here’s the situation today:

There’s actually quite a lot of disagreement between the algorithms here. They predict the opposite outcome in 3/6 games.

We’ve got two sets of predictions here, generated by perhaps the most well-known statistician in the world. For obvious reasons, it’s worthwhile to evaluate these algorithms, specifically to address the following questions:

  1. Is one better than the other?
  2. Do they provide complementary prognostic value?
  3. Do they provide prognostic value beyond the Vegas spread?

The 3rd question is by far the most important. In order to make money in sports betting, you need a truly prognostic betting signal, conditional on the casino’s prediction.

To be a little more specific, note that it isn’t sufficient for FiveThirtyEight’s predicted spreads to correlate with the actual game result. I could achieve that myself, say by picking the team with the better record to win by 5 points in every game, or even just picking the home team to win by 1 point. Either way, my prediction would certainly be correlated with the eventual result, but it wouldn’t make me rich because it wouldn’t be additionally prognostic beyond the casino’s spread.

I’ve started collecting data to answer (3), but I only have ~3 weeks of data so far. On the other hand, FiveThirtyEight publishes their predictions for the entire 2019–2020 season to date, which gives me a little over 3 months of data for addressing questions (1) and (2).

Dataset and variables

My dataset consists of 679 NBA games played between Oct. 22, 2019, and Jan. 24, 2020. Here’s what it looks like:

There are four candidate betting signals:

  • RAPTOR spread
  • RAPTOR P(Away Win)
  • ELO spread
  • ELO P(Away Win)

We don’t have to pick just one. If all four hold prognostic information, our betting signal could use them all.

As for our response variable, it depends on what type of bet we ultimately want to make. We’d use ‘Away Win’ if we’re ultimately interested in moneyline bets, and ‘Result’ if we’re interested in betting on the spread. Either could work, but I’m partial to the latter, as I think having all potential bets close to 50/50 is preferable.

Analysis

Basics

  • Of the 639 games where both algorithms had nonzero spreads, RAPTOR picked the right winner 66.4% of the time, and ELO 67.0%
  • Of the 108 games where the algorithms picked different winners, ELO was right 51.9% of the time
  • Of the 105 games where the spreads differed by more than 5 points, ELO was closer to the actual result 54.3% of the time

Distribution of residuals

The residual for each game is simply the actual point spread minus the predicted point spread. Ideally, the distribution of residuals should be centered at 0 (accurate) and as narrow as possible (precise).

No obvious difference visually here, in my mind. The mean residual was -1.51 for RAPTOR and -1.31 for ELO, so both tend to exaggerate the importance of home court advantage. Standard deviations were 12.7 for RAPTOR and 12.8 for ELO, so similar precision.

The home court issue is interesting. It turns out that RAPTOR and ELO picked the home team to win in 69.1% and 70.5% of games, respectively. The home team only actually won 54.8% of the time.

Correlation matrix

Let’s look at the correlations among our four candidate predictors and response variable:

A couple things to note here:

  • From the rightmost column, we see that RAPTOR spread and win probability are slightly more prognostic than ELO spread and win probability
  • The correlation between RAPTOR and ELO isn’t extremely high — 0.790 for point spread, 0.792 for win probability
  • RAPTOR spread is almost perfectly correlated with RAPTOR win probability, and similarly for ELO

On the last point, it seems that FiveThirtyEight uses a very simple method to map point spreads to win probabilities:

Based on this, I’m comfortable completely dropping the win probability variables from subsequent analyses. They’re redundant with the point spreads, and I much prefer using the spreads, as they’re on the same scale as the response variable.

So, we’re down to two candidate variables: the RAPTOR spread and ELO spread.

Linear regression analysis

Let’s look at the prognostic value of each algorithm separately:

The tendency for FiveThirtyEight to overweight home court advantage is clear here, as the y-intercept is negative for both algorithms. The slope is about right for RAPTOR (1.005) and a little too steep for ELO (1.064). The R-squared was slightly higher for RAPTOR (0.193 vs. 0.182), suggesting RAPTOR is slightly more prognostic.

Finally, we can use multivariable analysis to see whether the two algorithms provide complementary prognostic information. We saw earlier that the correlation between RAPTOR and ELO spread was 0.79. With 679 data points, I think it’s perfectly reasonable to include both in the same linear regression model.

Here’s the result:

Both spreads are highly significant predictors, meaning they offer complementary (non-redundant) prognostic information. The R-squared is 0.210, which is modestly higher than the single-algorithm models (0.193 and 0.182).

Next steps

We learned a few things about FiveThirtyEight’s NBA algorithms here: they use a very simple function to map point spreads to win probabilities; both algorithms exaggerate home court advantage; and the two algorithms seem to complement each other in terms of prediction.

The next step is to test whether either algorithm is prognostic beyond the casino’s point spread. If neither is, they’re essentially worthless. We’ll see!

--

--