September 22, 2006

Pre-Season

The Battle of Alberta has an interesting article on the significance of the pre-season and its ability to predict the regular season. Using a nifty tool the Spearman’s correlation rank test Colby Cosh concludes “that success in exhibition games does predict success in the regular season”. I will argue that points and thus winning percentage satisfy normality, homoscedasticity and linearity assumptions and so I can use a standard regression. This isn’t too much of a stretch.

Colby Cosh used Yahoo! sports to compare the pre-season data of 2005-2006. I decided to include 2003-2004, so my data set has 60 data points.

Regression

So the easiest tool is a regression on winning percentage pre-season vs. winning percentage regular season as one can see in the graph below. If you remove the constant (removing constant creates a “bad” regression) you get a near “perfect” coefficient of 0.939 ± 0.04 (note: doesn't reject 1) [graph 1]. In fact this regression has better error properties than the error with a variable intercept (reg = 0.445 + 0.182 * pre [graph 2] ). I don’t have the patience to compare these two regressions, but I will say the relationship “looks good”. Interestingly goals for have similar regression: Goals For Reg-Season = 0.945* Goals For Pre-Season, similarly for Goals against: Goals Against Reg-Season = 0.962 Goals Against Pre-Season.

There’s not much to say here, but to look at the actual graph of this regression, it’s not pretty, but you can see how the regression was fit and how it “makes sense”. You can also see it’s not chasing the outliers like a standard regression would. Another way of looking at these problems can been seen with this example: a team that performs at 70% in the pre-season has a confidence interval of 60% to 70% (confidence interval is of the “mean”), and you can only predict with ±30% (worthless in the NHL), and this is the real question here. With the regression that includes the intercept the prediction intervals are cut by 50%, but still cover most teams’ general success level for a season (not much better). So while there’s an obvious relationship between preseason and regular season (as one would expect) this is not useful as it’s an average and there are a lot of below average and above average teams that comprise this score.



Conclusion

So while there’s a general tendency for teams to do well in the regular season if they’ve done well in the pre-season it’s no guarantee or even something to predict the regular season with. As a Vancouver fan I don’t exactly like these results, but I understand they’re a poor predictor, so I’ll wait for the real thing to come to make any conclusions.

As an aside: Vancouver has to stay out of the box, so far by counting they’re a -8 in terms of power play opportunities. Also, I hope Luongo improves his 0.839 save percentage.

Added Note:

I decided to include another variable in the model: that is the winning percent of last season (it's a good control variable). If you do a regression with two varibles: pre-season and last season, you get a regression that looks like: winning% = - 0.011 + 0.565 * pre + 0.941 * last - 0.861 * pre * last. However, the only term that is significant in this model is last season's performance [The others should be removed, but I wont]. So the preseason is a worse "predictor" than the previous season. For example a team that is 100% in pre-seaon play and 55% [playoff team] is predicted to perform anywhere from 42% to 78%, however a team that does 55% in per-season and 55% in last season is expected to perform anywhere from 40% to 73%, so not a huge difference for the preseason.

In other words, if you want a prediction look at last season it's much more informative than the pre-season...

2 comments:

Earl Sleek said...

Heh. I had a feeling you'd chime in on this one. Keep that Cosh honest over there.

Kent W. said...

Thanks for the post. It confirms my non-mathematically derived assumptions about preseason success.