Hockey often looks at highly variable statistics in order to determine averages that aren’t very accurate. What we care about is the true figure, but it often masked by a week of great play or a bad week. A great start to the season could give an average player a great plus minus for the entire season, whether it’s a measure of his true skill or not it will appear in the final results. A way of fixing this is to look at all the data between the start of the season and the current time and see what the trend is doing rather than just looking at the final result. For this example I’m going to use plus minus statistics. This will remove some of the luck in the results and make the results more real. Take for example Heatley with a 4.1 +/hr and 2.2 -/hr, he is measured as a ±1.9/hr (+23 in 12.1hours), however if you do a regression on his game by game data you’ll find that it looks like:
The spike at end allowing him to go from +12 to +23 in just 4 games, which appears to be an anomaly. The regression spits out a slope of +1.18/hr, which seems a little more reasonable, but significantly different from the 1.9 +/hr, which is the official average. You might also notice the slope doesn’t start at the origin (he is +2 before the season starts), this is an attempt to scale out luck at the beginning and so the regression doesn’t chase the results at the end of the regression. The goal is to find the best slope to fit the data and if one fixes the starting point you can only swivel around that point and so the regression will chase the final value because that is the one it can change the most as a small change is slope has the most effect on the values furthest from the origin.
The benefit of such a method is in terms of error. In general the standard deviation for the plus minus statistic is: sqrt(plusses + minuses), so Heatley works out to about ±8 of course it’s not quite that simple because plusses aren’t purely independent of minuses, although the best error it could be is sqrt(plus-minus), for Heatley this is sqrt(23) or 4.8. Using the regression method we get a standard error for the fit of 0.7, which is much better than the simple average. You can see the same effect with Crosby who is +4.2 and -2.5 (difference = 1.7), the regression says he’s playing at +2.3/hr or 35% than his simple average suggests due to the fact he’s on a mini even strength slump.
It’s a rather simple adjustment to how to look at the data, and it can produce some striking results. Most of the time this should produce the same results as a simple average due to the fact the most likely place for a variable to be at is close to the true mean then anywhere else, but in cases where there is a divergent short term trend that is significantly effect the final results this should produce better results.
The rest of the NHL results can be found on my website.