Hockey Numbers: Early Predictions

I’ve been having some fun predicting the outcome of the season with very little data. Like many readers realized these predictions aren’t really predictions as such as they have way too much error and make predictions all people know wont happen, like Phoenix getting less than 30 points. This is largely because one game’s goals against can significantly affect the average (9 goals against in one game increases the average over 9 games by 1 goal, which is a lot in terms of winning percentage). So I needed a formula to get a more accurate average for teams with anomalies in their data set (results that should only happen 1% of the time, occurring in 10% of their games). The goal is to scale game’s scores that are too many standard deviations away from the team’s average to a smaller part of the team’s average.

The nice thing about these predictions is that in general I know the standard deviation for goals against (the problem data), if a team has a game that’s more than 2 standard deviations away from their average then that game, should only happen about 4 times in a season and 3 standard deviations once every 5 seasons. In hockey goals against occur with an average of 2.85 and standard deviation of 1.7. Thus an average team should have 8 games in a season with 5 or more goals against, a bad team probably 16. So I take any game that’s beyond 1 standard deviation and give it a smaller weight. For standard equal weights you multiply every term by 1/n, for my weights I multiply by some constant c_i/(Σc_i), where c_i = 1 if within 1 standard deviation of the average and c_i = √(1.7/μ– ga_i) otherwise. The square root increases the amount of focus it puts on outliers, I used it because it produces slightly better results.

The neat thing about predictions in sports is that if you can define some algorithm you can test it on past data (and hope it works in the future). So, since I have 2005-2006 data, I can test this algorithm on the first few games of that season and see how well it performs. The first test any sort of regression should do is how much error does an average have. The total sum of squared error for the average is 7927, my original model actually increased the error (one team had error of 68 contributing to almost half of the total error). So I reapplied my normalizing algorithm mentioned above and got a sum of squared errors to be 5317, or 72% for an r² of 33%. Or I could say with 12% of the games I was able to explain 33% of the final variability. 9 teams were within 5 points of their predictions (almost a third), and 19 were within 10. The standard deviation was 13 (for a ± 26 confidence interval 95% of the time), which is better than the 20 I predicted from before. The worst prediction was Dallas who got 38 more points than predicted (they had a lot of goals against early on, but won games) accounting for 27% of my error. While this is a regression style analysis, the prediction is not based on a regression, but simply on the assumption that goals for and goals against correlate with winning (which is known).

Problem with my predictions this year and 2005-2006 predictions is that there appears to be a more competitive start to the 2005-2006 season than this year. So the best prediction was 109 points and the lowest was 60, creating reasonable minimums and maximums. This season teams don’t seem to want to be competitive there are a significant number of good teams to start this season: Dallas, Anaheim, Ottawa, and Atlanta not to mention the bad teams: Philadelphia, Chicago, Columbus, and Phoenix. The question of course is whether this season will be less competitive than past season (possibly a direction of the new CBA – I’ll look into later). What I’m basically trying to say is that the best algorithm of 2005-2006 won’t produce the best results for 2006-2007, but they should be useable.

So without further ado, here are the 2005-2006 results.

WEST:

Team	PTS	EPTS	ERR
Detroit Red Wings	124	102.554	21.4
Dallas Stars	112	74.1465	37.9
Calgary Flames	103	89.7303	13.3
Nashville Predators	106	89.1933	16.8
San Jose Sharks	99	98.1116	0.9
Anaheim Ducks	98	91.138	6.9
Edmonton Oilers	95	89.3213	5.7
Colorado Avalanche	95	91	4.0
Vancouver Canucks	92	90.5752	1.4
Los Angeles Kings	89	88.5126	0.5
Minnesota Wild	84	85.8979	1.9
Phoenix Coyotes	81	89.3955	8.4
Columbus Blue Jackets	74	80.4389	6.4
Chicago Blackhawks	65	83.1687	18.2
St. Louis Blues	57	77.0534	20.1

EAST:

Team	PTS	EPTS	ERR
Ottawa Senators	113	107.501	5.5
Carolina Hurricanes	112	102.183	9.8
New Jersey Devils	101	79.0032	22.0
Buffalo Sabres	110	92.838	17.2
Philadelphia Flyers	101	92.983	8.0
New York Rangers	100	98.7356	1.3
Montreal Canadiens	93	88.2429	4.8
Tampa Bay Lightning	92	98.0173	6.0
Toronto Maple Leafs	90	80.0578	9.9
Atlanta Thrashers	90	84.3788	5.6
Florida Panthers	84	94.902	10.9
New York Islanders	78	81.3242	3.3
Boston Bruins	74	89.3152	15.3
Washington Capitals	70	70.1098	0.1
Pittsburgh Penguins	58	80.2557	22.3

Of course I cannot predict or for that matter know what teams will do to solve problems early on. Boston for example (projected for 89 points) traded their top forward to San Jose (predicted exactly due to bad prediction for Dallas [over predicts San Jose to win games vs. Dallas] and Joe Thornton [Better player, better team] compensated each other). This may not be all that useful at this point, but it’s a start. Unlike most predictions I’m at least testing my hypothesis!

Hockey Numbers

October 26, 2006

Early Predictions

No comments:

Usage Statistics

About Me