## October 26, 2006

### Early Predictions

I’ve been having some fun predicting the outcome of the season with very little data. Like many readers realized these predictions aren’t really predictions as such as they have way too much error and make predictions all people know wont happen, like Phoenix getting less than 30 points. This is largely because one game’s goals against can significantly affect the average (9 goals against in one game increases the average over 9 games by 1 goal, which is a lot in terms of winning percentage). So I needed a formula to get a more accurate average for teams with anomalies in their data set (results that should only happen 1% of the time, occurring in 10% of their games). The goal is to scale game’s scores that are too many standard deviations away from the team’s average to a smaller part of the team’s average.

The nice thing about these predictions is that in general I know the standard deviation for goals against (the problem data), if a team has a game that’s more than 2 standard deviations away from their average then that game, should only happen about 4 times in a season and 3 standard deviations once every 5 seasons. In hockey goals against occur with an average of 2.85 and standard deviation of 1.7. Thus an average team should have 8 games in a season with 5 or more goals against, a bad team probably 16. So I take any game that’s beyond 1 standard deviation and give it a smaller weight. For standard equal weights you multiply every term by 1/n, for my weights I multiply by some constant ci/(Σci), where ci = 1 if within 1 standard deviation of the average and ci = √(1.7/μ– gai) otherwise. The square root increases the amount of focus it puts on outliers, I used it because it produces slightly better results.

The neat thing about predictions in sports is that if you can define some algorithm you can test it on past data (and hope it works in the future). So, since I have 2005-2006 data, I can test this algorithm on the first few games of that season and see how well it performs. The first test any sort of regression should do is how much error does an average have. The total sum of squared error for the average is 7927, my original model actually increased the error (one team had error of 68 contributing to almost half of the total error). So I reapplied my normalizing algorithm mentioned above and got a sum of squared errors to be 5317, or 72% for an r2 of 33%. Or I could say with 12% of the games I was able to explain 33% of the final variability. 9 teams were within 5 points of their predictions (almost a third), and 19 were within 10. The standard deviation was 13 (for a ± 26 confidence interval 95% of the time), which is better than the 20 I predicted from before. The worst prediction was Dallas who got 38 more points than predicted (they had a lot of goals against early on, but won games) accounting for 27% of my error. While this is a regression style analysis, the prediction is not based on a regression, but simply on the assumption that goals for and goals against correlate with winning (which is known).

Problem with my predictions this year and 2005-2006 predictions is that there appears to be a more competitive start to the 2005-2006 season than this year. So the best prediction was 109 points and the lowest was 60, creating reasonable minimums and maximums. This season teams don’t seem to want to be competitive there are a significant number of good teams to start this season: Dallas, Anaheim, Ottawa, and Atlanta not to mention the bad teams: Philadelphia, Chicago, Columbus, and Phoenix. The question of course is whether this season will be less competitive than past season (possibly a direction of the new CBA – I’ll look into later). What I’m basically trying to say is that the best algorithm of 2005-2006 won’t produce the best results for 2006-2007, but they should be useable.

So without further ado, here are the 2005-2006 results.

WEST:
 Team PTS EPTS ERR Detroit Red Wings 124 102.554 21.4 Dallas Stars 112 74.1465 37.9 Calgary Flames 103 89.7303 13.3 Nashville Predators 106 89.1933 16.8 San Jose Sharks 99 98.1116 0.9 Anaheim Ducks 98 91.138 6.9 Edmonton Oilers 95 89.3213 5.7 Colorado Avalanche 95 91 4.0 Vancouver Canucks 92 90.5752 1.4 Los Angeles Kings 89 88.5126 0.5 Minnesota Wild 84 85.8979 1.9 Phoenix Coyotes 81 89.3955 8.4 Columbus Blue Jackets 74 80.4389 6.4 Chicago Blackhawks 65 83.1687 18.2 St. Louis Blues 57 77.0534 20.1

EAST:
 Team PTS EPTS ERR Ottawa Senators 113 107.501 5.5 Carolina Hurricanes 112 102.183 9.8 New Jersey Devils 101 79.0032 22.0 Buffalo Sabres 110 92.838 17.2 Philadelphia Flyers 101 92.983 8.0 New York Rangers 100 98.7356 1.3 Montreal Canadiens 93 88.2429 4.8 Tampa Bay Lightning 92 98.0173 6.0 Toronto Maple Leafs 90 80.0578 9.9 Atlanta Thrashers 90 84.3788 5.6 Florida Panthers 84 94.902 10.9 New York Islanders 78 81.3242 3.3 Boston Bruins 74 89.3152 15.3 Washington Capitals 70 70.1098 0.1 Pittsburgh Penguins 58 80.2557 22.3

Of course I cannot predict or for that matter know what teams will do to solve problems early on. Boston for example (projected for 89 points) traded their top forward to San Jose (predicted exactly due to bad prediction for Dallas [over predicts San Jose to win games vs. Dallas] and Joe Thornton [Better player, better team] compensated each other). This may not be all that useful at this point, but it’s a start. Unlike most predictions I’m at least testing my hypothesis!