I’ve been having some fun predicting the outcome of the season with very little data. Like many readers realized these predictions aren’t really predictions as such as they have way too much error and make predictions all people know wont happen, like
The nice thing about these predictions is that in general I know the standard deviation for goals against (the problem data), if a team has a game that’s more than 2 standard deviations away from their average then that game, should only happen about 4 times in a season and 3 standard deviations once every 5 seasons. In hockey goals against occur with an average of 2.85 and standard deviation of 1.7. Thus an average team should have 8 games in a season with 5 or more goals against, a bad team probably 16. So I take any game that’s beyond 1 standard deviation and give it a smaller weight. For standard equal weights you multiply every term by 1/n, for my weights I multiply by some constant ci/(Σci), where ci = 1 if within 1 standard deviation of the average and ci = √(1.7/μ– gai) otherwise. The square root increases the amount of focus it puts on outliers, I used it because it produces slightly better results.
The neat thing about predictions in sports is that if you can define some algorithm you can test it on past data (and hope it works in the future). So, since I have 2005-2006 data, I can test this algorithm on the first few games of that season and see how well it performs. The first test any sort of regression should do is how much error does an average have. The total sum of squared error for the average is 7927, my original model actually increased the error (one team had error of 68 contributing to almost half of the total error). So I reapplied my normalizing algorithm mentioned above and got a sum of squared errors to be 5317, or 72% for an r2 of 33%. Or I could say with 12% of the games I was able to explain 33% of the final variability. 9 teams were within 5 points of their predictions (almost a third), and 19 were within 10. The standard deviation was 13 (for a ± 26 confidence interval 95% of the time), which is better than the 20 I predicted from before. The worst prediction was Dallas who got 38 more points than predicted (they had a lot of goals against early on, but won games) accounting for 27% of my error. While this is a regression style analysis, the prediction is not based on a regression, but simply on the assumption that goals for and goals against correlate with winning (which is known).
Problem with my predictions this year and 2005-2006 predictions is that there appears to be a more competitive start to the 2005-2006 season than this year. So the best prediction was 109 points and the lowest was 60, creating reasonable minimums and maximums. This season teams don’t seem to want to be competitive there are a significant number of good teams to start this season:
So without further ado, here are the 2005-2006 results.
|
EAST:
|
Of course I cannot predict or for that matter know what teams will do to solve problems early on.
No comments:
Post a Comment