that they match the expected error of ±15 (95% Confidence interval: [-30, 30]). This doesn’t indicate that expected goals for have no error themselves, but should indicate the error is smaller than the error with goals for and that’s why it’s a useful measure. The expected error for expected goals for per game is: √(3600*SFPG/3600*(1-SFPG/3600))*0.092 = 0.5 goals per game.
Defense is much more complicated to measure accurately, as mentioned above if a 9-1 game occur early in the season the teams’ statistics defensively are heavily affected by those 9 goals against and as such will likely appear worse than they really are. One cannot look at expected goals against, because they ignore the quality of the goaltending they have. So I’m stuck using standard goals against. As above you can estimate the error as √(3600*GAPG/3600*(1-GAPG/3600)) = 1.58 goals per game (remember that there are around 2.5 goals per game or that’s 63% of goals scored). Take note that expected goals for are three times more accurate than goals.
You can use addition and subtraction rules for values with standard deviation as well as for multiplication and division to get the error for an average team (50%):
GF2: √(2*(0.5/2.5)2)*2.52 = 1.76
GA2: √(2*(1.58/2.5)2)*2.52 = 5.58
GF2 + GA2: √(1.762 + 5.582) = 6.12 (or 12 ± 6)
GF2/(GF2 + GA2): √( [6.12/(2*2.52)] 2 + [1.76/(2.52)] 2)*0.5 = 0.36.
Now in order to implement this error estimate for higher winning percentages (I want linear) I need a method that doesn’t predict a team will win more than 100% of the time or lose more than 0% of the time that has the above error. This can be seen in the little graph.
Now I’m not going to show this, but binomial error decreases at √(n)/n (or 1/√(n)), where n is the number of games played. The same approximations can be made above and you get a decreasing winning percentage error as teams play more games. So I now have an error estimate as well as the winning percentages. Using the Poisson toolbox and I can get a prediction for every game based on goals for and against and then I can apply the above error rules to appropriately distribute the errors.
Note all these calculations are assuming that the teams do not change, which is in fact not true, because of injury and trades, but it’s the best estimate.
Now that you know how I’m determining who wins and loses each game I can now explain how I came up with my predictions. In order to get a average I need to cycle through the calculations below 10,000 times (take 45 minutes), I could do more, but I calculated the error to be around 0.5%, or probably around 1 point so 10,000 is a good balance between time and accuracy. I want to know 3 things: expected points, probability of making playoffs and probability of winning division.
First thing I calculate a random value for how much better or worse a team could be using the normal distribution and using the number of games played as a scale factor to lower the error to appropriate values. In order to calculate the error for an individual game I scale the linear errors from the teams to non-linear win prediction error (min: 0, max 1), following the concept of the little graph above, so games with a high probability of being wont are more likely be shifted down and similarly games with a low probability of being won are shifted up a little. However, most games (between 30% and 70%) are not affected at all by these things estimates as they are very close to linear. It should be clear that I look at every game individually so teams schedule difficulty are included in this estimate.
So no one care how I get there, but I now I have predictions for making the playoffs and winning the division many people remember from the end of last season. Here are the results for the west and for the east. You can see there are a number of teams with over and under estimate still, and that some teams have performed so bad that this system never predicts them in the playoffs.
No comments:
Post a Comment