November 17, 2006

Predicting what games go to Overtime

Problems with current predictions

So I’m going over 2005-2006 data to enhance my standings predictions. I was a little shocked that for example the northwest division when sorted by point projections was negatively correlated to actual points. In other words Minnesota with 22 is projected to be the worst (75 expected points) and Avalanche with only 16 points are expected to get 109 expected points. So I tediously went through the 2005-2006 data testing alternative algorithms. Of course season of testing isn’t going to be accurate, but it’s the only set of data that complete enough to test with (I need a lot of shot data, which I don’t have for 2003-2004, plus the NHL changes). My current algorithm simply uses expected goals vs. actual goals against (with a special averaging function that scales down blow outs); this is the best predictor for the last half of the season based on the first halves data. Certainly Minnesota is falling because they miss Gaborik, not sure if my algorithm can pick up the changes that quickly, but obviously he’s an important part of the team. The Canucks who have a negative goal differential are predicted to get 104 points, which doesn’t make sense either, but then again their shooting percentage is at a low 5.6%, and I was expecting it to be at 6.6%, so assuming expected goals accurately predict scoring the Canucks should score at 6.6% the rest of the season rather than 5.6% (so the Canucks have seen tougher goaltenders). It’s hard to argue with the prediction that does better than a standard Pythagorean prediction, so with its problems I decided to keep it.

About Overtime

But, in the process I discovered something interesting about OT’s. What’s interesting about overtime is that it should be a function of skill that is to say if two equal teams play together they should be more likely to go to overtime than let’s say Phoenix and Detroit, however a quick binary regression shows there’s little significance to this assumption on a game by game basis. 22% of all games go to OT with little skill reasons for this a regression shows a slight favour of good teams to go to overtime less, a better way of saying this is that Ottawa and Detroit went to overtime a lot fewer time than average teams. Maybe more interesting is that there is virtually no correlation to winning percentage outside overtime compared to teams records in overtime. It shouldn’t be too much of a surprise that the overtime system in the NHL is completely random. I’m saying all this to say that I have good reason to conclude that overtime occurs randomly given any two team and that the results once in overtime are completely random. Basically every game has a ¼ chance of going to overtime and then each team has 50% chance of winning.

Random Overtime

The consequences may not be obvious immediately, but the first thing that comes to mind is that this guarantees 22% of the NHL standings are the result of pure randomness (above the normal randomness you would normally observe). What I’m saying is that teams will get about 28 free points (95% confidence interval of (21, 35)) as you can imagine the team or two who only get 20 points will have to be 7 points better (10% better) than average just to make the playoffs or the team who gets 34 points in overtimes (10% worse). Minnesota for example last year got 20 points in 14 overtimes, getting 84 points over the season, if they have been average and got 28 points they would’ve had 92 points and proabaly wouldn’t have traded away Mitchell and Roloson and possibly done very well in the playoffs.

Obviously the overtime is fun to watch, its always exciting to see shootouts, there’s no question there, but if you think about what I said, you’d realize that determining who gets the extra point via a coin toss would produce the same results. Personally I find this frustrating; if it’s so important to have a winner this works, but it simply hurts the overall ranking of teams and of course this is on top of the scheduling problems. Interestingly this should make the NHL appear more competitively balanced as it makes the results come close to random.

Fixing my Algorithm

What does this mean to me? Well I realized my current algorithm that assumed the better team wins the overtime more often then the worse team was incorrect and also I was incorrect that overtimes only occur with teams with similar skill. This means that I will randomly predict overtimes for all games played. This means if a team is predicted to have overtime with Detroit, for example, this will give them free points they probably wouldn’t have gotten otherwise. This of course is a lot easier to program than the other method of trying to find correct cut off percentages (eg. A game predicted to be won 53% of the time goes to overtime). Of course this will make bad teams look good because they get a number of free points. For example St. Louis last year got 29 out of 57 points, over half, in the random overtime.


JavaGeek said...

A great example happened today:
Anaheim lost in a shootout to Chicago.

Obviously over 15 years you might find something (but the new OT rules were started only recently).

It shouldn't a be a shock that a 5 minute OT or a shootout are essentially random. You can show this with the Poisson data or some simple probabilities for the shoot out.

Now how bad teams get to OT's just as often as average teams is a little confusing to me. Obviously I'll wait and see if the data changes, but I shouldn't assume there exists something if I can't prove it even if it doesn't make sense.

Obviously the conclusion is overstated there is never no correlation, but based on this data assuming it is all random is the better alternative.

Feel free to prove me wrong the data I use is freely available by the NHL.

JavaGeek said...

Binary Logistic Regression:
Concordant: 61.6
Discordant: 37.6

This is a prediction algorithm, so it's using past data to predict future games.

JavaGeek said...

I create a part II for all those interested.