August 12, 2006

Penalties – The bigger picture

Earl Sleek asked a number of good questions in regard to the new season: one could summarize the penalty questions by saying "what changed", "how has it changed". The first question of 2005-2006 is what changed: it should be no surprise that interference was higher, hooking was three times higher and can account for most of the differences, high sticking statistically lower (I suspect that either referees were too busy looking for hooking to see high sticks or they didn’t call them because they had called too many penalties already). Tripping and Holding were up 50% and Roughing and cross checking were down around 50%. Goaltender interference was up slightly.

So How did it change?

Power-plays accounted for one third of the offense in 2005-2006, some might ask then, what percentage of power-plays are the results of referee’s conscious choices. Referees have a number of things on their own mind. Arguably they want to appear, fair and unbiased (the best way is to give penalties when you see them and not think about what time it is). Referees all know if they were to appear to favor a certain team they would likely loose their job (six figure salary as well). Management likely has some set rules for measure bias that may not actually measure bias and contribute to the results of this study. Power-plays are interesting mostly because of the design of the rules of hockey are extremely subjective. This subjectivity allows the referees to mask almost anything from unjustified penalties to cheating and favoritism.

In hockey there are two types of scores, in regards to penalties: there’s the actual score of the hockey games (goals scored for each team), and then there’s the penalty score, number of penalties to each team. In general if the score is different the losing team might be more willing to take risks to get a goal, similarly a winning team is taking less risk (at worst the game will be tied), when taking a penalty.

Score Differential

The simplest way to account for these differences is to do a regression on each of the mentioned factors, this is reasonably straight forward and one can answer all the questions at once with one equation. There is one significant problem with the data, some data has a lot of results in it (regular season tie game), others have very few (playoffs up or down by two goals), I don’t want a regression that are chasing the parts with very few results, so I approximated the error with the binomial distribution (not quite accurate, because you can have two penalties called at a given time), and used one over the standard deviation as a scale factor. So problems with very small standard deviation will be used more. I also consider scores of 2 and -2 as well as 1 and -1 and 0 as I felt it would increase the amount of data without changing the results (at least of what I’ve seen of hockey and penalties). So I considered a number of variables including: score differential, year, division, home or away, west or east, period in order to predict the number of penalties per hour, while up or down a goal. There are two things I’m concerned about for each factor: constant and slope, as the model is essentially one variable (score differential). Now you can cross any two variables to produce (a lot of) variables, I considered a few cross products including all variables individually cross with the original score differential. Because this is not completely scientific process I’m present here this is the most useful model I found, with more time and effort one could likely find more extra variables that correlate, but this is reasonable model:

penalties/hour =
+ 0.297
score differential
+ 0.443 away
- 0.306 east
- 1.71 period 3
+ 1.40 2005-2006 season
- 0.949 playoffs
- 0.158
score differential x away

Virtually all the cross products were not statistically significant except for the away cross product (refs are less likely to help the away team than the home team). The only difference between 2005-2006 season is that you see 1.4 more penalties per game, the referees apply the same techniques to balance the score (this shouldn’t be a shock). The away team gets half a penalty more or 0.44*0.17*1230 = 92 goals per season, which is 3 goals per team or approximately 18 extra home wins distributed among the 30 teams, the interesting thing is that there does not appear to be any real home ice advantage in the NHL (7 above .500 games in 3 seasons). I found it interesting that the east is more responsible than the west; I don’t have an explanation at this time (although it has the most error). What shocked me was the extent that period 3 had fewer penalties; the referees call 1.7 fewer penalties in the third period (that’s for playoffs, new rules as well.).

Penalty Differential

However, as mentioned in the comments to my last piece, the score board score isn’t the only score that matters, most fans know by now that referees keep track of the balance of power-plays and try to even them up as well (in order to make them appear fair). Using the same variables as before, I came up with this equation:

penalties/hour =
- 0.600 penalty differential
+ 0.786 away
- 0.350 east
+ 1.37 2005-2006 season
- 0.733 playoffs

You should note there are no cross products, in other words there were no statistically significant cross products, so referees have been using the same rules in the game and just giving out more absolute penalties for each given situation: 2005-2006 is the same as 2003-2004 in regards to penalty balance. However, one should take notice of the significantly larger coefficient on the main term: score differential, the power-play score plays a much larger role in determining who gets the next power-play than does the scoreboard, which should not be a surprise.


The way the game is refereed has an effect on the outcomes of the games, as it makes it easier to comeback in games, but this has not changed over the course of the last five seasons. Interestingly enough if one compares 2003-2004 one might be able to claim a small change in the ability to comeback in games, however, the comeback rate has remained relatively stable over three years and as such comebacks have not changed (it’s a figment of your imagination), the shootouts may be affecting comeback rates (I did not include over time in this statistic as that has changed too much). But, higher scoring games and more penalties has not effected the rate at which teams comeback, it has remained steady at around 28% (team who scores first loses (in a non over time game) 28% of the time). If you include games where the team wins or ties to get into over time (not considering what happens in over time), you get the same results (around 46%). In other words, while the NHL would like you to believe they’ve made the game less predictable by increasing the number of comebacks, this has in fact remained steady. If you take a look at the Poisson toolbox you can learn about comeback rates, I approximated the comeback rate from the first goal to be around 40% to tie and 25% to win, so comebacks are inflated by about 2.5%, 30 extra comebacks per season (or one per team) compared to Poisson predictions. One should consider the fact that the better team is more likely to score first and it’s harder to comeback against a better opposition, but this shouldn’t affect the results substantially. I should note: I predicted 18 extra wins as a result of referees favoring the home team, which is pretty close to the 30 extra come back wins (maybe the other 12 comeback occur away from home).

In Summary, it would appear all the NHL has done differently in 2005-2006, is call more penalties resulting is more goals, they have not changed the structure of the calls. And it has not helped even strength production significantly.

1 comment:

Tangotiger said...


Just found your site, and I'm already a fan. On your other site you say:

" but I stuck to my older code use penalties to determine when the 5 on 3 situations occur"

Would it be possible to see an algorithm of this? I'm especially interested to see how you handle multiple penalties, seconds apart, more than two minors for a team, subsequent goals, etc.

The strength indicator on the new shift data files are not reliable, in the cases where a guy stays on the ice following a penalty. It will show up as one shift, with the strength indicator of PP or SH.

Thanks, Tom