Power-plays accounted for one third of the offense in 2005-2006, some might ask then, what percentage of power-plays are the results of referee’s conscious choices. Referees have a number of things on their own mind. Arguably they want to appear, fair and unbiased (the best way is to give penalties when you see them and not think about what time it is). Referees all know if they were to appear to favor a certain team they would likely loose their job (six figure salary as well). Management likely has some set rules for measure bias that may not actually measure bias and contribute to the results of this study. Power-plays are interesting mostly because of the design of the rules of hockey are extremely subjective. This subjectivity allows the referees to mask almost anything from unjustified penalties to cheating and favoritism.
In hockey there are two types of scores, in regards to penalties: there’s the actual score of the hockey games (goals scored for each team), and then there’s the penalty score, number of penalties to each team. In general if the score is different the losing team might be more willing to take risks to get a goal, similarly a winning team is taking less risk (at worst the game will be tied), when taking a penalty.
The simplest way to account for these differences is to do a regression on each of the mentioned factors, this is reasonably straight forward and one can answer all the questions at once with one equation. There is one significant problem with the data, some data has a lot of results in it (regular season tie game), others have very few (playoffs up or down by two goals), I don’t want a regression that are chasing the parts with very few results, so I approximated the error with the binomial distribution (not quite accurate, because you can have two penalties called at a given time), and used one over the standard deviation as a scale factor. So problems with very small standard deviation will be used more. I also consider scores of 2 and -2 as well as 1 and -1 and 0 as I felt it would increase the amount of data without changing the results (at least of what I’ve seen of hockey and penalties). So I considered a number of variables including: score differential, year, division, home or away, west or east, period in order to predict the number of penalties per hour, while up or down a goal. There are two things I’m concerned about for each factor: constant and slope, as the model is essentially one variable (score differential). Now you can cross any two variables to produce (a lot of) variables, I considered a few cross products including all variables individually cross with the original score differential. Because this is not completely scientific process I’m present here this is the most useful model I found, with more time and effort one could likely find more extra variables that correlate, but this is reasonable model:
penalties/hour =
5.17
+ 0.297 score differential
+ 0.443 away
- 0.306 east
- 1.71 period 3
+ 1.40 2005-2006 season
- 0.949 playoffs
- 0.158 score differential x away
Virtually all the cross products were not statistically significant except for the away cross product (refs are less likely to help the away team than the home team). The only difference between 2005-2006 season is that you see 1.4 more penalties per game, the referees apply the same techniques to balance the score (this shouldn’t be a shock). The away team gets half a penalty more or 0.44*0.17*1230 = 92 goals per season, which is 3 goals per team or approximately 18 extra home wins distributed among the 30 teams, the interesting thing is that there does not appear to be any real home ice advantage in the NHL (7 above .500 games in 3 seasons). I found it interesting that the east is more responsible than the west; I don’t have an explanation at this time (although it has the most error). What shocked me was the extent that period 3 had fewer penalties; the referees call 1.7 fewer penalties in the third period (that’s for playoffs, new rules as well.).
However, as mentioned in the comments to my last piece, the score board score isn’t the only score that matters, most fans know by now that referees keep track of the balance of power-plays and try to even them up as well (in order to make them appear fair). Using the same variables as before, I came up with this equation:
penalties/hour =
5.74
- 0.600 penalty differential
+ 0.786 away
- 0.350 east
+ 1.37 2005-2006 season
- 0.733 playoffs
You should note there are no cross products, in other words there were no statistically significant cross products, so referees have been using the same rules in the game and just giving out more absolute penalties for each given situation: 2005-2006 is the same as 2003-2004 in regards to penalty balance. However, one should take notice of the significantly larger coefficient on the main term: score differential, the power-play score plays a much larger role in determining who gets the next power-play than does the scoreboard, which should not be a surprise.
Comebacks
1 comment:
Chris,
Just found your site, and I'm already a fan. On your other site you say:
" but I stuck to my older code use penalties to determine when the 5 on 3 situations occur"
Would it be possible to see an algorithm of this? I'm especially interested to see how you handle multiple penalties, seconds apart, more than two minors for a team, subsequent goals, etc.
The strength indicator on the new shift data files are not reliable, in the cases where a guy stays on the ice following a penalty. It will show up as one shift, with the strength indicator of PP or SH.
Thanks, Tom
Post a Comment