July 30, 2006

Shots against: The whole story.

Shots are the bread and butter of hockey games, hits, takeaways, giveaways faceoffs, etc. I all doesn't matter if you don’t get shots or prevent them. “The role of the defender is to minimize both the quantity and the quality of shots on goal” (Shot Quality). In theory this is not quite true, some goaltenders let out bigger rebounds. Some defenders screen a shot and others wont, current shots statistics don’t inform how wide open the shooter is and how much time and space they have. I will demonstrate how this is still a reasonable assumption to make given individual player shot data.

As recently as last week, I believe no one knew how many shots each individual player caused (shots while player was on the ice), largely due to the complexity of the data and due to NHL publishing poor data. Only recently did the NHL provide enough data to find out which shots went to which goaltender. And now I can tell you how many shots each player faced, in each situation as well. This study begins with 5 on 5 play, this of course is a more complicated section, because each player is concerned about offense and defense. An offensive player that spends most time in the opponents’ zone will appear effective defensively, however, his offense is preventing chances (which isn’t a bad thing).

Once I aquired this data I first calculated a few useful statistics: shot quality against average [1], shot quality neutral save percentage (SQN%) [2]. I also calculated a difference from their SQN% and their teams SQN% assuming a player only prevents goals by preventing shots and reducing quality, a players neutral save percentage should not vary significantly from 0. What this means is that if you consider the shots and quality of these shots and the goaltender the number of goals the shots predict should be the number of goals against, plus or minus some "small" error. It’s easier to demonstrate these differences with a graph of shots against vs. expected goal against minus goal against


What one can see is data that has significant error growth; the real question is what the error is and what part is the result of players affecting shots beyond quality and quantity. If you consider a shot a binary event (in or not in), you can consider a binomial distribution and using the trivial standard deviation (sqrt(n*p*q)) to determine errors. The first standard deviation should contain 63% of the data. A quick look at my data and 65% (plus or minus 2%) is contained within this region. Another standard location is z-score of 1.96, which contains 95% of the data I have 93% (plus or minus 1%), and for further comparison 2.5, which should contain 99% of the data, where my data begins to fail with only 97% (plus or minus 1%). There are a number of places where this extra error could be coming from the list is below:

  • Players effecting shots beyond quality and quantity
  • Shot quality missing some key factors due to NHL reporting or poor model
  • Errors in my data collection (players not getting the correct number of shots)
  • Players playing with a bad goaltender more frequently than others. (different SQN%)
Given the above errors I will go on to consider that players, as stated by the hockey analytics website, only effect defense by controlling shot quantity and quality, while I will agree this is a simplistic model that will miss some important aspects, it represents a good starting point. It also removes that huge amount of error and allows you to see players in a completely different light. If you’re interested in data beyond this you can study how the minus rate of players compare, however I would contend the minus rating has significant error to the factor of ± 10 for most players, meaning a -10 is potentially a 0 or +10 is actually +20 or 0 (big difference).


There are two pieces of information here that a player controls, one being that of quality, the other of quantity. The question is how one measures these two variables together. It would appear the best way is to consider expected goals against average, or the number of goals a player is expected to have scored against him if he played for one hour. This is basically a goal against average for players. I compiled two lists [3]: top defensive defenseman and top defensive forwards. Some of the players at the top and bottom are to be expected, others are significant exceptions. You can see Pronger at the top, which is a good sign along with Malik, who is a consistent plus player. What is interesting is that Naslund ranks 62nd (tied with the Sedin’s) defensively, although his record this year (-19) would indicate otherwise. In other words Naslund got unlucky this year and compiled a huge minus rating (this is likely a little too simplistic of an explanation, but it is likely Naslund's minus rating is exagerated by error).

I should make a special note that these statistics are based on two things shots and the quality of the shots, they make no statement of actual goals. This might seem strange, and I would have to agree, on the surface this doesn't make sense, but what we are trying to look at is who is better, not who is lucky, and hockey is full of great examples of lucky individuals and teams. However these statistics are indifferent to any goaltender you had in net, it would make no difference statistically if Noronen or Hasek were in net. That being said a bad goaltender can be motivator to improve ones play.

This can be furthur extended to short-handed situations, however I wont comment on thse (I'll skip the statistical error analysis for this). The top penalty killers: defenseman, forwards. There would be very limited benifit to publishing powerplay data. I hope to look at offense and plus statistic in the next article and both Canucks and Edmonton team changes summaries are coming...


[1] or SQA is calculated by: expected goals / (league save percentage * shots)
[2] calculated by: 1 - (1 - save percentage)/SQA
[3] for those confused by the charts #. Lastname, First initial (expected goals against average) [salary]

No comments: