July 28, 2006

Introduction: Shift Analysis.

Hockey is a team sport, but even as a team sport it’s built up of lines and pairings that determine the outcome. It is very difficult in hockey to evaluate the individual. In sports like baseball, each player has a specific role and can be measured based on specific standards at that position. You either hit the ball or you don’t, you throw a strike or a ball etc. Every assist has a goal scorer, without the scorer there’s no assist every goal has an assist (ok there are a number of exceptions). Every goal has a supporting cast, and a defense (I will get to this in the future). When considering an individual performance, line mates are key. For example, was Carter in successful because of the Sedin twins or were the Sedin twins successful because of Carter?

In order to discover how players do in certain complicated circumstances one needs a list of every second the player is on the ice, problem is this is a lot of data. The NHL provided two methods for analysis shift charts (optical recognition anyone...) and time sheets, however the time sheets were only provided from mid January until the end of the season. The problem is with the shift charts: they are images that have 540 (or 520) pixels length wise, meaning every pixel represents 6 to 8 seconds (depending on the length of the game), resulting in unknown errors. Also, on the score sheets, big black bars represent goals and potentially hide 12-16 seconds of data creating additional problems. Thankfully the time sheets do not have these problems and will make 2006-2007 data collection a lot more pleasant.

I could use this data for counting 5 on 3 and power-play situations, but I stuck to my older code use penalties to determine when the 5 on 3 situations occur, I will likely convert to using the shift data to determine these events, but at this time this works better. So now I have: every second a player is on the ice and the number of players on the ice for each team. I can easily combine them in anyway to determine how much 5 on 3 time an individual received, or how often a line played together as a percent of each individual or just as an aggregate. In this study, I choose to only look at two players together [Displaying the information for three players would be complicated]


Example*: Jovanovski and Naslund. Naslund was one of the Canucks leaders in terms of minutes played on the power-play, but the top power-play unit struggled in the second half of the season. Was this random, or was this a function of a larger problem? Naslund spent 46% of his power-play time with Jovanovski (who missed half the season); however, Naslund had 61% of the power-play goals scored with Jovanovski, meaning in 54%, the rest of the time, he got only 39% of the goals. However in the other direction: Jovanovski who spent 243 minutes on the power-play spent 84% of his time with Naslund, got 96% of his goals with Naslund, that is quite a tandem!

What the above example shows is two players who help each other both do better when they are together. There are many examples where the benefits only go one way, or both hurt each other, what’s useful about the above example is the fact that these two spent well over 200 minutes together so the statistics are reasonably accurate. As an aside, statistics like this make me nervous about the new season as a Canucks fan.

Just as I did with the example above the same calculations can be done with every player. Some players will make every player they play with do better, other will make all players do worse; these players will likely lead or “drag down” the team statistically as well, although this may not be the case, for example, if the player spends a lot of time with bad players he wont be able to make a big enough different to improve his individual statistics. Below is the data for both the Canucks and Oilers, I will explain the consequences in regards to lost players in future posts.

The Data

Edmonton: PP SH EV
Canucks: PP SH EV

  • Time% - this represents the percentage of the player on the lefts time these player were together.
  • +% - this represents the percentage of the player on the lefts goals for that are scored with that player on the ice
  • -% - this represents the percentage of the player on the lefts goals against that are scored with that player on the ice
  • +R - +%/Time%
  • -R - -%/Time%
  • Time measured in seconds.

So basically how this works is each player plays with certain players a certain amount (these percentages are displayed across the table) and also gets a certain number of goals with these players (also displayed as a percentage across the table). Now across the table, it’s a zero sum game, for example, if you do better with one player, you have to do worse with someone else, but this person could be off the table due to lack of minutes played. The really useful information comes from looking down the columns. While a player will do better or worse along the table, down the columns they can in fact do better with every player. Take more notice of higher Time%’s as they have more relevance, playing 20 power-play minutes together isn’t very statistically significant, so doing better with 60% means a lot more than doing better with 15% of the time. The tables are colour coded for easy reading.

A good example is Smyth on the power-play, it appears everyone he played with benefited from him (except Reasoner and Dvorak, who weren’t good so just ignore them). Another one would be Moreau on the penalty kill, doing better with everyone except Conklin and Ulanov, however not statistically significantly worse with those guys. However, the most impressive, Moreau spent over 50% with Peca (who wasn’t that great killing penalties yet Moreau was able to cut down Peca’s score against rate by over 20%. There was an identical effect (even more extreme) with Bergeron. You can look through the data yourself; I will have a team analysis shortly for both teams (Edmonton and Vancouver). The above sheets take about four hours to create on my 2 GHz processor, so I can’t just make them instantly.

Future Considerations

So what else can be done with such information? This really is the tip of the iceberg, once you figure out who is good, bad and average. Then you can see how players fair versus good, bad and average oppositions. You can also discover which lines shoot the most and even better, how many shots against each individual receives. This of course will be a future analysis. I’d be curious if individual players have better save percentages than others and which players allow the most difficult shots. Maybe this will give some insight into minus statistics.

*I should note: this example only looks at 5 on 4 power-plays and the “goal scored” statistic is any goal scored by any player on the ice while the individual is on the ice (like the plus statistic for even handed situations).

1 comment:

Earl Sleek said...

OMG, this is great stuff. Ooh, ooh, do the Ducks!

I didn't know about the time sheets coming out. I always thought the barrier was those 'pictures' of shift charts.

I really like your metrics, also. Pct. of goals scored vs. Pct. of minutes really brings out the 'contributors' and the 'borrowers'.

Aw, I got more to say, but I'm probably posting on the next post also.