November 26, 2006

Site Update

Most sites provide some interesting power play and penalty kill statistics. The most basic being sites such as ESPN. However there are a number of stats that are not used at these sites such as my preferred: expected goals and shot quality neutral save percentage, a measure of shot quality. All these can indicate, before it is apparent, that a team has problems in a certain situation. For example I have Edmonton ranked as the worst Power play team in the NHL (slightly below Phoenix) as they have taken very few shots and taken horrible shots. Somehow they’ve managed to score at 6.6 goals per hour, when I would predict they score at 4.3 goals per hour (which I doubt will continue over the course of the season). This is better stated by Andy Grabia one week ago: “The Oilers powerplay got a goal tonight. Thank God. But it wasn't on the 5-on-3. That was wasted by MAB faking the shot about 17,000 times, and the continual passing back and forth along the point. Nor was the goal deserved, although the shot was a beauty. Has this team heard of anything other than a one-timer? Sweet Caroline, our powerplay stinks. And it is not personnel. It's coaching.” You’ll note: they got a goal they didn’t deserve, the statistics agree, Edmonton has gotten a lot of powerplay goals it didn’t deserve. I’ll disagree partly it’s not all coaching as I think the defenseman in Edmonton are not conducive to a productive powerplay, but that’s another issue.

In order to somehow “summarize” this data I created the definitive situational table. I decided to include all data in one massive table due to the fact that I want to be able to see what happens to even strength data when I sort by power play data. In order to make the tables a little simpler I decided to split them up as well (as it’s easy to do so), so there’s separate even strength, power play and penalty kill tables. All columns are sortable, and in general one can spend hours sorting by all 27 columns. I included a goal differential in the separate tables (there was no room in the big one), but I put little weight in it as there’s a lot of random error in the results. This is more data than most people can stomach in one day let alone a few minutes, but I sure some people will like it as much as I do. If there are any errors let me know.

Also, why on earth do the Canadiens have such an amazing penalty kill?


JavaGeek said...

Assuming a "non-screened 12' shot by Brookbank" is equal to a "12' screened shot by Ovechkin passed to him by Zubrus for a one time shot" is a lot better than assuming a "60' shot by Brookbank" is equal to a 1"12' screened shot by Ovechkin passed to him by Zubrus for a one time shot".

It doesn't just look at distance, also type of shot, situation, rebounds, score differential.

Interestingly last season:
Brookbank had 0.6 expected goals and got 1
Ovechkin had 56.7 expected goals and got 58.
That's pretty good if you ask me...
McCabe had 16.4 expected goals and got 19

In terms of statistics the above results are quite good...

I've done quite a bit of analysis and these expected goals are really quite impressive, but of course they cant tell the whole story, no number can.

Anonymous said...

"Brookbank had 0.6 expected goals and got 1
Ovechkin had 56.7 expected goals and got 58.
That's pretty good if you ask me...
McCabe had 16.4 expected goals and got 19"

Funny, if I said something like this you would ask me to add error bars and do a confidence test on it.

JavaGeek said...

Didn't think you'd be interested:
Standard deviation for each player:
Ovechkin: 7.2 Goals (Z=0.18)
- A top player can see +/- 14 goals per season and this is perfectly normal... So a 45 goal scorer will get anywhere from 60 to 30 per season...
McCabe: 4.4 Goals (Z=0.59)
Brookbank: 1 Goal [all error] (Z=0.4)

3 players isn't a huge sample though, so I've actually done it with all players and it was good (don't remember how good).

Anonymous said...

"A top player can see +/- 14 goals per season and this is perfectly normal... So a 45 goal scorer will get anywhere from 60 to 30 per season..."

You call that good? It doesn't sound very impressive that you can tie someone down to X goals +/- 30%. Marian Hossa has scored between 31 and 45 goals (38 goals +/- 19%) for the past 5 years. It shouldn't make you all warm and fuzzy inside that you developed an algorithm that explains his goal production +/- 30%.

Approximately 10% of all shots end up in the net. McCabe had 209 shots so one would expect him to score 20.9 goals. He scored 19. That is a difference of 10%. Your estimation had him scoring 16.4 goals for an error of 13.7%. For McCabe my algorithm works better.

Wade Brookbank had 10 shots on goal last year. He scored 1 goal. Look, almost perfect. Can just assuming that 10% of all shots end up in the goal do just as well at estimating ones production?

Now, my guess to that question is probably not, but the point it would be interesting to see how much better your process is than just taking 10% of all shots end up as goals.

JavaGeek said...

First off for those who tried to access my site, we've had some power issues in Surrey, so my server was down.

Scored 29 goals in 1999-2000 and 45 in 2002-2003 for a difference of 16 (higher than expected - although the distribution includes 1 again!). He's more like a 40 goal scorer [not 45]. So far he's doing very well this year. But it's interesting to not that his expected goals are at 0.48 per game and his actual goals are at 0.64 goals per game. But, his historical average is 0.45. Maybe he's over scored his expected goals because he's gotten some positive random error helping him out. Based on that I expect Hossa to get 43 goals this season, where as most people would argue he should get 52. Either way my guess will be right more often then alternative.

Random Error:
The 14 goals is an expected random error range. That is to say this exists no matter what anyone does or says. A model that perfectly predicts scoring will still have this error (you cannot predict randomness). I'm not about to create a huge discussion on how random scoring is, but I'd like to see you prove it is not random (read some book to find out how). If goals we're not random players could "choose" when they scored and when they didn't score. Presumably I would choose to always score, but that isn't the case.

Shot Quality:
Shot Quality Theory [PDF]
No one is forcing you to agree to analytical solutions to hockey problems, you can continue using subjective analysis and that is just fine. There is nothing "wrong" with trying to explain some of the variability in any model if you can.

As to 10% vs Shot Quality:
Here's a graph black is expected goals using 10% and Red is expected goals using shot quality. You can see the black data under predicts the players who get more goals. The regression for 10% is r2 = 80% (coefficient of 1.33 - chasing the errors) and for shot quality is 95% (coefficient of 1.04 - should be 1). So both under predict the best players, but the shot quality model does a much better job.

Jeff J said...

"Also, why on earth do the Canadiens have such an amazing penalty kill?"

1) They have four great defensive players in Bonk, Johnson, Markov, and Komisarek heading up the PK.

2) Leading the league in SHGs tends to make opponents play a more conservative style on the PP.

3) Guy Carbonneau, Doug Jarvis and Bob Gainey showing them how it's done.

4) Luck.

dj: Any attempt at a predictor model has to include uncertainty to be taken seriously. It is the uncertainty that accounts for the inherent variability. You simply can't take every single bit of info (what the player had for breakfast, what the other team's checking line had for breakfast, details about each player's personal life, etc.) into account. You chalk it up to randomness and assume it all balances out.

In three straight seasons with the same team, coach, and usual linemates, Hossa scored 31, 46 and 36 goals. It's silly to suggest that his overall effectiveness as a goal scorer improved by precisely 48% then dropped by 22% in consecutive seasons. He was basically playing the same way all along (well, he probably improved a bit - the 31 goal season he was only 22). A good model captures this variability with error.

RiversQ said...

Java: According to my interpretation of your tables, you've got the Oilers at almost exactly league average PP shot quality. How does that jive with your description of "horrible shots?"

No sane person would argue that they're generating enough shots thus far, but it seems to me that you're contradicting your own data there.

JavaGeek said...

Rivers Q:
Times change, in the games since writing this Edmonton's shots have been at a shot quality level of 1.5, this helps get you back to average, but they're still only getting 34 shots/hr...