September 14, 2007

Adjusted Shot Quality Neutral Save Percentage

Alan Ryder has found systemic bias in the shot quality data leaving the results showing problems with the data. It is bet summarized by Ryder himself:
I have been worried that there is a systemic bias in the data. Random errors don’t concern me. They even out over large volumes of data. I seriously doubt that the RTSS scorers bias the shot data in favour of the home team. But I do think that it is a serious possibility that the scoring in certain rinks has a bias towards longer or shorter shots, the most dominant factor in a shot quality model. And I set out to investigate that possibility [Shot Quality Product Recall].

I did a rather simple way to fix the problem. I did a regression on SQ results for all games based on two factors: team shot quality and stadium shot quality or [RTSS shot quality]. This simply calculate how much off the RTSS scores are from the standard [how the team normally performs]. Preferably we want no effect from RTSS scores so all those variables should be 0. I found a rather long list of biases, most of them small, including: Calgary, St. Louis, Columbus, Chicago, Phoenix, New Jersey, New York Rangers, Philadelphia, Buffalo, Carolina, Washington. I have deliberately over chosen, so that list likely includes teams which are simply randomly different as opposed to actual bias, but it doesn't matter. Ideally, I would want to incorporate these issues into the model directly, but the shot quality model is time consuming to build and once you get the variables you have to go through the hassle of calculating the percentages for all 7000 shots.

Simple adjustment on shot quality and it's effect on goaltending:
NlastnamenewSQNoldSQNShots
1BACKSTROM0.9250.9261027
2VOKOUN0.9200.9211299
3DIPIETRO0.9200.9221917
4THIBAULT0.9200.921570
5MASON0.9190.9211244
6HUET0.9190.9191280
7GIGUERE0.9180.9181490
8LUONGO0.9180.9192169
9LEHTONEN0.9150.9152075
10ROLOSON0.9140.9141979
11BRODEUR0.9140.9132182
12KIPRUSOFF0.9140.9122190
13NABOKOV0.9140.9151227
14KOLZIG0.9130.9191771
15KHABIBULIN0.9130.9141668
16TURCO0.9120.9121554
17BURKE0.9120.913687
18FERNANDEZ0.9120.9131154
19AEBISCHER0.9110.910929
20LEGACE0.9110.9191177
21HASEK0.9110.9121309
22LUNDQVIST0.9100.9211927
23EMERY0.9100.9111691
24DUNHAM0.9100.910540
25NIITTYMAKI0.9080.9131562
26GRAHAME0.9080.912702
27MILLER0.9070.8991886
28SMITH0.9060.909510
29FLEURY0.9040.9051955
30BELFOUR0.9030.9041550
31NORRENA0.9030.9081420
32BUDAJ0.9030.9031499
33BRYZGALOV0.9020.903668
34THOMAS0.9020.9041985
35GERBER0.9000.901784
36AULD0.8990.899729
37GARON0.8990.901849
38JOSEPH0.8990.9041481
39TOIVONEN0.8990.897502
40THEODORE0.8980.898870
41WARD0.8980.9061625
42TOSKALA0.8970.899915
43LECLAIRE0.8940.900629
44JOHNSON0.8940.900894
45BIRON0.8940.899509
46SANFORD0.8930.901707
47RAYCROFT0.8910.8921939
48BIRON0.8870.886533
49TELLQVIST0.8870.892780
50HOLMQVIST0.8860.8901134
51DENIS0.8840.8841068
52CLOUTIER0.8780.880608
newSQN = adjusted for RTSS bias
oldSQN = no adjustment for RTSS bias.

I'm curious what the RTSS turnover is. That is to say, I wonder if the bias last year will be the same this year.

2 comments:

David Johnson said...

To make matters worse, one wonders if there is within-season turnover. There is no guarantee that a game observer will remain on the job for the entire season or will never get sick and have to take a night off. That could mess things up even more.

MOgen_david said...

I still think that a mixed effects model would work best for this sort of analysis use the broad inference space for overall shot quality. You should even be capable of projecting stadium specific quality based on the narrow inference space.