I have been worried that there is a systemic bias in the data. Random errors don’t concern me. They even out over large volumes of data. I seriously doubt that the RTSS scorers bias the shot data in favour of the home team. But I do think that it is a serious possibility that the scoring in certain rinks has a bias towards longer or shorter shots, the most dominant factor in a shot quality model. And I set out to investigate that possibility [Shot Quality Product Recall].
I did a rather simple way to fix the problem. I did a regression on SQ results for all games based on two factors: team shot quality and stadium shot quality or [RTSS shot quality]. This simply calculate how much off the RTSS scores are from the standard [how the team normally performs]. Preferably we want no effect from RTSS scores so all those variables should be 0. I found a rather long list of biases, most of them small, including: Calgary, St. Louis, Columbus, Chicago, Phoenix, New Jersey, New York Rangers, Philadelphia, Buffalo, Carolina, Washington. I have deliberately over chosen, so that list likely includes teams which are simply randomly different as opposed to actual bias, but it doesn't matter. Ideally, I would want to incorporate these issues into the model directly, but the shot quality model is time consuming to build and once you get the variables you have to go through the hassle of calculating the percentages for all 7000 shots.
Simple adjustment on shot quality and it's effect on goaltending:
|
newSQN = adjusted for RTSS bias
oldSQN = no adjustment for RTSS bias.
I'm curious what the RTSS turnover is. That is to say, I wonder if the bias last year will be the same this year.
oldSQN = no adjustment for RTSS bias.
I'm curious what the RTSS turnover is. That is to say, I wonder if the bias last year will be the same this year.
2 comments:
To make matters worse, one wonders if there is within-season turnover. There is no guarantee that a game observer will remain on the job for the entire season or will never get sick and have to take a night off. That could mess things up even more.
I still think that a mixed effects model would work best for this sort of analysis use the broad inference space for overall shot quality. You should even be capable of projecting stadium specific quality based on the narrow inference space.
Post a Comment