December 3, 2009

Shots - Home vs. Away

Back in March Jilken's posted an article on "shot recording bias" for the home team. It an interesting concept. Jilken's most convincing evidence was a excel table that showed bias patterns highlighted in yellow. The table didn't give me any perspective as there were no indication of what one should expect the table to look like in a perfectly random environment. I ended up ignoring it for the time being. Sunny Mheta's recent post went into more details (along with 25 comments - which are quite detailed as well). Sunny started mostly with a subjective techniques, actually counting the shots himself. He also used some not so subjective techniques, showing the viewer of a shot that wasn't recorded.

This made me wonder, is there really a significant "home bias" effect for counting shots. Who could have thought such a simple task would be subject to such variations?

So let's start with the last 4 years of shot data. The graph below shows a "Matrix Plot" of the ratio of Shots@home / Shots@road (only first 3 periods). All four seasons have some sort of positive correlation (I didn't check significance though for each). The reader should notice quite quickly that most of the values are greater than 1. Or that shots @ home is generally greater than shots on the road. This should not be a huge surprise - home teams also get more goals.
Clicking on the image will bring up a much
bigger copy with the names of teams
Before I get too far ahead of myself I should explain the above plot. A Matrix Plot is used to compare multiple variables at the same to to see if any of them are related. In the above plot the "rows" represent a season variable and the "columns" another season variable. So for example, the graph in the first row and last column represents "Ratio 2005 vs. Ratio 2008" or is a scatter plot along with a regression for the 2005 season and the 2008 season. Similarly, if you move over one column you'd have "Ratio 2005 vs. Ratio 2007". The reason the diagonal doesn't contain any graphs is because "Ratio 2005 vs. Ratio 2005" is not a very interesting graph (just a bunch of points along the 45 degree line).

On a separate note, I did do a regression based on the average of the first 3 seasons:
Ratio in 2008 = 0.42 + 0.61 x Ratio Average (2005,2006,2007) [p-value=0.001]
or
Ratio in 2008 = 1.07 + 0.61 x (Ratio Average - 1.07)

I've also included a data summary for all the ratio observations (4 seasons x 30 teams = 120 observations) to give the reader an idea of how the data is distributed.


Conclusion:
There is obviously something going on here (sorry I'm trying to keep this short). It's worth noting that Colorado elevation provides them with a natural home advantage. However, even when you remove C0lorado from the data you get similar results.