December 3, 2009

Shots - Home vs. Away

Back in March Jilken's posted an article on "shot recording bias" for the home team. It an interesting concept. Jilken's most convincing evidence was a excel table that showed bias patterns highlighted in yellow. The table didn't give me any perspective as there were no indication of what one should expect the table to look like in a perfectly random environment. I ended up ignoring it for the time being. Sunny Mheta's recent post went into more details (along with 25 comments - which are quite detailed as well). Sunny started mostly with a subjective techniques, actually counting the shots himself. He also used some not so subjective techniques, showing the viewer of a shot that wasn't recorded.

This made me wonder, is there really a significant "home bias" effect for counting shots. Who could have thought such a simple task would be subject to such variations?

So let's start with the last 4 years of shot data. The graph below shows a "Matrix Plot" of the ratio of Shots@home / Shots@road (only first 3 periods). All four seasons have some sort of positive correlation (I didn't check significance though for each). The reader should notice quite quickly that most of the values are greater than 1. Or that shots @ home is generally greater than shots on the road. This should not be a huge surprise - home teams also get more goals.
Clicking on the image will bring up a much
bigger copy with the names of teams
Before I get too far ahead of myself I should explain the above plot. A Matrix Plot is used to compare multiple variables at the same to to see if any of them are related. In the above plot the "rows" represent a season variable and the "columns" another season variable. So for example, the graph in the first row and last column represents "Ratio 2005 vs. Ratio 2008" or is a scatter plot along with a regression for the 2005 season and the 2008 season. Similarly, if you move over one column you'd have "Ratio 2005 vs. Ratio 2007". The reason the diagonal doesn't contain any graphs is because "Ratio 2005 vs. Ratio 2005" is not a very interesting graph (just a bunch of points along the 45 degree line).

On a separate note, I did do a regression based on the average of the first 3 seasons:
Ratio in 2008 = 0.42 + 0.61 x Ratio Average (2005,2006,2007) [p-value=0.001]
Ratio in 2008 = 1.07 + 0.61 x (Ratio Average - 1.07)

I've also included a data summary for all the ratio observations (4 seasons x 30 teams = 120 observations) to give the reader an idea of how the data is distributed.

There is obviously something going on here (sorry I'm trying to keep this short). It's worth noting that Colorado elevation provides them with a natural home advantage. However, even when you remove C0lorado from the data you get similar results.


Sunny Mehta said...

Cool, Chris. Do the three matrix plots for each season represent each period?

If I'm reading the plots correctly, look at NJ on all 12 of those graphs!

JavaGeek said...

Oh, sorry I should have explained it better.

Each graphs represent one whole season vs. another whole season. So at row "Ratio 2005" a column "Ratio 2007" you get a dot plot of the "Ratio 2005 points vs Ratio 2007 points".

What the Matrix plot does is does a regression on every possible plot option: (R5 = Ratio 2005 to simplify things here).
First row:
R5 vs. R6; R5 vs. R6; R5 vs. R7 R5 vs. R8
So the only regression that doesn't look meaningful is R5 vs. R7

And yes New Jersey is in the lower left quadrant every single season. Anaheim is in the upper right.

Anonymous said...


This is really a well written blog. I've been reading quite a bit of your
content lately. As it goes I own and operate
and we're currently looking for writers. We already have gotten three
new bloggers in recent days. You have a very intriguing blogging niche and we'd certainly be interested in bringing you
in if you are interested. I know this isn't much info but give me a shout
back if you are interested and I can give you more.