Hockey Numbers: Relative Face-off Scores.

Face-offs occur about as frequently as shots in a game. Just because there are many face-offs doesn’t mean they are extremely important, however, face-offs are a simple win loss game and as such allow a certain amount of simplified analysis. I’m sure there are many algorithms out there to do all this stuff for me, but I’m going to build this model from the ground up and keep it simple. Of course face-offs are a complicated dynamic between forwards and even at times defenseman to get the puck from a random dropping by a referee, but for now we are only going to be looking at one forward: the center.

Joe Thornton won at 52.3% in Boston and 50.1% in San Jose; he’s the same player why should the score differ so much? There are many reasons, but the most basic explanation is Joe Thornton got easier face-offs in Boston compared to San Jose. The idea is simple a player who gets easier opposition should get a better face-off percentage than average, but when he plays against good opposition he will perform much worse. Of course for face-offs opposition averages out quite nicely so most players are within a percentage point of the actual average. Of course it isn’t perfect as Sillinger performed at 52.6% in St. Louis, but at 55.0% in Nashville.

Theory

So how does one go about doing the calculations? Well, each team has around 4 regular face-off men, so there are around 120 guys (actually I did 121) I need to simultaneously compare in order to know who the best is. Before we can compare players we have to have an idea of who should win given some score of each player. What I really care about is the players “real” winning percentage, which should be approximately, be the win percentage vs. a 50% opposition and no player can win more than 100% of the time or lose more than 0% of the time. I’ll hand wave here, that the best two variable function to predict the odds of winning with two “players” is a Pythagorean prediction often used with runs scored in baseball or goals for and against in hockey.

p_a - probability of player 1 winning average
p_b - probability of player 2 winning average
p_ab = p_a²/(p_a²+p_b²)^*
p_ba= p_b²/(p_b²+p_a²)^*
FO_ab – face-offs between player 1 and player 2
FW_ab – face-offs wins by player 1 vs. player 2.

There are a few important relationships here that I should mention, these should be obvious, but I want to make them very clear to begin with

FO_ab= FO_ba
FW_ab= FO_ab * p_a²/(p_a²+p_b²)
FW_ba= FO_ab * p_b²/(p_b²+p_a²)

The above assumption is that the actual face-offs wins equals the prediction is required for this model, this isn’t quite true because of the error associated with the cross face-offs for many players is extremely high (only around 10 cross face-offs), but in general these problems “should” average out. So what I’m trying to do is solve for all the p_b and p_a information (there are up to n p’s), and I know how well each player did against every other player (n² - n – number of cross face-offs [-n because players can’t have face-offs against themselves]). Since we have FO_ab and FW_ab, I can easily calculate p_ab. FW_ab/FO_ab = p_ab, = p_a²/(p_a²+p_b²). This can be re-written as FO_ab* p_a²= FW_ab*(p_a²+p_b²) and there are n² equations just like it (Where FO_aa = 0). If you sum up n of them (fix a, change b) you get (where i goes from 1 to N):

Σ FO_ai* p_a²= Σ FWai*p_a²+ Σ FWai*p_b²
Or
Σ FO_ai* p_a²– Σ FW_ai*p_a² – Σ FW_ai*p_i² = 0
The nice thing is that this is equivalent to:
– Σ FL_a*p_a² + Σ FW_ai*p_i² = 0

There are N equations like this with N unknowns (p_i²’s). If you consider the p_i²’s as a variable without the power (eg. p_i² = c_i) as both are technically just constants. Then this problem is a linear system of N equations and N unknowns, however the solution is not unique, since the system is homogenous (there are infinitely many solutions, one of which is trivially all values are zero). How one solves this matrix is reasonably irrelevant, it’s reasonably large (121 x 121), but the solution will be the same no matter how you do it. I solved of this matrix by assuming a value for s, to get a b vector so I have an equation of Ax = b format, which I then solve using the “trivial” LU factorization or LUx = b. But, all you need to know is this matrix is solvable. Since it’s has N equations and N unknowns (and the matrix is non-singular) you actually get a set of solutions that is a vector multiplied by any constant (call this constant s) as a solution. In order to fix a solution I need a constraint for this constant. So I use the fact that that there were only so many face-offs between these players and their wins summed together must be equal to the total number of face-offs. Or Σ FO_i*p_i = Total Face-offs, where we know p_i² = c_i = b_i*s

Σ FO_i*s*b_i= Total Face-offs. (Know everything except s).
Total Face-offs/ Σ FO_i*b_i = s
Once I have s:
p_i = sqrt(b_i*s), and I’m done.

What I have just explained is how to simultaneously compare N players in the face-off circle. It’s hard to really understand what’s going on if you don’t deal with this sort of math on a regular basis; it took me a while to even come up with how to do it. You can easily make up trivial examples (each players takes 100 face-offs, 50 against each player) (the p_i² are the unknowns, A is the matrix in the equation Ax = 0.

FL_a*p_a²	FW_ab*p_b²	FW_ac*p_c²	= 0	A =	-45	26	29
FW_ba*p_a²	FL_b*p_b²	FW_bc*p_c²			24	-51	25
FW_ca*p_a²	FW_cb*p_b²	FL_c*p_c²			21	25	-54

What you should notice is that for example that 29 + 21 = 50 that the columns sum to 0. The positive numbers in the rows are that player’s wins and the negative numbers are the player’s losses. You’ll find s = 0.223 for the above example and the b_a = (2129/1671) and b_b = (607/557) and the last value b_c = 1 (by my choice in order to solve system) so

p_a = sqrt(0.223 * 2129/1671) = 0.5336 = 53.4%
p_b = sqrt(0.223 * 607/557) = 0.4935 = 49.4%
p_c = sqrt(0.223 * 1) = 0.4728 = 47.3%

These numbers don’t vary significantly from their original numbers (55%, 49% and 46%), but they’re different and on a bigger problem this can produce interesting results.

The Actual Results

	Ax=0 system		FW/(FW+FL)
1	Perreault, Y	63.75%	Perreault, Y	62.18%
2	Vermette, A	58.78%	Nieuwendyk, J	59.4%
3	Draper, K	57.98%	Brind'amour, R	59.06%
4	Nieuwendyk, J	57.45%	Vermette, A	57.9%
5	Brind'amour, R	57.04%	Draper, K	57.73%
6	Malhotra, M	56.9%	Stoll, J	56.75%
7	Johnson, R	56.41%	Sillinger, M	56.64%
8	Yelle, S	56.04%	Malhotra, M	56.35%
9	Mcdonald, A	55.38%	Wellwood, K	56.32%
10	Sillinger, M	55.03%	Mcdonald, A	56.23%
11	Johnson, G	54.62%	Johnson, R	55.89%
12	Drury, C	54.57%	Holik, B	55.67%
13	Halpern, J	54.44%	Drury, C	55.51%
14	Peca, M	53.82%	Yelle, S	55.41%
15	Holik, B	53.81%	Sillinger, M	55.33%
16	Wellwood, K	53.58%	Halpern, J	55.23%
17	Sundin, M	53.48%	Johnson, G	54.92%
18	Stoll, J	53.4%	Peca, M	54.87%
19	Green, T	53.38%	Bergeron, P	54.66%
20	Smithson, J	53.33%	Green, T	54.41%
21	Bergeron, P	52.94%	Smithson, J	54.32%
22	Horcoff, S	52.67%	Iginla, J	54.16%
23	Sillinger, M	52.67%	Sundin, M	54.01%
24	Fedorov, S	52.61%	Koivu, S	53.75%
25	Koivu, S	52.47%	Cammalleri, M	53.46%
26	Wilm, C	52.45%	Betts, B	53.33%
27	Marchant, T	52.44%	Scatchard, D	53.2%
28	Handzus, M	52.22%	Handzus, M	53.19%
29	Comrie, M	52.2%	Datsyuk, P	53.07%
30	Cammalleri, M	52.1%	Pahlsson, S	52.98%
31	Iginla, J	52.03%	Comrie, M	52.75%
32	Hrdina, J	52.02%	Horcoff, S	52.71%
33	Chouinard, M	51.96%	Chouinard, M	52.71%
34	Sakic, J	51.88%	Gomez, S	52.58%
35	Pahlsson, S	51.87%	Spezza, J	52.55%
36	Scatchard, D	51.79%	Sakic, J	52.49%
37	Betts, B	51.78%	Reasoner, M	52.48%
38	Gratton, C	51.42%	Wilm, C	52.31%
39	Taylor, T	51.28%	Cullen, M	52.26%
40	Gomez, S	51.2%	Thornton, J	52.25%
41	Spezza, J	51.03%	Gaustad, P	52.23%
42	Bates, S	50.98%	Taylor, T	52.12%
43	Armstrong, D	50.82%	Smith, M	51.95%
44	Datsyuk, P	50.63%	Hrdina, J	51.93%
45	Barnes, S	50.59%	Fedorov, S	51.84%
46	Weight, D	50.57%	Marchant, T	51.68%
47	Thornton, J	50.53%	Dowd, J	51.66%
48	Smith, M	50.53%	Savard, M	51.6%
49	Lang, R	50.39%	Madden, J	51.52%
50	Madden, J	50.36%	Bates, S	51.27%
51	Thornton, J	50.32%	Gratton, C	51.25%
52	Roenick, J	50.23%	Lecavalier, V	51.24%
53	Brown, C	50.22%	Conroy, C	51.22%
54	Sedin, H	50.08%	Arnott, J	51.15%
55	Primeau, W	50.03%	Modano, M	51.09%
56	Conroy, C	50.01%	Thornton, J	50.89%
57	Arnott, J	49.98%	Armstrong, D	50.73%
58	Savard, M	49.97%	Briere, D	50.68%
59	Lecavalier, V	49.91%	Mclean, B	50.65%
60	Gaustad, P	49.9%	Brown, C	50.62%
61	Modano, M	49.88%	Forsberg, P	50.58%
62	Morrison, B	49.69%	Sedin, H	50.52%
63	Briere, D	49.63%	Morrison, B	50.41%
64	Begin, S	49.58%	Lang, R	50.29%
65	Cullen, M	49.53%	Fisher, M	50.28%
66	Yashin, A	49.5%	Plekanec, T	50.28%
67	Zubrus, D	49.23%	Zubrus, D	50.27%
68	Belanger, E	48.97%	Zetterberg, H	50.26%
69	Cajanek, P	48.97%	Richards, B	50.23%
70	White, T	48.96%	Yashin, A	50.18%
71	Reasoner, M	48.93%	Allison, J	50.11%
72	Mccauley, A	48.81%	Begin, S	50.09%
73	Mclean, B	48.71%	Barnes, S	50.08%
74	Fisher, M	48.62%	Kapanen, N	49.87%
75	Zetterberg, H	48.51%	Weight, D	49.84%
76	Allison, J	48.49%	Primeau, W	49.8%
77	Linden, T	48.27%	Smolinski, B	49.76%
78	Dowd, J	48.2%	Laich, B	49.7%
79	Bell, M	48.17%	Cajanek, P	49.66%
80	Plekanec, T	48.15%	Linden, T	49.63%
81	Koivu, M	47.96%	Roenick, J	49.44%
82	Richards, B	47.87%	Ricci, M	49.21%
83	Ott, S	47.78%	Stumpel, J	49.17%
84	Forsberg, P	47.76%	Ott, S	49.16%
85	Reinprecht, S	47.74%	Mccauley, A	49.13%
86	Sharp, P	47.71%	White, T	49.1%
87	Smolinski, B	47.7%	Belanger, E	49.02%
88	Stumpel, J	47.63%	Adams, K	48.8%
89	Ricci, M	47.52%	Brylin, S	48.7%
90	Langkow, D	47.41%	Sutherby, B	48.67%
91	Mcclement, J	47.29%	Turgeon, P	48.62%
92	Sutherby, B	47.16%	Bell, M	48.45%
93	Brylin, S	46.88%	Carter, J	48.17%
94	Carter, J	46.76%	Sharp, P	48.04%
95	Goc, M	46.74%	Roy, D	47.96%
96	Kesler, R	46.69%	Goc, M	47.9%
97	Laich, B	46.63%	Payer, S	47.57%
98	Jokinen, O	46.53%	Koivu, M	47.38%
99	Adams, K	46.51%	Rucchin, S	47.35%
100	Turgeon, P	46.33%	Bonk, R	47.35%
101	Marleau, P	46.18%	Langkow, D	47.35%
102	Kapanen, N	46.06%	Reinprecht, S	47.34%
103	Payer, S	45.82%	Jokinen, O	47.21%
104	Nylander, M	45.64%	Walz, W	46.92%
105	Walz, W	45.33%	Mcclement, J	46.89%
106	Legwand, D	45.28%	Marleau, P	46.79%
107	Roy, D	45.21%	Nylander, M	46.63%
108	Laperriere, I	45.05%	Kesler, R	46.62%
109	York, M	44.64%	Moore, D	46.49%
110	Kelly, C	44.57%	Stefan, P	46.34%
111	Rucchin, S	44.53%	York, M	46.08%
112	Stefan, P	44.5%	Laperriere, I	45.9%
113	Moore, D	44.34%	Kelly, C	45.75%
114	Ribeiro, M	44.11%	Richards, M	45.73%
115	Bonk, R	43.9%	Crosby, S	45.49%
116	Crosby, S	43.82%	Ribeiro, M	44.72%
117	Connolly, T	43.81%	Legwand, D	44.66%
118	Richards, M	43.78%	Getzlaf, R	43.99%
119	Staal, E	42.21%	Staal, E	42.89%
120	Getzlaf, R	41.76%	Connolly, T	42.54%
121	Malone, R	39.22%	Malone, R	39.56%

The whole point of of this was to prepare the skills I needed to do real analysis on shots for and against aka real icetime analysis. Looking at shot quality for an against for each player vs. every other player (Up to 1200 comparisons). So that's next!

* I used Pythagorean win percentage because it matched the properties I needed, I don’t think it’s perfect, but linear win prediction doesn’t work so you need something different.

3 comments:

Dirk Hoag said...: Here's a stupid question - did you use Blogger to format those columns, or did you stich in the HTML yourself? I need to do the same thing over at my blog...; 12:53 PM
JavaGeek said...: I can't imagine a way to do those with Blogger. I've coded them myself for my webpage , just a nested table, you can easily view the HTML code.

You can copy them if you like.; 2:19 PM
Tangotiger said...: You may find this useful:
http://www.insidethebook.com/ee/index.php/site/article/the_odds_ratio_method; 9:09 AM

Hockey Numbers

October 13, 2006

Relative Face-off Scores.

3 comments:

Usage Statistics

About Me