October 13, 2006

Relative Face-off Scores.

Face-offs occur about as frequently as shots in a game. Just because there are many face-offs doesn’t mean they are extremely important, however, face-offs are a simple win loss game and as such allow a certain amount of simplified analysis. I’m sure there are many algorithms out there to do all this stuff for me, but I’m going to build this model from the ground up and keep it simple. Of course face-offs are a complicated dynamic between forwards and even at times defenseman to get the puck from a random dropping by a referee, but for now we are only going to be looking at one forward: the center.

Joe Thornton won at 52.3% in Boston and 50.1% in San Jose; he’s the same player why should the score differ so much? There are many reasons, but the most basic explanation is Joe Thornton got easier face-offs in Boston compared to San Jose. The idea is simple a player who gets easier opposition should get a better face-off percentage than average, but when he plays against good opposition he will perform much worse. Of course for face-offs opposition averages out quite nicely so most players are within a percentage point of the actual average. Of course it isn’t perfect as Sillinger performed at 52.6% in St. Louis, but at 55.0% in Nashville.

Theory

So how does one go about doing the calculations? Well, each team has around 4 regular face-off men, so there are around 120 guys (actually I did 121) I need to simultaneously compare in order to know who the best is. Before we can compare players we have to have an idea of who should win given some score of each player. What I really care about is the players “real” winning percentage, which should be approximately, be the win percentage vs. a 50% opposition and no player can win more than 100% of the time or lose more than 0% of the time. I’ll hand wave here, that the best two variable function to predict the odds of winning with two “players” is a Pythagorean prediction often used with runs scored in baseball or goals for and against in hockey.

pa - probability of player 1 winning average
pb - probability of player 2 winning average
pab = pa2/(pa2+pb2)*
pba = pb2/(pb2+pa2)*
FOab – face-offs between player 1 and player 2
FWab – face-offs wins by player 1 vs. player 2.

There are a few important relationships here that I should mention, these should be obvious, but I want to make them very clear to begin with

FOab = FOba
FWab= FOab * pa2/(pa2+pb2)
FWba= FOab * pb2/(pb2+pa2)

The above assumption is that the actual face-offs wins equals the prediction is required for this model, this isn’t quite true because of the error associated with the cross face-offs for many players is extremely high (only around 10 cross face-offs), but in general these problems “should” average out. So what I’m trying to do is solve for all the pb and pa information (there are up to n p’s), and I know how well each player did against every other player (n2 - n – number of cross face-offs [-n because players can’t have face-offs against themselves]). Since we have FOab and FWab, I can easily calculate pab. FWab/FOab = pab, = pa2/(pa2+pb2). This can be re-written as FOab* pa2 = FWab*(pa2+pb2) and there are n2 equations just like it (Where FOaa = 0). If you sum up n of them (fix a, change b) you get (where i goes from 1 to N):

Σ FOai* pa2 = Σ FWai*pa2+ Σ FWai*pb2
Or
Σ FOai* pa2 – Σ FWai*pa2 – Σ FWai*pi2 = 0
The nice thing is that this is equivalent to:
– Σ FLa*pa2 + Σ FWai*pi2 = 0

There are N equations like this with N unknowns (pi2’s). If you consider the pi2’s as a variable without the power (eg. pi2 = ci) as both are technically just constants. Then this problem is a linear system of N equations and N unknowns, however the solution is not unique, since the system is homogenous (there are infinitely many solutions, one of which is trivially all values are zero). How one solves this matrix is reasonably irrelevant, it’s reasonably large (121 x 121), but the solution will be the same no matter how you do it. I solved of this matrix by assuming a value for s, to get a b vector so I have an equation of Ax = b format, which I then solve using the “trivial” LU factorization or LUx = b. But, all you need to know is this matrix is solvable. Since it’s has N equations and N unknowns (and the matrix is non-singular) you actually get a set of solutions that is a vector multiplied by any constant (call this constant s) as a solution. In order to fix a solution I need a constraint for this constant. So I use the fact that that there were only so many face-offs between these players and their wins summed together must be equal to the total number of face-offs. Or Σ FOi*pi = Total Face-offs, where we know pi2 = ci = bi*s

Σ
FOi*s*bi = Total Face-offs. (Know everything except s).
Total Face-offs/ Σ FOi*bi = s
Once I have s:
pi = sqrt(bi*s), and I’m done.

What I have just explained is how to simultaneously compare N players in the face-off circle. It’s hard to really understand what’s going on if you don’t deal with this sort of math on a regular basis; it took me a while to even come up with how to do it. You can easily make up trivial examples (each players takes 100 face-offs, 50 against each player) (the pi2 are the unknowns, A is the matrix in the equation Ax = 0.

FLa*pa2

FWab*pb2

FWac*pc2

= 0

A =

-45

26

29

FWba*pa2

FLb*pb2

FWbc*pc2

24

-51

25

FWca*pa2

FWcb*pb2

FLc*pc2

21

25

-54

What you should notice is that for example that 29 + 21 = 50 that the columns sum to 0. The positive numbers in the rows are that player’s wins and the negative numbers are the player’s losses. You’ll find s = 0.223 for the above example and the ba = (2129/1671) and bb = (607/557) and the last value bc = 1 (by my choice in order to solve system) so

pa = sqrt(0.223 * 2129/1671) = 0.5336 = 53.4%
pb = sqrt(0.223 * 607/557) = 0.4935 = 49.4%
pc = sqrt(0.223 * 1) = 0.4728 = 47.3%

These numbers don’t vary significantly from their original numbers (55%, 49% and 46%), but they’re different and on a bigger problem this can produce interesting results.

The Actual Results


Ax=0 systemFW/(FW+FL)
1Perreault, Y63.75%Perreault, Y62.18%
2Vermette, A58.78%Nieuwendyk, J59.4%
3Draper, K57.98%Brind'amour, R59.06%
4Nieuwendyk, J57.45%Vermette, A57.9%
5Brind'amour, R57.04%Draper, K57.73%
6Malhotra, M56.9%Stoll, J56.75%
7Johnson, R56.41%Sillinger, M56.64%
8Yelle, S56.04%Malhotra, M56.35%
9Mcdonald, A55.38%Wellwood, K56.32%
10Sillinger, M55.03%Mcdonald, A56.23%
11Johnson, G54.62%Johnson, R55.89%
12Drury, C54.57%Holik, B55.67%
13Halpern, J54.44%Drury, C55.51%
14Peca, M53.82%Yelle, S55.41%
15Holik, B53.81%Sillinger, M55.33%
16Wellwood, K53.58%Halpern, J55.23%
17Sundin, M53.48%Johnson, G54.92%
18Stoll, J53.4%Peca, M54.87%
19Green, T53.38%Bergeron, P54.66%
20Smithson, J53.33%Green, T54.41%
21Bergeron, P52.94%Smithson, J54.32%
22Horcoff, S52.67%Iginla, J54.16%
23Sillinger, M52.67%Sundin, M54.01%
24Fedorov, S52.61%Koivu, S53.75%
25Koivu, S52.47%Cammalleri, M53.46%
26Wilm, C52.45%Betts, B53.33%
27Marchant, T52.44%Scatchard, D53.2%
28Handzus, M52.22%Handzus, M53.19%
29Comrie, M52.2%Datsyuk, P53.07%
30Cammalleri, M52.1%Pahlsson, S52.98%
31Iginla, J52.03%Comrie, M52.75%
32Hrdina, J52.02%Horcoff, S52.71%
33Chouinard, M51.96%Chouinard, M52.71%
34Sakic, J51.88%Gomez, S52.58%
35Pahlsson, S51.87%Spezza, J52.55%
36Scatchard, D51.79%Sakic, J52.49%
37Betts, B51.78%Reasoner, M52.48%
38Gratton, C51.42%Wilm, C52.31%
39Taylor, T51.28%Cullen, M52.26%
40Gomez, S51.2%Thornton, J52.25%
41Spezza, J51.03%Gaustad, P52.23%
42Bates, S50.98%Taylor, T52.12%
43Armstrong, D50.82%Smith, M51.95%
44Datsyuk, P50.63%Hrdina, J51.93%
45Barnes, S50.59%Fedorov, S51.84%
46Weight, D50.57%Marchant, T51.68%
47Thornton, J50.53%Dowd, J51.66%
48Smith, M50.53%Savard, M51.6%
49Lang, R50.39%Madden, J51.52%
50Madden, J50.36%Bates, S51.27%
51Thornton, J50.32%Gratton, C51.25%
52Roenick, J50.23%Lecavalier, V51.24%
53Brown, C50.22%Conroy, C51.22%
54Sedin, H50.08%Arnott, J51.15%
55Primeau, W50.03%Modano, M51.09%
56Conroy, C50.01%Thornton, J50.89%
57Arnott, J49.98%Armstrong, D50.73%
58Savard, M49.97%Briere, D50.68%
59Lecavalier, V49.91%Mclean, B50.65%
60Gaustad, P49.9%Brown, C50.62%
61Modano, M49.88%Forsberg, P50.58%
62Morrison, B49.69%Sedin, H50.52%
63Briere, D49.63%Morrison, B50.41%
64Begin, S49.58%Lang, R50.29%
65Cullen, M49.53%Fisher, M50.28%
66Yashin, A49.5%Plekanec, T50.28%
67Zubrus, D49.23%Zubrus, D50.27%
68Belanger, E48.97%Zetterberg, H50.26%
69Cajanek, P48.97%Richards, B50.23%
70White, T48.96%Yashin, A50.18%
71Reasoner, M48.93%Allison, J50.11%
72Mccauley, A48.81%Begin, S50.09%
73Mclean, B48.71%Barnes, S50.08%
74Fisher, M48.62%Kapanen, N49.87%
75Zetterberg, H48.51%Weight, D49.84%
76Allison, J48.49%Primeau, W49.8%
77Linden, T48.27%Smolinski, B49.76%
78Dowd, J48.2%Laich, B49.7%
79Bell, M48.17%Cajanek, P49.66%
80Plekanec, T48.15%Linden, T49.63%
81Koivu, M47.96%Roenick, J49.44%
82Richards, B47.87%Ricci, M49.21%
83Ott, S47.78%Stumpel, J49.17%
84Forsberg, P47.76%Ott, S49.16%
85Reinprecht, S47.74%Mccauley, A49.13%
86Sharp, P47.71%White, T49.1%
87Smolinski, B47.7%Belanger, E49.02%
88Stumpel, J47.63%Adams, K48.8%
89Ricci, M47.52%Brylin, S48.7%
90Langkow, D47.41%Sutherby, B48.67%
91Mcclement, J47.29%Turgeon, P48.62%
92Sutherby, B47.16%Bell, M48.45%
93Brylin, S46.88%Carter, J48.17%
94Carter, J46.76%Sharp, P48.04%
95Goc, M46.74%Roy, D47.96%
96Kesler, R46.69%Goc, M47.9%
97Laich, B46.63%Payer, S47.57%
98Jokinen, O46.53%Koivu, M47.38%
99Adams, K46.51%Rucchin, S47.35%
100Turgeon, P46.33%Bonk, R47.35%
101Marleau, P46.18%Langkow, D47.35%
102Kapanen, N46.06%Reinprecht, S47.34%
103Payer, S45.82%Jokinen, O47.21%
104Nylander, M45.64%Walz, W46.92%
105Walz, W45.33%Mcclement, J46.89%
106Legwand, D45.28%Marleau, P46.79%
107Roy, D45.21%Nylander, M46.63%
108Laperriere, I45.05%Kesler, R46.62%
109York, M44.64%Moore, D46.49%
110Kelly, C44.57%Stefan, P46.34%
111Rucchin, S44.53%York, M46.08%
112Stefan, P44.5%Laperriere, I45.9%
113Moore, D44.34%Kelly, C45.75%
114Ribeiro, M44.11%Richards, M45.73%
115Bonk, R43.9%Crosby, S45.49%
116Crosby, S43.82%Ribeiro, M44.72%
117Connolly, T43.81%Legwand, D44.66%
118Richards, M43.78%Getzlaf, R43.99%
119Staal, E42.21%Staal, E42.89%
120Getzlaf, R41.76%Connolly, T42.54%
121Malone, R39.22%Malone, R39.56%


The whole point of of this was to prepare the skills I needed to do real analysis on shots for and against aka real icetime analysis. Looking at shot quality for an against for each player vs. every other player (Up to 1200 comparisons). So that's next!


* I used Pythagorean win percentage because it matched the properties I needed, I don’t think it’s perfect, but linear win prediction doesn’t work so you need something different.

3 comments:

Dirk Hoag said...

Here's a stupid question - did you use Blogger to format those columns, or did you stich in the HTML yourself? I need to do the same thing over at my blog...

JavaGeek said...

I can't imagine a way to do those with Blogger. I've coded them myself for my webpage , just a nested table, you can easily view the HTML code.

You can copy them if you like.

Tangotiger said...

You may find this useful:
http://www.insidethebook.com/ee/index.php/site/article/the_odds_ratio_method