Faceoffs occur about as frequently as shots in a game. Just because there are many faceoffs doesn’t mean they are extremely important, however, faceoffs are a simple win loss game and as such allow a certain amount of simplified analysis. I’m sure there are many algorithms out there to do all this stuff for me, but I’m going to build this model from the ground up and keep it simple. Of course faceoffs are a complicated dynamic between forwards and even at times defenseman to get the puck from a random dropping by a referee, but for now we are only going to be looking at one forward: the center.
Joe Thornton won at 52.3% in
Theory
So how does one go about doing the calculations? Well, each team has around 4 regular faceoff men, so there are around 120 guys (actually I did 121) I need to simultaneously compare in order to know who the best is. Before we can compare players we have to have an idea of who should win given some score of each player. What I really care about is the players “real” winning percentage, which should be approximately, be the win percentage vs. a 50% opposition and no player can win more than 100% of the time or lose more than 0% of the time. I’ll hand wave here, that the best two variable function to predict the odds of winning with two “players” is a Pythagorean prediction often used with runs scored in baseball or goals for and against in hockey.
p_{a}  probability of player 1 winning average
p_{b}  probability of player 2 winning average
p_{ab} = p_{a}^{2}/(p_{a}^{2}+p_{b}^{2})^{*}
p_{ba }= p_{b}^{2}/(p_{b}^{2}+p_{a}^{2})^{*}
FO_{ab} – faceoffs between player 1 and player 2
FW_{ab} – faceoffs wins by player 1 vs. player 2.
There are a few important relationships here that I should mention, these should be obvious, but I want to make them very clear to begin with
FO_{ab }= FO_{ba}
FW_{ab}= FO_{ab} * p_{a}^{2}/(p_{a}^{2}+p_{b}^{2})
FW_{ba}= FO_{ab} * p_{b}^{2}/(p_{b}^{2}+p_{a}^{2})
The above assumption is that the actual faceoffs wins equals the prediction is required for this model, this isn’t quite true because of the error associated with the cross faceoffs for many players is extremely high (only around 10 cross faceoffs), but in general these problems “should” average out. So what I’m trying to do is solve for all the p_{b} and p_{a} information (there are up to n p’s), and I know how well each player did against every other player (n^{2}  n – number of cross faceoffs [n because players can’t have faceoffs against themselves]). Since we have FO_{ab} and FW_{ab}, I can easily calculate p_{ab}. FW_{ab}/FO_{ab} = p_{ab}, = p_{a}^{2}/(p_{a}^{2}+p_{b}^{2}). This can be rewritten as FO_{ab}* p_{a}^{2}_{ }= FW_{ab}*(p_{a}^{2}+p_{b}^{2}) and there are n^{2} equations just like it (Where FO_{aa} = 0). If you sum up n of them (fix a, change b) you get (where i goes from 1 to N):
Σ FO_{ai}* p_{a}^{2}_{ }= Σ FWai*p_{a}^{2}+ Σ FWai*p_{b}^{2}
Or
Σ FO_{ai}* p_{a}^{2}_{ }– Σ FW_{ai}*p_{a}^{2} – Σ FW_{ai}*p_{i}^{2} = 0
The nice thing is that this is equivalent to:
– Σ FL_{a}*p_{a}^{2} + Σ FW_{ai}*p_{i}^{2} = 0
There are N equations like this with N unknowns (p_{i}^{2}’s). If you consider the p_{i}^{2}’s as a variable without the power (eg. p_{i}^{2} = c_{i}) as both are technically just constants. Then this problem is a linear system of N equations and N unknowns, however the solution is not unique, since the system is homogenous (there are infinitely many solutions, one of which is trivially all values are zero). How one solves this matrix is reasonably irrelevant, it’s reasonably large (121 x 121), but the solution will be the same no matter how you do it. I solved of this matrix by assuming a value for s, to get a b vector so I have an equation of Ax = b format, which I then solve using the “trivial” LU factorization or LUx = b. But, all you need to know is this matrix is solvable. Since it’s has N equations and N unknowns (and the matrix is nonsingular) you actually get a set of solutions that is a vector multiplied by any constant (call this constant s) as a solution. In order to fix a solution I need a constraint for this constant. So I use the fact that that there were only so many faceoffs between these players and their wins summed together must be equal to the total number of faceoffs. Or Σ FO_{i}*p_{i} = Total Faceoffs, where we know p_{i}^{2} = c_{i} = b_{i}*s
Σ FO_{i}*s*b_{i }= Total Faceoffs. (Know everything except s).
Total Faceoffs/ Σ FO_{i}*b_{i} = s
Once I have s:
p_{i} = sqrt(b_{i}*s), and I’m done.
What I have just explained is how to simultaneously compare N players in the faceoff circle. It’s hard to really understand what’s going on if you don’t deal with this sort of math on a regular basis; it took me a while to even come up with how to do it. You can easily make up trivial examples (each players takes 100 faceoffs, 50 against each player) (the p_{i}^{2} are the unknowns, A is the matrix in the equation Ax = 0.
FW_{ab}*p_{b}^{2}  FW_{ac}*p_{c}^{2}  = 0  A =  45  26  29  
FW_{ba}*p_{a}^{2}  FL_{b}*p_{b}^{2}  FW_{bc}*p_{c}^{2}  24  51  25  
FW_{ca}*p_{a}^{2}  FW_{cb}*p_{b}^{2}  FL_{c}*p_{c}^{2}  21  25  54 
What you should notice is that for example that 29 + 21 = 50 that the columns sum to 0. The positive numbers in the rows are that player’s wins and the negative numbers are the player’s losses. You’ll find s = 0.223 for the above example and the b_{a} = (2129/1671) and b_{b} = (607/557) and the last value b_{c} = 1 (by my choice in order to solve system) so
p_{a} = sqrt(0.223 * 2129/1671) = 0.5336 = 53.4%
p_{b} = sqrt(0.223 * 607/557) = 0.4935 = 49.4%
p_{c} = sqrt(0.223 * 1) = 0.4728 = 47.3%
These numbers don’t vary significantly from their original numbers (55%, 49% and 46%), but they’re different and on a bigger problem this can produce interesting results.
The Actual Results

The whole point of of this was to prepare the skills I needed to do real analysis on shots for and against aka real icetime analysis. Looking at shot quality for an against for each player vs. every other player (Up to 1200 comparisons). So that's next!
* I used Pythagorean win percentage because it matched the properties I needed, I don’t think it’s perfect, but linear win prediction doesn’t work so you need something different.
3 comments:
Here's a stupid question  did you use Blogger to format those columns, or did you stich in the HTML yourself? I need to do the same thing over at my blog...
I can't imagine a way to do those with Blogger. I've coded them myself for my webpage , just a nested table, you can easily view the HTML code.
You can copy them if you like.
You may find this useful:
http://www.insidethebook.com/ee/index.php/site/article/the_odds_ratio_method
Post a Comment