January 19, 2007

Attendance Demand.

Attendance is about as complicated as they come. I’m not sure I can tackle this with the limited data available and only partial information of the market forces. Attendance primarily depends on the demand for: “going to hockey games”, price of tickets, and population. I should quickly mention there was a change in how attendance was reported in 2000-2001 as such I had to add a $10 increase to the stated value and increase attendance by 200. [Many of the factors involved are complicated and I lack the information or time to look at them all in detail]. Normalizing ticket prices is trivial as there are nice simple CPI calculators (~2.5%/year) that do a great job and it appears that before the lockout the optimal average was about $55. In order to look at demand I had to also normalize attendance. Now there is no CPI for attendance, looking at the years where prices were stable I was able to estimate attendance growth to approximately 2.5% per year. Now the American Population grows at around 1%, so it’s somewhere in that range. I found a 2% growth rate for New York, which probably accurately measures most cities and so I used 2% natural growth rate for attendance, changing this number slightly has large effects on the results. I’m not sure how important ticket sales are in terms of the hockey business, but 16000*$55*41*30 = $1.1B or half of league revenue. If you use all the above data you get a graph that looks like:

Now the regression on price for the pre-lockout years looks like:

[ Attendance = 22574 – 142 * Price ]

Or each dollar increase/decrease in prices costs the organization about 142 people. If you maximize that Attendance*Price you get $80, that certainly wouldn’t be popular and would only increase ticket revenue by about $100M (10%). Or a price increase of 50% would increase revenue by 10%; of course less revenue would be received from consumables and merchandise as well. So I can’t really say whether it would be beneficial without understand all aspects of their revenue. That being said raising ticket prices by a dollar would increase ticket revenue by about $7,000*1230 = $9M.

Some may note that it appears that price of tickets is increasing every year; this is to be expected as population increases:

[ Optimal price: 22574*(1+population growth rate)year/284 = Price ]

In other words prices should grow as fast as population. If not, there’s something else going on.


However the real question that everyone probably cares about is: what was the cost of the lockout in terms of attendance? So I did a quick regression on Price, and whether or not the information came after the lockout:

[ Attendance = 22622 – 143*Price - 1486 Lockout ]

Basically the lockout cost the NHL about 1500 people per game; of course it doesn’t look that way as attendance averages have increased. The past two years of attendance increases have been the result of lower ticket prices and population increases. This costs the NHL about $100M per year until it regains that 1500 loss (if never then you can discount [at 2%] $100M to infinity for a net cost of $5B). The GMs probably believe that the benefits of the new CBA exceeded the costs. Such as the immediate $400M dollar saving in salaries (guess) and possible future saving caused by the cap.

In conclusion I did all this to show that while the NHL is very proud of its attendance figures it’s actually nothing to be proud of. They’ve lost a significant amount of fans and this can be seen by the lower ticket prices. I’m not sure if the future is predicting that the NHL recovers these lost fans (attendance increases faster than population until it reaches it’s old trajectory), however there aren’t really any good signs that is happening at this point. Maybe the 2006-2007 figures will improve as we get closer to the playoffs. But don't panic, Betman isn't even as Colorado see s 12/23 home games not sell out.

January 17, 2007

Diving Update.

I’ve said a bit about diving in the past. And as people posted they wanted to see how it played out through the season. In the first 500 games or so this season the NHL wanted to make a statement that they were cracking down on diving. Interestingly those statements have faded since November. The media appears to have gone silent and interestingly enough it appears there no more upper management pressure on referees to call these penalties. Avery got his third diving call (next requires a suspension) costing him 0.2% of his current income (I guess it’s the cost of being in his business), however there appears to be no news coverage on the fine he should’ve received. I’m curious if the NHL will give Avery the fourth to make a point or if they’re just bluffing about the suspension.

The point of this post however is just to show the interesting trend in diving calls over the season:

Games:

2005-2006

2006-2007

0-99

11

14

100-199

10

14

200-299

5

12

300-399

4

10

400-499

9

13

500-599

8

5

600-699

8

6

It’s interesting that the 2005-2006 had the same sort of diving calls occurring in games 200-399 as 2006-2007 is having in games 500-699. Now my database is missing 6 games from the 600-699 set, however 4 of those played today had no diving calls and it’s unlikely that the two games left played tomorrow will have a diving call.

In my opinion, I think there is probably one significant dive per game and probably twice that number of additional minor dives however the NHL seems to believe there is around one dive every ten games and even less when they’re not marketing the diving rules.

January 15, 2007

Plus Minus Graphs II

David Johnson provided a neat suggestion that the plus minus graphs should include the team along with it. 1 problem with doing this is that the X axis is time played and the Y axis is plus minus. Any given team will have more even strength time than any player so graphing the numbers together would just make the individual graph portion a small part at the beginning with a very long portion for the team data. So instead I scaled the team data down by the same percentage the player plays on a per game basis. So that the slopes are the same and the length of both curves are equal and each segment represents a game played.

Instead of graphing the team data as it is, I decided to remove the ice time and plus minus data of the given player so that the two are relatively independent of each other. This closely mimics that of Behind the Net On-Ice/Off-Ice statistics. However it only looks at the games they were involved in (where as behind the net takes the entire season average).

So to get me started I'll use Jovanovski as an example. A top defenseman on a terrible team, as you can see below, Jovanovski has done quite well in his circumstances. Even though Jovanovski has played on a relatively bad team he's managed reasonable numbers.
  • Individual (blue) Slope (red): 0.54 ±/HR
  • Team (grey-red) Slope (light pink): -0.76 ±/HR
  • Difference: 1.3 ±/HR
I found this one to be possibly the most interesting, Thomas Vanek. The rest of the team has significantly upward sloping team curves, but Vanek has managed to be the one player who makes all the other players look average. A difference of 2.09 isn't the best in the NHL, since he's on a good team, but the score is quite good.

  • Individual (blue) Slope (red): 2.47 ±/HR
  • Team (grey-red) Slope (light pink): 0.38 ±/HR
  • Difference: 2.09 ±/HR
As far as I'm concerned this is the most drastic difference in the NHL and this is Jagr. A difference of 3.43 is quite remarkable, although you can see Jagr hasn't been doing as well in recent games as he has in the past, but the team is getting better.
  • Individual (blue) Slope (red): 1.63 ±/HR
  • Team (grey-red) Slope (light pink): -1.81 ±/HR
  • Difference: 3.43 ±/HR
A number of things to remember is that: The difference scores are not really measures of skill, they probably remove some effects and are therefor better measures of skill that other alternatives. For example, Shanahan in New York, suffers from the fact that when he's not on the ice Jagr is, so he is certainly worse than Jagr, but that doesn't make him a bad player.

I'm not sure who's responsible for making Phoenix so bad, but Morris and Scatchard certainly aren't the best two players in the NHL, but on a bad team both of these players are making the team look average.
Morris: ($3.9M)

Scatchard: ($2.1M)
And if anyone is wondering who is responsible for Philadelphia's -1.71 ±/HR you need not look any further than Calder ($3M):


The rest of these graphs can be found on my website by clicking on a players name.

January 14, 2007

Plus Minus Graphs

Hockey often looks at highly variable statistics in order to determine averages that aren’t very accurate. What we care about is the true figure, but it often masked by a week of great play or a bad week. A great start to the season could give an average player a great plus minus for the entire season, whether it’s a measure of his true skill or not it will appear in the final results. A way of fixing this is to look at all the data between the start of the season and the current time and see what the trend is doing rather than just looking at the final result. For this example I’m going to use plus minus statistics. This will remove some of the luck in the results and make the results more real. Take for example Heatley with a 4.1 +/hr and 2.2 -/hr, he is measured as a ±1.9/hr (+23 in 12.1hours), however if you do a regression on his game by game data you’ll find that it looks like:



The spike at end allowing him to go from +12 to +23 in just 4 games, which appears to be an anomaly. The regression spits out a slope of +1.18/hr, which seems a little more reasonable, but significantly different from the 1.9 +/hr, which is the official average. You might also notice the slope doesn’t start at the origin (he is +2 before the season starts), this is an attempt to scale out luck at the beginning and so the regression doesn’t chase the results at the end of the regression. The goal is to find the best slope to fit the data and if one fixes the starting point you can only swivel around that point and so the regression will chase the final value because that is the one it can change the most as a small change is slope has the most effect on the values furthest from the origin.

The benefit of such a method is in terms of error. In general the standard deviation for the plus minus statistic is: sqrt(plusses + minuses), so Heatley works out to about ±8 of course it’s not quite that simple because plusses aren’t purely independent of minuses, although the best error it could be is sqrt(plus-minus), for Heatley this is sqrt(23) or 4.8. Using the regression method we get a standard error for the fit of 0.7, which is much better than the simple average. You can see the same effect with Crosby who is +4.2 and -2.5 (difference = 1.7), the regression says he’s playing at +2.3/hr or 35% than his simple average suggests due to the fact he’s on a mini even strength slump.



It’s a rather simple adjustment to how to look at the data, and it can produce some striking results. Most of the time this should produce the same results as a simple average due to the fact the most likely place for a variable to be at is close to the true mean then anywhere else, but in cases where there is a divergent short term trend that is significantly effect the final results this should produce better results.

The rest of the NHL results can be found on my website.

Every Division in West better than East

I was going over my usual processing this morning and discovered the worst division in the West is better than the best division in the East. The numbers below are winning percentages assuming overtimes are a coin toss, so the only thing that matters is regulation wins. I use a Colley Matrix to do the ranking of teams

Northwest: 53.1%
Worst goal differential a -6 for the Oilers (-4 Vancouver)
Central: 51.8%
Worst division in the west, Nashville and Detroit have taken over and the rest of the teams are giving up.
Pacific: 56.3%
Stars appear to be falling, but other than that the bottom teams in this division would do a lot better if they weren't in this division.
Atlantic: 42.3%
Only one team has a significant positive goal differential (New Jersey)
Northeast: 51.3%
It's really too bad this division will likely only send 3 teams to the playoffs.
Southeast: 45.2%
It's strange to see the southeast division as better than the Atlantic, but 45% isn't great. Best goal differential is +4 (Carolina)

The West vs. East works out to:
42W - 13 OTW - 7 OTL - 30 L (42/72 = 58%)

January 13, 2007

Attendance

So all the talk about attendance figures made me wonder, how 2006-2007 is different from 2005-2006. Tom Benjamin has been on a tirade, with articles such as: Attendance Again, Records, Attendance and Bure, and Money Talks.

After reading all this I was wondering how attendance is determined, that is to say what is the attendance a function of? There are a few obvious candidates such as winning percentage, opposition winning percentage or skill players (Crosby). My main question is however, is 2005-2006 any different from 2006-2007? In order to accommodate many of the problems I’m making a model with 58 variables for teams (29 home teams, 29 away teams), winning percentage for the home team and away team, day of the week, and year. For this study I looked at three years worth of attendance figures (2003-2004, 2005-2006, 2006-2007) a total of 3117 games. One game in 2003-2004 was rejected as it wasn’t in an official stadium and had astronomical attendance and that would be the Heritage Classic game.

Day of the Week

It appears that the NHL knows Saturday is the best day for attendance so the NHL went through a special effort to get as many games as possible on Saturday to boost attendance. The NHL found a way to make the percentage of games go from 23% Saturdays to 26.4% using Sunday and Monday as rest days. (14% of games occur on Sunday and Monday). The rest of the week is a wash and the NHL loses about 1000 visitors (compared to Saturday games) on the weekdays, with the exception of Friday where attendance is the same as Saturday. Sunday performance is similar to weekdays, but sits at -700, which isn’t as bad.

Winning

Due to the inclusion of individual teams in the regression the winning variable becomes less useful, however including three seasons made winning slightly useful. If you win more games you get better attendance. The winning variable (# of wins per 82 game season) on its own works out to about 120 more guests per game per win; however in the full model it is reduced to 60 more guests per game per win as much of the variability falls onto the individual teams. In other words a team that wins 50 games over the course of a season will attract about 2400 more people per game than a team that wins 30 games over a season. Each additional win for the visiting team is worth an addition 10 extra guests. So a team with 20 wins during the season will draw 300 fewer fans than a team that has 50 wins. In other words fans prefer to see decent teams.

Individual Teams

The worst 5 teams when it comes to attendance (using Washington as a reference) are: Nashville (-939), Chicago (-463), New York Islanders (-1781), New Jersey Devils (-592) and Carolina (-170). The best teams generally have large stadiums such as Montreal (+6260), Detroit (+5056), Philadelphia (+4710), Toronto (+4670), Tampa Bay (+4510). I’m wondering why Tampa does so well, but the others are mostly self explanatory.

Popular Teams

The first group of popular teams is the obvious: the original six all attract statistically significantly more viewers than other teams. Philadelphia and Pittsburg also draw more crowds. Not sure how Philadelphia is drawing crowd. It’s shouldn’t be a shock to see Crosby, I mean Pittsburg, on the list as well. In fact Crosby draws about 950 extra people to the places he visits. It’s interesting to note, but Nashville is the worst team at attracting audiences in foreign arenas at -600 per game. It would be in the best interest of the NHL as a whole to move this team as they’re hurting the NHL by about 25,000 guests ($1 million) just with Nashville’s road trips (in ticket sales).

Ticket Prices.

Ticket prices could be a useful variable over long periods or with decent market information, in this model increase in ticket prices causes attendance to increase. So it isn’t useful at this present time.

Year

Of course this is the variable that matters most all the above work is to factor out the important variables so we can see how poorly the NHL is doing. And it works out to -300 people per game or $16 million in ticket sales (not sure how this correlates with booze purchasing at games and merchandising outside of games). That works out to a 1.2% drop in attendance; however there was a 1.6% real increase in ticket prices. So the 98.8% of the people who are still coming are paying 1.6% more, which basically works out to an increase of 0.4% in revenue. However actually doing the math on ticket sales (attendance * average real ticket price) works out to 3% ($712,152.05 per game vs. $732,636.94) increase in revenue from ticket sales, which suggests that the loss of attendance is occurring in places where it effects revenue less. It's not all bad news for attendance though, it's still higher than 2003-2004, by about 150 guests per game.

January 11, 2007

2005 - 2006 Frequency Standings


Nlong_nameWLTePTSPTSSWTLOLLWCBRW
1Detroit Red Wings692381231243.1155%41%56%71%0
2Ottawa Senators6525101191132.5453%41%62%62%1
3Dallas Stars5728161121122.9549%43%49%62%0
4Carolina Hurricanes5830121111123.3249%45%48%67%4
5Buffalo Sabres6226121161103.3554%40%47%67%2
6Nashville Predators5732111071063.3052%43%46%62%0
7Calgary Flames5930111101032.9657%38%46%54%0
8New Jersey Devils5332161061012.7550%42%46%52%1
9Philadelphia Flyers5533121051013.3052%42%43%60%0
10New York Rangers5136131001003.0548%45%45%56%0
11San Jose Sharks573410105993.3749%47%48%64%1
12Anaheim Ducks543412103983.2449%45%45%60%2
13Edmonton Oilers483220103953.7146%47%37%66%3
14Colorado Avalanche543611101953.2350%45%44%59%1
15Montreal Canadiens5242692933.2950%47%45%56%0
16Vancouver Canucks50401094923.4750v46%41%58%
17Tampa Bay Lightning50381297923.2947%48%44%59%0
18Toronto Maple Leafs51361299903.3250%45%42%59%
19Atlanta Thrashers49391296903.2545%49%44%60%
20Los Angeles Kings4844989893.1547%49%44%55%
21Florida Panthers45441187853.3948%47%37%53%
22Minnesota Wild44461084843.1541%54%46%57%
23Phoenix Coyotes4546985812.9044%52%48%52%
24New York Islanders39461582783.2043%50%36%51%
25Columbus Blue Jackets38491479743.4339%56%37%58%
26Boston Bruins38501277743.5345%49%32%52%
27Washington Capitals32511773703.5342%51%28%50%
28Chicago Blackhawks33571067653.0938%57%37%48%
29Pittsburgh Penguins2962959582.8933%63%40%47%
30St. Louis Blues25591560573.4042%51%23%41%

I've added RW - rounds won. Reference in previous post, for the TL, OL etc.

If one assumes CB is the most important playoff stat, then it should be no surprise that Edmonton did well (66%) and Carolina won (67%). It's a bit of a shock that Edmonton beat Detroit (71% vs. 66%), however these are odds not guarantees. Of course there are other factors such as pre-post-season trades and injuries to consider. Of course this is one playoff dominated by two teams worth of data (Edmonton and Carolina are involved in 7/15 series)

Nteam1TL1OL1LW1CBteam2TL2OL2LW2CB2W1W2
1DET
0.550.410.560.71EDM
0.460.470.370.6624
2S.J
0.490.470.480.64EDM
0.460.470.370.6624
3ANA
0.490.450.450.60EDM
0.460.470.370.6614
4CAR
0.490.450.480.67EDM
0.460.470.370.6643
5DAL
0.490.430.490.62COL
0.500.450.440.5914
6ANA
0.490.450.450.60COL
0.500.450.440.5940
7NSH
0.520.430.460.62S.J
0.490.470.480.6414
8CGY
0.570.380.460.54ANA
0.490.450.450.6034
9CAR
0.490.450.480.67N.J
0.500.420.460.5241
10N.J
0.500.420.460.52NYR
0.480.450.450.5640
11BUF
0.540.400.470.67PHI
0.520.420.430.6042
12OTT
0.530.410.620.62BUF
0.540.400.470.6714
13CAR
0.490.450.480.67MTL
0.500.470.450.5642
14BUF
0.540.400.470.67CAR
0.490.450.480.6734
15OTT
0.530.410.620.62T.B
0.470.480.440.5941

So I did a binary logistic regression: (1/(1+exp(-(a+bx))) on the difference of each variable in the series. One did come out significant, however, there simply is too little data to be sure of any of the results. I'll have to go back a few years to fix that, but for now I'll work with what I got. So the percentage of concordant results is below with the p-value (probably it's a bad model) in brackets.
Difference in TL: 64% (20.2%) Negative (Fewer goals for when tied results in more winning)
Difference in OL: 59% (37.4%) Positive (More goals against when tied results in more winning)
The above two statements are nonsense. Although what it could be saying is that scoring goals when the game is tied is easier in the regular season than the playoffs so being good in the regular season may not help you that much in the playoffs. TL is almost directly correlated to OL (OL+TL ~ 1), so doing a regression on one or the other should have relatively the same results
Difference in LW: 52% (78.9%) Negative (Being better at winning when you have the lead doesn't result in more wins)
With a p-value of 78.9% anything here is nonsense and well as you can see it's nonsense. However, it's been said: "Offense wins games, defense wins championships", this result doesn't exactly support that claim as LW could be perceived as a defense statistic, although if you score a lot of goals when you're in the lead you won't need much defense.
Difference in CB: 80% (2.6%) Positive (Better ability to come back results in more wins).
Even with this tiny model the p-value falls below the "magical" 5%. The estimates (a,b) are not significant, but work out to about 6% increase in winning percentage for every 1% improvement in comeback frequency, so a 5% difference gives you 30% better chance of winning the series (that's good!). Of course since the estimate is wrong it could be anywhere from 0% to 12%. So my intuition about the last playoffs is correct: teams who can comeback did well, whether this is a trend or just the result of last playoffs is still up for analysis.

This shouldn't be too much of a shock as teams who can comeback demonstrate resiliency, determination and grit, much of what is needed for a successful playoff run. I'll run 2003-2004 some other time, but it's an interesting start. Also you have to win that elimination game to actually win the series, which isn't easy against a team that doesn't give up. Of course it's likely that an extremely dominant team could have a bad CB and still do well. Although it might be counter intuitive CB depends on goaltending as well as scoring, as the goaltender needs to keep the game close in order to give his scorers a chance to comeback.

January 9, 2007

2006-2007 Frequency Standings

A little over a month ago Earl Sleek presented a different way of looking at winning. The basic premise is that all that matters is when the "winning team" changes. So you only consider when a team lets in that goal to tie the game or breaks a tie not the one that makes the game go from 5-2 to 6-2. The list below presents most of the relevant results. As stated by Earl Sleek: "Frequency counts are simplistic in that they do not consider how long situations last, but rather just counts how often situations arise." Using the frequency counts one can simulate thousands of games based on those numbers to get expected percentages of wins, losses and ties. Now I'm not giving points for OTL (you can add I 4 points to each team as a reasonable estimate), and I'm giving teams who go to a SO 1.5 points (1 point for L, 2 points for W assumed to be a 50-50 split). The results look reasonable in terms of a season prediction for expected points. In fact I really like this style of logic as it doesn't chase those blow-out games like a Pythagorean prediction would.

You will notice bottom teams struggle to hold onto a lead giving it up over 70% of the time. The best teams often have a great ability to comeback in games with the exception of Detroit the top 10 teams on this list come back at a rate above 60%, Detroit wins as a result of their impressive 58% win percentage when they take the lead.

Personally I think of these as more useful for the Playoff series predictions as I believe that playoff hockey follows slightly different patterns than the regular season, a hypothesis I have for example is that CB (comeback %) dominates other factors in the playoffs so Detroit will do poorly again and Buffalo/Anaheim/Montreal will be the dominant teams and Pittsburgh if they make it could be a dark horse (8th seed goes to Western?), of course that's just a theory (I'll test it when I get more time).

Nlong_nameWLTPTSSWTLOLLWCB
1Anaheim Ducks5514131302.9461%32%48%66%
2Buffalo Sabres5216131252.8153%38%52%67%
3New Jersey Devils4819151192.5249%41%54%61%
4Nashville Predators5121101173.0857%37%47%61%
5Detroit Red Wings522461122.5756%40%58%55%
6Montreal Canadiens4323151103.0544%47%47%65%
7San Jose Sharks552701092.6253%47%70%61%
8Dallas Stars462691072.6645%49%59%61%
9Atlanta Thrashers4229111013.1946%48%45%61%
10Minnesota Wild352720993.0938%51%40%61%
11Vancouver Canucks44308992.7853%42%48%49%
12Calgary Flames45334962.7053%44%52%48%
13Boston Bruins403111963.3953%41%37%53%
14Washington Capitals403210952.6344%49%51%51%
15Pittsburgh Penguins393310943.4346%48%40%59%
16New York Rangers383312933.0047%47%41%51%
17Toronto Maple Leafs383311922.9947%47%41%51%
18Carolina Hurricanes43354912.7147%51%56%52%
19New York Islanders373410902.3845%48%50%41%
20Edmonton Oilers39376872.6048%48%50%44%
21Ottawa Senators42382872.4451%47%56%42%
22Chicago Blackhawks363610862.9446%48%41%48%
23Colorado Avalanche39376863.4450%47%39%53%
24Phoenix Coyotes36388842.6742%53%50%49%
25Tampa Bay Lightning36388833.1347%49%40%50%
26Florida Panthers333811823.1437%56%43%57%
27Los Angeles Kings314210763.1038%57%41%53%
28Columbus Blue Jackets32446732.6640%56%48%45%
29St. Louis Blues244512683.1841%52%29%43%
30Philadelphia Flyers22546532.9841%55%29%37%
The random expected error for this data is around +/-10% (95% confidence intervals - for the Tl, OL, LW, CB), so take the results as you like.
W - wins, L - losses, T - ties (for 82 game season) - estimated using simulation.
PTS - expected points using this method.
SW - average score switching - high numbers likely indicate difficulty maintaining a lead (bad defense/goaltending)
TL - Percentage the team takes lead.
OL - Percentage the opposition takes lead.
LW - Percentage the lead results in win. (otherwise it is tied again).
CB - Percentage of come backs - # of times team re-ties game. (otherwise opposition wins).