Archive for the ‘Stat Workshop’ Category

Fancy stats for tonight’s OT thriller against the Sharks…

Fenwick chart for 2014-01-25 Wild 2 at Sharks 3 (OT)

Minnesota Wild individual Corsi

Player Pos ES TOI Corsi Net Zone Starts
Keith  Ballard D 13:27 -4 0
Charlie  Coyle C 19:52 -9 +7
Clayton  Stoner D 10:06 -5 -1
Marco  Scandella D 22:34 -8 +2
Zach  Parise L 19:51 -9 +7
Justin  Fontaine R 13:31 -4 +1
Dany  Heatley L 16:06 -4 0
Torrey  Mitchell C 7:49 -9 0
Ryan  Suter D 28:12 -17 +5
Kyle  Brodziak C 13:42 -8 0
Nino  Niederreiter R 17:03 -4 +6
Matt  Cooke L 14:15 -12 0
Jonas  Brodin D 24:40 -12 +6
Mike  Rupp C 6:40 -1 0
Jason  Pominville R 18:27 -5 -1
Nate  Prosser D 19:57 -8 0
Erik  Haula L 7:15 -6 0
Mikael  Granlund C 17:57 -3 -2

San Jose Sharks individual Corsi

Player Pos ES TOI Corsi Net Zone Starts
Jason  Demers D 17:52 +7 -1
Brad  Stuart D 20:38 +2 +2
Joe  Pavelski C 20:24 +12 -5
Andrew  Desjardins C 16:18 +4 -3
Patrick  Marleau C 19:53 -2 +1
Mike  Brown R 3:57 +3 0
Joe  Thornton C 20:23 +18 -7
Dan  Boyle D 19:43 +15 -7
Bracken  Kearns C 10:58 +10 0
John  Mccarthy L 6:26 +4 -1
Marc-Edouard  Vlasic D 18:27 +7 -1
Matt  Irwin D 20:45 +20 -7
Tommy  Wingels C 15:50 +2 +1
Justin  Braun D 21:32 +2 +2
Eriah  Hayes R 8:18 -1 -2
Tyler  Kennedy C 14:06 +8 -1
Matt  Nieto L 16:21 +10 +1
Brent  Burns R 19:37 +5 -3

Recently, an interesting article was published to the NHLNumbers website examining the behavior of skaters and their shooting tendencies, or how much they could be considered a “Puck Hog.” The article, written by Ben Wendorf, proposed an elegant way of measuring how frequently a player opts to shoot the puck.

Simply, the metric looks at the proportion of On-ice Fenwick (A’) that was represented by a specific player’s Fenwick shots (A). This number can be thought of as the Attempted Shot Ratio.

Ben found that this number did not correlate to things like shooting percent or quality of teammates, but that players showed the same tendency year after year. Thus, he concluded, “With high repeatability, and little connection to shooting talent, player development, or teammates, this seems much more like a behavioral activity.”

The article got me thinking about some of my own research that I did way back in the first month or so of writing this blog. I was messing around with players’ shots on goal versus missed shots, and determined what proportion of Fenwick shots were shots on goal (SOG/FEN), which I called Shooting Ratio. I had similar findings—that players could be productive even if they missed the net a lot, and the behavior was stable year after year. Something that stuck out to me during that research was the almost perfectly normal distribution of skaters’ ability to hit the net, so it was not much of a surprise to me when I checked with Ben and found that Attempted Shot Ratios also showed that perfect bell curve that we stat heads swoon over.

After mulling things over, I thought it would be insightful to combine Ben’s “Puck Hog” metric with my missed shot data to provide more of a two-dimensional look at player behavior. Or, simply,

How often do players choose to shoot, and how often do they hit the net?

Method:

I decided to look at just Minnesota Wild forward skaters from 2011-12 for this post, although I am working on a bigger dataset to look at more of a sample. I crunched all the numbers to determine players’ even-strength Attempted Shot Ratios (how often does the player take the shot when he’s on the ice) and Shooting Ratios (what proportion of those Fenwick shots are shots on goal) using the formulas listed above. All computations use 5v5 data.

Population Parameters

After initially just comparing the 11-12 Wild players to each other, I realized that I would be better served using ‘population’ data that came out of Ben’s and my research. Specifically,

Attempted Shot Ratio: Mean = 0.244, SD = 0.05 (N = 2177 NHL forwards, 2007-2012)

Shooting Ratio: Mean = 0.737, SD = 0.074 (N = 492 NHL forwards > 10 GP, 2011-12)

Results:

The tables below list Minnesota forwards with their attempted shot ratio and shooting ratio. The z-scores associated with each are calculated using the population parameters listed above. Due to several players who wore the same number at different times, some numbers have been substituted for dummy numbers (Wellman, Taffe, Veilleux.)

Name AttShRatio ASR-Z Sh Ratio ShR-Z
Almond, Cody (27) .205 -.789 .778 .551
Bouchard, Pierre-Marc (96) .265 .414 .803 .885
Brodziak, Kyle (21) .266 .430 .712 -.338
Bulmer, Brett (19) .233 -.229 .700 -.500
Christensen, Erik (26) .205 -.787 .727 -.131
Clutterbuck, Cal (22) .317 1.462 .686 -.684
Cullen, Matt (7) .229 -.302 .872 1.083
Heatley, Dany (15) .255 .221 .716 -.280
Johnson, Nick (25) .281 .747 .817 1.083
Koivu, Mikko (9) .218 -.513 .748 .144
Latendresse, Guillaume (48) .314 1.391 .703 -.464
Mcintyre, David (34) .214 -.594 .667 -.951
Mcmillan, Carson (45) .215 -0.572 .571 -2.238
Ortmeyer, Jed (41) .383 2.774 .645 -1.241
Palmer, Jarod (79) .265 .426 .846 1.475
Palmieri, Nick (17) .323 1.575 .747 .128
Peters, Warren (43) .273 .581 .662 -1.010
Powe, Darroll (14) .279 .700 .697 -.538
Rau, Chad (36) .256 .248 .700 -0.500
Setoguchi, Devin (10) .284 .796 .780 .578
Taffe, Jeff (97) .286 .834 .750 .176
Veilleux, Stephane (98) .228 -.326 .696 -.559
Wellman, Casey (95) .309 1.302 .706 -.421
Zucker, Jason (16) .345 2.017 .800 .851

When charted, the z-scores can be informative because they fall into one of four quadrants on the chart. Attempted Shot Rate is on the X-axis, and Shooting Rate is on the Y-axis. For lack of better terminology, they are I – shooting inclined + accurate shot; II – shooting inclined + inaccurate shots; III – shooting averse + inaccurate shot; IV – shooting averse + accurate shot. (I really dislike using “accurate/inaccurate” here but I just don’t have a better pair of descriptors right now. If you have suggestions please share!)

*Click the chart to view the full-size version.

Discussion:

The first thing that jumps out to me is that the majority of players fall into quadrants I and II…so it’s a team full of puck hogs? Apparently so. Recall that the ASR data here are compared to the five-year population parameters, so the Wild have a lot of guys who choose to shoot the puck. I am trying to avoid value judgements here since a higher value ASR doesn’t necessarily represent “better hockey,” just a different playstyle. On the other hand, Sh Ratio is less subjective–making the goalie make a save is almost always better than missing the net. I know some times a player will shoot wide on purpose to try to make a play, but for now I am using the assumption that SOG > missed shot.

So the Wild had guys who were looking for shots but they had the fewest shots per game and goals per game. What to make of this? I wouldn’t necessarily expect a team-wide pattern to emerge when looking at this kind of plot, but it would seem to make sense that having more of a balance between shooters and set-up guys would produce more quality scoring chances. I would also think that a team distribution like this would indicate that the Wild players were predictable–the opponents probably figured out quickly that guys would be looking to shoot instead of making the extra pass, so perhaps that predictability suppressed their shot totals by making them easier to read. I’m just spitballing here, but if the Wild ranked high in a stat that measured blocked shots against, there might be something to this theory. But honestly at this stage in the research I’m still just digesting the numbers so again, if you have thoughts let’s hear them!

At the individual level, it is important to keep in mind that there isn’t really a “good play vs bad play” distinction here, it’s all just a measurement of behavior. Because we know that both ASR and ShR are normally distributed, it makes the indivudal results more interpretable, but a player who has a high attempted shot ratio isn’t necessarily doing his team any favors if he consistently misses the net more than his teammates. I do like the quadrant setup because it helps classify players. Matt Cullen doesn’t shoot much anymore, but when he does pull the trigger he hits the net more than any of his teammates. Mikko Koivu, known as a playmaker, prefers to make an extra pass rather than shoot, and is about average at hitting the net. Pierre-Marc Bouchard, another playmaker, actually shoots the puck more than I would have guessed, and is an accurate shot. Nick Johnson, despite having a strong tendency for the uncontrolled zone entry, pulls the trigger often and makes the goalie make a save. Dany Heatley is as close to zero on both scales as any player, which is probably a surprise to most of us. Cal Clutterbuck and Guillaume Latendresse have a strong tendency to shoot, but are somewhat wild.

In the end, it is my hope that this kind of multidimensional shooting analysis can be helpful in our understanding of players’ tendencies. This kind of stuff can’t be used in isolation, but rather it all works together to paint a picture and gives us a peek inside the head of the players we enjoy watching. The game of hockey is just a series of decisions, nearly all of which must be made at a speed where there is no time for conscious thought but instead are instinctual. I will be excited to continue this analysis to expand to different teams and different years. If you have questions or comments I would love to hear them. You can comment here on this page or though Twitter, @Hashtag_Hockey. Thanks for reading!

Starting with the Philadelphia Flyers and recently adding the Minnesota Wild, we have learned quite a bit about the nature of zone entries as they relate to offense generated in the NHL. We now know that for both top- and bottom-scoring teams, uncontrolled entries (dump-in and tip-in) generate on average 0.3 Fenwick shots per entry, while controlled entries (carry-in and pass-in) generate 0.6 shots per. We have also observed that there is almost no difference in the shots generated by third line grinders and top line snipers across the different entry types.

However, while cracking into this area of analysis has produced some very remarkable findings, there are many more questions to be answered about zone entries and how they translate to offense. Here are some specific questions for the Wild that I hope to answer as I continue to research this part of the game, and preliminary ideas of how to tackle them:

 Team Zone Entry Distribution Over Time

We know that the dump-and-chase game is less productive than the straightforward zone entry, and we have seen that the Flyers utilize the dump-in less than the Wild. But to what extent does a different coaching staff dictate the proportion of zone entries? With this past season being the first with Mike Yeo behind the bench, I would want to look at the 2010-11 season under Todd Richards to see if a mostly similar roster showed a similar controlled/uncontrolled ratio, and if the shot output was the same. For that matter, it would be interesting to see how closely Dany Heatley and Devin Setoguchi’s entry ratios compare from year-to-year, and whether and how they differed playing under different coaches.

 Team Zone Entry Variance Within Season

To this point, we have looked at season-aggregated data and average shots per entry. Research using Flyers data has revealed that some things will take more than a season’s worth of data. But I want to know what the amount of variance there is within-season for number of zone entries and controlled/uncontrolled entry ratio, potentially giving us a glimpse at the strategy and game planning.

 Zone Entry Ratio Between Opponents

With just one season of data, I am thinking about looking at just division opponents to see if there are patterns in how the Wild play teams like the Canucks or Flames, and if they differ significantly from the overall season data. I do worry about small sample size, with just three home and three road games per division opponent per year.

 Big Lead/Big Trail Statistics

One of the main dismissals of advanced stats in hockey goes something like, “Corsi is stupid because when teams are winning the game they sit back and let the other team take more shots.” Researchers have answered this by examining close game situations, big leads and big trails, and I want to dive into zone entries the same way. When the Wild get up by two goals, to what extent do they shift their strategy to a dump-and-chase? This will be great to look across teams, because it will give us a better idea of coaching philosophies of the teams.

 Within-Game Zone Entries

This is still a half-formed idea, but it seems to me from the eyeball test while watching the games that in the early minutes of a period or a game, both teams are much more likely to execute a dump-in. Score effects will likely trump time effects, but this may be something to look into.

Bottom Line: Not Enough Hours in the Day

Ultimately, the answers to these questions will come with more data: going forward we can continue to track games (if and when we ever get them back) and going back to previous seasons would give us a lot more context. We could see how teams change their strategy over time, and how players develop their skills as they progress through their careers. We could see if patterns emerge as to how teams play their division opponents, and if they are able to adapt their strategy (entry ratio) within a season. However, for now the only way this data can be collected is by a human being putting two eyeballs on a screen, and at 90-minutes a pop the data will be slow to come in. I am wondering how many games it would take from a previous season to get a representative sample. There is pretty much no way to get a computer to code zone entries, and going forward it could be done with a series of cameras in the arena, but that’s just not going to happen any time soon, though perhaps if they have success with it in the NBA it may transfer over. When I get my genius grant, maybe I’ll work on training rhesus monkeys to code zone entries, but for now, I’ll just have to keep using that elbow grease.

If you have additional ideas or research questions, please feel free to shoot them at me on Twitter, @Hashtag_Hockey or in the comments section below. Thanks for reading!

.

Abstract (tl;dr version)

The present research examines the first sixty games of the 2011-12 Minnesota Wild hockey season, and looks at zone entries (controlled vs. uncontrolled) in terms of shots generated. It was found that controlled entries like carrying or passing the puck into the offensive zone generate twice as many shots as uncontrolled entries such as a dump-in or tip-in. Controlled entries generate an average of 0.6 shots per entry, while uncontrolled entries generate an average of 0.3 shots. Individual Wild players are examined, and implications on coaching and strategy are discussed. If you are familiar with previous research on zone entries, click here to jump to the results.

Acknowledgements

I would like to thank Eric T and Geoff Detweiler from broadstreethockey.com for getting me started on this project, sharing their spreadsheets, and patiently answering all manner of “What the heck am I doing?” e-mails. Thanks guys!

Intro/Background

In the world of sports analytics, the game of hockey is a different beast than others like baseball or football. In those sports, the game is segmented into discrete events that can be easily studied: the pitch and the at-bat in baseball and the play or the series or the drive in football—these are easily identifiable events that have a distinct start and end. Hockey is a fluid game, with players changing lines on the fly and the action moving up and down the ice uninterrupted, sometimes for several minutes at a time. For hockey researchers, this presents a problem because it limits the amount of data we can collect—until now we have pretty much been limited to recording simple events like shots on goal. But recently some of the brightest minds in the field have developed new technology to calculate a wealth of new statistics and exponentially enhance our understanding of the game. I have been fortunate enough to come along for the ride.

Hockey is fundamentally a game of possession—it is almost impossible to score a goal if you don’t control the puck. But there isn’t a good way to measure or quantify possession right now—in football they track time of possession, but in that sport either one team or the other controls the ball.  In hockey, there are times when no team controls the puck, and the game can stall in the neutral zone or along the boards where players battle for possession (perhaps Schrodinger would suggest that in these instances, bothteams have possession, but that’s a different discussion.)

If Schrodinger was a hockey fan

The present research uses zone entry as a surrogate for possession, because thanks to the play-by-play data that the NHL makes available for each game and some very nifty spreadsheet wizardry, we can extrapolate shot and goal data based on zone entry type and timestamp.

Definitions

I want to make sure anyone can understand this article, even those who may not be familiar with the rules of hockey, so here are some definitions if you need them.

Defensive Zone/Neutral Zone/Offensive Zone: The ice is divided into three equal parts—a team’s defensive zone is where their own goalie is (what they’re trying to protect,) the offensive zone is where the opponent’s goalie is (where they’re trying to score,) and the neutral zone is the middle of the ice between the two blue lines (see diagram.)

Controlled Zone Entry: A player skates with the puck from the neutral zone into his team’s offensive zone, or a player passes the puck across the blue line to his teammate as he is skating into the offensive zone.

Uncontrolled Zone Entry: A player “dumps the puck in” to the offensive zone by flipping it though the air or sending a hard shot along the ice, where his teammates try to regain possession of the puck before the opponents do. Or, a player who is positioned at the blue line receives a pass from his teammate, and he redirects it or “tips it in” to the offensive zone, where again his teammates battle for possession.

Fenwick: Shots + Missed Shots = Fenwick. Hockey researchers prefer to look at more than just shots on goal to provide a more robust analysis of the game. Fenwick totals are understood to be the sum of shots on goal and missed shots by any player on the ice.

.

Dependent (Outcome) Variables

In addition to goals, the present research uses shots and missed shots (Fenwick, see Definitions above) as a broader representation of a team or player’s offensive ability. The same way baseball researchers are interested in hits as well as runs; and football researchers care about yards as well as touchdowns, hockey nerds use shots to measure offense. The same way that hits lead to runs and gaining yards result in touchdowns, more shots are more opportunities for goals. To quote the great philosopher Wayne Gretzky, “You miss 100% of the shots you don’t take.” The key reason to study shots (and more broadly missed shots and sometimes blocked shots) instead of just goals is that…well, goals are hard to come by! In hockey, there is a man whose job it is to stand in front of the goal, and his entire paycheck is dependent on how many times he can place his body between the net and the puck so that little bit of rubber crashes into him instead of going into the goal. And there are a lot of very talented goalies in the NHL that can make it very frustrating for guys like Boston’s Milan Lucic. So, because goals are a generally rare occurrence and shots are more bountiful, we get better measurements when we use Fenwick.

Research Questions

Is there a difference between controlled entries and uncontrolled entries in terms of the number of Fenwick shots generated and goals scored as a result of each? Or…is it better to carry the puck into the offensive zone, or dump it in?

Methodology

To collect this data, I used the NHL’s Game Center Vault feature <link> which allows me to go back and watch all the games from the 2011-12 season.* While watching each game, I made an entry into an Excel spreadsheet recording the following information: time, entry type (see table below,) player number if it was the Wild, or just “Opp” if it was the opponent, and team strength/opponent strength. That is, 5-5 for even strength, 5-4 if the Wild were on a power play, or 4-5 if the Wild were killing a penalty. Only 5-vs-5 data were included for data analysis. For this research, I did not record a “dump and change” as an uncontrolled entry when a player flipped the puck in and immediately went for a line change.

The entry type codes are as follows:

Controlled Entries: C = Carry in, P = Pass in

Uncontrolled Entries: D = Dump in, T = Tip in, X = Other

Here is a sample of data entry from one of the games:

Once the data were entered, I imported official NHL play-by-play data into a separate worksheet. The NHL website makes this data available for each game <link>. The original researchers wrote macros to do the rest of the work—based on the time stamps, the macros figure out how many shots and goals occurred during each zone entry. Finally, the outcome data for each game is copied and pasted into another file that aggregates season totals. “I love technology/ but not as much as you, you see…”

*Actually, not all the games—one contest has been missing from the Vault, the November 3rd matchup between the Wild and the Vancouver Canucks is not available. I have notified the good folks over at NHL.com about this several times, but they haven’t fixed it yet.

Results

This write-up analyzes the first 60 games of the Minnesota Wild 2011-12 season. To this point, 11,105 entries have been recorded for the Wild and their opponents.

Team-Level Results

On average, the Wild enter the offensive zone 46.4 times per 60 minutes at even-strength, while their opponents gain the zone 52.7 times. While it is always better to spend more time in your offensive zone than your defensive zone, the Wild are known to play a defensive style so this discrepancy does not seem too alarming. However, over the course of an 82-game season, as that difference adds up cumulatively, the Wild are getting a lot less chances to score than their opponents.

The Wild get 0.42 shots per entry (uncontrolled and controlled combined,) while their opponents get 0.45 shots/entry. Minnesota’s entry ratio is 44% controlled, while their opponents are split 50-50 between controlled and uncontrolled. The Wild get 0.58 shots per controlled entry and their opponents get 0.60, while uncontrolled entries lead to 0.29 and 0.30 shots respectively. The number of shots generated for the different categories of zone entry are remarkably similar, and these findings are consistent with research on other teams (links to other studies will be provided at the end of this article.)

The Wild get 0.34 shots per offensive zone faceoff (both won and lost) and 0.29 shots per defensive zone entry. I have not checked these specific figures with other researchers but 0.34 per OZF seems low.

Player-Level Results

Due to the volume of injuries the Wild sustained last year, sheer numbers of zone entries do not accurately reflect individual player performance. When I have completed the entire season, I will provide a more thorough analysis of player’s contributions. For now, I will just give some simple stats.

Discussion/Future Research

Based on this research and that of other teams, we can definitively say that controlled entries lead to more shots on goal, by a ratio of two-to-one. Teams get 0.6 shots per carry-in or pass-in, and 0.3 shots per dump-in or tip-in. It is interesting to note that these figures hold true for top liners like Dany Heatley and Mikko Koivu, as well as grinders or players with a defense-first reputation like Darroll Powe. Often, such players are coached to focus on playing a dump-and-chase game and an aggressive forecheck, but in terms of pure shot production, they are just as effective at creating shots from controlled entries.

Hockey is like poker, in that if you do the same thing all the time, your opponent will catch on and adjust to negate your tactics. Dump-and-chase and strong forechecking are necessary to a well-rounded strategy for any team, but based on the data that we have available, it should be understood that such tactics lead to vastly decreased opportunities for shots, and therefore goals. But anyone who has watched a game of hockey knows there are times when a controlled entry is just not possible–sometimes the only option for a player is to whack the puck into the zone. Obviously, an uncontrolled zone entry is preferable to no zone entry, but the data suggest that coaching players who are stereotyped as “grinders” or “energy guys” to dump the puck in reflexively may be hindering the team’s offensive chances. While watching these games, it seemed to me like Nick Johnson couldn’t wait to flip the puck deep into the opponent’s zone, and even when he had a chance to carry it in he would opt for a dump in.

The data on Marek Zidlicky presents a good case for using shot-based metrics instead of goal-based. Zidlicky had exactly ZERO goals this season in a Wild sweater, but his controlled entry ratio was better than any other defenseman, and the team got more shots from his controlled entries than other blueliners. His departure from the team became inevitable, but the numbers here suggest that though he didn’t get the point production that was expected from him, he was still effective at creating shots.

I am very excited to be nearing the completion of this dataset, as I will be able to do quite a bit more with the numbers when I have a complete season. The information presented here is just the tip of the iceberg, and there are lots more opportunities for analysis, including: examining differences in home/away games, tracking zone entries when the team is leading, trailing, leading or trailing big (2+ goals), looking at division opponents, who are usually more familiar with team strategies…the list goes on and on. Moving forward, if and when there is a 2012-13 season, I will continue to track entries, but I am thinking about expanding the “Other” category to specify things like defensive zone turnovers, in an effort to quantify forechecking. If you have your own questions or thoughts, please do not hesitate to post a comment here, or you can get in touch with me via Twitter (@Hashtag_Hockey) or e-mail (hashtaghockey@gmail.com).

Thanks very much for reading, and until next time, remember to hit that blue line and keep going…always and forever…always and forever.

Links to Broadstreet Hockey Flyers Zone Entry Articles

Flyers Zone Entries 1: Opening Statement

Flyers Zone Entries 2: Individual Puck Handling

Flyers Zone Entries 3: Off-puck Contributions

Flyers Zone Entries 4: Team -level Results

I have been thinking a lot about missed shots lately, wondering what (if anything) they can tell us and how we could incorporate them into statistical hockey analysis. Of course, at the team level, Fenwick comparisons are informative to look at, and things like score effects can tell us how players perform in different game situations. Those kinds of numbers tell use how specific players impact the game when they are on the ice vs when they are off. But I have been wondering how we can look at missed shots for a player compared to his shots on goal (and also blocked shots, but more on that in a minute.) I decided to do some exploratory research–meaning I don’t have a particular hypothesis in mind, I just want to get my hands dirty and dig through some data to see if anything interesting comes out.

As seen in the above clip, missed shots are not all created equal. Ovechkin’s first slap shot sails wide and hits the boards, while his second attempt is driven hard and ricochets off the post. Both of these are counted the same on the stat sheet, meaning it’s difficult to say that missed shots indicate a particular skill or measure *one* thing. That said, not all shots on goal are created the same either. Some are laser shots that require a great save by the netminder, some are soft shots that are no trouble at all to handle, and sometimes a goalie will reach out and glove a shot that was otherwise going to go wide, essentially turning a missed shot into a shot on goal (I noticed Braden Holtby has a fondness for doing this.) Additionally, as in the clip above for Ovi’s second shot, hockey is a dynamic game where a skater in motion attempts to shoot a puck that is also in motion toward a fixed goal, with constantly changing angles and obstacles.

What we can say is that a skater cannot record a missed shot without having the puck and at least a reasonably good look at the net, so MS could function as an indicator of puck possession and inclination for pulling the trigger on a shot. Additionally, because the difference between a shot on goal and a missed shot can be a split-second timing difference or a half-inch change in shot trajectory, I tend to think that looking at shots on goal and missed shots together is more representative of a player’s behavior on the ice than just SOG. What about blocked shots? It would seem logical to look at SOG, MS, and BS together because those are the only three possible outcomes when a player launches the puck off his stick, and certainly that is what Corsi does, BS are a bit more murky because a BS can be due to another player’s skill in getting in front of pucks, or the shooter’s bad decision to shoot into the legs of the other players. From my perspective, SOG + MS indicate puck possession and opportunity where BS brings in more confounding variables. So for now, let’s just look at Sh + MSh, which I will refer to as SMS.

Literature Review: I did find a couple other articles on this topic, including a couple from Hawerchuk himself over at Arctic Ice Hockey:

http://www.arcticicehockey.com/2011/6/1/2198386/missing-the-net-redux

http://www.arcticicehockey.com/2012/2/15/2792515/stu-hackel-week-on-arctic-ice-hockey-improved-shooting-accuracy-stats

My approach is admittedly simpler than his, as I am not splitting my data into home/road, or only 5v5 situations. Nevertheless, I believe that there is something to be said for just looking at aggregate numbers from a season (though it necessitates looking at historical data to add context. Again, more on this later.)

Methodology: For this study, I pulled data from the NHL.com stats portal for the top-600 players ranked on goals scored for the 2011-12 season. I have mentioned that I am primarily interested in fantasy hockey, so 600 players is certainly more than the pool of fantasy-relevant players. I added their shots and missed shots together, and then looked at the proportion of a player’s SMS that were in fact SOG. So for example, a player that had 200 S and 100 MS (300 SMS) would have a shot ratio of .67 (200/300). Knowing that the nature of shots taken by forwards and defensemen are different (i.e. shots from the blue line are more likely to miss the net,) I split the data by Forward and Defense. On the one hand, I wanted to take a large sample, but on the other hand, this naturally included some outliers–see: Akim Aliu (RW-CGY) with just 3 GP, 3 Sh, 2G, and zero MS. So, I filtered out players that had less than 50 shots, thinking that while this is completely an arbitrary number, I would be hard-pressed to find a fantasy-relevant player with so few shots. The final samples included 352 Forwards and 142 Defensemen. When I charted the shot ratios of the samples, they fell rather neatly into some nice normal distributions:

Fig.1: 2011-12 Shot Rate Distribution, Forwards

I’m trying to ensure these charts are readable, still working on that, I’m afraid it’s my design template…but in any case: N = 352, M = 0.74, SD = 0.04.

Fig. 2: 2011-12 Shot Rate Distribution, Defense

N = 142, M = 0.69, SD = 0.04

The bell curve on the forward distribution is very evident, and while the defense distribution is a little choppy, the data are backed up by statistical tests of normality–after removing the low-end outliers, both distributions passed a K-S test (p = .200 for forwards and defense) and a Shapiro-Wilk test (p = .687 for forwards; p = .634 for defense.) Note that these tests are opposite of most significance tests in that small p values indicate significant difference from a normal distribution, while large p values indicate no statistical difference from normal.

So the data are normally distributed, great. What does this mean for us? One relevant way is to look at Z-scores for players, meaning how far away they are from the mean of the sample. Let’s brush up on the properties of a normal distribution quickly:

In a normal distribution, the mean value is at the highest point of the curve, while the standard deviation is a measurement of distance from the mean (either direction). As the image above shows, 68.2% of all data fall between +/- 1 SD from the mean, 95.4% are +/- 2 SD, and 99.7% are +/- 3 SD. A Z-Score is simply a reflection of how many SD the data point is from the mean, with the benefit that it is directional (Z-scores could be something like -1.2, meaning the data point is more than 1 SD lower than the mean, or +1.9, which would be almost 2 SD greater than the mean.)

Finally, let’s look at some specific player data!  I’ll have to think of a better way to present a larger list of players, perhaps as the start of next season gets closer I could give a full list in my draft prep kit…for now, if you are wondering about a specific player, just tweet me @Hashtag_Hockey or post a comment here.

Forwards

Jonathan Toews: 185 S, 35 MS (220 SMS), 0.84 s-rate, Z = 2.50

Zach Parise: 293 S, 68 MS (361 SMS), 0.81 s-rate, Z = 1.75

Olli Jokinen: 223 S, 54 MS (277 SMS), 0.81 s-rate, Z = 1.75

John Tavares:286 S, 80 MS (366 SMS), 0.78 s-rate, Z = 1.00

Rick Nash: 306 S, 87 MS (393 SMS), 0.78 s-rate, Z = 1.00

Steven Stamkos: 303 S, 109 MS (412 SMS), 0.74 s-rate, Z = 0.00

Evgeni Malkin: 339 S, 117 MS (456 SMS), 0.74 s-rate, Z = 0.00

Ryan Kesler: 222 S, 95 MS (317 SMS), 0.70 s-rate, Z = -1.00

Alex Ovechkin: 303 S, 135 MS (438 SMS), 0.69 s-rate, Z = -1.25

Anze Kopitar: 230, 103 MS (333 SMS), 0.69 s-rate, Z = -1.25

Mike Richards: 171 S, 103 SMS (274 SMS), 0.62 s-rate, Z = -3.00

Defensemen

Kyle Quincey: 168 S, 53 MS (221 SMS), 0.76 s-rate, Z = 1.75

Mark Streit: 149 S, 49 MS (198 SMS), 0.75 s-rate, Z = 1.50

Dennis Seidenberg: 174 S, 64 MS (238 SMS), 0.73 s-rate, Z = 1.00

Dan Boyle: 252 S, 98 MS (137 SMS), 0.72 s-rate, Z = 0.75

Zdeno Chara: 224 S, 86 MS (310 SMS), 0.72 s-rate, Z = 0.75

Marek Zidlicky: 70 S, 27 MS (97 SMS), 0.72 s-rate, 0.75

Shea Weber:230 S, 105 MS (335 SMS), 0.69 s-rate, Z = 0.00

Dion Phaneuf: 2202 S, 91 MS (293 SMS), 0.69 s-rate, Z = 0.00

Dustin Byfuglien: 223 S, 128 MS (351 SMS), 0.64 s-rate, Z = -1.25

S-Rate Over Time

Of course, because the Z-score is calculated relative to the sample, the next step is to try to approximate population parameters. I just pulled one season’s worth of data for this study, so going forward I’ll have to see how many years I can put together. Additionally, s-rate seems to be pretty reliable from year to year…

Ovechkin–2011-12: 0.69; 2010-11: 0.70; 2009-10: 0.68; 2008-09: 0.71; 2007-08: 0.69

Toews–2011-12: 0.84; 2010-11: 0.80; 2009-10: 0.79; 2008-09: 0.81; 2007-08: 0.82

Mike Richards–2011-12: 0.62; 2010-11: 0.66; 2009-10: 0.71; 2008-09: 0.70; 2007-08: 0.70

Other Thoughts

I have not controlled whatsoever for score effects, QualComp, TOI, PP time, or anything else. Part of the reason for that is that I am working from what I have available from the NHL.com portal, so if I can find more data to merge in (I do have some TOI data but I’m struggling with a data cleaning issue) I will continue to expand the scope. However, I think this way has a certain amount of parsimony, which I like. Einstein said, “Make things as simple as possible, but no simpler.”

Separating blocked shots from SOG and MS has a certain logic, but it also seems like using two of three is just not telling the whole story. It would be possible to include all three in some kind of metric using weights to adjust for MS and BS. Particularly for defensemen, their numbers of BS compared to MS and SOG can get really out of hand (Mark Stuart: 60 S, 40 MS, 182 BS). Maybe this kind of metric is only applicable to forwards due to the nature of where they play and from where they take their shots.

Last, I am curious if there is anything to be learned when looking at different “types” or “styles” of player–sniper, playmaker, grinder, etc., or players whom we typically classify as pass first/shoot first, or even combinations such as the ill-fated Nash/Carter combo in Columbus this year.

Thoughts?

I am very interested to hear what anyone thinks about these data and this methodology. As I said at the start, it’s exploratory research so I was just messing around a bit, but I am optimistic about some of the tests it passed and reliability it has started to show. Could be an interesting way to look at offensive production…?

Alright, enough maths…I’m going to kill some more demons in Sanctuary…