I have been thinking a lot about missed shots lately, wondering what (if anything) they can tell us and how we could incorporate them into statistical hockey analysis. Of course, at the team level, Fenwick comparisons are informative to look at, and things like score effects can tell us how players perform in different game situations. Those kinds of numbers tell use how specific players impact the game when they are on the ice vs when they are off. But I have been wondering how we can look at missed shots for a player compared to his shots on goal (and also blocked shots, but more on that in a minute.) I decided to do some exploratory research–meaning I don’t have a particular hypothesis in mind, I just want to get my hands dirty and dig through some data to see if anything interesting comes out.
As seen in the above clip, missed shots are not all created equal. Ovechkin’s first slap shot sails wide and hits the boards, while his second attempt is driven hard and ricochets off the post. Both of these are counted the same on the stat sheet, meaning it’s difficult to say that missed shots indicate a particular skill or measure *one* thing. That said, not all shots on goal are created the same either. Some are laser shots that require a great save by the netminder, some are soft shots that are no trouble at all to handle, and sometimes a goalie will reach out and glove a shot that was otherwise going to go wide, essentially turning a missed shot into a shot on goal (I noticed Braden Holtby has a fondness for doing this.) Additionally, as in the clip above for Ovi’s second shot, hockey is a dynamic game where a skater in motion attempts to shoot a puck that is also in motion toward a fixed goal, with constantly changing angles and obstacles.
What we can say is that a skater cannot record a missed shot without having the puck and at least a reasonably good look at the net, so MS could function as an indicator of puck possession and inclination for pulling the trigger on a shot. Additionally, because the difference between a shot on goal and a missed shot can be a split-second timing difference or a half-inch change in shot trajectory, I tend to think that looking at shots on goal and missed shots together is more representative of a player’s behavior on the ice than just SOG. What about blocked shots? It would seem logical to look at SOG, MS, and BS together because those are the only three possible outcomes when a player launches the puck off his stick, and certainly that is what Corsi does, BS are a bit more murky because a BS can be due to another player’s skill in getting in front of pucks, or the shooter’s bad decision to shoot into the legs of the other players. From my perspective, SOG + MS indicate puck possession and opportunity where BS brings in more confounding variables. So for now, let’s just look at Sh + MSh, which I will refer to as SMS.
Literature Review: I did find a couple other articles on this topic, including a couple from Hawerchuk himself over at Arctic Ice Hockey:
My approach is admittedly simpler than his, as I am not splitting my data into home/road, or only 5v5 situations. Nevertheless, I believe that there is something to be said for just looking at aggregate numbers from a season (though it necessitates looking at historical data to add context. Again, more on this later.)
Methodology: For this study, I pulled data from the NHL.com stats portal for the top-600 players ranked on goals scored for the 2011-12 season. I have mentioned that I am primarily interested in fantasy hockey, so 600 players is certainly more than the pool of fantasy-relevant players. I added their shots and missed shots together, and then looked at the proportion of a player’s SMS that were in fact SOG. So for example, a player that had 200 S and 100 MS (300 SMS) would have a shot ratio of .67 (200/300). Knowing that the nature of shots taken by forwards and defensemen are different (i.e. shots from the blue line are more likely to miss the net,) I split the data by Forward and Defense. On the one hand, I wanted to take a large sample, but on the other hand, this naturally included some outliers–see: Akim Aliu (RW-CGY) with just 3 GP, 3 Sh, 2G, and zero MS. So, I filtered out players that had less than 50 shots, thinking that while this is completely an arbitrary number, I would be hard-pressed to find a fantasy-relevant player with so few shots. The final samples included 352 Forwards and 142 Defensemen. When I charted the shot ratios of the samples, they fell rather neatly into some nice normal distributions:
Fig.1: 2011-12 Shot Rate Distribution, Forwards
I’m trying to ensure these charts are readable, still working on that, I’m afraid it’s my design template…but in any case: N = 352, M = 0.74, SD = 0.04.
Fig. 2: 2011-12 Shot Rate Distribution, Defense
N = 142, M = 0.69, SD = 0.04
The bell curve on the forward distribution is very evident, and while the defense distribution is a little choppy, the data are backed up by statistical tests of normality–after removing the low-end outliers, both distributions passed a K-S test (p = .200 for forwards and defense) and a Shapiro-Wilk test (p = .687 for forwards; p = .634 for defense.) Note that these tests are opposite of most significance tests in that small p values indicate significant difference from a normal distribution, while large p values indicate no statistical difference from normal.
So the data are normally distributed, great. What does this mean for us? One relevant way is to look at Z-scores for players, meaning how far away they are from the mean of the sample. Let’s brush up on the properties of a normal distribution quickly:
In a normal distribution, the mean value is at the highest point of the curve, while the standard deviation is a measurement of distance from the mean (either direction). As the image above shows, 68.2% of all data fall between +/- 1 SD from the mean, 95.4% are +/- 2 SD, and 99.7% are +/- 3 SD. A Z-Score is simply a reflection of how many SD the data point is from the mean, with the benefit that it is directional (Z-scores could be something like -1.2, meaning the data point is more than 1 SD lower than the mean, or +1.9, which would be almost 2 SD greater than the mean.)
Finally, let’s look at some specific player data! I’ll have to think of a better way to present a larger list of players, perhaps as the start of next season gets closer I could give a full list in my draft prep kit…for now, if you are wondering about a specific player, just tweet me @Hashtag_Hockey or post a comment here.
Jonathan Toews: 185 S, 35 MS (220 SMS), 0.84 s-rate, Z = 2.50
Zach Parise: 293 S, 68 MS (361 SMS), 0.81 s-rate, Z = 1.75
Olli Jokinen: 223 S, 54 MS (277 SMS), 0.81 s-rate, Z = 1.75
John Tavares:286 S, 80 MS (366 SMS), 0.78 s-rate, Z = 1.00
Rick Nash: 306 S, 87 MS (393 SMS), 0.78 s-rate, Z = 1.00
Steven Stamkos: 303 S, 109 MS (412 SMS), 0.74 s-rate, Z = 0.00
Evgeni Malkin: 339 S, 117 MS (456 SMS), 0.74 s-rate, Z = 0.00
Ryan Kesler: 222 S, 95 MS (317 SMS), 0.70 s-rate, Z = -1.00
Alex Ovechkin: 303 S, 135 MS (438 SMS), 0.69 s-rate, Z = -1.25
Anze Kopitar: 230, 103 MS (333 SMS), 0.69 s-rate, Z = -1.25
Mike Richards: 171 S, 103 SMS (274 SMS), 0.62 s-rate, Z = -3.00
Kyle Quincey: 168 S, 53 MS (221 SMS), 0.76 s-rate, Z = 1.75
Mark Streit: 149 S, 49 MS (198 SMS), 0.75 s-rate, Z = 1.50
Dennis Seidenberg: 174 S, 64 MS (238 SMS), 0.73 s-rate, Z = 1.00
Dan Boyle: 252 S, 98 MS (137 SMS), 0.72 s-rate, Z = 0.75
Zdeno Chara: 224 S, 86 MS (310 SMS), 0.72 s-rate, Z = 0.75
Marek Zidlicky: 70 S, 27 MS (97 SMS), 0.72 s-rate, 0.75
Shea Weber:230 S, 105 MS (335 SMS), 0.69 s-rate, Z = 0.00
Dion Phaneuf: 2202 S, 91 MS (293 SMS), 0.69 s-rate, Z = 0.00
Dustin Byfuglien: 223 S, 128 MS (351 SMS), 0.64 s-rate, Z = -1.25
S-Rate Over Time
Of course, because the Z-score is calculated relative to the sample, the next step is to try to approximate population parameters. I just pulled one season’s worth of data for this study, so going forward I’ll have to see how many years I can put together. Additionally, s-rate seems to be pretty reliable from year to year…
Ovechkin–2011-12: 0.69; 2010-11: 0.70; 2009-10: 0.68; 2008-09: 0.71; 2007-08: 0.69
Toews–2011-12: 0.84; 2010-11: 0.80; 2009-10: 0.79; 2008-09: 0.81; 2007-08: 0.82
Mike Richards–2011-12: 0.62; 2010-11: 0.66; 2009-10: 0.71; 2008-09: 0.70; 2007-08: 0.70
I have not controlled whatsoever for score effects, QualComp, TOI, PP time, or anything else. Part of the reason for that is that I am working from what I have available from the NHL.com portal, so if I can find more data to merge in (I do have some TOI data but I’m struggling with a data cleaning issue) I will continue to expand the scope. However, I think this way has a certain amount of parsimony, which I like. Einstein said, “Make things as simple as possible, but no simpler.”
Separating blocked shots from SOG and MS has a certain logic, but it also seems like using two of three is just not telling the whole story. It would be possible to include all three in some kind of metric using weights to adjust for MS and BS. Particularly for defensemen, their numbers of BS compared to MS and SOG can get really out of hand (Mark Stuart: 60 S, 40 MS, 182 BS). Maybe this kind of metric is only applicable to forwards due to the nature of where they play and from where they take their shots.
Last, I am curious if there is anything to be learned when looking at different “types” or “styles” of player–sniper, playmaker, grinder, etc., or players whom we typically classify as pass first/shoot first, or even combinations such as the ill-fated Nash/Carter combo in Columbus this year.
I am very interested to hear what anyone thinks about these data and this methodology. As I said at the start, it’s exploratory research so I was just messing around a bit, but I am optimistic about some of the tests it passed and reliability it has started to show. Could be an interesting way to look at offensive production…?
Alright, enough maths…I’m going to kill some more demons in Sanctuary…