Archive for November, 2012

Recently, an interesting article was published to the NHLNumbers website examining the behavior of skaters and their shooting tendencies, or how much they could be considered a “Puck Hog.” The article, written by Ben Wendorf, proposed an elegant way of measuring how frequently a player opts to shoot the puck.

Simply, the metric looks at the proportion of On-ice Fenwick (A’) that was represented by a specific player’s Fenwick shots (A). This number can be thought of as the Attempted Shot Ratio.

Ben found that this number did not correlate to things like shooting percent or quality of teammates, but that players showed the same tendency year after year. Thus, he concluded, “With high repeatability, and little connection to shooting talent, player development, or teammates, this seems much more like a behavioral activity.”

The article got me thinking about some of my own research that I did way back in the first month or so of writing this blog. I was messing around with players’ shots on goal versus missed shots, and determined what proportion of Fenwick shots were shots on goal (SOG/FEN), which I called Shooting Ratio. I had similar findings—that players could be productive even if they missed the net a lot, and the behavior was stable year after year. Something that stuck out to me during that research was the almost perfectly normal distribution of skaters’ ability to hit the net, so it was not much of a surprise to me when I checked with Ben and found that Attempted Shot Ratios also showed that perfect bell curve that we stat heads swoon over.

After mulling things over, I thought it would be insightful to combine Ben’s “Puck Hog” metric with my missed shot data to provide more of a two-dimensional look at player behavior. Or, simply,

How often do players choose to shoot, and how often do they hit the net?

Method:

I decided to look at just Minnesota Wild forward skaters from 2011-12 for this post, although I am working on a bigger dataset to look at more of a sample. I crunched all the numbers to determine players’ even-strength Attempted Shot Ratios (how often does the player take the shot when he’s on the ice) and Shooting Ratios (what proportion of those Fenwick shots are shots on goal) using the formulas listed above. All computations use 5v5 data.

Population Parameters

After initially just comparing the 11-12 Wild players to each other, I realized that I would be better served using ‘population’ data that came out of Ben’s and my research. Specifically,

Attempted Shot Ratio: Mean = 0.244, SD = 0.05 (N = 2177 NHL forwards, 2007-2012)

Shooting Ratio: Mean = 0.737, SD = 0.074 (N = 492 NHL forwards > 10 GP, 2011-12)

Results:

The tables below list Minnesota forwards with their attempted shot ratio and shooting ratio. The z-scores associated with each are calculated using the population parameters listed above. Due to several players who wore the same number at different times, some numbers have been substituted for dummy numbers (Wellman, Taffe, Veilleux.)

Name AttShRatio ASR-Z Sh Ratio ShR-Z
Almond, Cody (27) .205 -.789 .778 .551
Bouchard, Pierre-Marc (96) .265 .414 .803 .885
Brodziak, Kyle (21) .266 .430 .712 -.338
Bulmer, Brett (19) .233 -.229 .700 -.500
Christensen, Erik (26) .205 -.787 .727 -.131
Clutterbuck, Cal (22) .317 1.462 .686 -.684
Cullen, Matt (7) .229 -.302 .872 1.083
Heatley, Dany (15) .255 .221 .716 -.280
Johnson, Nick (25) .281 .747 .817 1.083
Koivu, Mikko (9) .218 -.513 .748 .144
Latendresse, Guillaume (48) .314 1.391 .703 -.464
Mcintyre, David (34) .214 -.594 .667 -.951
Mcmillan, Carson (45) .215 -0.572 .571 -2.238
Ortmeyer, Jed (41) .383 2.774 .645 -1.241
Palmer, Jarod (79) .265 .426 .846 1.475
Palmieri, Nick (17) .323 1.575 .747 .128
Peters, Warren (43) .273 .581 .662 -1.010
Powe, Darroll (14) .279 .700 .697 -.538
Rau, Chad (36) .256 .248 .700 -0.500
Setoguchi, Devin (10) .284 .796 .780 .578
Taffe, Jeff (97) .286 .834 .750 .176
Veilleux, Stephane (98) .228 -.326 .696 -.559
Wellman, Casey (95) .309 1.302 .706 -.421
Zucker, Jason (16) .345 2.017 .800 .851

When charted, the z-scores can be informative because they fall into one of four quadrants on the chart. Attempted Shot Rate is on the X-axis, and Shooting Rate is on the Y-axis. For lack of better terminology, they are I – shooting inclined + accurate shot; II – shooting inclined + inaccurate shots; III – shooting averse + inaccurate shot; IV – shooting averse + accurate shot. (I really dislike using “accurate/inaccurate” here but I just don’t have a better pair of descriptors right now. If you have suggestions please share!)

*Click the chart to view the full-size version.

Discussion:

The first thing that jumps out to me is that the majority of players fall into quadrants I and II…so it’s a team full of puck hogs? Apparently so. Recall that the ASR data here are compared to the five-year population parameters, so the Wild have a lot of guys who choose to shoot the puck. I am trying to avoid value judgements here since a higher value ASR doesn’t necessarily represent “better hockey,” just a different playstyle. On the other hand, Sh Ratio is less subjective–making the goalie make a save is almost always better than missing the net. I know some times a player will shoot wide on purpose to try to make a play, but for now I am using the assumption that SOG > missed shot.

So the Wild had guys who were looking for shots but they had the fewest shots per game and goals per game. What to make of this? I wouldn’t necessarily expect a team-wide pattern to emerge when looking at this kind of plot, but it would seem to make sense that having more of a balance between shooters and set-up guys would produce more quality scoring chances. I would also think that a team distribution like this would indicate that the Wild players were predictable–the opponents probably figured out quickly that guys would be looking to shoot instead of making the extra pass, so perhaps that predictability suppressed their shot totals by making them easier to read. I’m just spitballing here, but if the Wild ranked high in a stat that measured blocked shots against, there might be something to this theory. But honestly at this stage in the research I’m still just digesting the numbers so again, if you have thoughts let’s hear them!

At the individual level, it is important to keep in mind that there isn’t really a “good play vs bad play” distinction here, it’s all just a measurement of behavior. Because we know that both ASR and ShR are normally distributed, it makes the indivudal results more interpretable, but a player who has a high attempted shot ratio isn’t necessarily doing his team any favors if he consistently misses the net more than his teammates. I do like the quadrant setup because it helps classify players. Matt Cullen doesn’t shoot much anymore, but when he does pull the trigger he hits the net more than any of his teammates. Mikko Koivu, known as a playmaker, prefers to make an extra pass rather than shoot, and is about average at hitting the net. Pierre-Marc Bouchard, another playmaker, actually shoots the puck more than I would have guessed, and is an accurate shot. Nick Johnson, despite having a strong tendency for the uncontrolled zone entry, pulls the trigger often and makes the goalie make a save. Dany Heatley is as close to zero on both scales as any player, which is probably a surprise to most of us. Cal Clutterbuck and Guillaume Latendresse have a strong tendency to shoot, but are somewhat wild.

In the end, it is my hope that this kind of multidimensional shooting analysis can be helpful in our understanding of players’ tendencies. This kind of stuff can’t be used in isolation, but rather it all works together to paint a picture and gives us a peek inside the head of the players we enjoy watching. The game of hockey is just a series of decisions, nearly all of which must be made at a speed where there is no time for conscious thought but instead are instinctual. I will be excited to continue this analysis to expand to different teams and different years. If you have questions or comments I would love to hear them. You can comment here on this page or though Twitter, @Hashtag_Hockey. Thanks for reading!

After correctly predicting 49(!) states in the 2008 presidential election and 50(!!) in 2012, Nate Silver is the King of Stat Nerds. He sits on a throne made entirely of TI-83 graphing calculators and he wears a gold-played Casio calculator wristwatch. His fivethirtyeight blog was recently pulling in a staggering 20% of the New York Times web traffic, and while the rest of the pundits were reporting a dead heat in the polls, Silver’s model ended up at more than 90% for Obama leading up to election night.

Silver cut his teeth in baseball sabermetrics before moving to politics, and the backlash he faced in recent months strongly parallels the anti-stats sentiments that are still going strong in the sports world. But now Silver has been completely legitimized, and his recently released book (currently number 17 on Amazon and rising) is going to propel this whole “math” thing into the mainstream conversation, the likes of which we haven’t seen since Freakonomics and Moneyball.

Furthermore, the concept of “big data” will gain steam as the stories are written in the coming weeks about how the Obama administration campaigned more efficiently with less money and mopped the floor with the Romney camp. The zeitgeist is changing, and the legitimacy of data mining, reliance on sophisticated computer models, and quantitative analysis in general is going to become much more accepted in all parts of society. Fancy stats are certainly nothing new in sports, but I genuinely think that with Silver’s domination of the political prediction game, sports stats can hang on for the ride. I wanted to share some of my thoughts on the recent goings on because I really think we are at a crossroads with the way big data and stats are going to be received.

OBJECTIVITY: The Numbers Don’t Care Who Wins

Try to imagine for a minute what would have happened if the election was flipped–if Romney was leading in Silver’s model going into the election and then won. The reception in the media surely would have been different, but I wonder if the classic “I don’t like your numbers, therefore they’re untrue” argument would have been as obvious, or if the ad hominem attacks would have flown so freely.

Silver faced so much backlash because his forecast directly flew in the face of the pundits, and not just the Republicans. The narrative coming from talking heads on both sides of the aisle was that the election was “Razor Tight” (their words, not mine) and right up to November 6th, they were saying it was anybody’s election. Of course, it was in their interest to push that narrative, as they all have airtime to fill and quotas to fill for their magazines or newspapers or blogs. The fact that Silver was making an objective forecast was upsetting because it was so opposite of the status quo. The fivethirtyeight model simply input poll results, weighted them appropriately, and output a prediction of what the electoral votes would be. Silver didn’t set out to create a model that would show Obama was winning…he set out to create a model that would reflect the truth. 

Everything in the above paragraph is directly applicable to sports writers and sports researchers. The Old School is mad at the New School because they have made their living on their experience and their “gut calls” and now those things are being invalidated by numerical models. It has been said that the difference between researchers and pundits is that one starts with a question and constructs an argument based around the answers to that question, and one starts with an argument and searches for facts that support the argument. The sports and political worlds both have become so reliant on pundits who until recently have been using just the most basic of stats that this whole “objectivity” thing is still very new, and therefore, scary.

But over time, both political stats and advanced sports stats will gain legitimacy as they persist in the mainstream consciousness. And as long as they are good stats, they will persist (more on this in a bit). And obviously Silver is CRUSHING IT for his part, so there’s no reason to think he’s going anywhere.

TRANSPARENCY: Hey, come over here and look at what’s behind this curtain!

With the Old School way of doing things, the pundits don’t have to be concerned with transparency…they say what they think, they explain it, and boom, there’s your transparency right there. Here’s my opinion, and by definition it’s unfalsifiable so…we’ll just keep going round and round because there is no right or wrong, it’s just all conjecture. With the New School data-driven approach, there is now a need for transparency that didn’t exist before. Without transparency, it’s all too easy to try to discredit stats by calling them biased (see above section on Objectivity.)

Sports researchers (and Nate Silver) walk a fine line when it comes to the transparency of data and formulas (formulae?). They must reveal enough about their methodology so that others can understand what they did, other researchers can help develop the measures, and laypeople can get what the numbers mean. But let’s be really real right now, the way to make money off these kind of things is to withhold enough of the nuts and bolts so that nobody else can figure them out. Silver was incorrectly slammed for tinkering too much with his machine, oversampling Democrats for example, or generally just using his “Magical Formula.” But he does in fact go into great detail about his methodology. I guess what it boils down to is that research should be as transparent as is necessary, but each person will have different levels of comfort when it comes to divulging all the secrets. For Silver’s part, I have been impressed with how open he has been and how much of the nitty gritty he gets into.

It Helps (A Lot) to be Right

It’s hard to imagine Silver’s model being wrong, given that his forecast reached over 90% for Obama. If it was in fact a closer election and it was, say 55/45 then there would have been a much different discussion surrounding his forecast. But over the last two presidential elections, he is batting .990, which is obviously a stellar showing. The nature of Silver’s forecast is such that he puts himself out on quite a limb, predicting each state and the overall winner. Yes, he does in fact give probabilities for each state so if he was wrong he would be able to defend the forecast. But he’s NOT wrong, which is very good for him.

In hockey research I keep coming back to the whole Minnesota Wild thing from the 2011-12 season. Long story short, when the Wild were number one in the NHL, the advanced stats community was vehement about their falling back to earth, based mostly around the PDO statistic. And they were RIGHT, the Wild did regress in a big way. The stats crowd will always have that one in their pocket, and if we ever get a season again, that stat will have a very established place in the stats conversation, not just on the blogs and on Twitter, but I think with the legitimacy gained from last year, it will creep into the mainstream conversation. Just as Silver’s model has been shown to be profoundly correct, the PDO analysis has proven correct and useful.

Final Thoughts

The quote from Moneyball that stuck with me the most (I can’t remember if it’s in the book too) is: “The first guy through the wall always gets bloody.” Nate Silver took a lot of shots in the media this year, but he has come through the other side squeaky clean. His forecast has been amazingly accurate, and I think it’s going to go a long way toward the legitimization of Big Data and statistical analysis not just in politics, but in sports and all other areas of society. I have been very impressed with the way Silver handles his work, and conducts himself in public and in the media. I hope these ramblings have made some kind of sense, I am still forming what I think are the lessons to be learned from his unprecedented success. Let me know what you think about whether and how the fivethirtyeight model’s success will help usher in the Big Data movement and what lessons the sports stats community can learn. Don’t forget to follow me on Twitter @Hashtag_Hockey, and thanks so much for reading!