The 2014 NHL offseason will go down as the Summer of Analytics after a number of high profile bloggers and stats guys got hired by NHL teams. It’s clear that fancy stats are not going away, and I am seeing a lot more people who are interested in learning about them, so I decided to write up this handy dandy introduction to Fancy Stats for those of you who may be getting into analytics for the first time, or if you have dabbled and want to beef up your chops. I’m not going to get too deep into the maths here, but if you would like to really dive into the underlying research, I’ll provide some resources at the end of this post. I will start with two stats that have funny names, ones that I think Christopher Walken would enjoy talking about, Corsi and WOWY!.
Right now Corsi is the granddaddy of all fancy stats. It has been found to be one of the best stats for predicting future performance, indeed much better than goals or simply shots on goal. Despite being known as an “advanced stat,” Corsi is actually very simple when you get right down to the nuts and bolts. It’s a measurement of all of the (shots on goal + missed shots + blocked shots) that occur during a given time frame, whether that be a game, a month, a season, or a career. Corsi looks at what are thought of as the “three outcomes” of what can happen when the puck leaves a shooter’s stick. Either it can be put on goal, shot wide, or blocked, and those three categories encompass all the shooting-related “events” in a game. It’s called an “on-ice” statistic, meaning that whenever there is a Corsi event in a game, all ten skaters are credited with either a positive or negative for that event depending on which side they are on. If you want to think about it as shot plus/minus, that’s one way to wrap your head around it. Some writers prefer other terms such as “shot differential” instead of Corsi in an effort to try to be more descriptive. There is also a stat called Fenwick which is exactly the same as Corsi but removes blocked shots, which are seen as more or less random events.
So, we use Corsi as a proxy for puck possession because whoever got more Corsi events can be said to be driving play. The idea is that whichever team possesses the puck the most gets the most Corsi events, along with the most quality shots, the most scoring chances…you get the idea. There are arguments against this but to be frank they have all been debunked. Shot quality is still a large debate in the community, but our measurements for that are not the greatest., so until we get some better ways to measure that, we rely on Corsi.
The stat is expressed in a few different ways, but it’s important to remember that we only look at even-strength (5v5) events because we want to only compare apples to apples. When we start to look at special teams and things like that, we introduce a lot more noise, and it makes it harder to find the signal. The easiest way to talk about Corsi is in a percentage, called Corsi For, or simply CF%. It’s just the proportion of Corsi events that the team was responsible for. Example: imagine a game where the Wild tally 60 total Corsi events, and the Canucks tally 40. There were 100 total events, and the Wild got 60, so (60 / 100 = 0.60 or 60%.) When interpreting a CF%, anything over 50% is good, with the upper limits over the long run being about 55-58%…and maybe 60% over the short-term. Minnesota was a 60% CF% team for about the first three weeks of the 2013-14 season, but they came crashing back down to earth after that. The best possession teams in the league (Hawks and Kings) are generally in the upper-fifties over the course of a season.
Corsi is also expressed as a rate statistic, such as CF/60 or CF/20, and it gets a little dicey here so stick with me. We tally up the difference in Corsi events for and against, and express that number in a per-minute rate. This is usually done on the player level, because it allows us to compare players who receive different amounts of ice time. Our example from before at the team level doesn’t really work, so let’s say we were just looking at a single player (how about Kyle Brodziak) and over the course of a few games, he was on the ice for (40 Corsi events for) and (60 Corsi events against.) That comes out to -20, and hypothetically if he played 60 minutes of 5v5 time, his rate Corsi stat would be (-20 / 60 = -0.333) meaning that on average, for every minute of 5v5 time Brodzy is on the ice, he is being outshot by one-third of a Corsi event. We’re getting a little abstract here, and it’s difficult to really write out a detailed explanation without getting technical. If you are having trouble wrapping your head around this, don’t worry. When I was first getting into these stats I struggled for weeks just to understand how to read the numbers. You’ll get there with practice.
Where to find Corsi stats
There used to be a site called ExtraSkater.com that was essentially a one-stop shop for all advanced stats. It was so good in fact, that the Toronto Maple Leafs hired up the site’s creator, Darryl Metcalf, and since he’s working with them now he had to shut down the site. However, there are a number of sites that existed before ES that have the same data. The main ones are Behind the Net, Hockey Reference, and Hockey Analysis. For Behind the Net, mouseover “Statistics” and go to “Player Breakdown” to bring up the main database page. Select from the team, position, and games played dropdown bars (10 GP is a good one) to narrow your search. Click the “Update Results” button and you’re off to the races! For Hockey Analysis, simply select from the dropdown menus to begin browsing the stats. For Hockey Ref, search for a player and then click on the “Additional Stats” tab next to the players regular old boxcar stats. Let’s apply what we have learned and look at some actual, factual data from the MN Wild 2013-14 season.
- Mikko Koivu (+11.15; 51.6%)
- Zach Parise (+10.04; 55.0%)
- Jason Pominville (5.04; 52.7%)
- Nino Niederreiter (0.59; 50.4%)
- Charlie Coyle (-2.66; 48.8%)
- Dany Heatley (-11.26; 44.2%)
- Kyle Brodziak (-12.84; 43.9%)
- Jason Zucker (-15.23; 43.4%)
- Stephane Veilleux (-18.10; 40.5%)
- Cody McCormick (-23.43; 37.7%)
A couple of thoughts here…when you only have four players that have positive Corsi numbers, that means the team as a whole was not great in the possession department. This is something that the Wild need to improve if they ever want to make a deep Stanley Cup run, as we can see when we look at the most successful teams. Check out the numbers for Chicago and Los Angeles and you see quite a difference.
From the above lists, we see that Koivu and Parise were the top dogs last year. When we look at usage charts in a future post, we’ll see that they are heavily deployed in the offensive zone, which enables them to put up some pretty gaudy Corsi stats. It’s very encouraging to see young players like Niederreiter and Coyle on the list as well, but the negative numbers show up in a hurry, and they get downright ugly when we get down to the third- and fourth-liners like Brodziak, Veilleux, and Heatley. Can we talk about Dany Heatley for a second? Setting aside the $7.5M AAV contract, he played 76 games at -11 and change Corsi per 60 minutes. Talk about addition by subtraction…the team will hopefully take a step forward with younger, faster players in his stead. Heatley is the Ducks’ problem now, and knowing his underlying numbers, I cringe when I read articles like this one that wonders if Heater will get a spot next to Getzlaf and Perry on their top line. He was relegated to fourth-line duty at times last year, and I think he might have even been scratched a time or two down the stretch. He just doesn’t have it anymore, and to be frank, he pulls down the possession numbers of his teammates. I don’t want to spill many more pixels on the guy, but suffice to say I think the Wild will be a lot better this year without him. Younger, faster, and hopefully a better possession team.
I talked earlier about how looking at a single player’s Corsi is a bit abstract because it omits all the other players that were on the ice. The next thing we’re going to look at is another stat that Walken would love to talk about: WOWY!
With Or Without You (WOWY)
One way to add some context is to look at WOWY stats, or the Corsi values when two players are on the ice together compared to when they are apart. Everything is measured exactly the same way as described above, but the catch is that we can isolate the time that two guys skated together and drill into the data a little further.
Here is a nice little post I did last season that looked at Mikael Granlund’s WOWY with his most common line mates. What I found is that at the time (November 2013) Mickey was a little bit of a drag on possession, meaning other guys were consistently getting higher Corsi numbers when not playing with him compared to when they were. Granlund is certainly developing his skills to compete in the NHL, and I will re-examine that post this year to see whether and how they changed.
Where to find WOWY stats
To find WOWY stats, go to stats.hockeyanalysis.com and use the dropdown menus to pick a team. Then, when the list of players comes up, click on a player to drill down to his numbers. You can select a single year or multiple years and look at zone-start adjusted (ZS adj) numbers if you want, but I prefer to look at just 5v5. After you select a year or a range of years, a huge table shows up for that player and all other players he skated with. It’s a pretty gnarly table but once you get a sense of what you are looking for, it becomes much more manageable. Let’s look at Granlund again, specifically his WOWY with Jason Pominville. Wild fans will fondly recall the Niederreiter-Granlund-Pominville (aka GraNinoVille) line that scored in bunches for a few weeks last year. See if you can bring up Granlund’s WOWY page yourself, and click here if you get stuck.
Granlund and Pominville are easy to find on the table because they shared a lot of minutes, so Pommer is the second player on the list. In just over 660 even-strength minutes last year, the two put up some very respectable numbers: 0.817 goals scored per 20 minutes compared to 0.454 goals against per 20, and 17.49 corsi events for per 20 minutes, versus 16.13 against (for a cool 52.0% CF%.) Now, keep following the line to the right to see that when they played apart, Pominville performed very well (CF% = 53.5, GF% = 55.2) while Granlund stunk up the joint (CF% = 33.1, GF% 20.0%, both are eye-poppingly bad.) WOWY stats provide another dimension to possession metrics in that they allow us to look at two players together instead of just one. Usually I like to look at WOWY stats to analyze linemates, though it would be really cool to be able to look at three player combinations instead of just two. However, the data output is pretty crazy with just two, so you can imagine how adding a third would make interpreting the data even that much more difficult.
The death of Corsi
I’m going to let you in on a little secret…Corsi is not here to stay. It will be replaced by a better, stronger statistic some day. IT’S THE CIRRRRRRRCLE OF LIIIIIIIFE, er, STAAAAATS. We use Corsi as a proxy for possession, which inherently means that when we get a better measurement, Corsi will become obsolete. You might be familiar with the NBA’s SportVu camera technology. If you haven’t heard of it, basically what they did is took these cameras that were programmed to track intercontinental ballistic missiles and re-jiggered them to be able to recognize basketball players. This immensely powerful tool allows for some seriously cool analysis and visualizations, like this. The NHL doesn’t have the capability to do SportVu right now but they say it’s coming down the pike in five or so years. In the meantime, it’s very possible that someone who is much smarter than you or me will come along and discover something that’s even more predictive than Corsi, and then we’ll move to that. While it’s important to understand why advanced stats are more powerful than the traditional stats we’ve been using, it’s also important to understand that any one metric is not the Alpha and Omega…they are the best we have right now, and we will move on to the next one when it presents itself. We stats people are fickle like that.
Wrapping it all up
Okay, so what did we learn? In a nutshell:
- Corsi is a statistic that examines the “three outcomes” of a shot.
- It is considered a possession proxy because naturally whichever team possesses the puck more, gets more Corsi events.
- Corsi is expressed either as a percent or as a rate statistic over time.
- Mikko Koivu, Zach Parise, and Jason Pominville are the guys who drive possession for the Wild, while Dany Heatley, Kyle Brodziak, and Stephane Veilleux are the ones who have the ice tilted way against them.
- We can look at the Corsi stats of two players together vs. apart, called WOWY.
- Mikael Granlund was a lousy possession player last year when we was apart from Jason Pominville, whereas Pommer performed at a high level when he was with other linemates and Granny did not.
I hope this introduction to Corsi has been useful to you. Again, don’t be discouraged if you don’t understand it all right away. If you have any questions, please feel free to comment on this post, or contact me at hashtaghockey [at] gmail.com or on twitter, @Hashtag_Hockey
In future posts, I will look at deployment statistics like zone starts and quality of competition, in addition to other ridiculously-named stats like PDO and Zip Zop Zippity-Bop (alright, I made up that last one but honestly, based on some other names for fancy stats, you wouldn’t be surprised if that was one.) Thanks for reading!