Utilizing Corsi to Better Analyze PDO by @BrianK_PI - PensInitiative | Pittsburgh Penguins Blog | Rumors | News

The Latest

Post Top Ad

Thursday, October 30, 2014

Utilizing Corsi to Better Analyze PDO by @BrianK_PI

All data used in this post was obtained from Hockey Analysis and Puckalytics


PDO is a statistic that has been used as a measure of luck. It's a team's shooting percentage plus their save percentage, and with each shot either being a goal or a save across the entire league the total equals 100. Now, what makes this an indicator of luck is that theory states that PDO should regress to 100 over a sufficiently large sample size, and that any team above 100 is enjoying good luck that will not last, while any team below 100 is suffering bad look and should see a bounce back.

However, PDO as a stat suffers from a lack of sample size. If more data is present, the stat would be more accurate, though data collection is limited to the events occurring during NHL gameplay. However, it is possible to get more data for use in PDO analysis without having teams play any more additional games. Corsi is a measure of all shot attempts throughout the course of a game, not just those directed on net, and as such there are more Corsi events throughout a game than there are shots on goal. Over the course of a season, the additional events add up and give a larger data set to work with.




Before substituting shots on goal for Corsi attempts, it's important to demonstrate a correlation between the two. First, a comparison of team's shooting percentage (goals scored/ shots on goal) against a team's Corsi shooting percentage (goals scored/ Corsi events). Over the previous 7 seasons, teams had a strong correlation between shooting percentage and Corsi shooting percentage, with an R-squared value of 0.7922. The correlation between PDO and Corsi PDO is even stronger, with 0.8775 for the R-squared value.


After establishing a strong correlation between shots and goal and Corsi events, it's possible to analyze PDO with the insight of a larger sample size. Over the 7 year period analyzed, teams averaged 1829 shots on goal per 82 games; they averaged 3379 Corsi events. The much larger sample size better allows for variance to fade into the background and for true trends to be revealed. In addition to shifting the focus from PDO to Corsi PDO, another simple transformation can help with analysis. Corsi PDO, like regular PDO, is the percentage of a team's successful Corsi events taken plus their percentage of opposing Corsi events prevented from scoring. For those to equal 100, a team's Corsi shooting percentage must equal their opponent's Corsi shooting percentage, meaning that Corsi shooting percentage for divided by Corsi shooting percentage against must equal 1. And by plotting the two against each other, it's clear to see that they're equivalent measures of the same data.


However, Corsi shooting percentage for and against are not equivalent numbers. The idea that PDO is a luck based number that should mysteriously revert to 100 for every team over a sufficient sample size is nothing more than a myth. The values do regress towards 100 over a large sample size, but they don't regress *to* 100. The average is 100, but the good teams find themselves slightly higher, while the bad teams find themselves slightly lower. Constructing confidence intervals around the Corsi shooting percentages for/against is an easy way to see this fact. By calculating confidence intervals for each stat separately and using the corresponding extremes, it's possible to construct a range the true ratio should fall within at a 95% confidence level.


By year 7, as the Corsi events grow higher in number and the confidence intervals grow less and less wide, five teams have the 95% CI for their true for/against Corsi shooting percentage fall either entirely above 100 or entirely below. For these five teams, over this time frame, it's possible to reject the hypothesis that their true for/against Corsi shooting percentage ratio should be 100. If this data was taken further out than 7 years, the ranges would continue to constrict themselves and more teams would likely fall on either side of 100. It's also worth noting here that the Anaheim Ducks were so much more efficient in the offensive zone than their opponents during the 2013-14 season that it was statistically significant. Even with just one year's sample size, they were far enough removed from 100 to claim that their true Corsi shooting percentage was better than their opponents'.


Now, by showing that the ratio for/against of Corsi shooting percentage, and therefore by extension Corsi PDO and regular PDO, has some actual meaning, it's important to demonstrate that it has a clear correlation to team success. Plotting the for/against Corsi shooting percentage ratio against team points returns a weak correlation. However, this shouldn't be unexpected, because efficiency is only one aspect of being a successful team. A team can score on a higher percentage of their attempted shots, but if they're badly outshot they're still going to find themselves with fewer goals in the long run. Therefore, it's important to incorporate possession into the analysis, and this can be done by multiplying the for/against Corsi shooting percentage ratio by the for/against Corsi events ratio to form a statistic we'll call Corsi Driven Analytic (CDA).


As can be seen, a team's Corsi Driven Analytic has a strong correlation to their total points. This correlation can be refined even further by incorporating special teams play and weighting by time. Combining power play and penalty kill time and results into a single variable, and Time-Weighted CDA has a 91.87% correlation with team points. This could be refined even further by taking 4 on 4 play into account (which Puck Analytics and Hockey Analysis don't appear to track), as well as removing the third point effects caused by OT losses and shootouts.


By breaking things down even further, we can see what exactly CDA is really showing us. Breaking out Corsi shooting percentage into its two main components and cancelling out, all that's left of the stat is a simple goals for/against ratio. Winning games, and thereby accruing points, is about scoring more goals than your opponent, and over the long run the larger your goal differential, the more successful your team should be. As can be seen, there are two key components to outscoring your opponents and winning games: efficiency and possession. Setting aside the common PDO myth, teams can be more efficient than their opponents over significant sample sizes, and more efficiently producing goals will give them a leg up on their competition. However, it's not enough to put them over the top, as it's important that they're possessing the puck more and attempting more shots than the other team to maximize their efficiency advantage.


So while PDO as it's currently constructed still can function very well as a luck predictor, in reality it's an efficiency metric suffering from a serious lack of sample size, and when combined with possession it will correlate strongly with team success. And as shown in this article, Corsi is more than some fad statistic that says little of importance; it's a key component of goal differential and essential to icing a successful hockey team.

No comments:

Post a Comment

Post Top Ad