Friday, 9 December 2016

Does Save Percentage Tell You Anything About A Keeper?

Back in the late 90's, on Usenet and a floppy disk enabled IBM Thinkpad, beyond the reach of the even the Wayback Machine, football stats were taking a giant lurch forward.

Where there had once been only goals, shots and saves had arrived.

Now, not only could we build multiple regression models where many of the predictor variables were highly correlated, we could also look at conversion rates for strikers and save rates for keepers.

In an era when the short term ruled, a keeper such as David Seaman was world class in terms of save percentage until he conceded more goals in a game at Derby than he had conceded in the previous five Premier League games. And then, as now, MotD's expecterts went to town on a singular performance.

Sample size and random variation wasn't high on the list of football related topics in 1997, but it was apparent to some that what you saw in terms of save percentage might not be what you'd get in the future.

You needed a bigger sample size of shots faced by a keeper and you also needed to regress that rate towards the league average.

This didn't turn save percentage into a killer stat, but it did make the curse of the streaky 90% save percentage more understandable when it inevitably tanked to more mundane levels.

 Spot the interloper from the future.

Fast forward and now model based keeper analysis can extend to shot type, location and even include those devilish deflections that confound the best.

However, for some, save percentage remains the most accessible way to convey information about a particular keeper.

This week was a good example.

It may not be a cutting edge approach to evaluating a keeper, but for many, if not most, it is as deep as they wish to delve.

So what 1990's rules of thumb can be applied to the basic save currency of the current crop of keepers,

We know that the save percentages of this season's keepers is not the product of equally talented keepers facing equally difficult shots because the spread of the save percentages of each keeper is wider than if this was the case.

Equally, random variation is one component of the observed save percentages and small sample sizes are prone to producing extremes simple by chance.

If you want a keeper's raw save percentage to better reflect what may occur in the future regress his actual percentage in line with the following table.

Stoke's (or rather Derby's) Lee Grant has faced 31 shots on goal and saved 26 and his raw efficiency is 0.839. League average is running at 0.66.

Regress his actual rate by 70% as he's faced around 30 goal attempts, so (0.839 *(1-0.7)) + (0.7 * 0.66) = 0.713

Your better guess of Grant's immediate future, based on his single season to date is that his 0.839 save percentage from 31 shots may see him save 71% of the shots he faces, without any age related decline factored in.

He's still ranked first this season, but he's really close to a scrum of other keepers with similarly regressed rates. Ranking players instead of discussing the actual numbers is often strawman territory, anyway.

There's nothing wrong with using simple data, but you owe it to your article and audience to do the best with that data.

Raw save rates from one season are better predictors of actual save rates in the following season in just 30% of examples. 70% of the time your get a more accurate validation of your conclusion through your closeness to future events if you go the extra yard and regress the raw data.

At least Party Like it's 2016, not 1999.

Wednesday, 30 November 2016

Was Aguero Quite So Lucky in 2015/16?

By now, expected goals needs very little introduction.

It attempts to quantify the importance of pre-shot variables in determining the likelihood that a goal will be scored. In essence it is a measure of chance quality and is largely determined by such things as shot type and location.

The majority of models output the likelihood that an average Premier League player would score from a given position and shot type. By aggregating the individual expected goals for each attempt and comparing this to a player's actual output we can broadly suggest the level of under or over performance.

Here's how the two 2015/16 leading non penalty scorers fared compared to the aggregated total of their expected goals,

Both over-performed,

Aguero more so than Kane, but we can better visualise this disconnect by simulating each of the 111 non penalty attempts taken by Aguero to see the range of season long goal totals predicted by the model.

There's around an 8% chance that the average player model would equal or better Aguero's 20 non penalty goals from his 111 chances in 2015/16.

Thereafter the interpretation becomes more subjective.

We may assume presumptuously that the model is perfect and Aguero was merely lucky.

281 individual players tried to score in 2015/16, so that's alot of individual trials and someone is likely to over perform to the level that Aguero did.

This suggests that he may subsequently enjoy more normal levels of luck and his performance may be less extreme in the future.

Or we might prefer that Aguero's 20 goals is partly driven by luck, but it also contains an element of skill in finishing chances that exceeds that granted to the average player whose out of sample data went into producing the model.

As suggested by the title of the above graph, we can produce a second expected goals model that while not explicitly tailored to Aguero's (potential) finishing prowess, does contain elements that may act as a proxy for elusive finishing ability.


If we now simulate Aguero's 111 chances, but using a model that incorporates statistically significant variables that "may" relate to finishing skill, he becomes less "lucky". His 20 goals are now much less unlikely. The new model predicts he would score 20 or more in nearly 40% of seasons.

Overall, this new set of variables (I can't be more specific, sorry) inflates the individual expected goals values of players, such as Aguero and Kane who possess the new variable and reduces the the figures for those who don't.

Overall a model that allows for a differential in finishing abilities across all players that attempt to score in a typical season reduces such indicators as the rmse in out of sample data.

Under a model that includes a proxy term for finishing skill, Aguero only scores 1 more goal than predicted in out of sample data from 2015/16 and Kane scores exactly the number predicted by the model.

Perhaps more importantly Aguero's 2015/16 is a substantially better goodness of fit at the individual attempt level under the second model compared to the first.

Tuesday, 22 November 2016

Burnley's Unsustainable Survival Technique.

Monday night's live game pitted two of the Premier League's more dour sides against each other.

WBA is the magnificent Tony Pulis' current port of call, where they are the recipients of his exclusive brand of pundit flummoxing, survival techniques.

Meanwhile, Burnley are getting by on a meagre 0.8 expected goals per game. They are conceding an average of 2.1 expected goals per game and through the grace of the probabilistic gods, actually allowing just 1.4 real goals.

That's not a Pulis approved survival approach, at least in the long term, but it has given Sean Dyche's side a few notable results.

Top of the tree of upsets was Burnley's 2-0 early season win at home to Liverpool, where Dyche tired out his opponents, not by engaging them in a presssing foot race, but by nicking an early lead and then handing them dozens of goal attempts.

All of which they missed.

The blueprint of being overwhelmed, but showcasing the England credentials of your defence, was wheeled out again at Old Trafford for the approval of Jose. And while Burnley didn't quite manage to nick a goal here, they did keep their goal intact for a welcome point.

Sandwiched in between was another expected goals beating at the hands of a top six contender where the reality better reflected the distribution of the quality and quantity of chances created in the game.

Chelsea's invite left Burnley nursing a 3-0 loss.

On the surface Burnley had made a comfortable start to their renewed acquaintance with the Premier League. "they look far better equipped for survival this time around, sitting comfortably in 9th place"  might have been something that was written about the Clarets prior to Monday's game.

But scratch beneath the media soundbites and Burnley's well being is supported by a large helping of unsustainable variance.

Hats off to the 14 Burnley players who withstood the battering from an 11 and then ten man Manchester United in late October, but simulate the exercise 1,000's of times and a United win is by far the most likely outcome of the three possible results.

Simulate all 120 matches, along with the multitude of possible tables, 1,000's of times and Burnley's most likely current position is.....bottom. Rather than the more comfortable 9th they occupied prior to match week 12.

Of course, points already won are kept, no matter how ill gotten or deserving and should Burnley continue their idiosyncratic survival process, coupled with their recent showing in the Championship, they probably won't finish in their current expected position of bottom in May.

They'll most probably finish 19th.

If you want to check out all of Burnley's shot maps, along with all Premier League games for the last three seasons, download the free Infogol app