Thursday, 15 June 2017

Early Season Strength of Schedule

With the major European leagues currently enjoying their summer holidays, it is left to a handful of competitions to provide club based action until early August.

One such league is Brazil's Serie A, a fascinating mix of player and managerial churn, exciting skillful youngsters, paired with former internationals, slowly winding down their illustrious careers and lots of shooting from distance.

Tonight sees the completion of week seven of the twenty team league, so while we have accumulated some new information about the 2017/18 version of teams such as Santos, Sao Paulo, Corinthians and less know sides, such as Gremio and Bahia, that information comes courtesy of an unbalanced schedule.

Prior to week seven, Flamengo had played three of the current bottom four and no side from the top half of the table, whereas Vasco da Gama had faced the current top two and only two sides outside the top ten.

The challenges faced by these two sides were likely to vary in their degree of difficulty,

Delving deeper into each side's most recent games, including matches from 2016/17 may be a more reliable indicator of their respective future prospects, but it is understandable that a six game season to date also invites comment in isolation.

Predicting the future arc of a team's season is always welcome, but celebrating achievement over a shorter time frame, even if some of it has come from a sprinkling of unsustainable randomness also deserves attention.

How can advanced stats and strength of schedule adjustments assist?

It's natural to look firstly at the record of the side in question, but it is their opponents that possess the richest seam of data from 2017/18's fledgling season.

Vasco has played Palmeiras, Bahia, Sport, Fluminese, Corinthians and Gremio prior to last night and in turn each of their opponents has also played five other opponents in addition to Vasco.

Combined, Vasco's opponents have played 36 games, nearly a full season and have played every side in Serie A at least once, bar Corinthians.

We have a ton of accumulated data from goals to expected goals for Vasco's opponents, but only six games of data for Vasco themselves and the same is true for the remaining 19 teams.

It's natural to expect even this limited, if recent achievement does contain some signal relating to future performance and Ben Cronin over at Pinnacle has written this article about the correlations between Premier League position after six games and final position and the FT's John Burn-Murdoch also tweeted this excellent visualisation correlating current league position during the 2013/14 season with finishing position in May.

To adjust for strength of schedule, we might take expected goal differential, rather than league position as the performance related output for each team and utilise the interrelated collateral form lines are created after a few weeks of the season

Team A may not have played team B yet, but they may have played team C, who have played team B.

We are left with 20 simultaneous equations, with a side's opponents on one side and their actual expected goal differential output on the other. Solve these we have new expected goals differentials that more fully represent the difficulty of each team's schedule.

In short, it is the basis for so called power ratings.

Here's how Serie A teams were ranked by expected goals differential prior to week seven and how that ranking changed when we allowed for the sometimes heavily unbalanced schedules played.

Vasco were ranked 13th on expected goal differential, but jumped into the top 10 to 9th when their harsh early schedule was applied.

Ponte Preta dropped four places to 15th in view of an apparently benign group of initial opponents.

In theory this seems fine, but does schedule strength add anything to our knowledge of a side going forward if we choose to limit ourselves to data from just this single season?

As Ben and John have admirably demonstrated, there is a correlation between league position at various stages of the season and finishing position.

Here's a limited (due to workload) example from a previous Premier League season using simply goal differential rather than expected goals.

13 games into the 2013/14 season, Spurs were ranked 13th by goal difference, 10th when strength of previous schedule was applied and 9th in the actual table. They finished 6th.

Their position in the table after 13 games better predicted their finishing spot, followed by strength of schedule adjusted goal difference and lastly actual goal difference.

As a whole though ranked, strength of schedule adjusted goal difference from week 13 did best of the three, producing ranked correlations of 0.77 for league position and actual goal difference after 13 games, but rising to 0.80 when strength of schedule corrections were applied and the teams re ranked after 13 matches each.

In short, there is signal in limited early season data and as a means of predicting final finishing position there may be some improvement if we rank by a schedule adjusted performance indicator.

All Brazilian data from InfAppoGol

Sunday, 11 June 2017

Take On Me

A quick data viz spin through some of the less readily available attacking stats from the 2016/17 Premier League.

Aside from a penalty kick, the take on is the contest in a football game that most directly pits together the attacking and defensive attributes of individuals.

The ability to break apart a defensive structure by beating an opponent in a one on one contest is a hugely valuable asset, particularly if it takes place deep into opposition territory as demonstrated by England's opening goal against Scotland.

Similarly, conceding possession from an attacking move can also leave a side vulnerable to counters.

So who's perpetually trying to be creative in the opposition box and who might leave his side vulnerable to a costly turnover in less advanced areas of the field.

Here's the plots for the Top Six. The left hand side of the plot is closest to the opponent's goal and players who have played few minutes have been omitted.

Data from InfoGolApp

Friday, 9 June 2017

Visualising Premier League Defence

A quick follow up to the last post on the defensive actions of players in the 2016/17 Premier League.

Numerical values, of course are the mainstay of any attempt at a deeper analysis of the defensive side of football, but it is also useful to have a visualisation of the data from which to derive a quick overview and comparison of different players.

The previous post looked to quantify the number of defensive actions particular positions were responsible for and where on the pitch they took place.

This post looks at individual players and both the amount of defensive actions they partake in, corrected to per 90 minutes and also whether these occur closer to their own goal or higher up the field.

Here's the plots for the three main challengers to Chelsea from 2016/17.

The pitch has been split into ten equal portions, sorted by distance to the centre of the defending team's goal line and the volume of defensive actions have been counted in each of the ten sectors.

The right hand end of the spark line plot is the nearest sector to the team's own goal and the vertical line denotes half way.

The plot shows where and how often, either through instruction or necessity, a player is involved in the defensive efforts of his side and who is given free rein to concentrate on other aspects of team play.

All data from @InfoGolApp

Tuesday, 6 June 2017

All For One.....Defensive Lines in the Premier League.

While the attacking side of football was always going to be the focus of advanced analytics it is perhaps surprising that defensive metrics have received such little attention.

Aside from team wide expected goals allowed, more granular defensive metrics have barely progressed beyond mere counting of defensive actions such as tackles and challenges (player on player) and interceptions and clearances (player on ball).

There are exceptions, the universally excellent Colin Trainor here and there are excuses, particularly the scant availability of data relating to defensive actions.

Defence is also more overtly a team responsibility and whereas heroic last ditch tackles do occur and prevent a chance from turning into a shot, it is the overall structure and ability to create pressure on the team in possession that also exerts a great deal of influence.

So off the ball events are likely more important in defining an excellent defence than say decoy runs are to adding information to the attacking process, where shots, headers and key passes are more intuitively useful as an indication of repeatable process.

However, it can still be useful to add descriptive context to the defensive actions that are beginning to become available, such as interceptions, tackles and ball recoveries.

A simple division of how these defensive actions are shared out amongst the different playing positions and where on average on the field these actions are happening may add flesh towhat has previously been dry bones.

There are problems, especially the diversity of team formations, 17 different ones were employed in the 2016/17 Premier League 4231 proving most popular and 3142 the least and the definitive classification of positions also becomes less certain.

We can begin to look at both the share of defensive duties undertaken by a designated position both on average across the league and particularly within a team, along with the average area of the field where these actions occur.

These may then be a useful guide as to where either by choice or force as side defends its goal.

Firstly, here's a summary of the average distance from the centre of the goal where a defensive action occurred for designated positions during the 2016/17 Premier League season.

As you'd expect strikers and attacking players carry out their defensive duties the furthest away from their own goal. defensive midfielders creep closer to their own goal and defenders more so.

Now here's the share of defensive duties undertaken by the most commonly defined playing positions. Again there are no surprises, defensive positions are responsible for the lion's share of the recorded defensive events, but they do set baselines from which we can compare different teams to begin to tease out deviations from the norm.

Here's the average position from a side's own goal where the designated playing positions are taking part in a defensive action.

Usually strikers are involved in the defensive actions that take place highest up the field and central defenders are the group of playing positions who are mixing it nearest to their own goal.

The final column simply subtracts the first distance from the second to hopefully quantify the area within which most of a side's defensive actions are occurring within.

Burnley were the most compressed, defensively in 2016/17, requiring their designated strikers to help out in their own half, on average 42 yards from their own goal, while holding one of the deepest defensive lines in the league just 27 yards from goal, on average.

The majority of Burnley's defensive actions took place in a 15 yard perpendicular distance between these two lines of defensive action.

Leicester's defensive efforts, in contrast were the most spread out, with their strikers contribution spilling out into the opponents half of the pitch and their defence holding the deepest line of the 20 sides.

They perhaps needed a midfielder who could do the work of two.

Liverpool's high press is evident with the average position for defensive actions from their strikers taking place just inside their opponents half of the field and they also contribute the highest proportion of defensive actions in comparison to the attackers from other teams.

Part of this inflated striking defensive contribution will be down to the Reds utilising above average numbers of strikers, but it does seem that being part of such an attacking set up requires a spirited contribution towards the defensive cause as well.

All data is taken from the InfogolApp

Saturday, 3 June 2017

Francesco Totti's Ageing Curve

40 year old Francesco Totti ended his 25 year association with AS Roma when he appeared for the final half hour of last weekend's game with Genoa.

Totti has played over 600 Serie A matches, clocking up over 47,000 minutes of playing time, while scoring 250 league goals, although 71 of those have come from 12 yards and over that period, Roma has enjoyed consistent success, rarely dropping out of the top four positions.

As league careers go, Totti's has therefore been played at a very similar level, where Roma has been regularly amongst the best club sides in Italy and he has largely avoided injury.

Between 1994-95 and 2014-15 he has played at least 1,000 minutes in each and every season, peaking in 2006-07 when he managed 3,034 on the field minutes.

As such he is an ideal subject to see where is performance levels stopped improving and began that inevitable, age related decline, albeit from a very high level.

Quantifying the performance achieved by a players over the course of their careers is problematical. Playing time can often be used as a proxy, but goal output is perhaps the most easily accessible benchmark for an attacking player's current and previous level of play.

Here's the, inevitably noisy plot of how Totti's non penalty goals per 90 have changed from one season to the next over his long career.

The trend line indicates that improvement is replaced by decline when the horizontal axis is breached by the trend and this occurred when Totti was just over 28 & 1/2.

This doesn't of course mean that he suddenly because a poor player, merely that his best years, on average and from a scoring perspective were most likely behind him. Although as he subsequently demonstrated, he was still capable of contributing to Roma, perhaps in a slightly different role.

So footballers are all prey to ageing, although some have such high levels of innate talent that they can, like Totti prolong their time spent at the highest level because their aged talents are still above those peak years of less talented contemporaries.

Which brings us to tonight's champions League final, featuring Ronaldo. A player who has had a more varied league career, spanning Portugal, England and Spain, but judged against his own highest standards, has been himself in decline since just prior to his 28th birthday.

Thursday, 1 June 2017

Charting Liverpool's Expected Goal Surge Under Jurgen Klopp

Everyone with a passing interest in the developing football analytics movement will by now have heard of expected goals.

While far from  perfect, in common with most models, it does do an excellent job of examining the process behind the creation and attempted execution of goal scoring opportunities in a sport, such as football which has relatively few actual scoring events.

Much of the progress in recent years has revolved around improving both the descriptive and predictive qualities of the metric by incorporating firstly the shot type as well as location and also other pre-shot information, such as how the attack developed, often used as a proxy for defensive pressure.

Less attention has been paid to how the values of expected goals are presented for individual sides or players, with often a simple cumulative addition of the expected goals created and conceded being deemed sufficient for individual matches or seasons.

Simulations of each individual attempt using the expected goal value associated with that shot or header is an easy alternative, but this also converts the raw granular data into the different currency of win probability, when used on a single game or expected position or league points won if applied over a larger number of matches.

Retaining information about the distribution of the quality of the chances created, rather than simply taking a summation of the individual elements, is useful because of the way such distributions contribute towards the final range of possible outcomes.

Spreading your cumulative expected goals over a few shots compared to many has a different potential payoff.

In the former, you are foregoing the potential for an occasional bumper score line for the increased likelihood that you may be lucky and good enough to score at least one, which often yields some kind of return in a low score environment.

I first wrote about this here in 2014.

Here's an extreme example.

Would you rather have a penalty kick, with an ExpG value of 0.8 or eight shots, each with an ExpG value of 0.1.

The cumulative ExpG is 0.8 in both cases, but if the range of outcomes were combined in a match scenario, the lone penalty would win 35% of such games and the more frequent, but less likely attempts would win just 28% of the contests despite also summing to 0.8 ExpG.

Therefore, ExpG distribution matters.

Here's the distribution of the ExpG chances created by Brendan Rodgers' and Jurgen Klopp's Liverpool over their most recent 48 game span.

The opportunities have been grouped and counted by increasing ExpG per attempt and compared to the average league for quality and quantity, adjusted to a 48 game sequence.

The majority of chances created by a side has a relatively low expectations of scoring, falling between an expectation of near zero, rising to around a 15% chance.

Attempts with higher ExpG values are much less numerous, ranging up to so call big chances, where historically a team has been more likely to score than not.

Therefore, a secondary axis has been used to produce definition on these much rarer groups of bigger chances.

There's not much between the current Klopp managed Liverpool and the man he replaced, Rodgers in the lowest expectation region of chances created.

Klopp's side is above the average, volume-wise for attempts in the three initial groups that are quantified by the left hand axis, ranging from 0-0.15 expG.

Rodgers edges ahead in the volume of chances created with a grouped ExpG of between 0.2-0.25, the counts for which are shown on the right hand axis.

Once we encounter chances with a likely historical likelihood of 35% or greater, the present Liverpool set up dominates both the league standard and Rodgers' Reds.

No penalty kicks have been included.

Data from @Infogol