What should the Premier League standings really look like?

Almost a third of the way into the season, the Premier League standings almost looks like a gambler’s nightmare. Many teams are nowhere near where they are supposed to be, and every week has given us surprises.

For instance, most of us expected Sheffield United to be relegated without much of a fight, and not a single living soul would have put money on them to finish in the Top 10. Yet, here they are, in 5th. Leicester have also surprised quite a few and are making us feel like it’s 2016 all over again.

Meanwhile, Arsenal, Manchester United and Tottenham have all looked so bad it makes you wonder how long their respective coaches will last.

So what to make of all this? Since points in the standings don’t always show the real story of how a team is really playing (especially so early in the season), I tried to create a model to predict where each team should really be. My idea was not too distant from the Expected goals measure (xG), except that I tried to look not only at the shots generated, but also at the way the team plays .

In order to do so, I used data from the last three full Premier League seasons, dating back to 2016-2017 season. For every team, I looked at a variety of advanced statistics (16 metrics in total), such as the number of shots taken from inside the box, the number of counter attacks allowed per game, the number of long passes, etc. (all stats were available in a Kaggle dataset, with the link available at the end). The goal here was to predict where a team should finish in the standings, based on how it plays. By looking at past data, I built a profile of what a team in a certain position in the table should look like. Indeed, for instance, the better teams will tend to have more of the ball, create more quality shots and allow fewer quality chances.

The model was built using a simple linear regression. Here is what the standings should look like after matchweek 12, according to the model.

A predicted position of 3.5 means that a team is playing slightly better than a 4th position team has historically, but slightly worse than a 3rd position team.

The first thing that really catches the eye is that both Manchester City and Liverpool have negative predicted positions. That simply means that they have been so dominant that they are playing even better than a first place team typicaly has! They truly are in a league of their own. Chelsea are the only other side playing like a legitimate Top 4 team, with Leicester not too far behind.

Another shocking thing here is that Arsenal and Manchester United should actually consider themselves lucky, because according to the model, they have been playing like mid-tables team, and nothing better than that. Meanwhile, Tottenham’s struggles seem very real. They are 14th, but the real numbers don’t suggest they should be much higher. An actual look at their statistics shows that they are the team allowing the most shots from inside the penalty area in the entire league, at 2.6 per game! They have also struggled with creating high quality chances, with only 0.7 shots from inside the box created per game (17th in the Premier League). Tough times for the Spurs!

Without surprise, Sheffield United should consider themselves lucky to be in 5th (yes, 5th!!!). The numbers suggest they will come back down to earth soon, and any rational fan would have said the same if asked.

The season is still young, and a lot can change. It will be interesting to see who can get back to where they belong, who can right the ship and who can avoid the seemingly inevitable downfall!

Kaggle dataset used to build the model: https://www.kaggle.com/englader/epldataset

2 thoughts on “What should the Premier League standings really look like?

Add yours

  1. Nice effort with all the data but don’t forget that Leicester won the Premier League with what… 29% possession?

    I’m a cricket fan so love stats but do think that they can be misleading. In football they might celebrate that someone has covered the most grass but maybe they did so because they were out of position or making misguided runs.


    1. You are right that stats don’t always show the whole picture! That must always be taken into account! The idea is always to use them to find patterns/trends, not the ultimate truth!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: