XG is a terrible stat

When I saw Antony was on that end of that chance, my xG was 0.05%. I'd say my xOG was higher, at 0.06%.
 
It’s a good stat that can be (and regularly is) used in assessing individual and collective performance. And it’s usually more telling than actual goals scored due to how low-scoring of a game football is.

Just don’t think of it as a faultless model, no statistical model (currently, at least) is complicated enough to give you an objective and full picture of what’s happening. It works pretty well though, especially over the longer periods.

In fact, the Antony chance shows that they’re perhaps more realistic in your assessment than you are… it doesn’t make his attempt any less criminal but surely it’s counterintuitive to blame the model that actually predicted the miss that happened in being unrealistic?
If I have a model that is supposed to get the value of 2+2 and if that model randomly picks number between 2 and 6, eventually over a longer sample I will get approximately correct number added up. Right? However the model is still sh*t. You may claim that model accurately predicted Antony would miss, but it also predicted that Amad would miss all three of his goals, including the last one that was an open net, with no pressure.

You are not the first to claim that model is working over a longer period, but do you have any proof of that? How long do the periods need to be for it to be reliable?
 
My issue with it is that more teams should be outperforming it. If they aren’t, it’s not a real average. If you look at the stats here:

https://understat.com/league/EPL

Only three teams have outperformed it this season, Wolves, Brentford and Aston Villa. Surely the teams with the best players should be performing above average? It’s bollocks.
 
The only thing it's useful for is when you lose but has a bigger xG, you can then pretend as if you didn't actually lose the game to make you feel better. Like Liverpool game where we lost 7-0. "But there XG is very low though".
 
If I have a model that is supposed to get the value of 2+2 and if that model randomly picks number between 2 and 6, eventually over a longer sample I will get approximately correct number added up. Right? However the model is still sh*t. You may claim that model accurately predicted Antony would miss, but it also predicted that Amad would miss all three of his goals, including the last one that was an open net, with no pressure.

You are not the first to claim that model is working over a longer period, but do you have any proof of that? How long do the periods need to be for it to be reliable?
I mean, what proof do you require? There’s tons of it and there are many scientific articles on the matter that prove usefulness of xG as a prediction metric. 10 seconds on google and you’re sorted.

It’s not a coincidence that clubs themselves and professional analysts use it in their work. The important thing, as is with any stat, not to forget that it’s just one of the metrics that doesn’t show you everything there needs to be known about the football game.
 
If I have a model that is supposed to get the value of 2+2 and if that model randomly picks number between 2 and 6, eventually over a longer sample I will get approximately correct number added up. Right? However the model is still sh*t. You may claim that model accurately predicted Antony would miss, but it also predicted that Amad would miss all three of his goals, including the last one that was an open net, with no pressure.

You are not the first to claim that model is working over a longer period, but do you have any proof of that? How long do the periods need to be for it to be reliable?

It's been proven countless times over by countless people, because that data to do so is readily available. For one example, this based on approx 17k games from Europe's big 5 leagues and MLS:

image2.png
 
My issue with it is that more teams should be outperforming it. If they aren’t, it’s not a real average. If you look at the stats here:

https://understat.com/league/EPL

Only three teams have outperformed it this season, Wolves, Brentford and Aston Villa. Surely the teams with the best players should be performing above average? It’s bollocks.
The other side of this coin is that the best teams will create more chances and have a higher chance of missing goal opportunities.
 
My issue with it is that more teams should be outperforming it. If they aren’t, it’s not a real average. If you look at the stats here:

https://understat.com/league/EPL

Only three teams have outperformed it this season, Wolves, Brentford and Aston Villa. Surely the teams with the best players should be performing above average? It’s bollocks.
It’s a bit more complicated than that.

https://arxiv.org/pdf/2401.09940

There’s an overrepresentation of shots by good finishers in the data which skews the metric. Obviously it’s a gross oversimplification of a 23 pages long science paper but hey ho.
 
The other side of this coin is that the best teams will create more chances and have a higher chance of missing goal opportunities.
But if your forwards are better they should be more likely to finish a chance than a Wolves player. It’s nowhere near an average and the best teams/players should be regularly outperforming it.
 
Individual chances = xG is an almost useless metric

Chances accumulated in a match = xG is OK, but you still get bullshit results on a weekly basis so it shouldn't really be used in discussions

Chances accumulated over a whole season = xG is very good metric
 
First let me say, I love stats, and I love to measure as much as possible with stats, however finishing and particularly XG is currently terrible stat and shows very little.

Why I'm saying this, well Antony's chance was yesterday evaluated as 0.42. It was an open goal from 4 yards.

Diallo's goals are all evaluated as 0.33, 0.33 and 0.39. Actually, first two goals are quite difficult finishes, the first one from a tight corner on his weaker foot. The second one from a lobbed ball first time finish. I mean they are good chances, but certainly not trivial. However the last goal that is evaluated at 0.39???? Are you telling me there is a 60% chance that a professional footballer will miss an open goal under no pressure and with no sight of goalkeeper.

Are you telling me seriously that Matheus Fernandes chance is more likely to score than Amad his last or Antony? And that chance, despite him being surrounded with two united players and a goalkeeper straight ahead of him is apparently the best chance of game. Really? Really?

Source of XG values:
https://understat.com/match/26811

What this shows is the model (at least understat's) is flawed (probably too simple) and therefor can not be relied upon to provide actual information about quality of chances over a match. That also implies it is unreliable over the course of a season.

Yeah that's ridiculous. Absolutely no way that Antony's miss only gets scored 42% of the time.

This chance goes down as the type of 6 yard cross ball that players often slide in for and just miss - doubt the algorithm is advanced enough to understand that Antony is bloody useless and had no need to slide at all

xG is not perfect but it's still a useful snapshot

Big issue is that there are different xG numbers nowadays - Undestat and Opta can often be significantly different
 
Individual chances = xG is an almost useless metric

Chances accumulated in a match = xG is OK, but you still get bullshit results on a weekly basis so it shouldn't really be used in discussions

Chances accumulated over a whole season = xG is very good metric

I fail to understand given how long it's been around how some don't get this yet. If you want to evaluate an individual chance then use your fecking eyes because that's not what xG provides.

People seem to think xG should be higher for most shots but they forget it depends on the pass in, defenders position, keepers position. Lots of variables can result in a shot being easy or difficult and xG covers the range not the specifics.
 
Doesn’t really change my mind. Salah has underperformed his xG three seasons running. Is he a below average finisher?

Salah tends to score similarly to his xG, which makes sense. He scores a lot of goals because he's very good at getting into goal scoring opportunities, the hardest thing to do for attackers. I've never thought of him as someone especially good at finishing the chances he gets compared to other players.

Finishing is a pretty basic skill compared to other things footballers do, so when top level athletes practice it a lot you wouldn't expect much variance.
 
I dont find it all that useful to judge a player. But for an entire team combined with other stats to get the full picture of a performance, I think its quite useful.
 
It's been proven countless times over by countless people, because that data to do so is readily available. For one example, this based on approx 17k games from Europe's big 5 leagues and MLS:

image2.png
It's a general problem with statistics. People see statistics that are based on averages and compare them to outliers and proclaim that statistics are useless.

And your graphs are too woke anyways, no one is going to belief you gay agenda rainbow graphs. /s
 
xG works both in the sense that teams and players over time tend to score a similar amount of goals to xG created, and in the sense that xG created is the best predictor we have of how many goals players and teams will score in the future.
I remember playing around with these numbers once for some PL seasons and iirc the extra predictive value was not that large.
It's a general problem with statistics. People see statistics that are based on averages and compare them to outliers and proclaim that statistics are useless.
Well the thing is that in the case of xG most people don't care about broad averages, but about particulars, because we want to use the data to analyze specific teams in specific seasons.
 
I have been Googling whether clubs use xG as a metric in analysing their own players performance or transfer targets and can't find anything. That to me suggests it's not considered much.
 
I have been Googling whether clubs use xG as a metric in analysing their own players performance or transfer targets and can't find anything. That to me suggests it's not considered much.

In Ian Graham (the Liverpool data guy)'s book he noted that the rule of thumb they used for predictive modelling was 70% xG and 30% actual goals. He also talks about the different versions of the xG models they used.

Given they were one of the leaders in football analytics at time, it's safe to say it is still very much used by football clubs.
 
Its a useful but simplistic stat and the idea of it is to be a general statistic. People seem to take it as some set in stone commentary on a chance rather than a star that gives a rough indication of chances taken (or not) during a game or over many games.

If it was as shit as many people like to suggest it would be wildly inaccurate and yet over the course of the season its shockingly accurate.

Its not massively accurate for single instanced because it can't be. How many defenders were there in front of the player. Was it an open goal. Was the player on their favoured foot. Was the ball in the air or on the ground. Shin height, waist height? Was the ball rolled to them gently, were they under pressure etc etc etc.

Its a statistical model. Overall its very good with a large data set. Its like averages. We all accept that the chance of flipping a coin and getting heads is 50:50. You could flip a coin 10 times and get 8 heads and 2 tails. That doesn't mean the odds are wrong, it just means you're trying to apply statistics based on a large data set to an isolated and small dataset that will obviously not adhere to it that well until you have a lot more data points.
 
One of the issues with XG is that if you are under the cosh for most of the game and get a dodgy penalty it will give your XG a massive boost and won't necssarily be an accurate reflection of the chances you created.

I could be wrong but that's my understanding of it.
 
It's always struggled with those empty net/goalkeeper out of position chances. As far as I know it doesn't take that into account, instead it more looks at the average shot taken from that position and most of the time the goalie and/or defenders are going to be there.

There was a goal I remember Jamie Vardy scoring a few seasons back, was rated as less than a penalty when it was an empty net from closer range. 1 minute in on here.


By and large I find it alright.
I wonder do clubs actually use xG to drive decisions, does anyone have any information on this? For me personally, I don't pay much attention to it. I feel like it nevers reflects a result in a game correctly. Missing great chances and scoring improbable chances has always been a part of the game.

100% they do. It's a big part of why teams have taken fewer and fewer long range shots over the past deacde or so. Clubs have taken the belief that you're better off holding on to the ball and keep posession rather than taking a shot with a 3% chance of going in or whatever, hoping to fashion a better chance if they just keep the ball. They might lose it too, but if you can fashion 1 20% of chance of scoring for 5 x 3% shooting chances you turn down you're doing better(ish). It's a bit more complicated than that but it's the theory.

Clubs also use it as part of their scouting tools. Not as the be all and end all, but if lower down as a way to produce shortlists if you can't afford to have a huge scouting network, or if you do have a nice scouting network then it can be the deciding factor between 2 forwards they like the look of.

Brentford and Brighton were leaders with it, not just xG but a load of other data. Boiling it down to just xG woud be too simple but it certainly played it's part and it's helped them uncover lots of bargain players that have done well for them, some they've been able to sell for huge profits. Their owners come from gambling backgrounds where they first used it to win money and get rich, then used it to on the other side of the fence as bookmakers and now are putting it into practice as owners.

Salah was famously signed when Liverpool's data team loved him when Klopp preferred some one else.

Someone needs to post what the xg was over the entire PL season vs how many goals were scored (I cba)

If those figures match up, then it'll be fair to say that xg is a pretty reliable model. Just not in isolated incidents, it's more about averages

Well they'll never be perfect. Some seasons they're over, some seasons they're under. Some teams will perform under or over one year then the oppositie the next, same for players.

It's a long run thing. One season from one league is short term even if we don't think of it as that.
 
Last edited:
I think the issue with xG in online discussion is that people tend to ignore variance, because they underestimate the number of samples you need to reduce the xG-G difference to near zero.

I've posted this one before:
Assuming a perfect model without bias and measurement error, cumulative goals will lie within a +/- 20% range around cumulative xG after one season (with 95% confidence). Variation within this range is therefore not necessarily out- or underperformance of xG, but can simply be driven by the natural variation of Bernoulli random variables. Similar rules are +/- 50% for 10 games and +/- 33% for half a season.
That's why I said earlier that xG is useful but not as useful as people want it to be: you need the gap between xG and goals to be quite large to definitively rule out 'luck.' And if it's that large you can probably see it in other ways.
 
The other issue I have with xG is that I see two separate claims pop up (tbf, not always made by the same people):

Claim 1: xG is highly predictive of the goals a teams scores.
Claim 2: There is such a thing as "good finishing" and the best finishers should overperform their xG and the worst ones should underperform it.

The two claims are fine on their own. The issue is that if good finishing exists, then that is a hidden variable that is not accounted for in the xG model, so the stronger it is, the weaker the predictive value of the model should be. And you would need to find a separate way to quantify this hidden variable and see if it matches the gap between xG and G that you are detecting.
 
Salah tends to score similarly to his xG, which makes sense. He scores a lot of goals because he's very good at getting into goal scoring opportunities, the hardest thing to do for attackers. I've never thought of him as someone especially good at finishing the chances he gets compared to other players.

Finishing is a pretty basic skill compared to other things footballers do, so when top level athletes practice it a lot you wouldn't expect much variance.
It doesn’t make sense if you’re a good finisher and it was a real average.

Finishing is absolutely not a basic skill.
 
The good old days where some stats are displayed when the ball is out of play or during half time. Among those you normally want to look at two things: shots and possession. And while they didn't paint the whole picture, you looked at those at a glance and could already tell how the match is going. They're very good stats to represent how well a team does. None of these predictive bullshit that needs to be average out over the course of season for it to be quite accurate - if thats the case why do people (including football analysts) keep bringing it up to justify bad performance or judge players finishing then? if that's the purpose then talk about it when the season ends but by then it'd be quite useless because teams usually change a lot after the season ends and to evaluate it you have to wait another whole season! I hope one day journalists will actually ask seasoned managers about XG rather than always trying to stir up controversies. Would be interesting how the actual pros view it, and not the math pros but the actual managers.
 
Last edited:
The deal breaker for me when it comes to xG is that 99% players tend to hover around 0 for their G - xG metric. So if I accept xG as a statistic then I have to accept that everybody is an average finisher, which I don't believe in.

Also those graphs that supposedly prove something are very unconvincing. Basically they tell me that xG is a marginally better predictor of future points per game, than shots on target. From that I can't conclude that xG "works", only that it is marginally better than a very crappy stat like shots on target.
It's like saying that my broom is a good weapon of war because it outperforms my toothbrush. Not really, no.
 
I agree, it' very flawed. As far as I know also every offside decision automatically is 0.00xg - so you can create 10 chances that are all whistled back by var due to the closest decisions, according to xg you didn't create a single chance?
 
I agree, it' very flawed. As far as I know also every offside decision automatically is 0.00xg - so you can create 10 chances that are all whistled back by var due to the closest decisions, according to xg you didn't create a single chance?
Not only according to xG, you didn't create a single chance in that example
 
Last edited:
Also those graphs that supposedly prove something are very unconvincing. Basically they tell me that xG is a marginally better predictor of future points per game, than shots on target. From that I can't conclude that xG "works", only that it is marginally better than a very crappy stat like shots on target.
It's like saying that my broom is a good weapon of war because it outperforms my toothbrush. Not really, no.
:lol: I was just thinking that. Developing a complicated 'big data' model that evaluates hundreds of thousands of goalscoring opportunities in order to get an improvement in R^2 of 0.03 over simply counting shots on target!
 
Over thousands of games it's pretty accurate. Obviously shorter run can be massive variance.
 
As others have alluded to, the issue isn't xG it's people misusing it and thinking that it predicts isolated occurrences. It doesn't.
 
Let's be honest, it's being hyped because there's massive money behind it, even though it means absolutely nothing in the long term. Ulitmately it's a flawed detail that more often than not simply states the obvious.
 
It is a stat. It is not the end of the discussion type stat, but it generally gives a decent indication of the game. There are instances that fall through the cracks and are not properly represented.
 
If it's rubbish for one match, it's rubbish for the season. And it is absolutely awful.

It's like refereeing decisions evening out throughout the season, it's nonsense