If I have a model that is supposed to get the value of 2+2 and if that model randomly picks number between 2 and 6, eventually over a longer sample I will get approximately correct number added up. Right? However the model is still sh*t. You may claim that model accurately predicted Antony would miss, but it also predicted that Amad would miss all three of his goals, including the last one that was an open net, with no pressure.It’s a good stat that can be (and regularly is) used in assessing individual and collective performance. And it’s usually more telling than actual goals scored due to how low-scoring of a game football is.
Just don’t think of it as a faultless model, no statistical model (currently, at least) is complicated enough to give you an objective and full picture of what’s happening. It works pretty well though, especially over the longer periods.
In fact, the Antony chance shows that they’re perhaps more realistic in your assessment than you are… it doesn’t make his attempt any less criminal but surely it’s counterintuitive to blame the model that actually predicted the miss that happened in being unrealistic?
I mean, what proof do you require? There’s tons of it and there are many scientific articles on the matter that prove usefulness of xG as a prediction metric. 10 seconds on google and you’re sorted.If I have a model that is supposed to get the value of 2+2 and if that model randomly picks number between 2 and 6, eventually over a longer sample I will get approximately correct number added up. Right? However the model is still sh*t. You may claim that model accurately predicted Antony would miss, but it also predicted that Amad would miss all three of his goals, including the last one that was an open net, with no pressure.
You are not the first to claim that model is working over a longer period, but do you have any proof of that? How long do the periods need to be for it to be reliable?
If I have a model that is supposed to get the value of 2+2 and if that model randomly picks number between 2 and 6, eventually over a longer sample I will get approximately correct number added up. Right? However the model is still sh*t. You may claim that model accurately predicted Antony would miss, but it also predicted that Amad would miss all three of his goals, including the last one that was an open net, with no pressure.
You are not the first to claim that model is working over a longer period, but do you have any proof of that? How long do the periods need to be for it to be reliable?
The other side of this coin is that the best teams will create more chances and have a higher chance of missing goal opportunities.My issue with it is that more teams should be outperforming it. If they aren’t, it’s not a real average. If you look at the stats here:
https://understat.com/league/EPL
Only three teams have outperformed it this season, Wolves, Brentford and Aston Villa. Surely the teams with the best players should be performing above average? It’s bollocks.
It’s a bit more complicated than that.My issue with it is that more teams should be outperforming it. If they aren’t, it’s not a real average. If you look at the stats here:
https://understat.com/league/EPL
Only three teams have outperformed it this season, Wolves, Brentford and Aston Villa. Surely the teams with the best players should be performing above average? It’s bollocks.
But if your forwards are better they should be more likely to finish a chance than a Wolves player. It’s nowhere near an average and the best teams/players should be regularly outperforming it.The other side of this coin is that the best teams will create more chances and have a higher chance of missing goal opportunities.
Could you summarise that in less than a million words?
Could you summarise that in less than a million words?
First let me say, I love stats, and I love to measure as much as possible with stats, however finishing and particularly XG is currently terrible stat and shows very little.
Why I'm saying this, well Antony's chance was yesterday evaluated as 0.42. It was an open goal from 4 yards.
Diallo's goals are all evaluated as 0.33, 0.33 and 0.39. Actually, first two goals are quite difficult finishes, the first one from a tight corner on his weaker foot. The second one from a lobbed ball first time finish. I mean they are good chances, but certainly not trivial. However the last goal that is evaluated at 0.39???? Are you telling me there is a 60% chance that a professional footballer will miss an open goal under no pressure and with no sight of goalkeeper.
Are you telling me seriously that Matheus Fernandes chance is more likely to score than Amad his last or Antony? And that chance, despite him being surrounded with two united players and a goalkeeper straight ahead of him is apparently the best chance of game. Really? Really?
Source of XG values:
https://understat.com/match/26811
What this shows is the model (at least understat's) is flawed (probably too simple) and therefor can not be relied upon to provide actual information about quality of chances over a match. That also implies it is unreliable over the course of a season.
Yeah that's ridiculous. Absolutely no way that Antony's miss only gets scored 42% of the time.
Individual chances = xG is an almost useless metric
Chances accumulated in a match = xG is OK, but you still get bullshit results on a weekly basis so it shouldn't really be used in discussions
Chances accumulated over a whole season = xG is very good metric
Individual chances = xG is an almost useless metric
Chances accumulated in a match = xG is OK, but you still get bullshit results on a weekly basis so it shouldn't really be used in discussions
Chances accumulated over a whole season = xG is very good metric
Doesn’t really change my mind. Salah has underperformed his xG three seasons running. Is he a below average finisher?Both the abstract at the start and the discussion part at the end do that. Less than a page.
Doesn’t really change my mind. Salah has underperformed his xG three seasons running. Is he a below average finisher?
It's a general problem with statistics. People see statistics that are based on averages and compare them to outliers and proclaim that statistics are useless.It's been proven countless times over by countless people, because that data to do so is readily available. For one example, this based on approx 17k games from Europe's big 5 leagues and MLS:
I remember playing around with these numbers once for some PL seasons and iirc the extra predictive value was not that large.xG works both in the sense that teams and players over time tend to score a similar amount of goals to xG created, and in the sense that xG created is the best predictor we have of how many goals players and teams will score in the future.
Well the thing is that in the case of xG most people don't care about broad averages, but about particulars, because we want to use the data to analyze specific teams in specific seasons.It's a general problem with statistics. People see statistics that are based on averages and compare them to outliers and proclaim that statistics are useless.
I have been Googling whether clubs use xG as a metric in analysing their own players performance or transfer targets and can't find anything. That to me suggests it's not considered much.
I wonder do clubs actually use xG to drive decisions, does anyone have any information on this? For me personally, I don't pay much attention to it. I feel like it nevers reflects a result in a game correctly. Missing great chances and scoring improbable chances has always been a part of the game.
Someone needs to post what the xg was over the entire PL season vs how many goals were scored (I cba)
If those figures match up, then it'll be fair to say that xg is a pretty reliable model. Just not in isolated incidents, it's more about averages
That's why I said earlier that xG is useful but not as useful as people want it to be: you need the gap between xG and goals to be quite large to definitively rule out 'luck.' And if it's that large you can probably see it in other ways.Assuming a perfect model without bias and measurement error, cumulative goals will lie within a +/- 20% range around cumulative xG after one season (with 95% confidence). Variation within this range is therefore not necessarily out- or underperformance of xG, but can simply be driven by the natural variation of Bernoulli random variables. Similar rules are +/- 50% for 10 games and +/- 33% for half a season.
It doesn’t make sense if you’re a good finisher and it was a real average.Salah tends to score similarly to his xG, which makes sense. He scores a lot of goals because he's very good at getting into goal scoring opportunities, the hardest thing to do for attackers. I've never thought of him as someone especially good at finishing the chances he gets compared to other players.
Finishing is a pretty basic skill compared to other things footballers do, so when top level athletes practice it a lot you wouldn't expect much variance.
Not only according to xG, you didn't create a single chance in that exampleI agree, it' very flawed. As far as I know also every offside decision automatically is 0.00xg - so you can create 10 chances that are all whistled back by var due to the closest decisions, according to xg you didn't create a single chance?
I was just thinking that. Developing a complicated 'big data' model that evaluates hundreds of thousands of goalscoring opportunities in order to get an improvement in R^2 of 0.03 over simply counting shots on target!Also those graphs that supposedly prove something are very unconvincing. Basically they tell me that xG is a marginally better predictor of future points per game, than shots on target. From that I can't conclude that xG "works", only that it is marginally better than a very crappy stat like shots on target.
It's like saying that my broom is a good weapon of war because it outperforms my toothbrush. Not really, no.
who let the wokes inIt's been proven countless times over by countless people, because that data to do so is readily available. For one example, this based on approx 17k games from Europe's big 5 leagues and MLS: