Let’s talk about uncertainty…

The divide in modern football these days is not nearly as much between the ‘traditionals’ and ‘geeks’ but also between general competence and incompetence. Alongside this, when you are unsure of how to use the information correctly, uncertainty can very easily kick in. For example, a quote which I recently came across but where from I cannot re-call said:

“We do not typically see uncertainty alongside analytics. This is perhaps understandable, but it does hide some potentially crucial information from the decision-maker. Without accurate information, a decision-maker could put too much weight on your analysis and it could be quite possible that uncertain analytics may lead to worse decisions than no analytics at all. Ultimately, over a long enough time period, this could lead to distrust of analytics. That’s a bold statement, so we need to do some analysis to get a flavour of the uncertainty that is out there”.

Let’s talk about the concept of an industry-standard metric known as Expected Goals (xG). This metric is now a common use within the wider football community- media, fans, and clubs alike, yet there are still some very major questions that are asked of it all the time. For example, why is it based on the average of all players and not on each player individually? There are many reasons as to why but of which I will not cover today. This brings me to a tweet from Sam Gregory (Director of Analytics at Inter Miami) in the MLS that I had bookmarked over two years ago.

Personally, I would fully agree with Sam on this approach and also concur with many practices against using single game xG values and rather looking at general traits or trends of xG for a team over a period of time. A level of disagreements here between coach/analyst/director/chairman could very easily create any bit of ‘uncertainty’ which can easily be solved by speaking each other’s languages – ie: business impactfulness, tactical, technical, or analytical for instance.

Detail vs Noise

Let’s imagine we are concerned with understanding a player’s contribution to his team’s efforts. This can have an overall positive or negative impact based on the various models and metrics used to evaluate it. You can also go deeper of course and refer separately to its defensive and attacking impacts, or go even deeper and evaluate passing, shooting, tackling, etc. this could go on and on, as all these splits help a manager identify the strengths and weaknesses of his/her player’s in detail. But there is a pitfall to overdoing this, the uncertainty. 

You don’t need to be a statistician or data analyst to understand that the more information you gather on something, the more confident you can be about the results of any analysis applied. It may be desirable to be able to rank a player in as many categories as possible, but the observed data may be limited so that some rarely observed categories need to be grouped together in order to contain enough data to construct meaningful metrics. In other words, there is a trade-off between the precision of the analytics and their interpretability. 

A great example of precision and interpretability can be found in the quote below which came from former Opta data scientist Devin Plueler in 2014 when speaking about ‘game state effect’. He mentioned: 

“The effects of game state don’t just skew top-level metrics such as shot volume: they persist through every level of statistical granularity. For example, since 2010 in the Premier League, losing teams have had an overall shooting conversion rate of 9%. That’s one goal every 11.1 shots. Conversely, winning teams have had an inflated conversion rate of 11.8%, or one goal every 8.5 shots. Given the generous sample size, we are certain that the difference between these two goal-scoring rates cannot be attributed solely to pure luck. Game state is part of the underlying mechanics of our game”.

Closing Remarks

It is up to the analyst who designs the metrics to make the best compromise between uncertainty and interpretability. A better approach may be to follow this advice when trying to balance the compromise:

  1. Ask yourself what a manager might want to see/understand but be prepared to go down the biggest of rabbit holes you have ever encountered. The answer may lie in the depths of the information but will take some seeking out.
  1. Give the manager 90% of the information they wanted and also 10% of information ‘they did not know they needed/wanted’ and that will help them understand the information presented to them.

We have seen examples within the analytics industry of performance metrics having very fine splits. While it is understandably tempting to do this, we do not believe that the metrics are necessarily meaningful at that level. Precision is very important. As the adage says: ‘Just because you can doesn’t mean you should’. The question you can therefore consider when presented with a metric, or when you drill into your data yourself, is, have I ended up with too little data to draw reliable conclusions? Or to put it another way, am I simply tossing a coin without realising it?