Whilst this blog will focus on cricket, a lot of concepts will be inspired by work from other sports. For example, my Expected Runs model describes the average number of runs that would be scored from a delivery with particular attributes such as its line and length. This is equivalent to the well established Expected Goals (xG) metric from football which measures the quality of a chance. A simple search will glean many models that take into account a number of factors such as the distance and angle from goal, which body part was used, build-up play etc. Using some data that I had lying around, I wanted to see if I could replicate the results of some of these models.
Almost all xG models are fundamentally based on the location of the shot. In fact, this blog suggests that you can construct an xG model that is 95% reliable just by considering the location of shots.
The image below shows the locations of 234,018 shots from 9,133 matches I have in my database from various leagues and competitions around the world (attacking left to right).The cluster of shots in the defending penalty area are not opportunistic goalkeepers but own goals. This is illustrated by the next image which shows all 24,347 goals in the dataset.Just from these images we can immediately start forming some hypotheses such as the closer and more central you are to the goal the more likely you are to score. Compare the density of the cluster of shots outside of the penalty area to the goals from this area. Also, consider that the cluster inside the penalty area extends to the edges of the box, while the number of goals in the wide areas is significantly fewer.
For the purposes of my simple xG model, I excluded own goals and penalties and only considered shots in the boxed area show below.This leaves us with 21,424 goals from 225,372 shots. I calculated the distance of each shot from the middle of the goal line giving us shots that ranged from 0 to 40 metres. I then placed all these shots into forty 1 metre bins. Within each bin, the ratio of goals to shots is the expected goals figure or xG for each of those shots. For example, a shot from 10.5 metres out has an xG of 0.129 because there has been a total of 9,267 shots between 10 and 11 metres out resulting in 1,194 goals.
If we do this for every shot, we get the following graph:
We can see that a shot from less than 1 metre out is more or less certain to be a goal. Beyond this, there is a rapid drop in xG up to 10 metres before slowly converging to zero at large distances. This sort of graph can typically be fitted by an exponential decay curve.
By eye the curve seems like a decent fit for our data. We can use the equation to estimate the xG of a shot from just its distance from the goal. For example, d = 10.5 metres gives us an xG of 0.157 goals. This is slightly higher than the empirical result from earlier.
The equation, however, breaks down at very small distances. Specifically, any distance below 0.7 metres will return a value greater than 1 for xG. Nevertheless, the r-squared value is 0.987, similar to what is obtained here.
Here is a visualisation of the xG model.
Another limitation of this simple distance model is that it rates two shots of the same distance with the same xG, even if one is from directly in front of the goal and the other from a very tight angle. Our intuition tells us that xG should decrease as we get less and less of the goal to aim for. However, an xG model based just on angle to goal will also suffer from a similar problem. The xG of a shot from the penalty spot will be the same as that of a shot from the centre mark. Both have the same angle from goal i.e. zero, but are vastly different distances from goal.
To counter this, we can consider the angle to the middle of the crossbar. The figure below illustrates this angle.
Let’s say position A is the location of the shot and AB is the distance to goal as before. BC is the height of the goal i.e. 2.44 metres. From these two lengths we can calculate the angle to the crossbar, θ, using basic trigonometry. This angle has the advantage that it decreases with distance assuming the angle to the goal line stays the same.
As before, we can calculate this angle for every shot in the dataset and place these into bins to obtain the xG for a particular angle.This time we get a linear relationship between the angle to the crossbar and xG. As you might expect, the more of the goal you have to aim for and the closer you are, the higher the chance of scoring. There is a bit more uncertainty in xG in shots above 30 degrees as the vast majority of shots (99%) have an angle less than this.
This blog has described two simple implementations of an expected goals model based purely on the location of the shot. In the next blog, I will attempt to build a model that incorporates both the distance and the angle to the goal as well as investigating how teams and players perform under this metric.