How can you tell how good a batsman really is? How can we measure their true skill? Maybe they’ve been getting lucky or have been facing some pretty dross bowling? Perhaps a batsman appears to be out of form because they’ve recently been on the end of a couple of unplayable deliveries early on in their innings.
Averages and strike rates are good summary statistics but reveal little about the current match situation or the quality of the opposition bowling. In this blog, I describe my first attempt at a metric that aims to predict the number of runs a batsman should score based on the type of deliveries they have faced.
In football analytics, the metric expected goals, abbreviated to xG, measures the probability of a particular shot ending up as a goal based on a variety of factors such as the distance and angle from goal, the body part used to make the shot and the type of assist. An open goal from 10 yards usually leads to a goal, while a shot from 40 yards out rarely does. Adding up all these expected goals gives the number of goals that a team or player would score on average.
Similarly in cricket, the same type of delivery usually end with similar outcomes: half-volleys tend to go for 4, top-of-off deliveries tend to be defended and ripping leg-breaks are often played and missed etc. This can be quantified using data to calculate the number of runs an average batsman would be expected to score from a delivery of a particular line and length, speed and movement off the pitch among other factors. For example, exactly how many runs would you expect to be scored from a back-of-a-length delivery outside off, with no movement off the pitch at 85 mph? If we collect all the deliveries that have these attributes and add up the total runs that have accrued, we can divide this by the number of balls to get an Expected Runs figure, or (predictably) xR for short. The next time a similar ball is bowled we can say that it has an xR of however many.
This concept is also used by CricViz to measure current batting conditions in Test matches, as described here.
You may immediately see how Expected Runs can be used to measure the quality of a batsman. If the xR of a particular ball is 1.5 and the batsman is able to consistently hit this ball to the fence, it gives an indication of how good this batsman is compared to the average batsman.
In my first version of an Expected Runs model, I only consider the line and bounce of the ball i.e. the position of the ball when it is level with the stumps. My dataset consists of 51,775 balls from 226 T20I matches. The data contain details of the over, batsman, bowler, runs, any extras, wickets, ball speed, coordinates of where the ball pitched and coordinates of where the ball ends up at stump level. I stripped the data of any wides, null and erroneous coordinate values to end up with 43,541 deliveries producing 54,192 runs and 2,420 wickets in total. I then split this data by right and left-handed batsman to give 30,757 and 12,784 deliveries each. Every ball in the dataset is shown below as a beehive plot:
The batting crease runs from -1.5 m to 1.5 m with middle stump at 0 m. I split the coordinate space into square bins of 0.1 m side length, giving us 750 bins in total as shown in the figure below. However, it is evident from the figure above that the sample size for each bin will vary wildly.This procedure found bins to have Expected Runs ranging from 0 to 6. These extreme figures were due to very small sample sizes. The table below illustrates how runs compares against xR. As expected, they both have virtually the same mean but xR has a significantly smaller standard deviation. It may or may not be surprising that most balls in T20 matches are either dots or singles.
The continuous nature of the xR metric means we can differentiate between good and bad balls more accurately. We can determine whether a dot ball was a genuinely good ball or whether the bowler just got away with one. We can say whether a bowler deserved to be hit for a boundary or just got plain unlucky.
The figures below show the results of the binning for both right and left-handers. Remember, the average runs per ball is about 1.24. It can be seen that the relatively high scoring areas are anything wide of off-stump and on the batsman’s legs. Here the xR value is about 1.4 to 1.5 or up to 9 runs per over. To restrict the batsman to about a run a ball, the data shows that you should bowl a good to back-of-a-length on off-stump or just outside. The blank bins indicate extreme values for xR due to small sample sizes so were left out.
Even with this simple model some cricketing truths are apparent, namely not to give batsman room to free their arms or bowl on their legs in T20. There is certainly a lot of scope to improve this model. I am yet to incorporate the positional coordinates of where the ball pitched, and any movement off the pitch etc. I could construct separate models for both spinners and seam bowlers. I could also consider game state i.e. where is the best place to bowl in the death overs or at particular batsman early on in their innings.
In the next blog I investigate which batsmen fare the best under this metric and whether xR correlates with other measures.
If you have any questions/suggestions please feel free to tweet me: @cricketsavant