The aim of cricket is to win matches. Australia have won
more matches than anyone else in the history of test cricket (370). But one
could say the aim is to win as many matches as possible, without losing. Which team has won more than they have lost
by the largest margin? Again, Australia, whose wins are offset by only 208
losses, for a positive balance of +162.
But teams that play more matches are going to win more, and good teams
that have played a lot will have a larger win-loss difference, even if they win
(and lose) at the same frequency. So
perhaps one should award one point to any team that wins a game, half a point
to each side if the match is tied or drawn, and then measure the average points
won per game played. Which team has the
best average on this metric? Also Australia, who average 0.63 points per game,
as opposed to the 0.54 points per game of England, their nearest rivals (to get
these numbers, I used the indispensable Statsguru feature of espncricinfo.com,
which makes these sorts of analyses possible).
So fairly indisputably, Australia have been the most successful
team in test cricket. But even then they
have not won every match. And of course,
the outcomes of individual matches are unpredictable (if they weren’t, there’d
be no point in playing them). Players
get lucky, find form and fitness, pitches suit one side or the other, there’s a
critical umpiring mistake, and so on. We can say that the outcome of each
individual match is influenced by random factors. But random factors are not sufficient to
explain the observed patterns. When I
ran five simulations, assigning results to matches at random (but sticking to the overall pattern actually observed whereby approximately one third of games are drawn and the
remainder are won by either side), no team averaged more than 0.58 points per
game. Moreover, random fluctuations tend to cancel each other out with larger
samples. For England and Australia (who’ve played most matches), the five
simulations never once generated an average total of higher than 0.52 points per game,
or lower than 0.48.
One explanation could be that Australia mostly play weak opponents,
increasing their win-loss ratio above average.
In fact, I suspect that the opposite is true, but for the moment, let’s
just assume that this isn’t the explanation for Australia’s very high score.
Instead, let’s build a model, where the match outcome (O) is determined by A – B + R, where
A is a measure of the underlying strength of team A, B is the strength of their
opponents (obviously, it’s the difference between the two that determines a
side’s chance of winning), and R represents randomly fluctuating factors. What could be the factors determining A and
B? Population size, the strength of the
cricketing culture of a country (i.e. how dominant it is compared with other
sports), national wealth and so on might all play a part. And this model is
good as far as it goes. Except that, besides
fundamentally static qualities, and random fluctuations, there are many factors
which do vary, but only slowly over time. For example, great players
enter, and retire from, the national side; coaches are appointed and are
sacked. In fact, even the supposedly static qualities I’ve previously highlighted
will themselves change, just at a slower rate (the strength of cricket in the national culture is not guaranteed in the same way as a physical constant). And when we say “Australia
is a great side”, we don’t actually mean that Australia have been, on average,
more successful over the course of history, even though this is true. What we actually tend to mean is that particular
Australian teams have been very strong, and have won a lot of matches. But there have also been low ebbs in
Australian cricket, in which for lengthy periods the team has averaged less
than 0.5 points a match. We don’t have
to change the structure of our model to accomodate this; but the fact that A and B are no longer
assumed to be static creates some problems.
Specifically, we can’t assume that Australia are now a great team merely
because they won a lot of matches in the 19th century, or even the 20th. What might inform us better is
asking, how many matches have Australia won recently? Obviously, because a team’s ability level is
always changing, such a backwards-looking assessment will not tell us exactly
how good a team is right at this moment in time. But because we
assume that A and B change slowly and continuously (and indeed, we have
factored out match-to-match fluctuations into a separate variable to make this so), we can
assume that recent performances are a reasonable guide to the present day. But how many to
include? This will be the subject of the
next post.
One final thought.
The only true facts are match results. To demonstrate that a team’s
victory was 10% due to the eternal greatness of its cricketing culture, 60% due
to the semi-persistent talent of its current eleven, and 30% due to random luck, is
quite possible in an appropriately constructed model; but it’s not actually a
fact about the real world, just a quirk of the way we have chosen to conceive
it. In reality, there are 4000 of so
cricketing elevens that have started a test match, and each one has obtained a
particular result. As humans, we want to
fit this into a narrative, to assert “X was a great side in the 1980s”, even though
dozens of different players may have played for team X in this decade and the
many great victories achieved might have been interspersed by the odd truly shocking
defeat. When we say “X is the best team
playing now” that implies we would back X at even odds in a game against any
other side; but if it was easy to analyse the numbers and find out which way to
bet for certain, someone would have made their millions doing so now. And if a system doesn’t
provide an answer that matches our subjective prejudice, who’s to say if the
system is measuring the wrong things, or if we are simply refusing to believe that our
instincts could be wrong? Nonetheless, next time, we’ll
being to look at some ways a system could be constructed, and some basic
problems in statistics that we'll encounter as we do so.
No comments:
Post a Comment