In the previous post, we asked two questions? How do we
compensate for the difficulty of a team’s schedule when we assess the team’s
results? It’s not such an easy thing to
do: if you beat a supposedly strong team easily, doesn’t that just prove it
wasn’t actually so strong? And secondly, how do we weight recent results
against older ones? When a team records a new result, to what extent should we change
our previous assessment of it?
And the Elo system provides us with a very clear model for
thinking about these two points, starting with the second one, but based on how
in general, we make assessments in life.
If something happens which is very unlikely according to our previous
preconceptions, we adjust our preconceptions more radically than if something
happens which is very close to what we expected. If a cricket team beats a side we thought was
slightly weaker than it, we’re not so very surprised, and our assessment of the
two teams doesn’t change very much. But
if a supposedly weak team wallops a supposedly mighty one, we may consider this
a freak result; but we are also more likely to question our prior assumptions about the two sides’ relative strength.
And the second key idea of the Elo system is this: that if
we say that a team is stronger than another one, what we mean is that we expect
it to do well, should the two teams play each other.
Thus we have a basic formula for adjusting a team’s ratings after a
match: that the new rating of a team A,
R(A)1, equals R(A)0, its previous rating, plus
some function of the difference between the result achieved and that expected;
where the latter is itself some function of the difference between R(A)0
and R(B)0, the previous rating of the team’s opponents.
We’ll look into what that function might look like in the
next post, but for now, I just want to explore the difference between a rating
of this sort and an average. After every
game, R will change. Even if a team was
of absolutely fixed underlying ability, it would win some matches and lose
some; and it’s rating would fluctuate.
We can think of the rating less as a summary of performance (a
measurable thing) over a fixed time interval, but as a guess of the
un-measurable quality of ability derived from the performance. After each game, we make a new guess of the correct rating,
informed by the extent to which the team’s actual performance has deviated from
that we would have observed if the ratings were already perfect predictors of outcome.
Such a rating is thus very different from a statistic that
tells us (as a fact) that a team has won 40% of its last 10 matches. But here’s
an interesting point. How do we start operating such a system? Before the first match we consider, how do we
know what to expect? An obvious starting point is to assume all teams are
equal, before we have evidence to the contrary.
But providing we are happy to give the system a little time to
equilibrate, it doesn’t actually matter.
If our initially allocated ratings are poor, results will differ more from
expectations; and so our originally assigned ratings will self-correct quickly.
And thus we also solve the two problems we had
previously. A system which uses rolling,
and/or weighted averages, is in danger of either over-privileging old results
(so, for example, a team in decline keeps a good rating even once it is no
longer performing well), or suffering from absurd instability (whereby a couple
of good matches could lead to a side being top rated). But under the Elo system, the team’s ratings
will move more decisively in response to a team’s recent results when those
results differ dramatically from expectations.
Thus the more a team underperforms, the greater the correction to the
ratings (and the greater effective weight of more recent results). In fact, we will still need to pick a general
stability parameter; but as we will see, the system gives us a framework for at
least trying to set this in a non-arbitrary way.
And our first problem is also solved. A team with weaker opponents will have better
results expected; and so will need to win more games to beat its expectations. It is logically no more likely that playing
five matches against weak opponents will improve your rating than playing five
games against a strong team. The
principle of the system is thus very clean.
The mathematics will be explored next time.
No comments:
Post a Comment