In the last post, we established that you might revise an
estimate of a cricket team’s ability by comparing an actual result to that
expected, if that estimate had been correct.
But what do we mean by “expected” in this context?
If a strong team plays a weak one, we expect it to win, in
that a win may be the most likely result. But even the strongest team doesn’t win every
match it plays. Perhaps we expect a
strong team to win a six match series 4-1.
So if we give a point for a win and split the point in the event of a
drawn game, the team would win 75% of the points, or 0.75 points per game. Obviously a team can’t actually get 0.75
points from a single game, but thinking in this way allows us to put expected
values on a continuous scale. And if one
puts one in the context of betting, the frequency of wins becomes very important: if you judge the
expected outcome correctly, and make enough bets at favourable odds, you’ll
come in a winner: if odds of 4-1 are offered for a win (with the bet annulled
in the event of a draw), consistently betting on a team with only a 0.25
expected value will see you ahead (with three times as many losses as wins, but
four times the yield when they do win as what you lose when they do not).
And the other idea was that our ratings would be used to
define our expected value. Let’s assume
equal ratings mean equal chances, and an expected value of 0.5. And let us state that as the gap between two
teams grows very large, so the expected value tends towards 1. It can’t exceed 1; no possible team can
average more than 1 point per game, as we have defined it. But the greater the gap between two teams, the
more one-sided their records. And in
fact we can define a formula that meets this criteria: in fact, we can define a
whole family of formulae. Our formula is
as follows: the expected result (E) for the strongest team is equal to 1 / (B-D+
1), where D is the difference between the two team’s ratings, and B is any
positive number greater than 1. Any
number raised to the power 0 is 1; any number greater than 1, raised to a large
negative number, is little more than 0 (and becomes smaller the larger the
power to which it is raised). Thus when
D is 0, the expected value is 1 / (1 + 1), i.e. 0.5; and when D is large, the
expected value becomes 1 / (~0 + 1), i.e. ~1.
So now we need to set B, but firstly I’m going to add
another term to the formula, A, as follows:
E(strongest team) = 1 / (B-D/A+1). The A
term here is used to calibrate the ratings; we can consider our first formula
to be identical to the second, only A has been set to 1. But for any chosen B, there will be an A such
that a fixed gap in the ratings will be equivalent to a certain expected
value. In my system, I’ve chosen an A so
the E(strongest team) = 2/3 when D = 100. For the weaker team, the expected
value is very easy to calculate: it’s whatever’s left that the strongest team
hasn’t claimed. So in the case of my
system, a team that rates 100 points less than it’s opponent has an expected
value of just 1/3.
Thus, if two teams have a rating difference of 100, one
would expect a 1-0 win in a 3 test series, or a 2-1 victory, for the strongest
team. This is arbitrary: by picking a different A, one could make one point
difference a hugely significant one, or a 1000 point difference trivial: in
this sense, it’s purely an aesthetic preference. Nonetheless, whatever A we pick, a given D has
a meaning: it may not be an average of past performance, but it corresponds to
an actual predicted outcome of the chances of the two sides. Equally, in
absolute terms, a rating means nothing.
The difference between two ratings is a measure of relative strength. If one was to add (the same number of)
millions to every team’s rating, the system would be unaffected. A team rated one
million and one has exactly the same predicted advantage over a team rated one
million as a team with a rating of one has over a team with a rating of zero.
So now we have to pick a B. And this is not arbitrary,
because it determines the shape of the curve.
It’s clearly not a linear relationship: we’ve already asserted that as D
tends towards infinity, the expected value tends towards 1. But we can give a very specific example to
make things clearer. If team A has an expected value of 2/3 against
team B, and team B has an expected value of 2/3
against team C, what is the expected value when team A plays team C?
Suppose B is 2.
Setting A so a difference of 100 gives an expected value of 2/3, A needs
to be ~63. And then, if D is 200 not
100, the expected value of the strongest team is ~0.9. But if B is 4, A needs
to be ~115, and the expected value of the stronger team is ~0.92. Under both
formulae, then, there’s a clear prediction that A is expected to get the better
of C quite often, but nonetheless, the values are different. And this is not just a calibration
problem. A 100 point difference has been
defined as a real thing (namely, the difference that corresponds to a 2/3
expected value), but our system must produce a prediction for any points
difference. And if I was to pick a B so
small that a 200 point difference (the sum, after all, of two 100 point
differences) corresponded to only an expected value of 0.68, or so large as it
corresponded to an expected value of 0.99, it would be self-evidently
wrong. That is to say, even though the
model is not a linear one, the significance of the 200 point gap (or indeed, a
gap of any other margin) must be a function of the significance of the defining
100 point gap, and the difference between 200 and 100. If we can’t find such a
relationship, the system will work when the gap is exactly 100 points, and fail
thereafter. In fact, we’ve just defined
an infinite family of such relationships; but objectively, one should be
right. And oen can think of two ways we might try to find the the right B: one is empirical, to see which B best fits the actual data, but the
other approach is theoretical, and to define it from first principles.
To cut to the chase: our B is going to be approximately
2.7. In fact, it’s going to be an
irrational number, a number that can be shown to exist even though we can’t
exactly write it down (a bit like the ratio of the radius of a circle to its
diameter). Because we can’t write such
numbers down, mathematicians represent them by letters; and our number is known
by the letter e. But what exactly is e, and why do we choose it?
I’m not going to give you a formal proof, but to give even a cursory
explanation I do need to give an introduction to the concept of the normal
distribution. And that’s the next post.
No comments:
Post a Comment