Monday, 18 January 2016

Expectations and Exponentials



In the last post, we established that you might revise an estimate of a cricket team’s ability by comparing an actual result to that expected, if that estimate had been correct.  But what do we mean by “expected” in this context?

If a strong team plays a weak one, we expect it to win, in that a win may be the most likely result.  But even the strongest team doesn’t win every match it plays.  Perhaps we expect a strong team to win a six match series 4-1.  So if we give a point for a win and split the point in the event of a drawn game, the team would win 75% of the points, or 0.75 points per game.  Obviously a team can’t actually get 0.75 points from a single game, but thinking in this way allows us to put expected values on a continuous scale.  And if one puts one in the context of betting, the frequency of wins  becomes very important: if you judge the expected outcome correctly, and make enough bets at favourable odds, you’ll come in a winner: if odds of 4-1 are offered for a win (with the bet annulled in the event of a draw), consistently betting on a team with only a 0.25 expected value will see you ahead (with three times as many losses as wins, but four times the yield when they do win as what you lose when they do not).

And the other idea was that our ratings would be used to define our expected value.  Let’s assume equal ratings mean equal chances, and an expected value of 0.5.  And let us state that as the gap between two teams grows very large, so the expected value tends towards 1.  It can’t exceed 1; no possible team can average more than 1 point per game, as we have defined it.  But the greater the gap between two teams, the more one-sided their records.  And in fact we can define a formula that meets this criteria: in fact, we can define a whole family of formulae.  Our formula is as follows: the expected result (E) for the strongest team is equal to 1 / (B-D+ 1), where D is the difference between the two team’s ratings, and B is any positive number greater than 1.  Any number raised to the power 0 is 1; any number greater than 1, raised to a large negative number, is little more than 0 (and becomes smaller the larger the power to which it is raised).  Thus when D is 0, the expected value is 1 / (1 + 1), i.e. 0.5; and when D is large, the expected value becomes 1 / (~0 + 1), i.e. ~1.

So now we need to set B, but firstly I’m going to add another term to the formula, A, as follows:  E(strongest team) = 1 / (B-D/A+1).   The A term here is used to calibrate the ratings; we can consider our first formula to be identical to the second, only A has been set to 1.  But for any chosen B, there will be an A such that a fixed gap in the ratings will be equivalent to a certain expected value.  In my system, I’ve chosen an A so the E(strongest team) = 2/3 when D = 100. For the weaker team, the expected value is very easy to calculate: it’s whatever’s left that the strongest team hasn’t claimed.  So in the case of my system, a team that rates 100 points less than it’s opponent has an expected value of just 1/3.

Thus, if two teams have a rating difference of 100, one would expect a 1-0 win in a 3 test series, or a 2-1 victory, for the strongest team. This is arbitrary: by picking a different A, one could make one point difference a hugely significant one, or a 1000 point difference trivial: in this sense, it’s purely an aesthetic preference.  Nonetheless, whatever A we pick, a given D has a meaning: it may not be an average of past performance, but it corresponds to an actual predicted outcome of the chances of the two sides. Equally, in absolute terms, a rating means nothing.  The difference between two ratings is a measure of relative strength.  If one was to add (the same number of) millions to every team’s rating, the system would be unaffected. A team rated one million and one has exactly the same predicted advantage over a team rated one million as a team with a rating of one has over a team with a rating of zero.

So now we have to pick a B. And this is not arbitrary, because it determines the shape of the curve.  It’s clearly not a linear relationship: we’ve already asserted that as D tends towards infinity, the expected value tends towards 1.  But we can give a very specific example to make things clearer.   If team A has an expected value of 2/3 against team B, and team B has an expected value of 2/3  against team C, what is the expected value when team A plays team C?

Suppose B is 2.  Setting A so a difference of 100 gives an expected value of 2/3, A needs to be ~63.  And then, if D is 200 not 100, the expected value of the strongest team is ~0.9. But if B is 4, A needs to be ~115, and the expected value of the stronger team is ~0.92. Under both formulae, then, there’s a clear prediction that A is expected to get the better of C quite often, but nonetheless, the values are different.  And this is not just a calibration problem.  A 100 point difference has been defined as a real thing (namely, the difference that corresponds to a 2/3 expected value), but our system must produce a prediction for any points difference.  And if I was to pick a B so small that a 200 point difference (the sum, after all, of two 100 point differences) corresponded to only an expected value of 0.68, or so large as it corresponded to an expected value of 0.99, it would be self-evidently wrong.  That is to say, even though the model is not a linear one, the significance of the 200 point gap (or indeed, a gap of any other margin) must be a function of the significance of the defining 100 point gap, and the difference between 200 and 100. If we can’t find such a relationship, the system will work when the gap is exactly 100 points, and fail thereafter.  In fact, we’ve just defined an infinite family of such relationships; but objectively, one should be right.  And oen can think of two ways we might try to find the the right B: one is empirical, to see which B best fits the actual data, but the other approach is theoretical, and to define it from first principles.

To cut to the chase: our B is going to be approximately 2.7.  In fact, it’s going to be an irrational number, a number that can be shown to exist even though we can’t exactly write it down (a bit like the ratio of the radius of a circle to its diameter).  Because we can’t write such numbers down, mathematicians represent them by letters; and our number is known by the letter e.  But what exactly is e, and why do we choose it?  I’m not going to give you a formal proof, but to give even a cursory explanation I do need to give an introduction to the concept of the normal distribution.  And that’s the next post.

No comments:

Post a Comment