Thursday, 21 January 2016

The k-factor



So far, we’ve established a way of rating a cricket team (or indeed, any other participants in a two-headed completion).  We use a rating to predict the outcome (and we’ve seen how this is done); we observe the difference between the expected and actual outcomes; and we adjust the ratings according to the result – the more unexpected the outcome, the more we move the rating by.  But in what proportion to the difference between the expected and actual results should we adjust the ratings? This is known as the k-factor.

As we’ve defined things, both expected and actual results vary between 0 and 1; and we’ve also defined a 100 point difference as implying an expected result of 2/3 for the stronger team, and 1/3 for the weaker.  So let’s suppose two teams start a series and their equally fancied, and k is set to 100.  0.5 points each is predicted when the difference is zero, so if the game is drawn, the prediction is met exactly and the ratings of both teams stay the same.  But if team A wins, it registers 1 point, and so the difference from the expected value is 0.5. With a k of 100, team A’s rating will therefore rise by 50 points, and its opponents’ rating will fall by the same.  So there’s now a difference of 100 points.  That means, on the strength of this one game alone, one has switched from predicting a drawn result in a single game, or the same number of wins each in a multi-match series, to a prediction, for example, that a three match series would end in a score of either 2-1 or 1-0.

Now, obviously, given that team A is now ahead in this particular series, it has a great chance of winning it, particularly if this series only consists of three games.  But the outcome of this particular isn’t what the new expected value is predicting.  Instead, it’s predicting the outcome of another three match series, starting from scratch, not including the game that has just been played.  And instinctively, 100 seems too large a value for k.  If two teams are supposed to be equal, it’s no real surprise that one of them might win a game between them.  This might cause one to slightly up one’s belief in that team’s ability relative to the other – but the shift in expected value feels rather too large.  With too large a value of k, ratings will be highly unstable: a team that registers a few wins in succession will see its rating soar, only for that rating to fall just as quickly whenever the team loses a match.  But a small k is also a problem.  Many wins might do little to boost a team’s rating; if a team wins a series 3-0 and this does not substantially alter expectations for the next series between the two sides, something might appear to be wrong. It’s an equivalent problem to the one we had when considering how to build a weighted average: the best estimate of a team’s ability reflects both past and more recent form. The latter is clearly the better pointer, but how do we attach relative weights?

I started with a k of 40.  Which means, after 1 match between two previously equal-rated sides that ends in a win for one of them, each side’s rating would change by 20, and if they played again, the expected value of the stronger side would increase to approximately 0.57.  A second consecutive win for the same side would increase the ratings gap from 40 to 74 and the expectation for the third match to 0.63.  There’s a law of diminishing returns here: the greater the difference we initially believe there to be between two teams, the less a win for the team we already thought was stronger is going to further increase our regard for it.  And ultimately, no matter how awesome a run of wins a team puts together, we still can’t logically expect it to get more than 1 point per game (which is why we used an exponential function to define our expectations).

Can we do better than a k of 40?  In fact, because a lot of test cricket has already been played, it’s possible to explore the use of different values of k by testing them empirically – by asking, which value of k would have produced the ratings at each point in time that cumulatively turned out to have the best predictive power of subsequent results? And so we can conclude that the best value for k, based on past evidence, is not in fact 40 (though that wasn't such a bad guess) but actually 34.  I’ll talk a bit more about this calculation later, but first of all, I need to step away from general Elo theory and go into some cricket-specific details of my implementation.

No comments:

Post a Comment