Let us consider a square, and let us define a unit of measurement such that the area of the square is 4 square units. The length of each side of the square is thus 2 units. But supposing we define our unit of measurement differently, so that the area is 2 square units. What is the length of the side? We haven’t changed the square, only the unit of measurement; yet the length of the side is hard to pin down. It’s approximately (but not exactly) 1.41, but no matter how much we improve the precision of our estimate, we can’t get the right answer. In fact, since the time of the ancient Greeks, we’ve known that the exact answer can never be expressed as a fraction, or ratio, of two integers – you can find some fairly easy-to-follow proofs of this on Wikipedia. Hence, we call it an irrational number.
Now, consider the family of curves defined by the formula y = Bx, where B is a constant. Can we choose a value for B so that the gradient of the curve is exactly 1 when x = 0? We can, but once again the answer (in this case, approximately 2.71) is irrational. Mathematicians call this number e. Now, e seems to be a pretty esoteric concept; but in fact, it crops up in many places in the real world. We’ll come back to e in a moment, but first I want to change the subject.
Consider rolling 2 die. There are 6 possible outcomes for each dice, and thus 36 possible combinations. Six of these will result in a sum total of 7 (1 and 6, 2 and 5, 3 and 4, 4 and 3, 5 and 2 and 1 and 6). Just one combination of the 36 (1 and 1) will result in a sum total of 2. Random factors will determine the outcome of each roll. But random variations tend to balance each other out over time. Thus, we can predict from theory, that if we were to roll two die on a very large number of occasions, approximately 1/6 of all results will be 7, but only 1/36 will be 2. And one can plot the predicted relative frequency of the different outcomes on a bar chart. I’ve included one for reference below:
Now consider that we do the same thing with four die. The most
common outcome will now be 14, but we get a slightly different shape to our
chart. The new chart (see below) has
more of a bell-shape: the central portion is more highly represented with respect to the
flanks. And, as there are
more outcomes to depict, but we’ve plotted it on a chart of the same size, the picture is a smoother one, although we're still plotting the frequencies of discrete events (the probability that the sum of four die is 14.5 is of course zero).
Now, supposing we keep adding dice. Gradually, as the number of die per roll
increases, the chart showing the relative frequency of outcomes would look increasingly identical to a completely smooth
curve, with a well-defined bell-shape.
And this curve - in effect, the distribution of outcomes we would get if we were to roll an infinite number of die on an infinite number of occasions - is known as the normal distribution. You can see exactly what
one of those looks like here. Typically, the data is “normalised”, so that
the central point of the distribution has a value of 0, and a measure of the
spread (called the standard deviation) is set to 1. The mathematical formula
describing the shape of this distribution is complex, and I’m not going to type
it out (you can find it near the top of the relevant Wikipedia page).
For now I’ll notice just one thing: the formula contains that magic number e we saw earlier.
This might still seem pretty obscure: what does the distribution
of an infinite number of rolls of an infinite number of die have to do with
cricket? In fact, the answer is rather
more than we might expect. And that will
be our next subject.
No comments:
Post a Comment