*Before you read this, I suggest you read post* 16.24.

If you repeat an experiment several times, you will get slightly different results because of random effects beyond your control. For example, if you measure the temperature at which a liquid boils, you may get slightly different results because of changes in atmospheric pressure – if your measurements are sufficiently sensitive.

Suppose you perform an experiment and the mean result is 80.0 with a standard deviation of 0.1 (see post 16.24). According to a mathematical theorem called the *central limit theorem*, the probability of getting results of 79.5, 79.6, 79.7, 79.8, 79.9, 80.0, 80.1, 80.2, 80.3, 80.4 and 80.5 is given by the graph below.

If we plot a smooth curve through these results, we get the graph below.

This curve with this shape is called the *normal distribution* or the *Gaussian distribution*. The name “normal distribution” is misleading because many people think that all sorts of numbers are distributed in this way. For example, when I first started to teach in a university, a senior colleague told me that I was not marking exam papers properly because the results were not normally distributed. There is no reason why they should be! According to the central limit theorem, we expect a normal distribution as a result of random fluctuations. Exam marks are not awarded at random!

The box below shows how we can calculate the normal distribution curve – but you don’t need to read it.

When we have a set of results that can be fitted by the bell-shaped curve of the normal distribution, we can calculate some useful probabilities. Suppose that a series of ten repeat measurements yields the results: 80.0, 79.9, 79.8, 80.2, 80.2, 80.1, 79.8, 80.0, 80.1, 80.0. These results have a mean value of 80.0 and a standard deviation of 0.1. Their standard error, *S*, is defined to be

*S* = 0.1/√10 = 0.1/3.162 = 0.0316.

The number 10 appears because that is the number of observations; √10 = 3.162 because 3.162 × 3.162 = 10. There is then a probability of 0.95 (95%) that the true measurement lies between

a lower value of 80.0 – (1.96 × 0.0316) = 79.94 and

an upper value of 80.0 + (1.96 × 0.0316) = 80.06.

The number 80.0 appears here because it is the mean value.

These upper and lower values are called the 95% confidence limits. If we want to calculate the 99% confidence limits we replace the number 1.96 with 2.58.

According to Benjamin Disraeli, a former British Prime Minister, “there are three kinds of lies: lies, damned lies and statistics”. Of course a lot of people use statistics to tell lies but statistics can also be very useful for gaining understanding.

*Related posts*

16.24 Accuracy and precision

16.10 Expensive cars and health

16.8 Predictions16.7 Writing numbers

*Follow-up posts*