16.26 Normal distribution

Before you read this, I suggest you read post 16.24.

If you repeat an experiment several times, you will get slightly different results because of random effects beyond your control. For example, if you measure the temperature at which a liquid boils, you may get slightly different results because of changes in atmospheric pressure – if your measurements are sufficiently sensitive.

Suppose you perform an experiment and the mean result is 80.0 with a standard deviation of 0.1 (see post 16.24). According to a mathematical theorem called the central limit theorem, the probability of getting results of 79.5, 79.6, 79.7, 79.8, 79.9, 80.0, 80.1, 80.2, 80.3, 80.4 and 80.5 is given by the graph below.

Graph 1 cropped

If we plot a smooth curve through these results, we get the graph below.

Graph 2 croppedThis curve with this shape is called the normal distribution or the Gaussian distribution. The name “normal distribution” is misleading because many people think that all sorts of numbers are distributed in this way. For example, when I first started to teach in a university, a senior colleague told me that I was not marking exam papers properly because the results were not normally distributed. There is no reason why they should be! According to the central limit theorem, we expect a normal distribution as a result of random fluctuations. Exam marks are not awarded at random!

The box below shows how we can calculate the normal distribution curve – but you don’t need to read it.

Norm dist eqn cropped

When we have a set of results that can be fitted by the bell-shaped curve of the normal distribution, we can calculate some useful probabilities. Suppose that a series of ten repeat measurements yields the results: 80.0, 79.9, 79.8, 80.2, 80.2, 80.1, 79.8, 80.0, 80.1, 80.0. These results have a mean value of 80.0 and a standard deviation of 0.1. Their standard error, S, is defined to be

S = 0.1/√10 = 0.1/3.162 = 0.0316.

The number 10 appears because that is the number of observations; √10 = 3.162 because 3.162 × 3.162 = 10. There is then a probability of 0.95 (95%) that the true measurement lies between

a lower value of 80.0 – (1.96 × 0.0316) = 79.94 and

an upper value of 80.0 + (1.96 × 0.0316) = 80.06.

The number 80.0 appears here because it is the mean value.

These upper and lower values are called the 95% confidence limits. If we want to calculate the 99% confidence limits we replace the number 1.96 with 2.58.

According to Benjamin Disraeli, a former British Prime Minister, “there are three kinds of lies: lies, damned lies and statistics”. Of course a lot of people use statistics to tell lies but statistics can also be very useful for gaining understanding.

 

Related posts

16.24 Accuracy and precision
16.10 Expensive cars and health
16.8 Predictions16.7 Writing numbers

 

Follow-up posts

16.28 Significant differences

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s