# Correlation

## You are here

## Correlation

The **product moment correlation coefficient, r,** is a measure of the degree of scatter.

The value of r will lie between -1 and 1. If the correlation is positive and the points lie exactly on a straight line, then both regression lines coincide and r = 1. The following diagrams show the sort of correlation obtained for various values of r...

**Where:**

This may look like a pretty mean formula, but considering you are mostly given

- n
- ∑ x
- ∑ y
- ∑ x
^{2} - ∑ y
^{2} - ∑ xy

as summarised data, we only need to use these in calculating S_{xy}, S_{xx} and S_{yy}.

(If you are not given summarised data you should be able to obtain these from plugging in the raw data into your calculator.)

**Example:**

We will use the data seen earlier of the test results for the first 2 tests on S1, probability and discrete random variables, for 12 sixth form students. We will then calculate the value of the product moment correlation coefficient, r.

Student: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |

Prob (x): | 65 | 88 | 83 | 92 | 50 | 67 | 100 | 100 | 73 | 90 | 83 | 94 |

D.R.V (y): | 52 | 57 | 78 | 76 | 30 | 67 | 96 | 74 | 65 | 87 | 78 | 89 |

The raw data above has been summarised into the following:

n = 12 | ∑ x = 985 | ∑y = 849 |

∑x^{2} = 83465 |
∑ y^{2} = 63693 |
∑xy = 72266 |

Hence the product moment correlation coefficient **r = 0.837** indicating a **high positive correlation**.

In terms of our example, it appears that the better the student did in the first test the better they did in the second.

**Spearman's Coefficient of rank correlation, r _{s},** is another value that measures the spread of our scatter. Like the product moment correlation coefficient, r, the value of r

_{s}lies between -1 and 1 and the sort of correlation obtained for various values of r

_{s}is the same as r.

Spearman's Coefficient of rank correlation, r_{s} is an approximation to the product moment correlation coefficient and is calculated by a process of ranking the data in order of size.

**The formula used to calculate Spearman's Coefficient of rank correlation, r _{s} is:**

n = number of items to be ranked

d = rank difference

The rank difference (d) needs a little more explaining but this is best done by way of an example.

**Example:**

2 judges independently judge the exhibits of a vegetable show from 8 contestants. Their placings are given in the table...

Contestant: | A | B | C | D | E | F | G | H |

Judge 1 (x): | 4 | 3 | 1 | 2 | 8 | 7 | 6 | 5 |

Judge 2 (y): | 4 | 1 | 2 | 3 | 8 | 5 | 7 | 6 |

As the data is already ranked we can look straight away at the rank difference.

**Contestant A** was ranked 4^{th} and 4^{th}

4 − 4 = 0 hence the rank difference, **d = 0**

**Contestant B** was ranked 3^{rd} and 1st

3 − 1 = 2 hence **d = 2.**

We can add these results along with d^{2} to our table obtaining...

Contestant: | A | B | C | D | E | F | G | H |

Judge 1 (x): | 4 | 3 | 1 | 2 | 8 | 7 | 6 | 5 |

Judge 2 (y): | 4 | 1 | 2 | 3 | 8 | 5 | 7 | 6 |

Difference d: | 0 | 2 | 1 | 1 | 0 | 2 | 1 | 1 |

d^{2}: |
0 | 4 | 1 | 1 | 0 | 4 | 1 | 1 |

This gives us the following data:

This indicates a very **high positive correlation** and we can conclude that the judges appeared to agree very closely on their rankings.

In our example above the data was already ranked for us. If this had not been the case then we need to rank the data ourselves. We will take 9 of our sixth form students' data for their first 2 tests and attempt to rank them.

Student: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

Prob (x): | 65 | 88 | 83 | 92 | 50 | 67 | 100 | 73 | 90 |

D.R.V (y): | 52 | 57 | 78 | 76 | 30 | 67 | 96 | 65 | 87 |

The order of ranking does not matter, but must be the same for both tests. I will choose to rank highest to lowest.

Test 1 (probability, x): | Test 2 (d.r.v., y): |
---|---|

Student 7 is ranked 1 (100) | Student 7 is ranked 1 (96) |

Student 4 is ranked 2 (92) | Student 9 is ranked 2 (87) |

Student 5 is ranked 9 (50) | Student 5 is ranked 9 (30) |

We can add these values to the table and carry out the difference process...

Student: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |

Prob (x): | 65 | 88 | 83 | 92 | 50 | 67 | 100 | 73 | 90 |

D.R.V (y): | 52 | 57 | 78 | 76 | 30 | 67 | 96 | 65 | 87 |

Rank x: | 8 | 4 | 5 | 2 | 9 | 7 | 1 | 6 | 3 |

Rank y: | 8 | 7 | 3 | 4 | 9 | 5 | 1 | 6 | 2 |

d: | 0 | 3 | 2 | 2 | 0 | 2 | 0 | 0 | 1 |

d^{2}: |
0 | 9 | 4 | 4 | 0 | 4 | 0 | 0 | 1 |

Again, this indicates a **high positive correlation** between the students' marks.