Correlation

Correlation

The product moment correlation coefficient

The product moment correlation coefficient, r, is a measure of the degree of scatter.

The value of r will lie between -1 and 1. If the correlation is positive and the points lie exactly on a straight line, then both regression lines coincide and r = 1. The following diagrams show the sort of correlation obtained for various values of r...

Where:

This may look like a pretty mean formula, but considering you are mostly given

• n
• ∑ x
• ∑ y
• ∑ x2
• ∑ y2
• ∑ xy

as summarised data, we only need to use these in calculating Sxy, Sxx and Syy.

(If you are not given summarised data you should be able to obtain these from plugging in the raw data into your calculator.)

Example:

We will use the data seen earlier of the test results for the first 2 tests on S1, probability and discrete random variables, for 12 sixth form students. We will then calculate the value of the product moment correlation coefficient, r.

 Student: 1 2 3 4 5 6 7 8 9 10 11 12 Prob (x): 65 88 83 92 50 67 100 100 73 90 83 94 D.R.V (y): 52 57 78 76 30 67 96 74 65 87 78 89

The raw data above has been summarised into the following:

 n = 12 ∑ x = 985 ∑y = 849 ∑x2 = 83465 ∑ y2 = 63693 ∑xy = 72266

Hence the product moment correlation coefficient r = 0.837 indicating a high positive correlation.

In terms of our example, it appears that the better the student did in the first test the better they did in the second.

Spearman's Coefficient of rank correlation

Spearman's Coefficient of rank correlation, rs, is another value that measures the spread of our scatter. Like the product moment correlation coefficient, r, the value of rs lies between -1 and 1 and the sort of correlation obtained for various values of rs is the same as r.

Spearman's Coefficient of rank correlation, rs is an approximation to the product moment correlation coefficient and is calculated by a process of ranking the data in order of size.

The formula used to calculate Spearman's Coefficient of rank correlation, rs is:

n = number of items to be ranked

d = rank difference

The rank difference (d) needs a little more explaining but this is best done by way of an example.

Example:

2 judges independently judge the exhibits of a vegetable show from 8 contestants. Their placings are given in the table...

 Contestant: A B C D E F G H Judge 1 (x): 4 3 1 2 8 7 6 5 Judge 2 (y): 4 1 2 3 8 5 7 6

As the data is already ranked we can look straight away at the rank difference.

Contestant A was ranked 4th and 4th

4 − 4 = 0 hence the rank difference, d = 0

Contestant B was ranked 3rd and 1st

3 − 1 = 2 hence d = 2.

We can add these results along with d2 to our table obtaining...

 Contestant: A B C D E F G H Judge 1 (x): 4 3 1 2 8 7 6 5 Judge 2 (y): 4 1 2 3 8 5 7 6 Difference d: 0 2 1 1 0 2 1 1 d2: 0 4 1 1 0 4 1 1

This gives us the following data:

This indicates a very high positive correlation and we can conclude that the judges appeared to agree very closely on their rankings.

In our example above the data was already ranked for us. If this had not been the case then we need to rank the data ourselves. We will take 9 of our sixth form students' data for their first 2 tests and attempt to rank them.

 Student: 1 2 3 4 5 6 7 8 9 Prob (x): 65 88 83 92 50 67 100 73 90 D.R.V (y): 52 57 78 76 30 67 96 65 87

The order of ranking does not matter, but must be the same for both tests. I will choose to rank highest to lowest.

Test 1 (probability, x): Test 2 (d.r.v., y):
Student 7 is ranked 1 (100) Student 7 is ranked 1 (96)
Student 4 is ranked 2 (92) Student 9 is ranked 2 (87)
Student 5 is ranked 9 (50) Student 5 is ranked 9 (30)

We can add these values to the table and carry out the difference process...

 Student: 1 2 3 4 5 6 7 8 9 Prob (x): 65 88 83 92 50 67 100 73 90 D.R.V (y): 52 57 78 76 30 67 96 65 87 Rank x: 8 4 5 2 9 7 1 6 3 Rank y: 8 7 3 4 9 5 1 6 2 d: 0 3 2 2 0 2 0 0 1 d2: 0 9 4 4 0 4 0 0 1

Again, this indicates a high positive correlation between the students' marks.