WEBVTT
00:00:01.200 --> 00:00:10.600
The following table shows the number of units of a certain đť‘Ą and the production cost per unit đť‘¦ as produced in seven different factories in Egyptian pounds.
00:00:11.280 --> 00:00:16.840
Calculate the value of Spearmanâ€™s rank correlation coefficient between đť‘Ą and đť‘¦.
00:00:16.840 --> 00:00:18.040
Determine the type of correlation.
00:00:19.800 --> 00:00:27.080
Spearmanâ€™s rank correlation coefficient is a way of quantifying the degree of correlation between the ranks of two variables.
00:00:27.080 --> 00:00:33.320
It measures the tendency for one variable to increase as the other does, but not necessarily in a linear way.
00:00:34.720 --> 00:00:46.000
The formula for calculating Spearmanâ€™s rank correlation coefficient is this, one minus six multiplied by the sum of đť‘‘đť‘– squared over đť‘› multiplied by đť‘› squared minus one.
00:00:47.640 --> 00:00:50.720
Here, đť‘› represents the number of pairs of data.
00:00:50.720 --> 00:00:55.560
So in this question, đť‘› will be seven, as there are seven pairs of data given in the table.
00:00:56.960 --> 00:01:03.640
đť‘‘đť‘– means the difference in the ranks of the đť‘–th pair of data, that is, the pair of data đť‘Ąđť‘–, đť‘¦đť‘–.
00:01:04.000 --> 00:01:06.600
For example, the first pair of data is đť‘Ą one đť‘¦ one.
00:01:06.600 --> 00:01:09.120
And the second is đť‘Ą two đť‘¦ two and so on.
00:01:10.640 --> 00:01:16.280
Before we can apply the Spearmanâ€™s rank correlation coefficient formula, we must first rank the data ourselves.
00:01:16.680 --> 00:01:25.520
It doesnâ€™t matter whether we choose the rank one to be awarded to the smallest or the largest data value as long as weâ€™re consistent about what we do for the two variables.
00:01:27.040 --> 00:01:33.440
We can add two more rows to our table to fill in the đť‘Ą rank and the đť‘¦ rank.
00:01:33.440 --> 00:01:35.760
And letâ€™s choose to assign rank one to the smallest piece of data in each case.
00:01:37.520 --> 00:01:47.880
For đť‘Ą then, the smallest piece of data is 600, so this gets rank one; then 700, which gets rank two; then 1400, which gets rank three.
00:01:49.160 --> 00:01:51.720
There are then two equal pieces of data.
00:01:51.720 --> 00:01:56.760
The second and seventh values in the đť‘Ą row are both 1500.
00:01:56.760 --> 00:01:59.720
Now, we have to decide how to assign the ranks in this instance.
00:02:00.400 --> 00:02:06.400
In an ordered list of the data values, these would take up the fourth and fifth places in the list.
00:02:07.640 --> 00:02:12.160
Therefore, we choose to assign both pieces of data the same rank of 4.5.
00:02:12.160 --> 00:02:14.240
Thatâ€™s the average of four and five.
00:02:15.880 --> 00:02:16.880
We then continue.
00:02:16.880 --> 00:02:21.640
The next piece of data is 2000 which would be the sixth value in our ordered list.
00:02:21.640 --> 00:02:23.080
So it gets rank six.
00:02:23.080 --> 00:02:27.400
And finally, 2500 is the greatest value so it gets the rank seven.
00:02:28.600 --> 00:02:31.760
We then assign the ranks for the đť‘¦ variable in the same way.
00:02:32.040 --> 00:02:37.600
But we notice straight away that there are two pieces of data which are both equal to 20, the smallest value.
00:02:39.040 --> 00:02:43.160
These would be the first and second values in an ordered list of the đť‘¦ data.
00:02:43.360 --> 00:02:45.960
So we assign them both the rank of 1.5.
00:02:45.960 --> 00:02:47.760
Thatâ€™s the average of one and two.
00:02:49.200 --> 00:02:55.320
The next smallest piece of data is 23 which gets the rank three because it would be the third value in an ordered list.
00:02:56.560 --> 00:03:04.280
And then we notice that there are two values both equal to 24 which would be the fourth and fifth values in an ordered list of the đť‘¦ variable.
00:03:05.360 --> 00:03:07.800
So they both get rank 4.5.
00:03:07.800 --> 00:03:09.480
Thatâ€™s the average of four and five.
00:03:11.040 --> 00:03:13.240
25 then gets rank six.
00:03:13.240 --> 00:03:15.600
And finally, 30 gets rank seven.
00:03:17.040 --> 00:03:26.000
This method of dealing with the tied ranks by awarding an average rank to both pieces of data is appropriate in this case as there are only a small number of tied ranks.
00:03:26.000 --> 00:03:29.200
There are other methods that we could consider such as Kendallâ€™s tab.
00:03:29.520 --> 00:03:32.240
But the method weâ€™ve used is fine in this instance.
00:03:33.560 --> 00:03:37.400
Next, we need to work out the difference in the ranks awarded to each pair of data.
00:03:37.800 --> 00:03:44.920
It doesnâ€™t matter whether we subtract the rank awarded to đť‘Ą from the rank awarded đť‘¦ or vice versa as long as weâ€™re consistent about what we do for every pair.
00:03:44.920 --> 00:03:49.320
Letâ€™s choose to subtract the ranks of đť‘¦ from the ranks of đť‘Ą.
00:03:50.720 --> 00:04:00.040
First, we have one minus seven which is equal to negative six, then 4.5 minus 4.5 which gives zero.
00:04:00.400 --> 00:04:09.480
We now work out all the other differences in the same way, giving negative 1.5, negative four, 4.5, 5.5, and 1.5.
00:04:10.880 --> 00:04:18.280
At this point we can perform a quick check about work so far because it should always be the case that the sum of these differences is equal to zero.
00:04:18.880 --> 00:04:27.640
If we add up negative six, zero, negative 1.5, negative four, 4.5, 5.5, and 1.5, we do indeed get zero.
00:04:28.000 --> 00:04:31.440
So this helps us be confident that the work weâ€™ve done so far is correct.
00:04:33.080 --> 00:04:36.040
Finally, we need to work out the squares of these differences.
00:04:36.280 --> 00:04:42.600
And this is why it doesnâ€™t actually matter which way around we subtract the ranks because weâ€™re going to end up squaring the differences anyway.
00:04:43.840 --> 00:04:46.200
Negative six squared gives 36.
00:04:47.360 --> 00:04:48.920
Zero squared gives zero.
00:04:49.440 --> 00:04:59.680
We square all the remaining differences in the same way, giving 2.25, 16, 20.25, 30.25, and 2.25.
00:05:01.160 --> 00:05:05.040
Now, weâ€™re nearly ready to apply our Spearmanâ€™s rank correlation coefficient formula.
00:05:05.400 --> 00:05:08.560
But first, we need to work out the sum of the squared differences.
00:05:09.040 --> 00:05:12.480
Thatâ€™s the sum of the seven values in the final row of our table.
00:05:13.960 --> 00:05:16.440
Adding all these values up gives 107.
00:05:18.080 --> 00:05:25.720
Now substituting the relevant values into our formula for Spearmanâ€™s rank correlation coefficient then, the sum of đť‘‘đť‘– squared is 107.
00:05:26.080 --> 00:05:27.640
And the value of đť‘› is seven.
00:05:27.640 --> 00:05:34.160
So we have one minus six multiplied by 107 over seven multiplied by seven squared minus one.
00:05:35.560 --> 00:05:37.120
Seven squared is 49.
00:05:37.120 --> 00:05:39.040
And subtracting one gives 48.
00:05:40.120 --> 00:05:44.360
In the numerator, six multiplied by 107 is 642.
00:05:45.520 --> 00:05:49.840
And in the denominator, seven multiplied by 48 is 336.
00:05:49.840 --> 00:05:54.720
So we have one minus 642 over 336.
00:05:55.800 --> 00:06:04.200
Evaluating this on a calculator and converting our answer to a decimal gives negative 0.9107 and then the decimal continues.
00:06:05.400 --> 00:06:08.680
We havenâ€™t been asked to give our answer to a particular degree of accuracy.
00:06:08.680 --> 00:06:10.880
So letâ€™s use three significant figures.
00:06:11.360 --> 00:06:14.280
In this case, the fourth significant figure is the seven.
00:06:14.680 --> 00:06:17.760
And as this is greater than five, it tells us that weâ€™re rounding up.
00:06:18.200 --> 00:06:22.200
So the zero in the third decimal place will round up to become a one.
00:06:23.960 --> 00:06:30.240
Weâ€™ve calculated the value of đť‘ź then to be negative 0.911 correct to three significant figures.
00:06:32.120 --> 00:06:44.040
Now, the question also asked to determine the type of correlation which means we need to interpret what this value of đť‘ź tells us about đť‘Ą and đť‘¦.
00:06:44.040 --> 00:06:48.160
To do so, we recall that this correlation coefficient always takes a value between negative one and one inclusive.
00:06:48.760 --> 00:06:58.080
A value of positive one means that there is perfect positive rank agreement between đť‘Ą and đť‘¦ which means that the smallest value of đť‘Ą is paired with the smallest value of đť‘¦.
00:06:58.480 --> 00:07:07.320
The second smallest value of đť‘Ą is paired with the second smallest value of đť‘¦ and so on, all the way up to the largest value of đť‘Ą being paired with the largest value of đť‘¦.
00:07:08.920 --> 00:07:15.320
A value of negative one means there is perfect negative rank correlation between đť‘Ą and đť‘¦ which means the opposite.
00:07:15.560 --> 00:07:19.960
The smallest value of đť‘Ą is paired with the largest value of đť‘¦ and vice versa.
00:07:21.360 --> 00:07:31.480
In this case, our value of negative 0.911 is pretty close to negative one which means that there is strong negative rank correlation between đť‘Ą and đť‘¦.
00:07:32.880 --> 00:07:35.640
This makes sense if we consider the context of this problem.
00:07:36.160 --> 00:07:39.520
The larger number of units you produce, the more efficient this will be.
00:07:39.520 --> 00:07:42.080
And so the production cost per unit will be lower.
00:07:42.760 --> 00:07:44.240
We have our answer to the problem then.
00:07:44.720 --> 00:07:51.080
The value of the Spearmanâ€™s rank correlation coefficient to three significant figures is negative 0.911.
00:07:51.560 --> 00:07:56.240
And we conclude that there is a strong negative rank correlation between đť‘Ą and đť‘¦.