WEBVTT
00:00:00.990 --> 00:00:04.830
Find Spearman’s correlation coefficient between 𝑥 and 𝑦.
00:00:05.100 --> 00:00:07.440
Round your answer to three decimal places.
00:00:07.700 --> 00:00:13.000
Looking at this set of data, we see 𝑥-values are in the first row and 𝑦-values are in the second.
00:00:13.260 --> 00:00:17.800
These points then are bivariate, meaning they’re described by two variables.
00:00:18.060 --> 00:00:26.680
Where our question asks us to solve for the Spearman’s correlation coefficient between 𝑥 and 𝑦, we actually don’t directly use these data values to do that.
00:00:26.940 --> 00:00:30.130
Instead, we use what are called the ranks of these values.
00:00:30.320 --> 00:00:37.110
That’s because Spearman’s correlation coefficient describes the level of agreement between the relative ranks of bivariate data.
00:00:37.460 --> 00:00:40.810
To see what this means, let’s create two new rows in our data table.
00:00:41.200 --> 00:00:48.500
The first new row can represent the rank of the 𝑥-values in our data, and the second new row will represent the rank of the 𝑦-values.
00:00:48.820 --> 00:00:56.970
Considering first the 𝑥-values in our table, we can rank these values from smallest to largest using the numbering one, two, three, and so on.
00:00:57.220 --> 00:01:00.430
This means that our smallest 𝑥-value will get a rank of one.
00:01:00.850 --> 00:01:02.800
We see that smallest value is four.
00:01:02.930 --> 00:01:05.740
So that means 𝑅 sub 𝑥 for this value is one.
00:01:06.060 --> 00:01:10.540
The next lowest 𝑥-value is five, which means that this has a rank of two.
00:01:10.960 --> 00:01:13.860
Then comes seven, which must then have a rank of three.
00:01:14.180 --> 00:01:17.290
And next see that we have two 𝑥-values of eight.
00:01:17.700 --> 00:01:21.030
We can say that these are the fourth and fifth lowest 𝑥-values.
00:01:21.230 --> 00:01:31.420
But since they’re the same number, we take these two rankings, four and five, and find the average of them, that’s 4.5, and then assign that the relative ranking of each of these numbers.
00:01:31.730 --> 00:01:34.840
Lastly then, the highest 𝑥-value on our table is 12.
00:01:35.140 --> 00:01:38.800
So this has a ranking of six, the sixth smallest value.
00:01:39.200 --> 00:01:41.080
So that’s what it means to rank our data.
00:01:41.290 --> 00:01:43.690
And now we’ll do the same thing for our 𝑦-values.
00:01:43.880 --> 00:01:48.120
The smallest 𝑦-value in the table is four, so that gets a rank of one.
00:01:48.520 --> 00:01:51.210
And then comes six, which we have three of.
00:01:51.640 --> 00:01:56.930
These values occupy then the second, third, and fourth places among our 𝑦-values.
00:01:57.220 --> 00:02:02.890
And since they’re all the same, we assign in the same ranking of the average of those three numbers, which is three.
00:02:03.210 --> 00:02:05.230
The next lowest 𝑦-value is seven.
00:02:05.510 --> 00:02:07.640
That is the fifth lowest 𝑦-value.
00:02:07.760 --> 00:02:09.650
So 𝑅 sub 𝑦 for this is five.
00:02:10.040 --> 00:02:14.470
And lastly, the highest 𝑦-value is 10, and so the ranking for this is six.
00:02:14.930 --> 00:02:21.040
The results in these two rows of our table are what Spearman’s correlation coefficient is actually going to describe.
00:02:21.250 --> 00:02:27.830
This coefficient gives a quantitative indication of the level of agreement between the relative ranks of these data.
00:02:28.280 --> 00:02:37.190
Essentially, the closer 𝑅 𝑥 and 𝑅 𝑦 are for each point in the data set, the closer this correlation coefficient comes to positive one.
00:02:37.610 --> 00:02:44.690
As our next step then, let’s create a row in our table that indicates the difference between respective 𝑅 𝑥 and 𝑅 𝑦 values.
00:02:44.900 --> 00:02:50.060
We’ll say that this value 𝑑 sub 𝑖 equals 𝑅 𝑥 minus 𝑅 𝑦 for each data point.
00:02:50.480 --> 00:02:53.590
For our first data point then, we have one minus five.
00:02:53.890 --> 00:02:54.910
That’s negative four.
00:02:55.270 --> 00:03:10.340
Then we have three minus three, zero, next 4.5 minus three or 1.5; two minus one or one; 4.5 minus three again, which is 1.5; and finally six minus six, which is zero.
00:03:10.880 --> 00:03:16.770
In order to normalize these results, let’s make a final row in our table where we square these difference values.
00:03:17.040 --> 00:03:19.610
That way, none of these relative differences will be negative.
00:03:19.970 --> 00:03:22.880
Negative four times negative four is positive 16.
00:03:23.160 --> 00:03:24.560
Zero squared is zero.
00:03:24.870 --> 00:03:27.680
1.5 squared is 2.25.
00:03:28.100 --> 00:03:29.470
One squared is one.
00:03:29.740 --> 00:03:32.150
1.5 squared again is 2.25.
00:03:32.320 --> 00:03:33.840
And zero squared is zero.
00:03:34.260 --> 00:03:39.190
At this point, let’s recall the mathematical relationship for Spearman’s correlation coefficient.
00:03:39.510 --> 00:03:42.510
We often represent this coefficient using an 𝑟 sub 𝑠.
00:03:42.860 --> 00:03:54.300
It’s equal to one minus six times the sum of all these 𝑑 sub 𝑖 values squared divided by the number of data points in our set 𝑛 multiplied by that number squared minus one.
00:03:54.720 --> 00:04:04.680
To calculate Spearman’s correlation coefficient for a set of data then, the two things we need to know are the sum of all the 𝑑 sub 𝑖 squared values and also the total number of points in the set.
00:04:05.100 --> 00:04:11.810
Considering the sum of 𝑑 sub 𝑖 squared, we can solve for that by adding together all the results in the last row of our table.
00:04:12.180 --> 00:04:19.300
16 plus zero plus 2.25 plus one plus 2.25 plus zero adds up to 21.5.
00:04:19.780 --> 00:04:26.630
And then, regarding the number of data points in our set, we see that we have one, two, three, four, five, six such points.
00:04:27.040 --> 00:04:29.600
This means that, in our case, 𝑛 equals six.
00:04:29.960 --> 00:04:39.980
And now that we’ve gotten these two bits of information from the data in our set, we can clear away all the rows we created and move ahead to calculate 𝑟 sub 𝑠, Spearman’s correlation coefficient.
00:04:40.410 --> 00:04:44.570
We sub in our values for the sum of 𝑑 sub 𝑖 squared and 𝑛.
00:04:45.140 --> 00:04:50.340
And note that because 𝑛 equals six, one factor of six cancels from numerator and denominator.
00:04:50.660 --> 00:04:53.090
Moreover, six squared is 36.
00:04:53.230 --> 00:04:59.680
So we can express 𝑟 sub 𝑠 as one minus 21.5 over 36 minus one, or 35.
00:05:00.230 --> 00:05:04.870
Calculating this out, we get 0.38571 and so on.
00:05:05.120 --> 00:05:08.870
But note that we want to give our final answer rounded to three decimal places.
00:05:09.170 --> 00:05:12.540
Doing this, we get a result of 0.386.
00:05:12.940 --> 00:05:17.820
To three decimal places, this is the Spearman’s correlation coefficient between 𝑥 and 𝑦.