WEBVTT
00:00:00.950 --> 00:00:06.970
What is the most likely value of the product-moment correlation coefficient for the data shown in the diagram?
00:00:07.420 --> 00:00:17.980
(A) Zero, (B) negative 0.94, (C) negative 0.58, (D) 0.37, (E) 0.78.
00:00:18.430 --> 00:00:24.450
Looking at our graph, we see that it consists of data where each data point has an 𝑥- as well as a 𝑦-value.
00:00:24.830 --> 00:00:34.860
Data sets consisting of two variables are called bivariate, and such sets can be described quantitatively by what’s called a product-moment correlation coefficient.
00:00:35.160 --> 00:00:38.470
Another name for this is the Pearson correlation coefficient.
00:00:38.720 --> 00:00:47.020
And the whole idea is to use a single number, this coefficient, to describe how well one of the variables in the data set correlates with the other.
00:00:47.350 --> 00:00:52.350
The correlation coefficient can take on values anywhere between negative one and one.
00:00:52.530 --> 00:00:58.410
And actually, in both of these extreme cases, that coefficient value describes perfect correlation.
00:00:58.660 --> 00:01:05.820
A coefficient value of negative one would describe a downward-trending data set that perfectly follows the line of best fit.
00:01:06.070 --> 00:01:09.430
That is, all the points in the data set lie along the same line.
00:01:09.860 --> 00:01:17.390
A correlation coefficient of positive one means the same thing, but for a data set that follows a positively sloping best fit line.
00:01:17.830 --> 00:01:28.980
In between these values, there’s a correlation coefficient of zero suggesting that there is no correlation between the two variables and then all the possible values in between these values named so far.
00:01:29.380 --> 00:01:37.160
Looking at the set of data in our diagram, if we were to draw a best fit line for this set of data, we might draw it in by hand like this.
00:01:37.440 --> 00:01:45.430
Clearly, there is an inverse or a negative correlation between the values of 𝑥 and the values of 𝑦; that is, as 𝑥 gets larger, 𝑦 gets smaller.
00:01:45.840 --> 00:01:51.260
This tells us that the correlation coefficient for this set of data lies somewhere below zero.
00:01:51.570 --> 00:01:55.310
If the line of best fit had a positive slope to it, the opposite would be true.
00:01:55.440 --> 00:02:00.140
But here we see there is indeed a negative or inverse correlation between 𝑥 and 𝑦.
00:02:00.410 --> 00:02:05.620
Looking then at our five answer options, we can see that what we’ve learned so far eliminates several of them.
00:02:05.830 --> 00:02:09.030
Any positive correlation coefficients are out of consideration.
00:02:09.200 --> 00:02:12.570
That means options (D) and (E) can’t be our final choice.
00:02:12.750 --> 00:02:20.450
And we also know that option (A), which suggests that there is no correlation between the 𝑥- and 𝑦-variables in our data set, isn’t a valid answer either.
00:02:20.690 --> 00:02:23.310
This leaves us with answer choices (B) and (C).
00:02:23.560 --> 00:02:29.180
These are both negative values, and we see that one is closer to the extreme value of negative one than the other.
00:02:29.570 --> 00:02:40.120
As we consider this range of Pearson correlation coefficients, the difference between them comes down to how tightly clustered about the best fit line the data points in a data set are.
00:02:40.310 --> 00:02:49.900
For example, if we looked at data sets represented by correlation coefficients of negative 0.9 and negative 0.2, then, respectively, they might look like this.
00:02:50.060 --> 00:03:00.570
The data points in the data set represented by a correlation coefficient of negative 0.9 are much more tightly clustered about the line of best fit compared with those in the other data set.
00:03:00.890 --> 00:03:08.860
As we look at the data shown in our given diagram, we can see that it’s neither extremely far away from the best fit line nor extremely close to it.
00:03:09.000 --> 00:03:15.980
This suggests that the correlation coefficient is not particularly close to negative one, neither is it particularly close to zero.
00:03:16.330 --> 00:03:24.600
For this reason, of our two answer options (B) and (C), we’ll choose the one that is closer to a Pearson correlation coefficient of negative 0.5.
00:03:24.770 --> 00:03:26.100
That’s option (C).
00:03:26.480 --> 00:03:35.220
So, of these answer choices, we would say that negative 0.58 is the most likely value of the product-moment correlation coefficient for the given data set.