WEBVTT
00:00:00.760 --> 00:00:08.290
In this video, we will learn how to determine when to choose between taking a sample and using the whole population.
00:00:09.610 --> 00:00:15.340
We will begin by defining what we mean by these terms when dealing with statistics.
00:00:16.560 --> 00:00:21.100
The study of statistics revolves around the study of data sets.
00:00:21.790 --> 00:00:28.790
In this video, we will discuss two important types of data sets, populations and samples.
00:00:30.150 --> 00:00:34.680
A population includes all of the elements from a set of data.
00:00:35.910 --> 00:00:42.230
A sample, on the other hand, consists of one or more observations drawn from the population.
00:00:43.390 --> 00:00:55.340
Whilst we will not focus on them in this video, there are many different ways of obtaining a sample: for example, random sampling, systematic sampling, and stratified sampling.
00:00:56.460 --> 00:01:03.490
In this video, we will only be looking at whether we should choose the whole population or a sample of the population.
00:01:04.510 --> 00:01:08.980
A sample usually has fewer observations than the population.
00:01:09.950 --> 00:01:15.970
We use a sample due to constraints or an inability to study the whole population.
00:01:17.140 --> 00:01:20.300
The most common constraints are time and money.
00:01:20.690 --> 00:01:26.700
However, there are other constraints that could also impact our ability to study the whole population.
00:01:28.400 --> 00:01:32.460
We will now look at some specific questions in context.
00:01:34.160 --> 00:01:41.780
Which of the following data sets would be suitable to check the education level in the poor villages in Africa?
00:01:42.580 --> 00:01:47.180
Is it (A) mass population or (B) samples?
00:01:48.450 --> 00:01:54.240
When deciding which data set to use, we need to factor in any constraints.
00:01:55.390 --> 00:02:00.350
Two of the biggest constraints when collecting data are time and money.
00:02:01.550 --> 00:02:11.610
In this particular question, we need to ask ourselves whether it is possible to check the education level of every child in the poor villages in Africa.
00:02:12.900 --> 00:02:17.290
If this was a sensible method, we could use the mass population.
00:02:18.280 --> 00:02:25.470
However, as it is not realistic to visit every village in Africa, we need to choose samples.
00:02:26.570 --> 00:02:34.270
We could choose a sample of different villages and then a sample of children from each of the villages chosen.
00:02:35.370 --> 00:02:41.500
This would be the most suitable way to check the education level in the villages in Africa.
00:02:43.390 --> 00:02:50.100
Which of the following data sets is suitable to calculate how many hospitals there are in a city?
00:02:50.640 --> 00:02:55.170
Is it (A) mass population or (B) samples?
00:02:56.420 --> 00:03:02.460
When deciding which type of data set to choose, we need to consider any constraints.
00:03:03.400 --> 00:03:10.250
These include time and money, but they also include what we are trying to find out from our question.
00:03:11.640 --> 00:03:16.760
In this question, we need to calculate the number of hospitals in a city.
00:03:17.850 --> 00:03:20.420
This means that we want an exact answer.
00:03:21.830 --> 00:03:30.240
As a result, taking a sample would not be beneficial, as there could be more hospitals in some areas of the city than in others.
00:03:31.530 --> 00:03:38.010
In order to calculate how many hospitals are in a city, we would need to count each individual hospital.
00:03:38.500 --> 00:03:41.550
This means that we need to use the whole population.
00:03:42.910 --> 00:03:45.840
The correct answer is therefore option (A).
00:03:46.010 --> 00:03:50.460
The data set that is most suitable is mass population.
00:03:52.710 --> 00:04:01.890
In the next two questions, we need to identify whether the data collected is a population characteristic or a sample statistic.
00:04:03.390 --> 00:04:07.770
Olivia knows all the families living in her area quite well.
00:04:08.200 --> 00:04:14.280
She says that she has found out that the average number of children per family is 2.3.
00:04:15.180 --> 00:04:20.770
Is this figure a sample statistic or a population characteristic?
00:04:22.200 --> 00:04:27.200
We recall that a population includes all the elements from a data set.
00:04:28.110 --> 00:04:34.410
A sample, on the other hand, consists of one or more observations drawn from the population.
00:04:35.530 --> 00:04:42.480
The keyword in this question is “all” as it states that Olivia knows all the families in her area.
00:04:43.410 --> 00:04:50.720
She has found out the average number of children per family using the entire population of her area.
00:04:52.040 --> 00:04:56.630
The correct answer is therefore a population characteristic.
00:04:59.120 --> 00:05:07.800
A study claims that 96 percent of people aged 16 to 24 in a certain country own a smart phone.
00:05:08.480 --> 00:05:13.380
Is this a sample statistic or a population characteristic?
00:05:14.860 --> 00:05:19.870
We recall that a population includes all the elements from a data set.
00:05:20.910 --> 00:05:26.720
In this question, this would be all the people aged 16 to 24 in a country.
00:05:27.560 --> 00:05:33.620
A sample, on the other hand, consists of one or more observations drawn from the population.
00:05:34.700 --> 00:05:42.410
Due to the constraints of time and money, it would be very difficult to ask every 16- to 24-year-old in a country.
00:05:43.010 --> 00:05:47.090
Typically, this would only happen when conducting a census.
00:05:48.070 --> 00:05:55.650
This means that the 96 percent that the study claims must be based on a sample of the population.
00:05:56.730 --> 00:06:01.050
The correct answer is therefore a sample statistic.
00:06:02.290 --> 00:06:09.750
Any study of this type will not be able to ask the entire population but instead will focus on a sample.
00:06:10.710 --> 00:06:14.980
This sample could have been obtained using a variety of methods.
00:06:15.260 --> 00:06:22.220
Random sampling, systematic sampling, or stratified sampling are examples of this.
00:06:23.270 --> 00:06:29.100
In our final example, we will identify some keywords involved in sampling.
00:06:30.400 --> 00:06:33.850
Which of these makes an inference in statistics?
00:06:34.170 --> 00:06:37.640
Is it (A) computing a statistic from the sample?
00:06:38.170 --> 00:06:42.790
(B) Generating a random sample from a given population.
00:06:43.520 --> 00:06:49.040
(C) Applying conclusions drawn from a sample of a whole population.
00:06:49.630 --> 00:06:56.450
Or (D) working out the percentage of the population that exhibits a certain characteristic.
00:06:57.440 --> 00:07:05.060
Statistical inference is the process of using data analysis to deduce properties of a population.
00:07:06.250 --> 00:07:12.600
This means that we’re looking to make conclusions from a sample that could apply to the whole population.
00:07:13.520 --> 00:07:16.410
The correct answer is therefore option (C).
00:07:16.720 --> 00:07:22.380
An inference applies conclusions drawn from a sample of a whole population.
00:07:23.870 --> 00:07:27.520
We will now summarize the key points from this video.
00:07:28.710 --> 00:07:34.670
We found out in this video that a population contains all elements of a data set.
00:07:35.660 --> 00:07:43.240
As a sample consists of one or more observations from the population, it is a subset of the population.
00:07:43.950 --> 00:07:51.620
This can be shown in the given diagram where the sample is a selection from the larger group or population.
00:07:52.760 --> 00:07:57.630
All elements of the sample must be contained within the population.
00:07:58.750 --> 00:08:06.200
We also found out that we can analyze a sample to infer properties of an entire population.
00:08:07.310 --> 00:08:14.940
This allows us to make further hypotheses or conclusions without asking the entire population.