WEBVTT
00:00:00.000 --> 00:00:04.880
Imagine if you had a lie detector that was able to detect truth or lies nine times out of 10.
00:00:05.080 --> 00:00:08.880
You’d nearly always be able to tell if someone was telling the truth or telling a lie!
00:00:09.200 --> 00:00:13.280
How brilliant would that be for solving crimes or finding out who ate the last cookie without telling anyone!
00:00:13.840 --> 00:00:25.680
Let’s put aside the question of whether or not people can learn to beat lie detectors by pinching themselves or learning to imagine relaxing on a beach while answering your questions and assume that the lie detector is gonna be equally accurate with everyone who takes a test.
00:00:25.880 --> 00:00:29.080
That means, all you have to do is wire someone up to the lie detector.
00:00:29.080 --> 00:00:32.480
And if the big red lie light flashes, then you’re 90 percent sure they’re lying.
00:00:32.720 --> 00:00:33.760
Fantastic!
00:00:34.000 --> 00:00:34.840
But, wait a minute!
00:00:34.840 --> 00:00:37.720
We’ve forgotten some basic conditional probability here.
00:00:37.720 --> 00:00:43.680
A test being 90 percent accurate is not the same thing as being 90 percent certain that someone is lying if the machine says they are.
00:00:43.960 --> 00:00:53.160
If the lie detector is 90 percent accurate, then it correctly identifies nine out of 10 lies as lies and incorrectly says that one out of 10 lies are true.
00:00:53.160 --> 00:00:59.840
It’ll also correctly identify that someone is telling the truth nine times out of 10 and incorrectly accuse them of lying one in 10 times.
00:01:00.040 --> 00:01:04.160
This is all well and good, if everyone tells the truth or lies in equal measure.
00:01:04.160 --> 00:01:07.720
But in reality, people tell the truth far more than they tell lies!
00:01:07.960 --> 00:01:12.280
For the sake of argument, let’s say 90 percent of statements are true and only 10 percent are lies.
00:01:12.280 --> 00:01:17.960
Now, let’s imagine an experiment in which we do 100000 tests and see what happens when we sum up the results in a table.
00:01:18.440 --> 00:01:21.800
In 100000 tests, 90 percent of people tell the truth.
00:01:21.800 --> 00:01:25.840
That’s 90000 actual true statements and 10000 actual lies.
00:01:26.120 --> 00:01:31.360
The lie detector says 90 percent of the actually true statements are true, that’s 81000.
00:01:31.360 --> 00:01:37.840
But it wrongly identifies 10 percent of those true statements as lies, that’s 9000 false accusations of lying.
00:01:38.240 --> 00:01:48.840
It correctly identifies 90 percent of the 10000 lies as lies, that’s 9000, and wrongly says 10 percent of the lies are true, 1000 lies that slip through the net and are believed to be true.
00:01:49.080 --> 00:01:54.120
So the lie detector thinks there are 82000 true statements and 18000 lies.
00:01:54.120 --> 00:01:58.280
It’s underestimated the number of true statements and overestimated the number of lies.
00:01:58.520 --> 00:02:04.120
Overall, it correctly identified 90000 of the statements, which makes it 90 percent reliable.
00:02:04.400 --> 00:02:06.320
But, here’s the all-important thing.
00:02:06.320 --> 00:02:13.160
The lie detector accused 18000 statements of being lies, even though only 9000 of them actually were lies.
00:02:13.160 --> 00:02:19.000
If it accuses you of lying, there is only a 9000 out of 18000 probability that you actually are lying.
00:02:19.000 --> 00:02:21.040
That’s a half, 50 percent!
00:02:21.040 --> 00:02:28.120
Even though it correctly labels 90 percent of statements truth or lie, it’s no better than flipping a coin and predicting whether you’re telling a lie.
00:02:28.400 --> 00:02:37.240
The problem is, with so many people actually telling the truth, incorrectly allocating 10 percent of a large number of true statements as lies completely messes up the results.
00:02:37.240 --> 00:02:40.160
We need to be quite subtle about how we use the test.
00:02:40.160 --> 00:02:44.440
If it says you’re lying, then the 90 percent reliability rating is not relevant.
00:02:44.440 --> 00:02:51.520
We need to consider the whole story including the prevalence of lying in the whole population and the number of true statements that the test has mistaken for lies.
00:02:51.760 --> 00:02:57.040
In this case, we found that leads to the conclusion that it’s only 50 percent likely you’re lying if the test says you are.
00:02:57.040 --> 00:03:05.080
Back in the 1700s, the Reverend Thomas Bayes proposed a formula, which we now call Bayes’ theorem, that helps us to work out conditional probabilities.
00:03:05.080 --> 00:03:15.680
It applies in all sorts of situations like lie detector tests, medical screening tests, criminal investigations, email spam filtering, crash wreckage location, and many more.
00:03:16.080 --> 00:03:23.080
The probability of A given B equals the probability of B given A times the probability of A, all over the probability of B.
00:03:23.360 --> 00:03:34.720
It’s well worth learning more about so you can make more informed decisions the next time you want to know who ate the last cookie, or your test results seem to say it’s 90 percent likely you’ve got lurgy disease, or you want to know why your spam filter is making bad decisions about your emails.