The morning after posting my latest blog post, I woke up still thinking about how to explain the concept.
More importantly, I realized that my goal of writing math for humans failed miserably.
So here is a second go at it.
First we're told we're in a world where 85% of cabs are Green and the rest are Blue. Humans love tables (and they are easy to understand). So we start off with a representative sample of 100 cabs:
Category | Green | Blue | Total |
---|---|---|---|
Cabs | 85 | 15 | 100 |
After this, we're told that a bystander correctly identifies a cab 80% of the time, or 4 out of every 5. Applying this to the 85 Green cabs (85 is 17 times 5), this bystander will mis-identify 17 as Blue (1 out of 5) and the other 68 will correctly be identified as Green:
Category | Green | Blue | Total |
---|---|---|---|
Cabs | 85 | 15 | 100 |
ID'd Green | 68 | ||
ID'd Blue | 17 |
Similarly, of the 15 Blue cabs (15 is 3 times 5), this bystander will mis-identify 3 as Green (1 out of 5) and the other 12 will correctly be identified as Blue:
Category | Green | Blue | Total |
---|---|---|---|
Cabs | 85 | 15 | 100 |
ID'd Green | 68 | 3 | |
ID'd Blue | 17 | 12 |
Now Kahneman wants us to use the data at hand to determine what the probability is that a cab is actually Blue given the bystander identified the cab as Blue. To determine this probability, we simply need to consider the final row of the table:
Category | Green | Blue | Total |
---|---|---|---|
ID'd Blue | 17 | 12 | 29 |
This rows tells us that only 29 cabs will be identified as Blue, and among those, 12 will actually be Blue. Hence the probability will be
$\frac{12}{29} \approx 0.413793103$
What this really shows is that even though the bystander has a large chance (80%) of getting the color right, the number of Green cabs is so much larger it overwhelms the correctly identified Blue cabs with incorrectly identified Green ones.
What I Overlooked
- Dense text is always bad
- Using colors and breaking up text makes reading easier (more modular)
- Introducing mathematical notation is almost always overkill
- Tables and samples are a good way to discuss probabilities