The morning after posting my latest blog post, I woke up still thinking about how to explain the concept.
More importantly, I realized that my goal of writing math for humans failed miserably.
So here is a second go at it.
First we're told we're in a world where 85% of cabs are Green and the rest are Blue. Humans love tables (and they are easy to understand). So we start off with a representative sample of 100 cabs:
Category | Green | Blue | Total |
---|---|---|---|
Cabs | 85 | 15 | 100 |
After this, we're told that a bystander correctly identifies a cab 80% of the time, or 4 out of every 5. Applying this to the 85 Green cabs (85 is 17 times 5), this bystander will mis-identify 17 as Blue (1 out of 5) and the other 68 will correctly be identified as Green:
Category | Green | Blue | Total |
---|---|---|---|
Cabs | 85 | 15 | 100 |
ID'd Green | 68 | ||
ID'd Blue | 17 |
Similarly, of the 15 Blue cabs (15 is 3 times 5), this bystander will mis-identify 3 as Green (1 out of 5) and the other 12 will correctly be identified as Blue:
Category | Green | Blue | Total |
---|---|---|---|
Cabs | 85 | 15 | 100 |
ID'd Green | 68 | 3 | |
ID'd Blue | 17 | 12 |
Now Kahneman wants us to use the data at hand to determine what the probability is that a cab is actually Blue given the bystander identified the cab as Blue. To determine this probability, we simply need to consider the final row of the table:
Category | Green | Blue | Total |
---|---|---|---|
ID'd Blue | 17 | 12 | 29 |
This rows tells us that only 29 cabs will be identified as Blue, and among those, 12 will actually be Blue. Hence the probability will be
What this really shows is that even though the bystander has a large chance (80%) of getting the color right, the number of Green cabs is so much larger it overwhelms the correctly identified Blue cabs with incorrectly identified Green ones.
What I Overlooked
- Dense text is always bad
- Using colors and breaking up text makes reading easier (more modular)
- Introducing mathematical notation is almost always overkill
- Tables and samples are a good way to discuss probabilities