Bossy Lobster

A blog by Danny Hermes; musing on tech, mathematics, etc.

Edit on GitHub

Conditional Probabilities in "Thinking Fast and Slow"

I'm currently reading Thinking, Fast and SlowAMZN Affiliate Ad by Daniel Kahneman. (Thanks to Elianna for letting me borrow it.) I'm not finished yet, but 60% of the way through I definitely recommend it.

While reading the "Causes Trump Statistics" chapter (number 16), there is a description of a study about cabs and hit-and-run accidents. It describes a scenario where participants are told that 85% of cabs are Green, 15% are Blue and a given observer has an 80% chance of correctly identifying the color of a given cab. Given this data, the chapter presents a scenario where a bystander identifies a cab in an accident as Blue and Kahneman goes on to explain how we fail to take the data into consideration. I really enjoyed this chapter, but won't wreck the book for you.

Instead, I want to do some math (big surprise, I know). However, I want to make it accessible to non-mathematicians (atypical for my posts).

Given the data, Kahneman tells us that the true probability that the cab was Blue is 41% though we likely bias our thinking towards the 80% probability of the identification being correct. I was on the bus and it kept bothering me, to the point that I couldn't continue reading. Eventually I figured it out (when I got to the train) and I wanted to explain how this is computed using Bayes' Law. As a primer, I wrote a post using layman's terms explaining how we use Bayes' Law. (There is some notation introduced but I hope it isn't too confusing.)

Putting Bayes' Law to Use

We need to understand what 41% even corresponds to before we can compute it. What's actually happened is that we know the event IDBIDB has occurred — the cab has been identified (ID)(ID) as Blue (B)(B). What we want is the probability that the cab is Blue given we know it has been identified — we want:

Pr(BIDB).\text{Pr}(B \, | \, IDB).

Using Bayes' Law, we can write

Pr(BIDB)=Pr(B and IDB both occur)Pr(IDB)\text{Pr}(B \, | \, IDB) = \frac{\text{Pr}(B \text{ and } IDB \text{ both occur})}{\text{Pr}(IDB)}


Pr(IDBB)=Pr(B and IDB both occur)Pr(B).\text{Pr}(IDB \, | \, B) = \frac{\text{Pr}(B \text{ and } IDB \text{ both occur})}{\text{Pr}(B)}.

identified 80% of the time hence

Pr(IDBB)=0.8\text{Pr}(IDB \, | \, B) = 0.8

(i.e. the probability of correct ID as Blue given it is actually Blue). We're also told that 15% of the cabs are Blue hence

Pr(B)=0.15.\text{Pr}(B) = 0.15.

We can combine these with the second application of Bayes' Law above to show that

Pr(B and IDB both occur)=Pr(IDBB)Pr(B)=0.12.\text{Pr}(B \text{ and } IDB \text{ both occur}) = \text{Pr}(IDB \, | \, B) \cdot \text{Pr}(B) = 0.12.

The only piece of data missing now to finish our computation is Pr(IDB)\text{Pr}(IDB).

Using the extended form of Bayes' Law, since we know that the events BB and GG (the cab is Blue or Green) are exclusive and cover all possibilities for the cab, we can say that

Pr(IDB)=Pr(IDBB)Pr(B)+Pr(IDBG)Pr(G).\text{Pr}(IDB) = \text{Pr}(IDB \, | \, B) \cdot \text{Pr}(B) + \text{Pr}(IDB \, | \, G) \cdot \text{Pr}(G).

Since there is only an 80% chance of correct identification, we know that Pr(IDBG)=0.2\text{Pr}(IDB \, | \, G) = 0.2 (the probability of misidentifying a Green cab as Blue). We also know that 85% of the cabs are Green hence we can plug these in (along with numbers already computed) to get

Pr(IDB)=0.80.15+0.20.85=0.12+0.17=0.29.\text{Pr}(IDB) = 0.8 \cdot 0.15 + 0.2 \cdot 0.85 = 0.12 + 0.17 = 0.29.

Putting it all together we get our answer

Pr(BIDB)=Pr(B and IDB both occur)Pr(IDB)=\text{Pr}(B \, | \, IDB) = \frac{\text{Pr}(B \text{ and } IDB \text{ both occur})}{\text{Pr}(IDB)} = \frac{0.12}{0.29} \approx 0.413793103.

Fantastic! Now we can get back to reading...