I'm currently reading Thinking, Fast and Slow by Daniel Kahneman. (Thanks to Elianna for letting me borrow it.) I'm not finished yet, but 60% of the way through I definitely recommend it.

While reading the "Causes Trump Statistics" chapter (number 16), there is a description of a study about cabs and hit-and-run accidents. It describes a scenario where participants are told that 85% of cabs are Green, 15% are Blue and a given observer has an 80% chance of correctly identifying the color of a given cab. Given this data, the chapter presents a scenario where a bystander identifies a cab in an accident as Blue and Kahneman goes on to explain how we fail to take the data into consideration. I really enjoyed this chapter, but won't wreck the book for you.

Instead, I want to do some math (big surprise, I know). However, I want to make it accessible to non-mathematicians (atypical for my posts).

Given the data, Kahneman tells us that the true probability that the cab was Blue is 41% though we likely bias our thinking towards the 80% probability of the identification being correct. I was on the bus and it kept bothering me, to the point that I couldn't continue reading. Eventually I figured it out (when I got to the train) and I wanted to explain how this is computed using Bayes' Law. As a primer, I wrote a post using layman's terms explaining how we use Bayes' Law. (There is some notation introduced but I hope it isn't too confusing.)

## Putting Bayes' Law to Use

We need to understand what 41% even corresponds to before we can compute
it. What's actually happened is that we know the event
$IDB$ has occurred — the cab has been identified
$(ID)$ as Blue $(B)$.
What we want is the probability that the cab **is Blue** given we know
it has been identified — we want:

$\text{Pr}(B \, | \, IDB).$

Using Bayes' Law, we can write

$\text{Pr}(B \, | \, IDB) = \frac{\text{Pr}(B \text{ and } IDB \text{ both occur})}{\text{Pr}(IDB)}$

and

$\text{Pr}(IDB \, | \, B) = \frac{\text{Pr}(B \text{ and } IDB \text{ both occur})}{\text{Pr}(B)}.$

identified 80% of the time hence

$\text{Pr}(IDB \, | \, B) = 0.8$

(i.e. the probability of correct ID as Blue given it is actually Blue). We're also told that 15% of the cabs are Blue hence

$\text{Pr}(B) = 0.15.$

We can combine these with the second application of Bayes' Law above to show that

$\text{Pr}(B \text{ and } IDB \text{ both occur}) = \text{Pr}(IDB \, | \, B) \cdot \text{Pr}(B) = 0.12.$

The only piece of data missing now to finish our computation is $\text{Pr}(IDB)$.

Using the extended form of Bayes' Law, since we know that the events $B$ and $G$ (the cab is Blue or Green) are exclusive and cover all possibilities for the cab, we can say that

$\text{Pr}(IDB) = \text{Pr}(IDB \, | \, B) \cdot \text{Pr}(B) + \text{Pr}(IDB \, | \, G) \cdot \text{Pr}(G).$

Since there is only an 80% chance of correct identification, we know that $\text{Pr}(IDB \, | \, G) = 0.2$ (the probability of misidentifying a Green cab as Blue). We also know that 85% of the cabs are Green hence we can plug these in (along with numbers already computed) to get

$\text{Pr}(IDB) = 0.8 \cdot 0.15 + 0.2 \cdot 0.85 = 0.12 + 0.17 = 0.29.$

Putting it all together we get our answer

$\text{Pr}(B \, | \, IDB) = \frac{\text{Pr}(B \text{ and } IDB \text{ both occur})}{\text{Pr}(IDB)} = \frac{0.12}{0.29} \approx 0.413793103.$

Fantastic! Now we can get back to reading...