I'm currently reading Thinking, Fast and Slow by Daniel Kahneman. (Thanks to Elianna for letting me borrow it.) I'm not finished yet, but 60% of the way through I definitely recommend it.
While reading the "Causes Trump Statistics" chapter (number 16), there is a description of a study about cabs and hit-and-run accidents. It describes a scenario where participants are told that 85% of cabs are Green, 15% are Blue and a given observer has an 80% chance of correctly identifying the color of a given cab. Given this data, the chapter presents a scenario where a bystander identifies a cab in an accident as Blue and Kahneman goes on to explain how we fail to take the data into consideration. I really enjoyed this chapter, but won't wreck the book for you.
Instead, I want to do some math (big surprise, I know). However, I want to make it accessible to non-mathematicians (atypical for my posts).
Given the data, Kahneman tells us that the true probability that the cab was Blue is 41% though we likely bias our thinking towards the 80% probability of the identification being correct. I was on the bus and it kept bothering me, to the point that I couldn't continue reading. Eventually I figured it out (when I got to the train) and I wanted to explain how this is computed using Bayes' Law. As a primer, I wrote a post using layman's terms explaining how we use Bayes' Law. (There is some notation introduced but I hope it isn't too confusing.)
Putting Bayes' Law to Use
We need to understand what 41% even corresponds to before we can compute it. What's actually happened is that we know the event has occurred — the cab has been identified as Blue . What we want is the probability that the cab is Blue given we know it has been identified — we want:
Using Bayes' Law, we can write
and
identified 80% of the time hence
(i.e. the probability of correct ID as Blue given it is actually Blue). We're also told that 15% of the cabs are Blue hence
We can combine these with the second application of Bayes' Law above to show that
The only piece of data missing now to finish our computation is .
Using the extended form of Bayes' Law, since we know that the events and (the cab is Blue or Green) are exclusive and cover all possibilities for the cab, we can say that
Since there is only an 80% chance of correct identification, we know that (the probability of misidentifying a Green cab as Blue). We also know that 85% of the cabs are Green hence we can plug these in (along with numbers already computed) to get
Putting it all together we get our answer
Fantastic! Now we can get back to reading...