Wednesday, April 18, 2007

Prediction and postdiction

I share in the horror over the massacre at Virginia Tech. Part of the shock for all of us in academia is that we tend to think of school as the diving board of life: a place where lives begin, not end. I see my own students and colleagues in the images of both the victims as well as the perpetrator. And that both saddens and terrifies me.

However, I find it upsetting that people are criticizing Virginia Tech for ignoring warning signs. I find that response represents a severe and all-too-common error in thinking about direct versus inverse probability.

The basic premise rests in the fact that the following statements are not identical:

1. Given person A wears a trenchcoat, is quiet, and writes plays involving murder, there is a high probability that he will go on a shooting spree.
2. Given person A went on a shooting spree, there is a high probability that he wore a trenchcoat, was quiet, and wrote plays involving murder.

The first statement is one that is a matter of direct probability. You multiply the proportion of people who wear trenchcoats, who are quiet, and who write plays about murder, and you get a number which represents the proportion of the population that are going to go on shooting sprees. However, there is a little problem: a lot of people wear trenchcoats, are quiet, and write plays about murder. Round them up, and you might find a group that includes a significant proportion of the undergraduate population. However, a significant proportion of the undergraduate population will not end up going on a shooting spree. That concerns the error inherent in mistaking direct and inverse probability.

A 17th century minister named Thomas Bayes noted that probability of event A conditional on B is generally different than the probability of B conditional on A. Bayes came up with a solution that links them mathematically by a simple equation called Bayes Theorem. In short, the theorem explains that the relationship between direct and inverse probability is a function of the base rate, or frequency, of the two events. Psychologists have long known that people ignore base rates, resulting in serious errors in reasoning that can be fatal.

For example, there are many cases of heterosexual, non-IV drug users who killed themselves because they received news of a positive HIV test. Their doctors probably informed them that the HIV test has an accuracy of 99.99 percent. However, that does not mean that they can be 99.99% sure that they have HIV. Why? Because of the base rate: tens of thousands of people take the HIV test, and HIV prevalence is extremely low (about 1 in 10,0000) among that particular population (heterosexual non-IV drug users). Every once in a while, given the fact that tens of thousands of tests are run, a negative sample will test positive (the 0.01 percent inaccuracy part). So the actual probability of being HIV positive given a positive HIV test (again, if you are a heterosexual non-IV drug user) is actually about 50% (I calculated this using Bayes Theorem). But most doctors I've interviewed never even heard of Thomas Bayes or his theorem, and make the same mistake their patients do in assuming that direct and inverse probability are the same.

In a similar way, it is easy to say, after the fact, the postdiction that someone should have known, that someone should have done something, that the university was irresponsible for not having seen the warning signs. But the problem with these statements is that they ignore the base rate of the frequency of these warning signs. A lot of college students show warning signs, and many students show warning signs that are considerably worse than those exhibited by Mr. Cho. And unless one were to come up with a better diagnostic tool than trenchcoats or murderous plays - like a crystal ball - I don't believe it is possible to predict who is just an awkward student (of which there are many) and who is going to be a serial killer. I hope we someday can tell them apart, but knowing a bit about human nature - and the way we make errors in our reasoning about probability - makes me doubt it.