Monday, March 4, 2013
What is Statistical Significance?
We often hear the term statistically significant in reference to scientific studies. What does it mean and why does it matter?
In life, in nature, in the world, almost everything varies: the heights of trees outside your window, your temperature and blood pressure, your heating bill, the shoe sizes of all your friends, the rate at which the plants in your garden grow. Even in finely tooled machine parts, precision is a matter of how closely you care to measure.
Variation is everywhere, but we take it for granted. It’s not until things start repeating that we question. If trees are exactly the same height, we suspect that we are looking at a cartoon or drawing, not a true picture of nature. If you meet someone with the same birthday as you, it’s unexpected. If your electric usage is exactly the same month after month, it seems suspicious. I’m sure you can think of many other examples. Variation is natural and expected.
When scientists develop a new treatment, whether it be a drug or anything else, they want to know if it works, that is, if it produces the effect they are looking for. They treat one group and compare the results to a second, untreated group. If the treated group does better, they want to believe that the treatment works, but they know that everything varies. There are subjects (patients, crops, or white mice) in the treated group that improve a lot, some that improve a little and some that may not change. There are subjects in the control group that improve, despite the fact that they were not treated. Comparing the two groups, they may find quite a bit of overlap where a few untreated individuals do better than a few treated ones.
The question to answer is: Has there been enough improvement overall in the treated group to attribute it to the treatment, or could it be the result of natural variation? Has variation of individuals tricked them into seeing a favorable, but false effect? In fact there is no way to know for sure. They can only judge if the probability of such variation is very low, low enough to make them confident that the treatment is working. They use statistics to calculate that probability, and if it’s low enough, they call the result statistically significant. That means they are confident, but never absolutely certain.
This explains how two studies of the same drug, herb, food or fertilizer can produce different results. It explains such things as the recent reversal on the benefits of calcium and vitamin D supplements for post-menopausal women and that cranberry juice "does not appear to have significant benefits" in preventing urinary tract infections. Try as they might to eliminate or account for all the causes of variation, the result is never perfect. Variation is everywhere. Greater confidence requires multiple tests that may confirm or may contradict prior results. Variation makes certainty unachievable.