Monday, August 7, 2017
Linked To Does NOT Mean Causes
Recently I came across this Washington Post headline: “59,000 farmer suicides in India over 30 years may be linked to climate change, study says.” The article explains that a researcher looked back almost 50 years comparing data and climate information and “concluded that temperature may have ‘a strong influence’ on suicide rates during the growing season.” Note the words “may have a strong [but unspecified] influence.” The researcher goes on to project an increase in the number of “lives lost to self-harm” in India.
What are we to make of this? The number of suicides correlates to the temperature. Is Global Warming to blame?
Correlation is a mathematical expression of how closely any two measurements move relative to each other. If the first one gets larger at exactly the same rate as the second, the correlation equals 1, a perfect positive correlation. If the first one decreases at exactly the same rate that the second increases, the correlation equals -1, a perfect negative correlation.
In our imperfect world this rarely happens, so a correlation of 90% seems to indicate that the two measurements are moving very closely together. A correlation close to zero shows no mathematical relationship. The math is not difficult. A laptop can do it easily and show the graph with points either closely tracking each other for a strong correlation or looking like a random scattering of points for a weak (near zero) correlation.
The most important thing to understand about correlation is that correlation is not causation! This is a major point of emphasis in every first year statistics class. Just because two things vary together, they are not necessarily related in any way.
Lots of things get bigger together and are related, like the size of a tree and the amount of lumber available from the tree or the number of pizzas (or amount of beer) needed to feed guests at a party – more people = more pizza. Gas mileage (MPG) may be related to the size of a truck, the power of the engine or the weight of the load. These correlations are easy to explain.
When two measures are correlated, sometimes one influences the other, or perhaps they are both affected by some unseen common factor. Often though, two measures are mathematically related but have no logical link. Some websites specialize in finding odd examples of these supposed relationships.
Here is one I’ve used before telling about a study correlating dog ownership with eating eggrolls, eating cabbage with having an innie bellybutton, and many others. Another site, with lovely graphs, correlates the divorce rate in Maine with consumption of margarine, consumption of cheese with the number of people who died by becoming tangled in their bed sheets and several more entertaining examples. There is the joking relationships between the stock market and skirt lengths or the stock market and the conference of the team that won the last Super Bowl. I wrote last year about the journalist who compared many measurements and found a surprising correlation between eating dark chocolate and weight loss in one data set. He later admitted that it was bad science and should not be taken seriously. If you look hard enough, you can find all kinds of weird examples of data that correlate just by coincidence. That’s what makes it tricky.
The use of correlation is quite common in studies about health and other areas. Studies find a mathematical relationship and use the magic word linked. An action or habit is linked to a healthy or unhealthy outcome. Eating A is linked to longer or shorter life. Researchers find relationships that may be the direct, pizza supply to number-of-guests, kind or it may be the cheese to death-by-bed-sheets kind. But at that point it's only math, not reality.
This is what came to mind when I saw that headline. To be fair, the researcher tries to explain how there can be an other-than-mathematical relationship. “High temperatures in the growing season reduce crop yields, putting economic pressure on India's farmers.” I think any farmer can tell you that weather often increases economic pressure, but how often does that lead to suicide? Don’t we need more than just two sets of numbers to compare on a graph? Is this reality or just math?
Whenever critical thinkers hear the word linked, they know someone has found a mathematical relationship. Researchers hope to discover a real relationship so they can use the word causes, as in action A causes outcome B, but that is much more difficult. Understanding this is very important to keep from panicking over the latest headline, insisting that someone do something or trying to persuade friends and family to change eating habits to avoid catastrophe. You don’t have to be a math or statistical wizard to know that linked is not necessarily the same as causes.