256
Logo

Gray Watson Personal Thoughts 2004.05.11
Correlation Versus Causality

I came across this great list of poor science and poor science reporting. I've been meaning for some time to write a rant on this topic which is a pet peeve of mine, so here we go.

A number of years ago, I picked up a magazine in a doctor's office and read how researchers had determined that cars painted yellow got in the fewest accidents. They theorized that this was because other drivers could see yellow the best so were able to avoid them. The question that immediately came to me was "what kind of person drives a yellow car?" Cars with sportier colors (red and black) are purchased by younger kids who drive faster and get into more accidents. Yellow cars may appeal to older drivers, who tend to get in fewer accidents.

This is a good example of confusion between correlation and cause. Correlation is defined by dict.org as:

a statistical relation between two or more variables such that systematic changes in the value of one variable are accompanied by systematic changes in the other

In other words, if you are studying two things A and B (car color and accident rate) and you change A then there will be a change in B if they are correlated. There could be a negative correlation in that when A increases, B decreases or it could be a positive correlation where when A increases B also increases. In the car study, the color yellow is negatively correlated with accidents -- the more yellow the car, the less accidents it has.

But correlations do not imply that A caused B. You can't go out and repaint all of the cars in Boston and somehow get fewer accidents. It could be that B caused A or rather that C (some other variable that we don't know about) caused both A and B. It could be that C (the age of the person who buys the car) causes both A and B.

Here are some other examples of bad science and/or bad science reporting.

People who sleep 6 hours a night live longer then eight or more.
So the research says that if you decrease the number of hours your sleep then you will live longer. Number of hours slept (A) is positively correlated with a person's age when they die (B). But maybe people who are more active sleep less and live longer? It could be the person's lifestyle (C) which causes both A and B.
Passive smoking dents children's IQ
The research says that the more secondhand smoke a child gets, the lower their IQ. That secondhand smoke (A) is negatively correlated with IQ (B). But it could be that parents with lower IQs are more likely to be smokers and are more likely to have children with lower IQs.
Breast-fed babies may grow up to be smarter adults
The study says: the longer babies are breast fed (A), the smarter they will become (B). But maybe it is that women who can afford to stay at home to breast feed their children, also have time to read to them and better prepare them for the critical first couple of years of school.
TV, lots of fast food triple obesity risk
Maybe it is not the fast food and the TV (A) that are causing obesity (B) but that obesity causes TV watching and fast-food eating. Maybe obese people are less mobile so rely on TV more for entertainment. Maybe obese people make less money in their job and so are more likely to rely on fast-food which is also some of the cheapest food. It could be B causing A.
Children who diet may actually gain weight in the long run
Maybe a child's body gets "fixed" at a certain fat level at a young age. Children who have to go on diets early may still have a problem reseting their fat level when they are older. It is not the diet which caused the weight gain.
Relation between Parental Restrictions on Movies and Adolescent Use of Tobacco and Alcohol
The more R-rated movies you let your kids watch, the more smoking and drinking they will do. Maybe parents who do not take the time to limit their children's access to R-rated movies, also tend to be lax about limiting their access to cigarettes and alcohol.
Preschoolers more likely to become bullies if they watch lots of TV
I would guess that it has a lot more to do with parental supervision than TV. Kids that are left alone tend to watch TV yes, but they also are not getting proper parental role models on proper behavior.

Now, you will notice that I use the words "maybe", "might", and "could". I am putting forth other hypotheses as to the causal relationships not saying that my answers are right. It is very difficult to "prove" something -- especially with a survey. To find out whether car color really affects accident rates, you would have to find a random set of people (all ages, backgrounds, races, etc.) and talk them into coloring their cars yellow for a year and then compare their accident rates to a control group who did not get a new car color. But to do this study better, the car owners shouldn't even know what the new color is. Maybe a teenager who does the study is less likely to volunteer when driving his young friends around because of the new "dorky" color of his car, therefore lowered the chance that he might get in an accident and changing the test results.

To go further, true scientific method says that you cannot "prove" anything. All you can do is disprove alternative hypothesis -- all you can do is show that alternative answers are incorrect.

One thing to be very wary of are studies which go out to prove something -- whose purpose is to support some hypothesis. Unless they are undertaken under strict controls, they will most likely suffer from the old adage that says that if you look hard enough, you find the data necessary to support any agenda. You can imagine the pressures on a research organization to deliver the proper findings when a corporation is paying them to do so. What company would hire a testing company back if they determined that the company's product really was dangerous? Much like a reporter who has already written the title to an article before s/he looks for the supporting evidence, many studies suffer from the same warped point of view.

Many of the studies that I read about these days seem to be contradictory that I can only guess that the general public distrusts general scientific results and possible scientific thought to a large degree. Many of the problems happen when reporters or scientists try to summarize the data -- when they go for the quick sound bite or title.

I hope that there are some reporters who read this and take it to heart -- help spread the word that improper reporting of science is a serious offense. The public is getting mixed signals at best and improper signals more often that naught. When you report on correlations, explain that they do not imply a cause. Push your scientific sources to give multiple probable causes and push them to elaborate on the alternative hypotheses that were tested and disproved. Reporters are the window to the scientific world for the large majority of our population and it's their responsibility to make sure the window is fully transparent.

For the rest of us normal people, I encourage you to don a critical eye when reading anything coming from the media but especially science reporting. Be sure to send your local reporter or paper comments if they seem to be over simplifying a finding or misrepresenting results.

Free Spam Protection   Eggnog Recipe   Android ORM   Simple Java Magic   JMX using HTTP