« Islam-Peace or Submission? | Gene Expression Front Page | New message board/forum »
March 03, 2003

Correlation vs. Causality

Just wanted to do a quick rant on the difference between correlation and causality. This difference is important in many things, including studies of people. The Age published an article that Obese Men Eat Up Their IQ Points. This article was written in London for an Australian audience, but describes a study performed at the University of Boston. Before we look at this article, let's talk about some basics.

Correlation is the mathematical relationship between two things which are measured. It is given as a value between 0 and 1. A correlation of 0 means the two things are unrelated; given the first value, there is no way to predict the second. A correlation of 1 means the two things are completely related, the first thing always predicts the second. As an example, let's say you measure the heights and weights of a group of people. These have a high correlation, somewhere around .8; height is a good predictor of weight, and vice-versa. Now say you took the same group and measured eye color. There is a low correlation between eye color and height, pretty close to 0. They are basically independent, knowing one doesn't tell you anything about the other.

Causality is also a relationship between two things, but it is not mathematical, it is physical (or philosophical). Something causes something else if there is a chain of events between the first thing and the second thing, each of which causes the next thing in the chain to happen.  Causality implies timing; the first thing happens, and then later the second thing happens as a result.  We say the first thing is the cause, and the second thing is the effect. Note that unlike correlation, the relationship is unsymmetrical.

People get these two relationships mixed up, and it causes incorrect conclusions to be drawn. For example, we noted a high correlation between height and weight. Does this mean height causes weight? Of course not. On the other hand, there is a high correlation between smoking and getting lung cancer. Does this mean smoking causes lung cancer? NO! It turns out smoking does cause lung cancer, but you couldn't draw that conclusion purely from the fact that they have a high correlation.

Now let's take a look at this article. Consider the final paragraph:

"When given cognitive function tests involving logic, verbal fluency and recall, obese men achieved scores as much as 23 per cent below those of non-obese men, even after taking into account factors such as educational level, occupation and blood pressure."

So, there is a high correlation between cognitive function and obesity. Does this allow us to conclude obesity causes low cognitive function? NO! How about in the other direction, can we conclude that low cognitive function causes obesity? NO again! Either could be true, but neither follows logically. The headline is clearly false.

There is another problem in the article. It uses words like decline and reduce, which imply a change in the measured cognitive function in the same individuals. But the study doesn't appear to have measured the same people over time, it appears to have measured different people at the same time. This is a really serious logic error. In fact, most studies have found that people's IQs don't really change from about age 5 onward. Given that, you could actually conclude a reverse causality, perhaps the headline should have read "Less Intelligent Men at Risk of Obesity".

Hello to all GNXP visitors - I'm Ole, and I've been graciously invited to make occasional posts.  I'm interested in the two most important human characteristics: gender and intelligence.  (I stay away from race...)  I'm writing a book called Unnatural Selection, you may find out more about it (and me) here.

Posted by ole at 10:16 PM

Thomas Sowell used to make a similar point when examining the effects of discrimination. Blacks have lower education *and* for given levels of education tend to have lower grades and take easier majors. Thus when you 'control for' education, what appears as discrimination is just a misspecified model. The omitted variable (say 'g', or bad family work ethic) could explain the lower group averages. You get the same thing for 'experience', which tends to minimize the true capital workers of different groups bring to the job (eg, measuring 'years in labor force' as experience neglects significant differences in the nature of such experience). Beware of omitted variables that are correlated with your explanatory variables.

Thus, perhaps the headline could be "Lazy Men Less Intelligent, Fat."

Posted by: eric at March 4, 2003 10:04 AM

Hey - Ole, your post regarding correlation and causation was on the mark. In case you didn't know this, there's already a book out by Lois Wingerson called: Unnatural Selection, The Promise and the Power of Human Gene Research.

Posted by: the alpha male at March 4, 2003 11:59 AM

also, Dysgenics by richard lynn.

Posted by: razib at March 4, 2003 12:11 PM

Where's your butt photo?

Posted by: duende at March 5, 2003 06:41 AM