Sunday, December 17, 2006

Variability and determinism   posted by p-ter @ 12/17/2006 03:22:00 PM

This whole Malcolm Gladwell hoopla (which may very well continue-- see his comment here for evidence he still doesn't understand what he got wrong) is proof that statistics is a tough thing for even intelligent people to wrap their heads around. I was reminded of this the other day when trying to explain the concept of heritability. Godless, back in the day, wrote an excellent post on the topic, and I'm only going to rehash a couple of the points in that post.

First, let's consider the study Gladwell cites on car dealers (how this will relate to heritability will become apparent later). The sample consists of a number of individuals (about whom we have collected several pieces of information--their race, age, gender, etc.) along with the price they were quoted on a car. These prices are all different; you could describe the distribution of the prices by its mean and the variance around that mean. The goal of the statistical analysis is to find variables that predict where a given individual's price will land in that distribution. If we look at the population at a whole, we might find that prices for a given can range all over the place, from, say, $500 to $5000-- a whole order of magnitude. To find the effect of a single variable in all that data might be difficult without looking at a lot of people.

But now let's consider the prices given to men in their late twenties, who all dress well and have similar occupations. In this sample, maybe the quoted prices range from $1000 to $1500. There is much less variation in this sample. Assume for the sake of argument that your race perfectly predicts what price you get in this sample, i.e. that white people get price X and black people get price Y. Then all the variance in the sample can be attributed to a single factor-- race. You could say that, in this sample, race accounts for 100% of the variance in price.

But can this finding then be attributed to the population as a whole? That is, could you say that, in the general population, race perfectly determines price? No, of course not-- in the general population there are older people, women, people from different educational backgrounds, etc., and all these things contribute noise (i.e. more variance) to the data. So race will obviously account for much less than 100% of the variance (the exact amount would have to be tested).

Now on to heritability. Let's consider a trait like IQ, whose distribution again has an mean and a variance. Traditionally we want to break the variance of this distribution into two components-- the component due to genetics and the component due to "environment" (i.e. not genetics). The (broad-sense) heritability of the trait is then the genetic variance over the total variance, or the percentage of variance attributable to genetics.

The crucial point is this: heritability is a property of a trait and a population, not of a trait alone. Like race perfectly determined price in the population of upper-class males but not in the population as a whole, if we measure IQ in a population that includes extremely poor individuals, extremely rich individuals, victims of child abuse, and drug addicts, all these things are going to inject additional noise into the data, decreasing the heritability. But if we limit ourselves to upper class individuals who all treat their children similarly, the heritability will be much higher.

This has a consequence that most people aren't aware of--the more we reduce difference in environment between people, the more heritable IQ will become. Presumably, if everyone had exactly the same environment (which is not likely, of course), the proportion of variance due to genetics in the population would approach 100%, while the absolute variance itself would decrease. Heritability is not fixed; it's an attribute of a trait and a population.