Thursday, June 28, 2007

Reanalysing gene expression differences between populations   posted by p-ter @ 6/28/2007 05:59:00 PM

Early this year, I commented on a paper showing large differences in gene expression between Europeans and Asians. A letter to the editor in this week's Nature Genetics points out a major flaw in part of their analyses.

Expression arrays are tricky tools-- they don't provide a measure of absolute mRNA levels, but rather an output that corresponds to the binding affinities of the mRNA, the ambient conditions, the way the mRNA was handled, and absolute mRNA levels (and a billion and one other things). Study design is extremely important in isolating the effect of the variable you're really interested in (mRNA levels), and it's very difficult, if not impossible, to really compare the raw data from one array experiment with that from another.

The error the authors made is an unfortunate (and pretty elementary) one-- they did the array experiments on the Europeans population in 2003-2004, and the array experiments on the Asian population in 2005-2006 (they actually erroneously claimed the samples were randomized with regard to year in the paper, which would explain why it got past peer review). This means that any variation between the European and Asian populations is perfectly confounded with variation between those two batches. There's no way to correct for this; any difference in mean expression between the two populations is due to a mixture of the "real" effects and the bias from the batch effect. That's a bitch.

Luckily, the authors also did additional analyses (as they point out in their reply)-- they looked at the correlation of expression levels with genotypes. In the figure, you see the population distributions of expression for a given gene on the left, and the within-genotype levels on the right. There doesn't seem to be much of a differences between the two populations within each genotype class, but the population difference is explained almost entirely by the difference in allele frequency between the two populations.

So was their claim of finding nearly 25% of all genes differentially expressed between the two populations likely wrong? Yes. But their conclusion that allele frequency differences play a role in expression differences between populations stands-- it will just take a better-designed study to quantify the effect.