Monday, February 01, 2010

"Synthetic associations" and sickle cell anemia   posted by p-ter @ 2/01/2010 07:40:00 PM
Share/Bookmark

Last week, I made a silly error in describing a problem in the sickle cell anemia example given by Dickson et al. (2010) as an empirical example of the phenomenon they call "synthetic association". So allow me to take a mulligan, and re-try this:

The authors performed an association study in African-Americans, using ~200 individuals with sickle cell anemia as cases, and >7,000 controls. From their description, they simply performed a logistic regression of disease status on common polymorphisms genome-wide. This turned up a large (~2.5Mb) region surrounding HBB (known to harbour the rare disease-causing mutation) as highly associated with the phenotype. This large region of association stands in contrast, they argue, to the known patterns of linkage disequilibrium in the region, which extends over a few kilobases at most.

This observation, they argue, is an empirical example of how associations due to rare variants can lead to large blocks of associations at common variants. This effect is due to the fact that haplotypes surrounding rare variants are longer and have had little time to be broken up by recombination. Under certain genetic models, this effect of "synthetic associations" is plausible, however, this example is a poor one for making their case.

The reason is that individuals with sickle cell anemia have two chromosomes of African ancestry in the region of HBB, while individuals without sickle cell anemia have approximately the background distribution of European and African chromosomes at the locus--~20% European and ~80% African. To put it another way, let X_d be number of chromosomes of African ancestry of an individual some distance d from HBB (X can be 0, 1, or 2), and Y be the number of chromosomes of African ancestry of an individual at HBB. In the cases, they've conditioned on the fact that Y=2, while in the controls they have not. P(X_d) != P(X_d | Y =2), so much of their association is likely due simply to differences in ancestry between the cases and controls in the HBB region (recall that admixture linkage disequilibrium in African-Americans extends for megabases).

More concretely, any SNP near the HBB locus that happened to be fixed for opposite alleles in Europe and Africa would have a whopping 20% allele frequency difference between cases and controls in their analysis, attributable simply to differences in local ancestry. That's the extreme (and unlikely) situation, but alleles with more modest allele frequency differences between populations will show the same effect.

To some extent, this is their point--the haplotype carrying the causal mutation is long. But the effect in this case is massively exaggerated by admixture, and the presentation of this exaggerated effect is misleading.

Labels: