Gene Expression: A note on the Common Disease-Common Variant debate

Front page

Tuesday, February 27, 2007

A note on the Common Disease-Common Variant debate posted by p-ter @ 2/27/2007 05:19:00 PM

One of the more heated debates in human medical genetics in the last decade or so has been centered around the Common Disease-Common Variant (CDCV) hypothesis. As the name implies, the hypothesis posits that genetic susceptibility to common diseases like hypertension and diabetes is largely due to alleles which have moderate frequency in the population. The competing hypothesis, also cleverly named, is the Common Disease-Rare Variant (CDRV) hypothesis, which suggests that multiple rare variants underlie susceptibility to such diseases. As different techniques must be used to find common versus rare alleles, this debate would seem to have major implications for the field. Indeed, the major proponents of the CDCV hypothesis were the movers and shakers beind the HapMap, a resource for the design of large-scale association studies (which are effective at finding common variants, much less so for rare variants).

However, CDCV versus CDRV is an utterly false dichotomy, as I'll explain below. This point has slipped past many of the human geneticists who actually do the work of mapping disease genes, and I feel the problem is this: essentially, geneticists are looking for a gene or the gene, so they naturally want to know whether to take an approach that will be the best for finding common variants or one for finding rare variants. However, common diseases do not follow simple Mendelian patterns-- there are multiple genes that influence these traits, and the frequencies of these alleles has a distribution. A decent null hypothesis, then, is to assume that the the frequencies of alleles underlying a complex phenotype is essentially the same as the overall distribution of allele frequencies in the population-- that is, many rare variants and some common variants.

This argument would seem to favor the CDRV hypothesis. Not so. The key concept for explaining why is one borrowed from epidemiology called the population attributable risk--essentially, the number of cases in a population that can be attributed to a given risk factor. An example: imgaine smoking cigarettes gives you a 5% chance of developing lung cancer, while working in an asbestos factory gives you a 70% chance. You might argue that working in an asbestos factory is a more important risk factor than cigarette smoking, and you would be correct--on an individual level. On a population level, though, you have to take into account the fact that millions more people smoke than work in asbestos factories. If everyone stopped smoking tomorrow, the number of lung cancer cases would drop precipitously. But if all asbestos factory workers quit tomorrow, the effect on the population level of lung cancer would be minimal. So you can see where I'm going with this: common susceptibility alleles contribute disproportinately to the population attributable risk for a disease. In type II diabetes, for example, a single variant with a rather small effect but a moderate frequency accounts for 21% of all cases[cite].

So am I then arguing in favor of the CDCV hypotheis? Of course not-- rare variants, aside from being predictive for disease in some individuals, also give important insight into the biology of the disease. But it is possible right now, using genome-wide SNP arrays and databases like the HapMap, to search the entire genome for common variants that contribute to disease. This is an essential step--finding the alleles that contribute disproportionately to the population-level risk for a disease. Eventually, the cost of sequencing will drop to a point where rare variants can also be assayed on a genome-wide, high-throughput scale, but that's not the case yet. Once it is, expect the CDRV hypothesis to be trumpted as right all along.

Labels: Association, disease, Genetics, Statistics