Rare variants versus common variants in complex disease is a political, not a scientific, debate

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

There’s been a recent uptick in interest in the genetic architecture of complex traits (by which I mean the allele frequencies and effect sizes of the relevant loci), some of which has been driven by a much-hyped recent paper from David Goldstein’s group pointing out using simulations that, as one commenter put it, “LD exists”. Though the main point of that paper, that “[associations caused by rare variants] are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies”, is almost certainly wrong (depending on what you mean by “likely” or “contribute to” or “many”), what is true is that there are alleles at low frequency in the population that contribute to disease risk for just about any disease. Another thing that’s true is that there are alleles that are common in the population that contribute slightly to disease risk, as shown recently in schizophrenia.

The way to go about identifying these loci is straightforward, and I’m pretty sure all geneticists would agree on this: with an infinite budget, you would sequence the genomes of every individual with the disease and every individual without the disease, and do a truly genome-wide association study–identify all the polymorphisms that differ in frequency between people who have the disease and the people who don’t.

The problem, and this is where tempers start to flare, is that obviously the budget isn’t infinite. So, there’s the choice between collecting a sample of N individuals and typing them on a SNP chip (which are currently very skewed towards assaying common variation, though the next generation of chips is somewhat reducing that skew), or collecting a sample of N/10 (or probably fewer, but let’s go with an order of magnitude for argument’s sake) and performing full-genome sequencing. Which do you choose? If you think the rarer variation is more “important”, you choose the latter, while if you think the common variation is more “important”, you choose the former. If we define importance as the proportion of variance in a trait explained, this choice is based on your prior beliefs about what the relevant parameters are for the genetic architecture of your trait of interest. Once the price of sequencing drops sufficiently, this question becomes moot, but for the moment there’s a choice, and we find ourselves in this situation: people have heated, vehement arguments about prior beliefs that seem to outsiders like real, heady scientific debate, but are really about getting funding for your preferred study design. In 20 years this debate will be of interest only to historians (and maybe the people that had to suffer through it); there’s no real contentious scientific question [1]

Personally, I lean slightly towards the common variation crowd, though not because I have a particularly strong feeling about it–bigger sample sizes are always better (or at least nice to have), and chips are covering rarer and rarer variation at a more-or-less fixed cost. But will cool things be found in the initial sequencing studies in smaller samples? Of course. It’s also important to note that sequencing studies are not a radical re-thinking of how to do disease genetics; they’re simply a more comprehensive way to do the exact same genome-wide association studies that people are doing now.

[1] There are some interesting scientific questions that could be answered by simply describing the genetic architecture of a trait (or perhaps more interestingly, comparing this across traits), but the volume of debate is probably not due to them.

One Comments

  1. Do these studies typically save some specimens in a freezer? If so, the whole issue disappears, dynamically, as the cost of sequencing goes down. So, collect N, use the chip now and save the specimens for later when the cost of whole-genome sequencing is lower. Right?

Leave a Reply