I think it is probably (or should be) an uncontroversial statement to say that recent genome-wide association studies have revolutionized our understanding of the molecular basis of variation in disease risk in humans. From a handful of polymorphisms reliably associated with a few diseases, there are now hundreds of such associations for a wide spectrum of disease and non-disease traits. That said, these studies have been disappointing to some–even now, the genetic loci identified are generally a poor predictor of whether a person will get a disease or not. This has led to something of a backlash against these sorts of studies. Some of this backlash is fair enough, but some of the arguments presented are problematic. One bizarre argument that seems to be gaining some traction is that, since genome-wide association studies are finding many non-genic regions associated with disease risk, they’re not identifying anything functionally relevant. See, for example, this article in the New York Times, and a recent commentary by McClellan and King. Here are McClellan and King:
A major limitation of genome-wide association studies is the lack of any functional link between the vast majority of risk variants and the disorders they putatively influence…Very few published risk variants lie in coding regions, in UTRs, in promoters, or even in predicted intronic or intergenic regulatory regions. Far fewer have been shown to alter the function of any of these sequences. How did genome-wide association studies come to be populated by risk variants with no known function?
Their answer to this rhetorical question is that common SNPs (used on current genotyping platforms) are generally nonfunctional. The alternative, the evidence for which I’ll present here, is that our ability to predict functional SNPs is poor. In the phrase “no known function”, the emphasis should be on the word “known”.
So how could all these non-genic polymorphisms of unknown function influence disease risk? The obvious answer is that they influence gene regulation–the expression levels and/or timing of expression of relevant genes. Is there evidence that this is the case? Here are three points from the recent literature:
1. I’ll start with a recently published mouse model of cancer [1]. In this paper, the authors generated a mutant mouse which expressed a particular gene at 80% of its normal levels (this is in contrast to many studies of this type, which remove a gene completely). This is a rather subtle alteration of the physiology of a mouse. That said, these slightly modified mice developed a range of cancers at higher rates than controls. So the first point is: relatively slight changes in the expression of a gene can predispose to disease.
2. From the above, you might guess that polymorphisms in humans which lead to subtle changes in gene expression might be likely to also have shown up in genome-wide association studies (even if we don’t known the precise mechanism). This would be a correct guess. In a recent paper [2], a group showed that polymorphisms found to influence gene expression in human lymphoblastoid cell lines were more likely than control polymorphisms to also influence different traits. In a particular example, another group [3] asked whether polymorphisms associated with celiac disease (most of which were non-genic) were also influencing gene expression in blood. Of the 38 associated regions they found, 20 of the influenced gene expression. So the second point is, common polymorphisms with relatively subtle influences on gene expression can and do influence disease risk.
3. The last point is that there’s been one heavily-studied example of a polymorphism influencing disease risk despite being far from any known gene. This is a region on chromosome 8 associated with a number of cancers. In the last year, multiple groups have shown that this region contains a long-range enhancer element, with a common polymorphism in a binding site for a relevant transcription factor (for example, [4]). It’s unclear exactly how this polymorphism influences cancer risk, but the point remains: even loci extremely far from known genes can influence gene regulation.
In sum, the weight of evidence suggests that our lack of functional knowledge about the majority of signals coming from genome-wide association studies can be attributed, not to some issue with how the studies are designed, but rather from a lack of understanding of the relevant biology. This will hopefully soon change.
[1] Alimonti et al. (2010) Subtle variations in Pten dose determine cancer susceptibility. Nature Genetics. doi:10.1038/ng.556
[2] Nicolae et al. (2010) Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genetics. doi:10.1371/journal.pgen.1000888
[3] Dubois et al. (2010) Multiple common variants for celiac disease influencing immune gene expression. Nature Genetics. doi:10.1038/ng.543
[4] Jia et al. (2009) Functional Enhancers at the Gene-Poor 8q24 Cancer-Linked Locus. PLoS Genetics. doi:10.1371/journal.pgen.1000597
Comments are closed.