How do non-genic polymorphisms influence disease risk?

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

I think it is probably (or should be) an uncontroversial statement to say that recent genome-wide association studies have revolutionized our understanding of the molecular basis of variation in disease risk in humans. From a handful of polymorphisms reliably associated with a few diseases, there are now hundreds of such associations for a wide spectrum of disease and non-disease traits. That said, these studies have been disappointing to some–even now, the genetic loci identified are generally a poor predictor of whether a person will get a disease or not. This has led to something of a backlash against these sorts of studies. Some of this backlash is fair enough, but some of the arguments presented are problematic. One bizarre argument that seems to be gaining some traction is that, since genome-wide association studies are finding many non-genic regions associated with disease risk, they’re not identifying anything functionally relevant. See, for example, this article in the New York Times, and a recent commentary by McClellan and King. Here are McClellan and King:

A major limitation of genome-wide association studies is the lack of any functional link between the vast majority of risk variants and the disorders they putatively influence…Very few published risk variants lie in coding regions, in UTRs, in promoters, or even in predicted intronic or intergenic regulatory regions. Far fewer have been shown to alter the function of any of these sequences. How did genome-wide association studies come to be populated by risk variants with no known function?

Their answer to this rhetorical question is that common SNPs (used on current genotyping platforms) are generally nonfunctional. The alternative, the evidence for which I’ll present here, is that our ability to predict functional SNPs is poor. In the phrase “no known function”, the emphasis should be on the word “known”.

So how could all these non-genic polymorphisms of unknown function influence disease risk? The obvious answer is that they influence gene regulation–the expression levels and/or timing of expression of relevant genes. Is there evidence that this is the case? Here are three points from the recent literature:

1. I’ll start with a recently published mouse model of cancer [1]. In this paper, the authors generated a mutant mouse which expressed a particular gene at 80% of its normal levels (this is in contrast to many studies of this type, which remove a gene completely). This is a rather subtle alteration of the physiology of a mouse. That said, these slightly modified mice developed a range of cancers at higher rates than controls. So the first point is: relatively slight changes in the expression of a gene can predispose to disease.

2. From the above, you might guess that polymorphisms in humans which lead to subtle changes in gene expression might be likely to also have shown up in genome-wide association studies (even if we don’t known the precise mechanism). This would be a correct guess. In a recent paper [2], a group showed that polymorphisms found to influence gene expression in human lymphoblastoid cell lines were more likely than control polymorphisms to also influence different traits. In a particular example, another group [3] asked whether polymorphisms associated with celiac disease (most of which were non-genic) were also influencing gene expression in blood. Of the 38 associated regions they found, 20 of the influenced gene expression. So the second point is, common polymorphisms with relatively subtle influences on gene expression can and do influence disease risk.

3. The last point is that there’s been one heavily-studied example of a polymorphism influencing disease risk despite being far from any known gene. This is a region on chromosome 8 associated with a number of cancers. In the last year, multiple groups have shown that this region contains a long-range enhancer element, with a common polymorphism in a binding site for a relevant transcription factor (for example, [4]). It’s unclear exactly how this polymorphism influences cancer risk, but the point remains: even loci extremely far from known genes can influence gene regulation.

In sum, the weight of evidence suggests that our lack of functional knowledge about the majority of signals coming from genome-wide association studies can be attributed, not to some issue with how the studies are designed, but rather from a lack of understanding of the relevant biology. This will hopefully soon change.

[1] Alimonti et al. (2010) Subtle variations in Pten dose determine cancer susceptibility. Nature Genetics. doi:10.1038/ng.556

[2] Nicolae et al. (2010) Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genetics. doi:10.1371/journal.pgen.1000888

[3] Dubois et al. (2010) Multiple common variants for celiac disease influencing immune gene expression. Nature Genetics. doi:10.1038/ng.543

[4] Jia et al. (2009) Functional Enhancers at the Gene-Poor 8q24 Cancer-Linked Locus. PLoS Genetics. doi:10.1371/journal.pgen.1000597

12 Comments

  1. Fair enough. How do the proportions look to you? That is, are there enough associations to map a meaningful distribution for the locations of disease-risk-altering polymorphisms across genomic features (“intragenic”, promoter, UTR, exon, intron, etc.) such that you can make a call on how unusual such polymorphisms look relative to all polymorphisms?

  2. Good question. I think the data is there, but I don’t think anyone has done a really careful analysis of this sort. maybe something like this is the closest that’s been published:
    http://www.pnas.org/content/106/23/9362.long

    this question would definitely be worth revisiting.

  3. I couldn’t agree more about the difficulty of predicting functional SNPs. What comes to your point three, I’d be more optimistic and say the two papers below pretty much nailed the colorectal cancer risk function for the SNP on 8q24. They also show the difficulty of studying (and publishing) the functional non-genic SNPs: you have computational prediction, altered in-vitro binding, altered chromatin IP and transgenic mice all pointing to the obvious model but you still need an independent group with 3C data and the exactly same model before you can publish.

    Tuupanen et.al. (2009) The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat Genet. 2009 Aug;41(8):885-90. http://dx.doi.org/10.1038/ng.406

    Pomeranz et.al The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet. 2009 Aug;41(8):882-4.

  4. My take on the paper is that the authors are fully aware of the role of regulatory elements, but are pointing out that these mutations are not occuring in any of the known regulatory regions (in addition to not occuring in the protein coding region of the gene itself).

    Sure there are cryptic enhancers, and probably a lot more could be known about the less obvious regulatory elements, but I can’t help thinking that if we are seeing the majority of these mutations in regions with no obvious biological significance, then the big question is why there and not as frequently in the regions that do have biological significance? Are there lots of cryptic regulators we don’t know about and don’t yet understand? Are these mutations so weak in their effects that if the aren’t selectively neutral, they are at least “nearly neutral” and so can hang around in the genome (in which case, are they really that important as to be worth the effort of finding them?) Or is it as the authors suggest – that because the effects tend to be on the order of an odds ratio of 1.5 or less, what we are seeing could as easily be due to cryptic population stratification causing spurious associations?

  5. What comes to your point three, I’d be more optimistic and say the two papers below pretty much nailed the colorectal cancer risk function for the SNP on 8q24

    good point, i’d forgotten the exact results from those two papers

  6. but I can’t help thinking that if we are seeing the majority of these mutations in regions with no obvious biological significance, then the big question is why there and not as frequently in the regions that do have biological significance?

    so take the example of celiac disease in my point 2. I don’t know how many of the SNPs they identified would have “obvious” biological significance if you were to look at them; my guess is few of them. But that’s beside the point–when they went and assayed gene expression in a relevant tissue, they found that the SNPs influenced it (or half of them did). Now it doesn’t matter if the SNPs have any “obvious” function a priori–they clearly have some function we’re not good at identifying yet from genome sequence alone.

    Or is it as the authors suggest – that because the effects tend to be on the order of an odds ratio of 1.5 or less, what we are seeing could as easily be due to cryptic population stratification causing spurious associations?

    this is the most bogus of the authors’ claims, since it’s readily testable. In fact, people have looked to see whether SNPs associated with disease show more population structure than random SNPs (in the context of wondering if the structure is due to selection, but the point remains). In general, the answer is no, SNP associated with disease look about the same as your average SNP, with a few interesting exceptions. see here:

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440747/
    http://genome.cshlp.org/content/19/5/826

    the authors identified a single SNP (one of the signals from a GWAS that hasn’t replicated) that they think might be due to population structure, but this is clearly a cherry-picked example (and not even a conclusive one at that)

  7. For readers of this post, I posted a few general comments on the McClellan et al paper in http://scienceblogs.com/geneticfuture/2010/04/why_disease_associations_outsi.php.

    With respect to the issue of non-coding SNPs detected in GWAS. The following section is especially important, so I reproduce it here for the convenience of readers:

    “(1) McClellan et al use the fact that most detected SNPs in GWAS are from intergenic regions to question the utility and the reliability of GWAS, and raised a serious question “How did genome-wide association studies come to be populated by risk variants with no known function?”. In fact, GWAS do not attempt to identify functional SNPs, but rather identify approximate location of loci that harbor disease variants. This is possible due to the extensive linkage disequilibrium (LD) between segregating sites in a given human population. Most SNPs in SNP arrays have unknown biological function, only because most SNPs in HapMap are outside of coding regions and because manufacturers of SNP arrays usually do not select SNPs by known function. Unfortunately, this fact may not be well known outside of the GWAS community, such as most readers of the journal Cell. McClellan et al did mention LD but they did not recognize that GWAS do not attempt to interrogate causal variants in the first place. More interestingly, they discussed the SCA GWAS and hearing loss GWAS that I published; the hits in both GWAS are actually outside but close to the causal gene (HBB and GJB2), yet they tag exonic variants in the causal gene, representing two particularly vivid and classic examples on how GWAS work through LD. It is unclear how McClellan et al can discuss these two examples extensively by ignoring the basic facts that both non-coding hits indeed faithfully tag the causal variants in causal genes through the magic of LD. For readers not familiar with GWAS, I need to also emphasize that GWAS variants were typically referred to as “risk variants” only because of convention of published literature, not because they are the actual functional variants that confer risk. Unlike what some readers may think based on McClellan et al, 100% of Africans carry a risk allele does not suggest that all subjects of African descent are predisposed to risk; it merely suggest that LD patterns in European and African populations at a locus are different. One cannot interpret GWAS results without acknowledging these basic facts.”

  8. Re: Kai Wang
    Yeah, that was my first thought. Surely some of these SNPs in non-coding regions are just linked rather than having any functional significance? Good point also that the degree of linkage may vary between different populations.

  9. Oh, wait. Just read the article more thoroughly, and you were suggesting alternatives to the non-functional hypothesis. I still think it’s dangerous trying to find function in every SNP; there are going to be hitch-hikers in there too.

  10. interesting. I gave McClellan and King the benefit of the doubt and assumed were using the standard shorthand when describing an association (ie. I thought when they said “associated SNPs have no known function” they meant “neither the associated SNPs, nor any correlated SNP in the region, have a known function”. )

    On re-reading, I think you might be right that they’re actually actually wondering why all associated SNPs aren’t functional. On the other hand, they’re certainly familiar with linkage studies, which also use random polymorphisms as markers to track the inheritance of functional ones, so they must see the analogy, right?

  11. “neither the associated SNPs, nor any correlated SNP in the region, have a known function”

    It wouldn’t even have to be a correlated SNP. You could find an association with SNP that’s in LD with, say, a deletion. This whole topic seems a bit odd, frankly. If you look at two adjacent SNPs in HapMap, do you tend to find linkage disequilbrium? Yes. So if you could look at your causal variant (whatever it might be) and an adjacent SNP, will you expect to find linkage disequilibrium? Yes. So will that adjacent (quite possible non-functional) SNP show significant association given a powerful enough GWAS? Yes.

    So do we expect GWAS to find associations with non-functional SNPs? Yes.

    What’s the mystery?

  12. I suppose I should have just said, “What Kai Wang said.”

Leave a Reply

a