Genome-wide association studies work

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

A few months ago, I mentioned an article in Cell arguing that many results of genome-wide association studies are false positives. This is obviously wrong, and this week, a pair of letters to the editor (including one by Kai Wang summarizing arguments he made in various comment threads here and elsewhere) take the authors to task. The response from McClellan and King is laden with non sequiturs and simple factual errors, but they conclude:

We understand that many believe that most GWAS findings are valid….Currently, GWAS results fail to explain the vast majority of genetic influence on any human illness. Further, most risk variants implicated by GWAS have no demonstrated biological, functional, or clinical relevance for disease.

I’ve bolded the last sentence there, because it’s somewhat ironic that this was published the day after an epic genome-wide association study titled “Biological, clinical and population relevance of 95 loci for blood lipids“. This paper is worth a read not just if you’re interested in lipids, but because of the massive effort put into functional characterization of the identified loci. For a few genes, the authors are able to show that altering the expression level of the gene in mice leads to the exact phenotype they’d expect, highlighting a number of potential therapeutic targets. At one gene, a companion paper describes a heroic series of experiments to identify the precise mechanism by which a non-coding SNP exerts its effect.

I allow myself to hope that these sorts of experiments will lead to a push back against the bizarre notion that associations between non-coding polymorphisms and disease are somehow suspect.

7 Comments

  1. “non sequiturs and simple factual errors

    so how does this happen? they have to pass their letters to some other people for second opinions, right? or are some people just that insulated?

    it’s like turning medical genetics into macroeconomics or something.

  2. The paper by McClellan and King makes a clear distinction in what our expectations should be for the spectrum of variants associated with risk for diseases which strike early in life and reduce reproductive fitness versus those that strike after reproductive age. In the first case, we should not expect large numbers of common alleles to be contributing to risk, due to the power of purifying selection. (Speculation about widespread balancing selection is just that – speculation). In the second case, however, there is every expectation that common alleles may affect risk. This is indeed borne out by the study on lipid levels by Teslovich et al. which suggests that common variants may contribute 25-30% of the genetic variance in this trait (but a much lower percentage of the variance in risk of associated diseases).

    So, this is exactly the type of trait one expects to be affected, in part, by aggregate effects of common variants in many loci. This is precisely the opposite of the expectation for serious diseases that reduce evolutionary fitness due to high mortality or reduced fecundity. For schizophrenia, for example, there are no good reasons to expect that it is a polygenic trait and very good reasons (and now a wealth of empirical evidence) that it is an umbrella term for a genetically heterogeneous disorder, caused by single rare alleles in any of a very large number of different loci. It is this type of disease that McClellan and King are most concerned with and one where GWAS are unlikely to be very successful. (The GWAS for autism have certainly not revealed much – certainly hardly anything compared with the rich haul of rare, single mutations now identified through other means).

    Finally, in reading the back-and-forth correspondence in Cell, it is clear that the various authors dispute some factual points and not just interpretations. Indeed, it is hard to know whom to believe on some of the points. If p-ter also thinks there are “non sequiturs and simple factual errors” in the reply by McClellan and King, it would be nice to know what these are.

  3. you know, it’s possible to argue about the relative importance of rare variants versus common variants in different diseases. fair enough. McClellan and King, however, are arguing that many (most?) associations between common polymorphisms and diseases are false positives. this is a very different point, and one that is (luckily) wrong.

    a few examples of mistakes:

    Many SNPs, inversions/deletions (indels), and short tandem repeats vary widely in allele frequency among populations. This is especially true for variants that are not in coding or regulatory regions because these alleles vary with population clusters in patterns more consistent with neutral drift and migration rather than with selection (Coop et al., 2009). The colonization of the world by modern humans was carried out by a series of founder populations with subsequent rapid expansion of population size. Neutral alleles emerging at the forefront of these expansions “surfed” waves of population growth. Variations in allele frequencies across populations stem from differences in the timing of the variant’s emergence in the expansion.

    I have no idea what to make of this paragraph. First, it’s not clear what this has to do with GWAS. Yes, sometimes allele frequencies are different between Africa and Europe, but this should not affect a GWAS in Europeans, for example. Second, Coop et al. (2009) does not claim that most large allele frequency differences between populations are neutral (though again, why does it matter for GWAS whether they’re neutral or not?). Third, most genetic variation in humans was present in Africa before humans expanded out, so the variation in allele frequencies has absolutely nothing to do with the “timing” of the emergence of the allele.

    We suggest that associations based on such highly variable SNPs are often artifacts of cryptic population stratification. Wang et al. argue that standard GWAS strategies have been adopted to control for population stratification. However, these methods control by person, not by SNP. Because populations from large geographic areas (e.g., Europe) are genetically heterogeneous, outlier SNPs that vary widely among subgroups of such populations are not excluded by these methods and often drive positive associations.

    My emphasis. Where is the evidence of this? How can you make an off-hand comment like, “Oh, by the way, i’ve found the fundamental flow in modern genetics”, when no one agrees with you and you have no evidence? For the case of the autism locus discussed, Wang et al. make a very clear case that 1) it’s not extremely variable between populations, and 2) that it doesn’t really matter if it were. For example, are all the 96 loci identified in the lipids study false positives due to population structure? Of course not; probably none of them are.

    Both Klein et al. and Wang et al. suggest that the vast majority of GWAS risk alleles are in LD with causal mutations, and that intergenic and intronic risk variants represent regulatory elements. In principle, either or both of these hypotheses could be true. However, thus far, virtually no such mutations or elements have been found by following up on GWAS findings.

    Finding a functional non-coding variant is more difficult than finding coding variants. That said, it’s simply not true that there are “virtually no” examples. See the example in this post, and in the previous post. If a causal variant has not yet been found, it does not follow that the signal is a false positive.

  4. so how does this happen? they have to pass their letters to some other people for second opinions, right?

    in this case, I think they must just be insulated. and since it’s a response to letters to the editor, i don’t think anyone else has to see it before its published.

  5. http://www.sciencedaily.com/releases/2010/08/100805172951.htm

    this is somewhat relevant and it came out today. CNVs making up the missing heritability, etc. one deletion gave a 17x increase for schizo.

  6. My reading is that neither rebuttal letter addressed McClellan and King’s point about lack of reproducibility between studies — and this new study of blood lipid loci may be as false-positive-prone as others cited in the original editorial and the counter-rebuttal.

    Two other points, possibly more important: first, from an outside perspective (I’m a mouse geneticist, not human) I feel like there is a lot of goalpost-moving going on among the GWAS people. This is typified by the “not a bug, but a feature” response that one hears regarding the prevalence of non-coding SNPs among even the well-validated GWAS hits — I’m guessing that most GWAS investigators had their fingers crossed for coding region mutations (like any other forward geneticists), and a lot of commentary like Klein et al’s on the rs693267 MYC SNP comes across as post-hoc whistling past the graveyard.

    Second, as an experimental biologist, I share McClellan and King’s skepticism that effects with odds ratios in the 1.33 range will ever be understandable in a mechanistic sense. I’m happy to entertain arguments to the contrary, of course — I thoroughly enjoy this blog and this discussion.

  7. My reading is that neither rebuttal letter addressed McClellan and King’s point about lack of reproducibility between studies — and this new study of blood lipid loci may be as false-positive-prone as others cited in the original editorial and the counter-rebuttal.

    the current standard in GWAS is to reproduce an association in multiple cohorts before publication (in the lipid study, the authors go even beyond this and replicate the associations in several non-European populations). McClellan and King are simply misinformed about how reproducible these results are.

    I’m guessing that most GWAS investigators had their fingers crossed for coding region mutations (like any other forward geneticists)

    yes, i imagine they were. that said, biology is complicated, such is life. Instead of being disappointed, I think instead that this is an exciting time for human biology.

    Second, as an experimental biologist, I share McClellan and King’s skepticism that effects with odds ratios in the 1.33 range will ever be understandable in a mechanistic sense.

    fair enough. time will tell; i’m an optimist. look at the paper i linked to above, for example. it’s a great example of the cool biology that can come out of these things
    http://www.nature.com/nature/journal/v466/n7307/abs/nature09266.html

Leave a Reply

a