Is the “missing heritability” right under our noses?

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

One of the major criticisms leveled against genome-wide association studies for complex diseases is that they have identified loci which account for a relatively small proportion of the variance in most traits. The difference between this small proportion of variance explained by known loci and the (generally large) total amount of variance known to be due to genetic factors has been called the “missing heritability”. Much ink has been spilled speculating about where this missing heritability lies.

Two papers published this week suggest that maybe much of the heritability isn’t actually missing at all. The argument is simple: when performing a genome-wide association study, people use very stringent thresholds for calling a SNP associated with a trait. This is reasonable; people generally want to follow up only on true positives. However, there are probably many loci which don’t reach these highly stringent cutoffs but which truly influence the trait in question. Using methods to determine how much of the variance can be explained by these loci of smaller effect, one group suggests that about half of the heritability of height can be explained by common SNPs, and possibly close to all of it if other factors are taken into account. The authors have, in their discussion, one of the most reasonable, non-hyperbolic discussions of where the “missing heritability” lies, and how whole-genome sequencing will affect genome-wide association studies. It’s worth reading the whole thing, but here’s their conclusion::

If other complex traits in humans, including common diseases, have genetic architecture similar to that of height, then our results imply that larger GWASs will be needed to find individual SNPs that are significantly associated with these traits, because the variance typically explained by each SNP is so small. Even then, some of the genetic variance of a trait will be undetected because the genotyped SNPs are not in perfect LD with the causal variants. Deep resequencing studies are likely to uncover more polymorphisms, including causal variants that will be represented on future genotyping arrays. Our data provide strong evidence that the variation contributed by many of these causal variants is likely to be small and that very large sample sizes will be required to show that their individual effects are statistically significant. A similar conclusion was drawn recently for schizophrenia. In some cases the small variance will be due to a large effect for a rare allele, but this will still require a large sample size to reach significance. Genome-wide approaches like those used in our study can advance understanding of the nature of complex-trait variation and can be exploited for selection programs in agriculture and individual risk prediction in humans.


Park et al. (2010) Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nature Genetics. doi:10.1038/ng.610.

Yang et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nature Genetics. doi:10.1038/ng.608.


  1. The whole business of correlation analysis depends on the assumption of a linear model. The correct model may be wildly nonlinear, which means we’ll never get far with simple superposition.

  2. As far as I know, few or no studies treat disease phenotypes quantitatively. So a variant simply correlates with a disease trait, or doesn’t. It cannot correlate linearly or non-linearly.

    As for height, which is obviously a quantitative trait, I don’t know how they do the math up.

  3. Only additive genetic variance contributes to the (narrow-sense) heritability. So non-linear effect are not directly relevant here, though they may be important in other situations.

  4. The problem with trying to identify single SNPs for complex things like height and other heritable conditions is the fact that many vectors control these conditions. It is a complex web of variables. Like the infrastructure of a building. Just looking at the outside gives no indication on how it is constructed internally. The human body is many orders of magnitude more complex than any man-made structure, therefore the potential internal variability is far greater. Just because a building is more than 200 feet tall does not mean that it is made of a particular type of steel. It could be stainless, wrought, carbon, or another alloy. In the same way, there are many genes which can yield the same results.

  5. One the current problems with current populations genetics is the assumption of a bean bag model for traits where traits have no affect on each other. This makes sense of alot of reasons and in a simple organism,it could be true. But in complex organisms like human beings we have so many genes that they’re bound to create some very complex and convoluted feedback loops. Now if throw in the environment too we get some really insane issues with casaulity in an already complex system that could be made even more complicated. Factor in epigenetic change and things could get very dicey vis a vis predictability.

  6. @Alex, good point about complexity. The thing with metabolism is that most things are chains. There are many precursors and steps in the metabolic processes, and many genes/variants affecting each step of the way.

  7. Eric, while things like Schizophrenia may be looked at as qualitative variables (cases vs controls) a lot of the studies are actually done with quantitative variables (height, BMI, Blood glucose, blood pressure, fasting insulin and so on). Even diseases like bipolar disorder may be investigated not as cases vs controls but as quantitative variables (scores on psychological tests) though I am not familiar with that literature.

  8. The Yang paper is very interesting but the methodology relies on some assumptions: in particular, they found that the SNPs on the chip accounted for about 40% of the variability – which leaves half missing, since the narrow sense heritability of height is 0.8.

    But they argue that the true value is 80% if you “correct” for the fact that the analysed SNPs may not be the casual ones, and may only imperfectly tag the true causal variants especially if the causal variants are rare.

    However I’m not sure how valid that is… the peer reviewers approved it, I guess.

Leave a Reply