I’ve got your missing heritability right here…

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

A debate is raging in human genetics these days as to why the massive genome-wide association studies (GWAS) that have been carried out for every trait and disorder imaginable over the last several years have not explained more of the underlying heritability. This is especially true for many of the so-called complex disorders that have been investigated, where results have been far less than hoped for. A good deal of effort has gone into quantifying exactly how much of the genetic variance has been “explained” and how much remains “missing”.

The problem with this question is that it limits the search space for the solution. It forces our thinking further and further along a certain path, when what we really need is to draw back and question the assumptions on which the whole approach is founded. Rather than asking what is the right answer to this question, we should be asking: what is the right question?

The idea of performing genome-wide association studies for complex disorders rests on a number of very fundamental and very big assumptions. These are explored in a recent article I wrote for Genome Biology (referenced below; reprints available on request). They are:

1) That what we call complex disorders are unitary conditions. That is, clinical categories like schizophrenia or diabetes or asthma are each a single disease and it is appropriate to investigate them by lumping together everyone in the population who has such a diagnosis – allowing us to calculate things like heritability and relative risks. Such population-based figures are only informative if all patients with these symptoms really have a common etiology.

2) That the underlying genetic architecture is polygenic – i.e., the disease arises in each individual due to toxic combinations of many genetic variants that are individually segregating at high frequency in the population (i.e., “common variants”).

3) That, despite the observed dramatic discontinuities in actual risk for the disease across the population, there is some underlying quantitative trait called “liability” that is normally distributed in the population. If a person’s load of risk variants exceeds some threshold of liability, then disease arises.

All of these assumptions typically go unquestioned – often unmentioned, in fact – yet there is no evidence that any of them is valid. In fact, the more you step back and look at them with an objective eye, the more outlandish they seem, even from first principles.

First, what reason is there to think that there is only one route to the symptoms observed in any particular complex disorder? We know there are lots of ways, genetically speaking, to cause mental retardation or blindness or deafness – why should this not also be the case for psychosis or seizures or poor blood sugar regulation? If the clinical diagnosis of a specific disorder is based on superficial criteria, as is especially the case for psychiatric disorders, then this assumption is unlikely to hold.

Second, the idea that common variants could contribute significantly to disease runs up against the effects of natural selection pretty quickly – variants that cause disease get selected against and are therefore rare. You can propose models of balancing selection (where a specific variant is beneficial in some genomic contexts and harmful in others), but there is no evidence that this mechanism is widespread. In general, the more arcane your model has to become to accommodate contradictory evidence, the more inclined you should be to question the initial premise.

Third, the idea that common disorders (where people either are or are not affected) really can be treated as quantitative traits (with a smooth distribution in the population, as with height) is really, truly bizarre. The history of this idea can be traced back to early geneticists, but it was popularised by Douglas Falconer, the godfather of quantitative genetics (he literally wrote the book).

In an attempt to demonstrate the relevance of quantitative genetics to the study of human disease, Falconer came up with a nifty solution. Even though disease states are typically all-or-nothing, and even though the actual risk of disease is clearly very discontinuously distributed in the population (dramatically higher in relatives of affecteds, for example), he claimed that it was reasonable to assume that there was something called the underlying liability to the disorder that was actually continuously distributed. This could be converted to a discontinuous distribution by further assuming that only individuals whose burden of genetic variants passed an imagined threshold actually got the disease. To transform discontinuous incidence data (mean rates of disease in various groups, such as people with different levels of genetic relatedness to affected individuals) into mean liability on a continuous scale, it was necessary to further assume that this liability was normally distributed in the population. The corollary is that liability is affected by many genetic variants, each of small effect. Q.E.D.

This model – simply declared by fiat – forms the mathematical basis for most GWAS analyses and for simulations regarding proportions of heritability explained by combinations of genetic variants (e.g., the recent paper from Eric Lander’s group). To me, it is an extraordinary claim, which you would think would require extraordinary evidence to be accepted. Despite the fact that it has no evidence to support it and fundamentally makes no biological sense (see Genome Biology article for more on that), it goes largely unquestioned and unchallenged.

In the cold light of day, the most fundamental assumptions underlying population-based approaches to investigate the genetics of “complex disorders” can be seen to be flawed, unsupported and, in my opinion, clearly invalid. More importantly, there is now lots of direct evidence that complex disorders like schizophrenia or autism or epilepsy are really umbrella terms, reflecting common symptoms associated with large numbers of distinct genetic conditions. More and more mutations causing such conditions are being identified all the time, thanks to genomic array and next generation sequencing approaches.

Different individuals and families will have very rare, sometimes even unique mutations. In some cases, it will be possible to identify specific single mutations as clearly causal; in others, it may require a combination of two or three. There is clear evidence for a very wide range of genetic etiologies leading to the same symptoms. It is time for the field to assimilate this paradigm shift and stop analysing the data in population-based terms. Rather than asking how much of the genetic variance across the population can be currently explained (a question that is nonsensical if the disorder is not a unitary condition), we should be asking about causes of disease in individuals:

- How many cases can currently be explained (by the mutations so far identified)?

- Why are the mutations not completely penetrant?

- What factors contribute to the variable phenotypic expression in different individuals carrying the same mutation?

- What are the biological functions of the genes involved and what are the consequences of their disruption?

- Why do so many different mutations give rise to the same phenotypes?

- Why are specific symptoms like psychosis or seizures or social withdrawal such common outcomes?

These are the questions that will get us to the underlying biology.

Mitchell, K. (2012). What is complex about complex disorders? Genome Biology, 13 (1) DOI: 10.1186/gb-2012-13-1-237

Manolio, T., Collins, F., Cox, N., Goldstein, D., Hindorff, L., Hunter, D., McCarthy, M., Ramos, E., Cardon, L., Chakravarti, A., Cho, J., Guttmacher, A., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C., Slatkin, M., Valle, D., Whittemore, A., Boehnke, M., Clark, A., Eichler, E., Gibson, G., Haines, J., Mackay, T., McCarroll, S., & Visscher, P. (2009). Finding the missing heritability of complex diseases Nature, 461 (7265), 747-753 DOI: 10.1038/nature08494

Zuk, O., Hechter, E., Sunyaev, S., & Lander, E. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability Proceedings of the National Academy of Sciences, 109 (4), 1193-1198 DOI: 10.1073/pnas.1119675109


  1. The problem is that the whole field is infested by people who simply don’t care about mathematical rigour. They even share and use computer programs without looking at the code, believing with blind faith that the calculations are going to be ok.

    Most people probably take loads of phenotype and genotype data, press three buttons, and volilà, there you have the new ‘scientific’ breakthrough.

  2. I’m not sure normally distributed liability is so important for GWAS (though it may be for computational modeling of disease). Imagine a disease caused by a single recessive allele at 50% frequency. Disease risk is not normally distributed at all, but the GWAS approach would be fantastically successful. Ditto for perfectly defined phenotypes: imagine a “disease” that is actually two diseases, and that there are common SNPs that influence either one (but not both). Now you just need twice the sample size (more or less) to identify the relevant genes. But it’s true knowing the right disease subtypes a priori would be helpful.

    If diseases are really caused by different rare, partially penetrant mutations in different genes in every individual, you’re right that GWAS isn’t the best way to look for those mutations. That’s a very special model though!

  3. I don’t think it’s that outlandish to suppose that disease states are extremes of rather continuous variables, with some “threshold” leading to disease.

    Although, for instance, type 2 diabetes may be labeled an “all or nothing” trait, there surely are healthy people with different efficiency of blood sugar regulation, i.e. if you feed them all the same amount of sugar, the immediate rise in blood glucose level may be damped to a different extent in different individuals, doesn’t that seem reasonable?

    For psychological traits even more so: for instance autism is clearly a spectrum disorder, ranging from mere impairment of perceiving social nuances to the complete absence of speech and awareness of others–and sociability shows great variance even in supposedly “normal” people. Depression is even “worse”, in that tolerance of stressors greatly differs between people, with some bouncing back like springs and some staying gloomy for weeks. In fact, I’d be apt to postulate a combined internal-external feedback model for depression, where the “sum” of the ability of the neural network and one’s life circumstances to recover from loss must fall below some threshold before setbacks become self-perpetuating enough to cause clinical depression.

    Furthermore, many disease phenotypes involve trade-offs. For instance, obesity involves the trade-off between famine tolerance and present bodily health, autism involves the trade-off between cooperation/integration with a group and the pursuit of novel strategies the group may have overlooked, ADHD involves the balance of rejecting extraneous information and remaining aware of shifting environments, etc. Thus, I’d say that many disease states involve selection of the wrong strategy at the wrong time, or the pursuit of one strategy to the total exclusion of others.

  4. Since you’ve got the missing heritability, why don’t you show it for something as simple as height?

  5. Regarding the idea that many diseases represent the extreme end of a normal distribution, it is clear that where the line is drawn, in terms of who gets a clinical diagnosis, can be blurry.
    What I am arguing against is the assumption that the genetics of a trait like sociability, for example, is the same as the genetics of a symptom like social withdrawal. There can be a normal distribution of a trait and deviations from that distribution that are caused by different mechanisms. I argue that you need some major insult to get a serious phenotype – not just an accumulation of very small variants, because the system has to deal with that kind of variation all the time. For example, at the ends of the normal distribution of height you don’t get dwarfism or giantism suddenly appearing (say in breedings of quite short or quite tall people) – these major effects on phenotype are caused by single mutations. Same for mental retardation or intellectual disability – the genetics of these conditions are not the same as the genetics of IQ.

    The point is that the genetics of how a system varies is not necessarily the same as the genetics of how the system fails. (And, in my view, should not be expected to be the same).

    Re height itself, the normal distribution could be caused all by common variation – there is no evolutionary reason why that should not be the case. But it could also involve a lot more rare and unique mutations that have larger effects in individuals than all their SNPs combined.

  6. Joe, why is the idea that complex diseases are caused by different rare, partially penetrant mutations in different genes in every individual a “very special model”? (Obviously not different genes in every individual but possibly mutations in any of hundreds or maybe even thousands of different genes). That is exactly the architecture of mental retardation, inherited blindness, inherited deafness, epilepsy and many other conditions defined by surface symptoms. We already know that mental retardation can be caused by mutations in any of a large number of different genes and so nobody would think of doing a population-based study lumping all cases together. We are now learning that the same is true for autism and schizophrenia. Treating them as unitary disorders is an assumption that I argue is not supported.

  7. Well, it’s not exactly the same model as mental retardation, deafness, etc, right? In that the latter involve mutations of very large effect (such that they’re approximately Mendelian), while common diseases don’t show that inheritance pattern. And one of the most common causes of deafness is an allele at 1-2% frequency in a gene that accounts (with other mutations) for about 50% of the cases. That’s definitely a “rare” allele, but not so rare that it’s individual or family specific. (In fact, if you were to do a population-based GWAS of deafness, you might well get that gene).

    I think people probably think individual/family-specific alleles contribute to disease risk. But to argue that they’re *the* source of disease risk seems like quite the leap. Not common variation at all (what is this schizophrenia GWAS picking up, if not common variants)? Not rare variants at 1% frequency?

  8. I would argue that common diseases are caused by effectively Mendelian alleles in many cases – we now know of many examples, especially for autism and schizophrenia. We still have to explain incomplete penetrance, but that is dependent on phenotypic definition – are we looking for penetrance for one specific diagnostic category, for psychiatric illness more generally, for some neurobiological or behavioural endophenotype? We should expect there to be genetic interactions with other mutations in the background – ones that either have an effect alone or that act only as modifiers. What I argue against is the model that very very large numbers of common variants could cause disease without some major mutation being present – one that you could say is “causal”. (Like an FMR1 or a MeCP2 mutation in someone with autistic symptoms – even though those mutations don’t always result in autism, if you find one in a patient so affected, most clinicians would be happy to say that if the person didn’t have that mutation, the most probably would not have autism).

    Re the SZ GWAS, it’s not so easy to say what they are picking up. Clearly they find a number of SNPs that are significantly associated (statistically) but with almost negligible effect on risk. Those signals could be caused by common or linked rare variants – it is not possible to distinguish. Claims that they prove a large polygenic effect in SZ caused by thousands of common variants are not justified. (For more on this, see the article referred to in the post).

  9. “What I am arguing against is the assumption that the genetics of a trait like sociability, for example, is the same as the genetics of a symptom like social withdrawal. There can be a normal distribution of a trait and deviations from that distribution that are caused by different mechanisms. I argue that you need some major insult to get a serious phenotype – not just an accumulation of very small variants, because the system has to deal with that kind of variation all the time.”

    This seems to imply that in each pathway, there is complete segregation of the components into those that produce qualitative vs. quantitative behavior.

    Imagine an enzyme where a particular mutation causes a complete loss of function–i.e. the enzyme fails to catalyze the reaction at all, or so little that the improvement over the uncatalyzed reaction has no physiological relevance. Then it seems logical that other mutations of the same enzyme, even possibly at the same amino acid position, may have milder effects on function, giving less extreme phenotypes that still seem “normal”, just perturbed.

    And this view doesn’t even take into account that qualitative changes in behavior need not necessarily arise from complete absence of any one component or interaction, even if they often do. Often, a combination of parameters will determine whether, e.g., a negative feedback is strong enough to keep a system stable.

    Besides, haven’t you, or others, argued on this blog that rare variants also contribute substantially to “normal” phenotypes, i.e. things like personality types that are not considered diseases?

  10. Reading this I thought, when kjmtchl writes something it is really good.

  11. I question the assumption that many common diseases are caused by heredity in the first place.

    How Your Cat Is Making You Crazy

    “I’d say 75 percent of cases of schizophrenia are associated with infectious agents, and Toxo would be involved in a significant subset of those.”

  12. DR01D, there is a link between maternal infection and schizophrenia in the offspring (a statistical link, that is – not a huge effect at the population level, but enough to warrant understanding how maternal infection can affect neural development in utero). As far as I know there is no evidence for a link with infection in patients themselves. I would be interested to know what you base that claim on, especially as the evidence for substantial heritability is very very solid and consistent.

  13. kjmtchl, from the article,

    “Many schizophrenia patients show shrinkage in parts of their cerebral cortex, and Flegr thinks the protozoan may be to blame for that. He hands me a recently published paper on the topic that he co-authored with colleagues at Charles University, including a psychiatrist named Jiri Horacek. Twelve of 44 schizophrenia patients who underwent MRI scans, the team found, had reduced gray matter in the brain—and the decrease occurred almost exclusively in those who tested positive for T. gondii. After reading the abstract, I must look stunned, because Flegr smiles and says, “Jiri had the same response. I don’t think he believed it could be true.” When I later speak with Horacek, he admits to having been skeptical about Flegr’s theory at the outset. When they merged the MRI results with the infection data, however, he went from being a doubter to being a believer. “I was amazed at how pronounced the effect was,” he says. “To me that suggests the parasite may trigger schizophrenia in genetically susceptible people.”

  14. kjmtchl, here is another Toxo/Schiz story from early 2008.

    Toxoplasma Infection Increases Risk Of Schizophrenia, Study Suggests

    “Researchers found that of the 180 study subjects diagnosed with schizophrenia, 7 percent had been infected with toxoplasma prior to their diagnosis, compared to 5 percent among the 532 healthy recruits. Thus, people exposed to toxoplasma had a 24 percent higher risk of developing schizophrenia.”

    “Our findings reveal the strongest association we’ve seen yet between infection with this very common parasite and the subsequent development of schizophrenia,” says Robert Yolken, M. D., a neurovirologist at Hopkins Children’s who was among those conducting the analysis.

    Previous studies have reported on the link between schizophrenia and the presence of toxoplasma antibodies, which are evidence of past infection, but this is the first study to show that infection with the parasite can precede the initial onset of symptoms and subsequent diagnosis with schizophrenia, Yolken says.

    Because the U.S. military routinely tests its active personnel for toxoplasma, among other infectious agents, and stores blood samples in a central repository, researchers were able to determine the time line between infection and a diagnosis of schizophrenia.

  15. There do seem to be some interesting links in the literature between toxoplasma infection and risk of schizophrenia. To be honest, I have not read enough of that literature to be able to evaluate it. My initial response is that the abstracts I read are inferring causation from correlation. It seems quite plausible that having schizophrenia would make you more likely to engage in behaviour that could lead to toxoplasma infection. In fact, at least one study suggests that is the explanation for the statistical link: http://www.ncbi.nlm.nih.gov/pubmed/20608474 (simply greater contact with cats in people with SZ). I am not discounting the possibility of a real causal link, but it doesn’t seem to have been demonstrated in any conclusive way. I certainly would not accept, fro the kind of study you cite above, your claim that this kind of infection is the major cause of SZ – the evidence for genetic mechanisms playing a major role is overwhelming.

  16. kjmtchl

    “My initial response is that the abstracts I read are inferring causation from correlation.”

    That sounds like 99.999% of the abstracts I’ve read concerning heredity and disease.

    Anyway according to the study on US military personnel symptoms occurred after infection with Toxoplasma.

  17. Sorry this was broken into two posts.

    “the evidence for genetic mechanisms playing a major role is overwhelming.”

    That’s probably true for ALL disease including infectious disease.

    Scientists pinpoint flu gene

    “An unlucky combination of “vulnerable” genes could explain why some people recover from the flu overnight and others struggle to shake off the virus for weeks.”

  18. and what about “all of the above”

    the problem may be the following
    1. many diseases are essentially a constellation of signs and symptoms. they well be a final common destination of a number of paths

    2. we in medical science and biology are using mathematical / statistical techniques from the linear ( 2 dimensional ) space to explore complex phenomena. i.e. our mathematics is not capable (as yet?) of making any sense out of phenomena.

    3. I think kjmtchl might be right that SOME syndromes may be the result of rare variants producing similar phenotypes in a Mendelian fashion. But this does not address the fact that some disorders may be related to gene regulation and not to a mendelian form of gene expression. And some may work in the Falconer type model. why cannot all of the above be the correct response?

    4. stable systems generally will have negative feedback loops tending to bring the system from temporary instability to stability. the more complex the system the more complex the feedback loops and the inter-relationships and interplay between them, and the more difficult it is to predict the end result.

  19. [...] [...]

  20. Here’s the problem with assuming that all (or most) diseases are caused exclusively by rare variants of moderate to large effect. While it’s true that these variants would be unlikely to be identified efficiently in a GWAS study, they would be highly likely to be pinpointed in a well-designed multi-level linkage study with medical resequencing. These types of studies have been around for ages. One of the major reasons why GWAS became so popular was because it was a new way to tackle some of the diseases that had resisted the linkage approach in the first place, which argues against a single moderate-to-large effect mutation causing the disease. This does, of course, assume that you have a decent sized family in which to study the disease, but many of these diseases have sufficient family cohorts to study, particularly when you add in medical resequencing as is being done now.

    I find it very strange that people are so married to the idea that genetic architecture has to be either “common SNP” or “rare variant”. Why can’t we have some diseases of each type, and some diseases with a combination of these architectures for the very same disease? I expect that when everything shakes out, that is exactly what we’re going to have…common SNPs creating phenotypes that predispose individuals to have particular phenotypes (in particular environments), with the occasional strong effect from a rare variant rising to the forefront.

    And to address your argument about selection, that was a non-starter…selection isn’t something that works in just one direction or at just one strength. It’s a force that has variable directionality and variable force. If you haven’t become familiar with Wright’s shifting balance theory, you should.

  21. Ria, thanks for your comments. Your argument that linkage studies would have found rare variants of major effect, if they indeed exist, is a popular one, but flawed for several reasons, especially for psychiatric disorders. For linkage studies to work (to actually get down to identifying specific genes) you need big families with multiple affected individuals and you need to map the phenotype correctly. It is rare to find such families for psychiatric disorders (partly because they are so deleterious), especially under the mistaken notion that they should “breed true” – i.e., that there should be distinct mutations causing schizophrenia versus bipolar disorder versus autism, for example. We now know that the etiology is overlapping. Also, overall penetrance (for some psychological or neurobiological phenotype) tends to be higher than that for mental illness (generally), which is higher than that for any specific diagnostic category. Performing linkage based only on the latter phenotype is therefore not likely to work.

    If you try to get around this by lumping together many families and performing linkage analyses across all of them (which has been done many times) then you dilute real signals if heterogeneity is high.

    As it happens, linkage analyses for schizophrenia have identified a large number of loci, but not had the power to get down to specific genes. The inference that all these signals are false positives is an assumption (many probably are but many may be real).

    Re the common versus rare dichotomy, the question is which mutations are most important and most likely to get us to the underlying biology? I favour focusing on the ones of large effect, because these can be more accurately said to be “causal” (in the sense that if the person did not have that mutation they would most likely not have the condition). As detailed in the paper referred to above, I absolutely expect an important role for modifiers of these mutations, common or rare, and for oligogenic interactions.

    Finally, re selection, you are remaking the balancing selection argument – this requires evidence. Yes, some traits may change in advantageousness in different contexts or over time – this has not been demonstrated for deleterious diseases, especially early-onset ones with demonstrable, negative (current) effects on fitness. I find those claims inherently implausible – that is just an opinion, but the point is that without some evidence to bolster such claims, they should not simply be accepted.

  22. Much has been made of native people who have high levels of type II diabetes while eating a Western diet but who become healthier when they eat something approximating a ‘traditional’ diet.

    I think that would provide a clear example of diversity that was once advantageous but has become deleterious in recent times.

    Haven’t also claims been made about genetic morphs related to Alzheimer’s that they protect the brain from the consequences of starvation early in development?

    And of course there’s good old Sickle Cell Anemia, which exists as a side effect of selection pressures for a trait which is quite advantageous… if there are endemic problems with malaria in your area, which thankfully is no longer the case for most of us.

  23. There are certainly a few well-known examples of balancing selection. That does not mean it is a widespread mechanism – there is no evidence that it is. My point is that simply invoking it does not get you off the hook if you are proposing that common variants predispose to highly deleterious disease – you have to provide some evidence that it actually pertains and the negative effects on fitness are so large for disorders like autism and schizophrenia that the balancing selection would have to be correspondingly (and implausibly) large in the opposite direction.

  24. I have skimmed the comments (I may have missed something), but what if a trait only manifests when given certain environmental triggers (we know that is true for PTSD – it is in the name)? If a certain population has a gene and yet only 1/2 get the trigger then right away heritability is down to 50%.

    Is that really the right way to look at all this?

    Some have noted that all the complexities of genetic interactions have not yet been teased out. Throw in the environment and there are a LOT more threads to untangle.

  25. It is certainly quite plausible that some mutations may increase risk to environmental triggers or that their effects may be modified by environmental factors (or vice versa). Many disorders, especially neurodevelopmental ones, may even be directly phenocopied by environmental insults. However, in terms of accounting for the missing heritability, these are not likely to be important, as they should not contribute to the measure of heritability – they should either be controlled for in twin studies or contribute to the non-genetic sources of variance.

  26. kjmtchl,

    From what I have seen re: PTSD studies and twin data, heritability is given as 50%. But that is not in fact the case. We know from general population genetic studies that about 20% of a population is susceptible to PTSD and yet only 1/2 those are affected.

    OTOH in high stress war zones reports of the % of troops affected runs in the 20 to 25% range. About what you would expect if 100% of the susceptible get enough stress.

    Twin studies do not in fact control for environment. As far as I can tell.

    I’m sure it can be done (or done better), but that is currently not the case. IMO. I only know this because the study of PTSD is a hobby of mine. So I’m not deeply conversant with the general field of genetics and heritability.

  27. This is quite interesting. I see that the Zuk et al paper chose not to model fitness as additive. That is fascinating! I teach Labs for a graduate course on population genetics. We are very careful to mention repeatedly in class when going over the various models that models assuming additivity are but one type of model, and there may yet be others that make more sense that don’t assume additivity. Jon Seger, working in whale lice, has found that weakly deleterious mutations might account for a vast amount of the so-called “missing” heritability. It’s just so small we won’t find it very easy. Lately he’s come up with some interesting toy models to figure out how we could see signals of these purported common alleles that should have an (almost!) immeasurably small effect on fitness.

  28. S.V. Nuzhdin and his collaborators have come up with an interesting way of making genotype-to-phenotype maps using structural equation models. It seems like a reasonable way to go beyond GWAS and actually start to pinpoint the exact genes and regulatory regions that work with our environment to determine phenotype.

Leave a Reply