Searching for a needle in a needlestack

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

Whole-genome sequencing is a game-changer for human genetics. It is now possible to deduce every base of an individual’s genome (all 6 billion of them – two copies of 3 billion each) for a couple of thousand euros, and dropping. (Yes, euros). Even Ozzy Osbourne just got his genome sequenced! For researchers searching for the causes of genetic disease (or resistance to vast quantities of drugs and alcohol), this means they no longer have to infer where a mutation is by tracking a sampling of “markers” spaced across the genome – they can directly see all of the genetic information.

The problem is, they directly see all of the genetic information. If each of us carries thousands of mutations – changes that are very rare or may even have never been seen before in any other person – then telling which one of those changes is actually causing the condition is a tough task. Researchers in psychiatric genetics are currently grappling with how to handle this glut of information.

The problem is particularly acute in this field, where there is a (very slowly) growing realisation that many so-called common disorders, such as schizophrenia and autism – are really umbrella terms for collections of very rare disorders. Each of these conditions can be caused by mutations in single genes. The reason they are so common is that there are so many genes required to wire the brain properly – mutations in any of probably hundreds of genes can lead to the kinds of neurodevelopmental defects that ultimately result in psychopathology. (At least, that is the working hypothesis – see review below for a discussion of the evidence supporting it).

Very large studies are now underway to sequence the genomes of thousands of people with schizophrenia, autism or other psychiatric disorders, along with “control” individuals from different populations. The hope is that by comparing the spectrum of mutations in patients with those in controls, it will be possible to deduce which mutations are pathogenic. The most obvious ones will be those which recur in multiple individuals with a psychiatric disorder, are not present in the control population and are predicted to affect the biochemical function of the encoded protein. Those parameters can be used to prioritise candidate mutations for further study.

So far, however, it has been far more difficult to generate the type of statistical evidence that psychiatric geneticists have been used to from genome-wide association or linkage studies. One major problem is that, while it is true that mental illness can be caused by single mutations, it is also true that the situation is likely more complicated than that in many cases. Most such mutations that have been identified to date are only partially “penetrant” – that means that not all of the people who carry the mutation have the disorder in question. Another way of describing that is to say that the mutations have “variable expressivity” – that means the phenotypes they result in vary widely across mutation-carriers. This makes it crucially important for genetic studies to very carefully define the phenotype being mapped – in many cases a particular clinical diagnosis will not be the best phenotype to choose.

One reason for such variable phenotypes due to a mutation in any single gene is that its effects may be modified by other mutations that each person carries. That situation is not unique to psychiatric disease – it’s actually true of all so-called Mendelian disorders. Even in classical examples like cystic fibrosis, which is caused by mutations in a single gene, the effects of such mutations are quite variable and are strongly affected by genetic background.

But it does pose a major problem – if you find a mutation in two or three people with disease and one person without disease, how can you assign a p-value to the likelihood of that mutation being causative? And how do you distinguish mutations in that gene from those that happen to occur in all the other genes in the genome? Hopefully, this problem will partly solve itself as larger samples of patients and control individuals are sequenced. A move back to family-based studies will also be hugely helpful as it will provide evidence based on which mutations segregate with illness (or, even better, with some more fundamental neurobiological “endophenotype”).

However, we will still likely be left with a situation where the statistical evidence we can get from considering the spectrum of mutations in single genes will run into mathematical limits. At some point it will be necessary to look for other types of evidence from outside the system. One type of evidence will come from analysing the biochemical pathways of the implicated genes – it is already becoming apparent that many such genes encode proteins that interact with each other (see review below for examples).

For example, mutations in the gene Contactin-associated protein 2 (CNTNAP2) have been found in patients with autism, schizophrenia, epilepsy, Tourette’s syndrome, ADHD and other disorders. The evidence for this gene by itself is extremely strong. Recently, mutations in genes encoding the related proteins CNTNAP4 and CNTNAP5 have also been found in patients with epilepsy and autism, respectively. By themselves, the evidence for each of these genes is not at all convincing – in fact it is not possible to even generate a p-value for how likely it is that they are causative. But taken together, the findings of mutations in each of these genes greatly strengthens the implication of the pathway in general. Findings of mutations in the genes encoding the interacting proteins Contactin-3, -4 and -5, similarly add to the weight of evidence.

These proteins are all involved in forming synaptic contacts between neurons, as are many other genes identified in patients, further implicating defects in this process as one route to mental illness.

The effects of mutations in particular genes can also be investigated in genetically modified mice. If a mutation in Gene A causes neurodevelopmental defects and physiological or behavioural phenotypes that are similar to those seen in mice with mutations in a gene known to cause psychiatric illness, then that is strong evidence that Gene A may be the culprit in individuals carrying a mutation that disrupts it.

The next few years will be tremendously exciting as the data from sequencing projects become available. To fully interpret these it will be necessary to look beyond statistical measures from the human data themselves and include evidence of biological plausibility, converging biochemical pathways and neurobiological phenotypes in both humans and animal models.

Mitchell KJ (2010). The genetics of neurodevelopmental disease. Current opinion in neurobiology PMID: 20832285


  1. Once the price gets down to a couple of hundred debased dollars, sequencing will become a routine medical lab procedure, and virtually everyone will be sequenced.

    Is that a good thing or a bad thing?

    How could one possibly use, analyze or act on that much data (6 billion x 6 billion)?

    Does the NSA have enough computers? Or programmers?

  2. It’s not really 6 billion bases if most people have the same DNA as the reference DNA most of the time.

Leave a Reply