Genomic noise and individual variation

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

In classic heritability studies, the variance of some phenotype Y is decomposed (in the simplest model) into the variance attributable to genetic effects, G, and the variance attributable to environment, E, such that Var(Y) = G+E. As the majority of heritability studies are done by geneticists, who are in general more interested in G than in E, the environmental variance is, to them, largely an error term. When thought of this way, it is clear that “environmental variance” can contain effects that, though not genetic, are certainly not “environmental” in any traditional sense.

In particular, the error term must includes simple stochastic noise on any part of the complex mapping from genotype to phenotype. Even at the early points in this map–the genome sequence and gene expression–there is considerable opportunity for random events to greatly affect phenotype. For lack of a better term, I’m going to call noise introduced at this level “genomic noise“; some examples follow:

1. While the genome is sometimes thought of as a constant in all cells from a given individual, that is not the case. Besides mutations, the genomes in some cell types undergo extensive remodeling during development. For example, consider the T and B cells of the immune system. During development, the genes in the immunoglobulin cluster are recombined to create the receptors presented by the cell. This recombination is stochastic– even from an identical starting spot, the precise combination of genes obtained in independent recombinations can vary greatly. It stands to reason that this genomic noise could, in turn, propogate up to phenotypic variation, and indeed, that is the case– if you look at identical twins who are discordant for multiple sclerosis (an autoimmune disease), you find that those early recombination events have made them less than identical.

2. Genomic noise is introduced in brain cells, as well, by the random movement of transposable elements and their effects on gene expression. The important studies (or perhaps study, singular; I can’t seem to find anything other than the linked paper) here have been done in the mouse, and any phenotypic effect is highly speculative, but as the costs of sequencing drop, it will be possible to study these sorts of somatic changes on a large scale.

3. Moving up a level from genomes to gene expression, it’s clear that some variation in levels of gene expression is simply stochastic. But interestingly, recent work has suggested that, though most everyone has two copies of all autosomal genes, a rather large fraction of genes (excluding imprinted ones) are only expressed from one copy, and the choice of copy to express varies from cell to cell. This opens up the possibility of cells or even entire tissues ending up effectively haploid for a given gene. So if you were to have two individuals heterozygous for some phenotypically relevant variant, they could end up with quite different phenotypes depending on the random choice of allele to express (see also G’s post on the topic here).

I find these sorts of speculations entertaining, and I imagine some of these postulated effects will soon be tested. Until then, just something to keep in mind.



  1. This research found that 0.9% of cells in schizophrenic human brains are aneuploid for chromosome 1, whereas only 0.3% were in normal controls. (Aneuploid == gain or loss of a chromosome.)

  2. the error term must includes simple stochastic noise on any part of the complex mapping from genotype to phenotype 
    I see it a little differently.  
    Let’s talk about IQ. It’s variation could be regressed against a sum of parental IQ and SES (Socio-Econ-Standard). What better environmental model do you have besides SES? Maybe none, and that one is not measured very precisely. You end up with a big residual even after doing your best to model both the genes and the environment. As I’m a seismologist I can only guess at the numbers, so I’ll stop there. The point is, there are lots of sources for errors besides in the gene–>phen transform

  3. I am very sympathetic to the noise argument. In discussions of the proportion of variance explained in [important variable X], the statement that Y explains 40% of the variance is, in my experience, frequently greeted with the automatic assumption that some more-important causal factor must explain the remaining 60%. Instead, the first question should be what proportion of the variance is explainable at all. If 50% is due to noise, then 40% actually covers 80% of the explainable variance. 
    Note that estimating the proportion of explainable variance is difficult, often (seemingly) impossible. Reliability coefficients are useless, because variance also arises from non-identity between the measurement and the construct being measured. If one can account for 100% of the variance, that’s the end of the story, but one must not be working in population genetics. A more typical situation is to have factors that cover a minority of the variance. Then…what? I wonder whether a principled way to decide between noise and as-yet-undetected causal factors exists, even in some useful subset of situations.

  4. So Charles Murray’s point (in the Bell Curve I think) that if we remove all environmental differences, then all the remaining differences would be genetic, is false?