Substack cometh, and lo it is good. (Pricing)

The post-neutral human genome (the Kern-Hahn era)

If you have any background in evolutionary biology you are probably aware of the controversy around the neutral theory of molecular evolution. Fundamentally a theoretical framework, and instrumentally a null hypothesis, it came to the foreground in the 1970s just as empirical molecular data in evolutionary was becoming a thing.

At the same time that Motoo Kimura and colleagues were developing the formal mathematical framework for the neutral theory, empirical evolutionary geneticists were leveraging molecular biology to more directly assay natural allelic variation. In 1966 Richard Lewontin and John Hubby presented results which suggested far more variation than they had been expecting. Lewontin argued in the early 1970s that their data and the neutral model actually was a natural extension of the “classical” model of expected polymorphism as outlined by R. A. Fisher, as opposed to the “balance school” of Sewall Wright. In short, Lewontin proposed that the extent of polymorphism was too great to explain in the context of the dynamics of the balance school (e.g., segregation load and its impact on fitness), where numerous selective forces maintained variation. The classical school emphasized both strong selective sweeps on favored alleles and strong constraint against most new mutations.

And yet one might expect low levels of polymorphism from the classical school. The way in which the neutral framework was a more natural extension of this model is that even if most inter-specific variation, most substitutions across species, are due to selectively neutral variants, most variants could nevertheless be deleterious and so constrained. Alleles which increase in frequency may have done so through positive selection, or, just random drift. Not balancing forces like diversifying selection and overdominance.

The general argument around neutral theory generated much acrimony and spilled out from the borders of population genetics and molecular evolution to evolutionary biology writ large. Stephen Jay Gould, Simon Conway Morris, and Richard Dawkins, were all under the shadow of neutral theory in their meta-scientific spats about adaptation and contingency.

That was then, this is now. I’ve already stated that sometimes people overplay how much genomics has transformed our understanding of evolutionary biology. But in the arguments around neutral theory, I do think it has had a salubrious impact on the tone and quality of the discourse. Neutral theory and the great controversies flowered and flourished in an age where there was some empirical data to support everyone’s position. But there was never enough data to resolve the debates.

From where I stand, I think we’re moving beyond that phase in our intellectual history. To be frank, some of the older researchers who came up in the trenches when Kimura and his bête noire John Gillespie were engaged a scientific dispute which went beyond conventional collegiality seem to retain the scars of that era. But younger scientists are more sanguine, whatever their current position might be because they anticipate that the data will ultimately adjudicate, because there is so much of it.

With that historical context, consider a new paper, Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences:

Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.

This is not an entirely surprising result. Some researchers in human genetics have been arguing for the pervasiveness of background selection, selection against deleterious alleles which effects nearby regions, for nearly a decade. In contrast, there are others who argue selective sweeps driven by positive selection are important in determining variation. Unlike the 1970s and 1980s these researchers don’t evince much acrimony, in part because the data keeps coming, and ultimately they’ll probably converge on the same position. And, the results may differ by species or taxon.

If you want a less technical overview than the paper, Kelley Harris has an excellent comment accompanying it. If you want to know what I mean by the Kern-Han era, it’s a joke due to the publication of The Neutral Theory in Light of Natural Selection.

Finally, some of you might wonder about the implications for demographic inference which preoccupies me so much on this weblog. In the big picture, it probably won’t change a lot, but it will be important for the details. So this is a step forward. That being said, the possibility of variable mutation rates and recombination rates across time and between lineages are also probably quite important.