The continuing tangling of the human tree

Last summer I made a thoughtless and silly error in relation to a model of human population history when asked by a reader the question: “which population is most distantly related to Africans?” I contended that all non-African populations are equally distant. This is obviously wrong on the face of it if you look at any genetic distance measures. West Eurasians, even those without recent Sub-Saharan African admixture (e.g., North Europeans) are closer than East Eurasians, who are often closer than Oceanians and Amerindians. One explanation I offered is that these latter groups were subject to greater genetic drift through a series of population bottlenecks. In this framework the number of generations until the last common ancestor with Sub-Saharan Africans for all groups outside of Africa should be about the same, but due to evolutionary factors such as more extreme genetic drift or different selective pressures some non-African groups had diverged more from Africans than others in terms of their genetic state. In other words, the most genetically divergent groups in relation to Africans did not diverge any earlier, but simply diverged more rapidly.

Dienekes Pontikos disagreed with such a simple explanation. He argued that admixture or gene flow between Africans and non-African groups since the last common ancestor could explain the differences. I am now of the opinion that Dienekes may have been right. My own confidence in the “serial bottleneck” hypothesis as the primary explanation for the nature of relationships of the phylogenetic tree of human populations is shaky at best. Why my errors of inference?

There were two major issues at work in my misjudgments of the arc of the past and the topology of the present. In the latter instance I saw plenty of phylogenetic trees which illustrated clearly the variation in genetic distance from Africans for various non-African groups. Why didn’t I internalize those visual representations? It was I think the power of the “Out of Africa” (OoA) with replacement paradigm. Even by the summer of 2010 I had come to reject it in its strong form, due to the evidence of admixture with Neanderthals, and rumors of other events which were born out to be true with the publishing of the Denisovan results. But to a first approximation the clean and simple OoA was still looming so large in my mind that I made the incorrect inference, whereby all non-Africans are viewed simply as a branch of Africans without any particular differentiation in relation to their ancestral population. Secondarily, I also was still impacted by the idea that most of the genetic variation you see in the world around us has its roots tens of thousands of years ago. By this, I mean that the phylogeographic patterns of 25,000 years in the past would map on well to the phylogeographic patterns of the present. This assumption is what drove a lot of phylogeography in the early aughts, because the chain of causation could be reversed, and inferences about the past were made from patterns of the present. My own confidence in this model had already been perturbed when I made my errors, but it still held some sort of sway in my head implicitly I believe. It is one thing to move on from old models explicitly, but another thing to remove the furniture from your cognitive basement and attic.

I have moved further from my preconceptions between then and now. It took a while to sink in, but I’m getting there. A cognitive “paradigm shift” if you will. In particular I am more open to the idea of substantive back migration to Africa, as well as secondary migrations out of Africa. A new paper in Genome Research is out which adds some interesting details to this bigger discussion, and seems to weigh in further against my tentative hypothesis that serial bottlenecks and genetic drift can explain variation in distance to Africans of various non-African groups. Human population dispersal “Out of Africa” estimated from linkage disequilibrium and allele frequencies of SNPs:

Genetic and fossil evidence supports a single, recent (<200,000 yr) origin of modern Homo sapiens in Africa, followed by later population divergence and dispersal across the globe (the “Out of Africa” model). However, there is less agreement on the exact nature of this migration event and dispersal of populations relative to one another. We use the empirically observed genetic correlation structure (or linkage disequilibrium) between 242,000 genome-wide single nucleotide polymorphisms (SNPs) in 17 global populations to reconstruct two key parameters of human evolution: effective population size (N_e) and population divergence times (T). A linkage disequilibrium (LD)–based approach allows changes in human population size to be traced over time and reveals a substantial reduction in N_e accompanying the “Out of Africa” exodus as well as the dramatic re-expansion of non-Africans as they spread across the globe. Secondly, two parallel estimates of population divergence times provide clear evidence of population dispersal patterns “Out of Africa” and subsequent dispersal of proto-European and proto-East Asian populations. Estimates of divergence times between European–African and East Asian–African populations are inconsistent with its simplest manifestation: a single dispersal from the continent followed by a split into Western and Eastern Eurasian branches. Rather, population divergence times are consistent with substantial ancient gene flow to the proto-European population after its divergence with proto-East Asians, suggesting distinct, early dispersals of modern H. sapiens from Africa. We use simulated genetic polymorphism data to demonstrate the validity of our conclusions against alternative population demographic scenarios.

Here are the details. The authors use patterns of linkage disequilibrium (LD) to gauge divergence, time since divergence, and, the effective population sizes of various groups. LD measures the correlations of genetic variations across loci. Because of the shuffling properties of recombination the correlation of markers across the genome should be relatively low. That is, they should be independent. But not in all cases. You could, for example, have two markers at two genes which are positioned together close physically. Now imagine a selective sweep event which increases the frequency of one of the variants through positive selection. Then the other marker on the second gene will also rise up in frequency by “hitchhiking” along on the other’s good fortune. Over time recombination will break apart these associations, but that decay of LD takes time. Important, it is not just natural selection which can generate these patterns within the genome. Population bottlenecks can drive up (and down ) fragments of the genome wildly because of the jacking up of “noise” into the generation-to-generation transmission of allele frequency values within a population. So LD can reflect both demographic events as well as bouts of adaptation.

Another measure of genetic variation that the authors rely is the fixation index (Fst). This ignores patterns of correlation across genes, and is a comparison of the variation of a given specific marker from population to population. High Fst values are a signal to a lot between population differentiation. An Fst value of ~0 indicates almost no between population differentiation. An extreme example would be a marker, 1, which is at frequency 0.5 in population A and population B, and a marker, 2, which is at frequency 0.0 and 1.0 in population A and B. Fst = 0.0 for marker 1, and 1.0 for marker 2. The Fst values in this paper are averaging across the genome, so obviously you’ll get values on the interval between 0 and 1, though it will usually be closer to 0 for any given marker (average intercontinental human Fst values at a given marker is famously ~15%; ergo, the chestnut of wisdom that 85% of variation is within races, and 15% between).

The chart at the top of the post shows the divergence times inferred from an Fst based statistic and an LD based statistic, above and below respectively. Two notable things to observe. First, the basic structure of both statistics is similar. Second, LD tends to give smaller values. The authors contend that LD is clearly an underestimate because it doesn’t take into account migration and fixation of allele frequencies, where one variant reaches 100% and so LD can not be calculated.

An aspect of LD which is useful for the authors is that they could calculate effective population sizes over time for their disparate samples. Below is a plot which shows the variation over time. I’ve added some clarifying labels (you should recognize many of the abbreviations from the HapMap populations):

Some observations:

1) African have a relatively large breeding population from before to after the putative OoA event.

2) Non-Africans show the small ancestral population during the Pleistocene that you’d expect, rising very slowly if at all from the exit event from Africa across the Ice Ages.

3) Then ~10,000 years ago you start to see divergences. The Chinese crest to very large effective populations. The Tuscans are next in order. Then there’s a cluster of Northwest European groups. The Japanese are between the Tuscans and Northwest Europeans. Finally at the bottom you see Finns and Mexicans. This is not too surprising in terms of rank order. But here’s the interpretation from the paper at the European patterns:

…likely the consequence of bottlenecks associated with the depopulation and recolonization of Northern Europe before and after the last glacial maximum…growth accelerates moving forward in time, with the average rate about threefold higher in the period 8–5 KYA than 20–8 KYA, presumably representing the impact of agricultural innovations on population density.

Remember my point that it is problematic to back project contemporary variation to the past? I think this needs to be emphasized here. My own hunch is that the difference between the Finns and other Northwest Europeans has to do with the relative late adoption of agriculture of the former, and the possibility that much of the genome of the latter is due to relatively late intrusions from southern and eastern Europe of explosively expanding agricultural groups. In other words, I’m not sure that aside from the Finns the recolonization after the LGM matters much at all.

Also, there’s one point I want to make sure to get to: the authors contend that the time until last divergence can’t be explained by a model of serial bottlenecks, as I had posited last summer. In other words, there has to be more complex dynamics at work here. They ran a bunch of simulations with constructed genomic sequences. Varying effective population size so that you have a bunch of serial bottlenecks was not enough to explain the difference between East Asians and West Eurasians when it came to time until last divergence to Sub-Saharan Africans. There has to be something more complex going on.

Speaking of complexity, I would also like to add that this paper reinforces the likelihood of a “pause” of the ur-non-African population after they left Africa. There’s a ~20,000 year gap between the time until the last common ancestor, and then the separation of West and East Eurasians. Several genomic analyses have pointed in this direction. I think the exact span of this interval is going to be debated, but I suspect that it is real. Additionally, the authors contend that the genetic closeness of West Eurasians to Sub-Saharan Africans may point to a ancient second migration out of Africa.

First, let’s walk back to where we started. Here was the rough “cartoon” model of the origin of modern H. sapiens sapiens circa 2009:

1) 50-100 thousand years ago you have a huge number of hominin groups across Africa and Eurasia.

2) At some point within this interval of time a small population of East Africans began to rapidly expand in population. They replaced in totality all other hominins, within, and outside of, Africa.

3) Therefore the inference can be made that all human beings alive today are descended from one tribe of East Africans.

At this point we can probably reject this model as being the full story. There is now suggestive evidence that the population fluctuations of Africans has been far more modest than non-Africans over the past 100,000 years. We also have to confront the likelihood of multiple admixture events with those “Other” hominins outside of, and possibly within, Africa. Finally, we can’t reject back migration events as well as multiple Out of Africa pulses.

I believe that the pattern of genetic variation across the whole world, including within Africa, has re-ordered itself radically over the past 10,000 years. We need to stop, and take a breath. If we know so little about the past 10,000 years, how much can we confidently infer about the past 100,000 years? Only a few points I suspect. For now.

Related: See Dienekes’ comment as well.

Citation: McEvoy BP, Powell JE, Goddard ME, & Visscher PM (2011). Human population dispersal “Out of Africa” estimated from linkage disequilibrium and allele frequencies of SNPs. Genome research PMID: 21518737

Related Posts:

Related