The lost 50,000 years of non-African humanity

The figure above is from Efficiently inferring the demographic history of many populations with allele count data. This preprint came out a few months ago, but I was prompted to revisit it after reading Spectrum of Neandertal introgression across modern-day humans indicates multiple episodes of human-Neandertal interbreeding.

The latter paper indicates that there were multiple waves to Neanderthal admixture into both Europeans and East Asians. The motivation to do the analysis is that East Asians are about ~12 percent more Neanderthal than Europeans. The authors don’t reject the idea that there was ‘dilution’ of Neanderthal through selection and especially admixture with a “Basal Eurasian” group which didn’t have Neanderthal ancestry. I don’t want to get into the details of the results except for one thing: the preprint confirms a consistent finding over the past eight years that the Neanderthal contribution to the modern human genome is from a single population.

Perhaps it was a small population. Or perhaps it was a large population that had gone through a bottleneck and was genetically not very differentiated. But unlike Denisovans it seems that it was a particular Neanderthal lineage that interacted with modern humans.

Moving back to the “Basal Eurasians,” notice some details of the schematic above. The divergence of Basal Eurasians from other non-Africans was ~80,000 years ago, across an interval of 70 to 100 thousand years ago. The admixture of Basal Eurasians into the proto-LBK population occurred ~30,000 years ago, across an interval of 11 to 41 thousand years ago. Ancient DNA from North Africa indicates that Basal Eurasians were already well admixed well before 11 thousand years ago.

The other dates make sense. 50,000 years for Europeans-Han Chinese, 96,000 years for Mbuti-Eurasians, and 696,000 years for Neanderthal-modern humans.

Ancient modern humans were highly structured. We know this from within Africa. But it seems clear that modern humans who had crossed over the other side of the Sahara also exhibited the same tendency. Basal Eurasians did not mix with Neanderthal populations. I suspect that that might be due to the fact that they were in Northeast Africa. At some point in the Pleistocene a mixing event occurred. This may have been precipitated by drier conditions and human retreat into only a few habitable areas, and the original Basal Eurasian populations may have mixed into other Near Eastern groups, which were part of the broader Neanderthal-mixed populations.

The great bottleneck after the post-Eemian separation

I’ve been thinking about effective population size. Basically it’s the inferred breeding population you estimate in the present, or in many cases the past, based on the genetic variation you see within the population. Another way to say it is that it’s the population size that can explain the genetic drift that you see in the data.

To give a concrete example, the population of the New England states of America was ~1,000,000 during the 1790 Census. The vast majority of this was due to natural increase from a settler population of about ~50,000 in 1650 (total fertility rate of women in New England was seven children in the years between 1650 and 1700). Of these, ~23,000 were Puritans or the offspring of Puritans who migrated around between 1630 and 1643 (due to religious differences with the English government of the period). One might think that a population of ~1,000,000 would be genetically diverse, but the ~50,000 in 1650 matter a lot more than the ~1,000,000 in 1790. The rate of mutation accumulation is pretty slow, so a population bottleneck or subsample has a huge long-term effect.

In fact, as you probably know one of the biggest determinants of genetic variation in New England whites of 1790 is the bottleneck that they share with all other non-Africans that dates to 50,000 years or more before 1790!

And these are just the coarse demographic considerations on the broader population/historical scale. In any normal random-mating human population, there’s some reproductive variance by chance (usually it is modeled as a poisson distribution; mean and variance being the same, though from I have read the variance in mammals is usually greater than the mean).

Some people have more children, and some people have fewer children. That means that there is a census population, and a breeding population, and the breeding population is invariably smaller than the census population. Some individuals don’t reproduce to the next generation, obviously. But there are also cases where some individuals have large numbers of surviving offspring, while others have only a few.

To make it concrete I plotted the distribution of the number of children of women older than 50 years of age from the year 2000 and later in the General Social Survey (GSS). You can see that the most common number is two, but there are a fair number with three. Only about 10% of women 50 years and older have no children in the GSS.

But the curious thing is that if you weight the number by the proportion, you notice that women who have three children may not be as common as women who have two children, but they are contributing more children to the next generation than women who have the more typical two children. And, though the number of women who have five or more children is only 11% of the sample, as opposed to 14% who have one child, they contribute nearly five times as many children as those with one child to the next generation (women with six children alone contribute more than women with one child).

Basically, not all the genetic variation in a given generation is created equally. Some people will contribute more to the next generation, and that has a homogenizing effect (there are models of mutation/selection/drift which establish equilibria values of variation in a stationary state).

I’m revisiting all of this for two reasons. First, in Who We Are And How We Got Here David Reich talks about a long period of a shared population bottleneck for “Out of Africa” (all non-Africans) groups before the primary expansion ~60,000 years ago. Second, in my conversation with Matt Hahn, he was very skeptical of drawing any correspondence between effective population and some inferred census size. In hindsight I think part of it is that in most organisms census quotes are more an art than science. Not so with humans.

This made me look more into the literature for humans again. Recently Browning et al. published Ancestry-specific recent effective population size in the Americas. It’s a great paper. Basically, it uses identity by descent tracts of different ancestry to tease apart the distinctive pre-admixture effective population sizes. If you take an admixed population and assume that it was a single population random-mating indefinitely, and then work backward in time, you’re probably going to produce rather strange effective population sizes (if the two groups are about the same genetic diversity beforehand, they’ll probably show an inflated effective population, because you are assuming the two groups were a big random-mating population long before they were randomly mating!).

There are many ways to infer effective population, and the identity by descent method seems reasonable for recent time periods. And one thing about recent population size estimates for humans is that you have reasonable census estimates (you don’t just check with simulations):

Our simulations showed that biased sampling of a structured population results in underestimation of most recent effective population size. When we compare the estimated current effective sizes of HCHS/SOL country-of-origin populations to World Bank population sizes (accessed via Google Public Data Explorer) from 1995 (when the average age of the sampled individuals was around 25), we find that the ratio of current estimated effective size to 1995 population size ranges from approximately 1/60 (Ecuador) to approximately 1/4 (Cuba), with typical values around 1/10. Although estimates of effective size in the most recent generations are affected by these issues, our simulations also showed that less recent generations are not affected. Thus our estimates are useful for learning about the effective population sizes at and before admixture.

The structured part is important. For example, the paper On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? explores how structured models of gene-flow might be confused when genomic inferences assume a panmictic population. Last year a paper in PNAS, Early history of Neanderthals and Denisovans, suggested that Neanderthals were characterized by a high structured meta-population, and that low effective populations from sampled genomes in this group of humans reflects this, rather than a genuinely low census size.

Browning et al. focused on recent population size inferences. I was curious about these inferences because we can compare them to real census sizes. From this I think I can tune my intuition at least to the possibily that census size of a random mating population is not likely to be two orders of magnitude above the inferred effective population size. Conversely, the rough mammalian value of an effective population size of ~1/3 the census size seems to be a ceiling. Population structure and bottleneck aside, humans seem to have enough basal reproductive skew that effective population size is less than half of the census size.

To focus on ancient population growth (or lack thereof), I reread Inferring human population size and separation history from multiple genome sequences (Schiffels et al. 2014), Exploring Population Size Changes Using SNP Frequency Spectra (Liu et al. 2015) and Neutral genomic regions refine models of recent rapid human population growth (Gazavea et al. 2014). The first two papers seem to suggest an “Out of Africa” population bottleneck that’s pretty long, with an effective population that’s somewhat lower than 5,000 individuals. In contrast, the last paper seems to have a sharp bottleneck of 200 individuals.

Remember, different models can produce the same empirical patterns in the genome. You can reduce genetic diversity by a modest, but long, bottleneck. Or, through a very sharp short bottleneck.

In Who We Are and How We Got Here David Reich definitely leans toward a long, but more modest, bottleneck. For anthropological and archaeological reasons this seems more plausible now than it did ten years ago.

But perhaps it makes more sense now that we have more ancient DNA and a more elaborated model of human history seen through the lens of population genetics. In Schlebusch and Jakkbonson’s Tales of Human Migration, Admixture, and Selection in Africa the authors come out say “For our species’ deep history in Africa, both paleoanthropological and genetic evidence increasingly point to a multiregional origin of AMHs [anatomically modern humans] in Africa.”

They’re only saying what I hear other people talking about.

Instead of the “Out of Africa bottleneck” being defining for our species, it’s only a phenomenon which is important for peoples outside of Sub-Saharan Africa. Arguably for the majority of the existence of our species something closer to multi-regionalism was operative within modern humans.

If fact, isn’t that what the new ancient DNA shows? Pulses of admixture and gene-flow between distinct groups? Arguably multiregionalism might be the answer to our origins, but also characterize many of the dynamics after the “Out of Africa” event.

In any case, the best evidence now points to the likelihood that modern human lineages began to diversify and diverge before 200,000 years ago. Conversely, most of the ancestry of modern humans outside of Africa dates to an expansion around ~60,000 years before the present (ancient DNA and archaeology seem to agree here).

This is probably right before the Neanderthal admixture event with non-African humans, at least the modern lineages we have around today. But, it turns out it does not define the point when non-African humans diverged from the ancestral African population. Another group, “Basal Eurasians” (who may not have been Eurasian at all), diverged before the expansion of all eastern non-Africans, Oceanians, as well as the ancestors of Pleistocene Europeans and Siberians. It does not seem that Basal Eurasians had any Neanderthal admixture. Basal Eurasian ancestry is substantial in the Middle East today (although lower than 50%), and non-trivial across broad swaths of Europe and South Asia, due to the expansion of farming. They seem to have been well mixed in places like North Africa with other Eurasian groups ~15,000 years ago. Presumably that was a “back to Africa” migration, since these people had Neanderthal ancestry.

All of this leads to the conclusion that the ancestors of Basal Eurasians/non-Africans must have gone through their shared bottleneck well before ~60,000 years before the present. And, it may have happened on the African continent. So with that, I’ll quote Schiffels et al.:

This comparison reveals that no clean split can explain the inferred progressive decline of relative cross coalescence rate. In particular, the early beginning of the drop would be consistent with an initial formation of distinct populations prior to 150kya, while the late end of the decline would be consistent with a final split around 50kya. This suggests a long period of partial divergence with ongoing genetic exchange between Yoruban and Non-African ancestors that began beyond 150kya, with population structure within Africa, and lasted for over 100,000 years, with a median point around 60-80kya at which time there was still substantial genetic exchange, with half the coalescences between populations and half within (see Discussion). We also observe that the rate of genetic divergence is not uniform but can be roughly divided into two phases. First, up until about 100kya, the two populations separated more slowly, while after 100kya genetic exchange dropped faster.

David Reich’s group, and others, now posit the existence of “Basal Human” population that mixed into West Africans, who can be modeled as primarily proto-East African (without Eurasian admixture), as well as this ancient outgroup. This means that estimates of divergences with non-Africans from something like MSMC may generate a composite if proto-East Africans are closer to the ancestors of non-Africans, which seems likely. One likely model is that the “Out of Africa” population emerged out of the northern edge of this proto-East African distribution of modern humans over 100,000 years ago (but after groups like the Khoisan and Basal Humans had already diverged).

Looking at Schiffel et al., they seem to posit lower in divergence times than seems likely to me. Is that perhaps due to unaccounted for admixture in lineages which fuse together groups which were earlier distinct?

In any case, with details about the divergence dates set aside, the MSMC results are actually in line with a new congealing consensus. Deep structure within Africa, but gene-flow between distinct populations, for at least ~100,000 years (possibly more). This is the period when population structure was quite fluid and indistinct along the East Africa continuum out of with non-Africans emerged.

Also, the archaeological evidence is now strongly suggestive of modern humans in places like Southeast Asia over 10,000 years before the wave which led to the ancestry of most extant populations. In fact, we know that this sort of early migration with no descendants isn’t abnormal. The first modern humans in Europe left no descendants (at least in any appreciable quantity). And the Altai Neanderthal seems to have modern-like admixture that dates to ~100,000 years before the present.

With all the evidence that modern humans were present in Africa, and expansively so, for hundreds of thousands of years, it seems unlikely that they never mixed with “archaic” Eurasian  lineages (and vice versa). In fact, as we obtain more and more Neanderthal and Denisovan genomes perhaps we’ll find that a rapid expansion like the one that occurred ~60,000 years ago across Eurasia and Oceania happened before, out of and/or into Africa.

Looping back to the effective population issue, the effective population of modern non-Africans seems to have been below ~5,000 for a while. There was minimal gene-flow with other populations for many generations. Reich has a schematic of 40,000 years between 90,000 and 50,000 BP in Who We Are and How We Got Here. But that’s obviously just a ballpark figure. I have a hard time believing that the census size was around 500,000. The world population 10,000 years ago is usually estimated to be 1 to 10 million. Human populations were probably much larger at the end of the Pleistocene than 100,000 years ago. But a figure of 10% effective would give 50,000, which seems a reasonable number, especially with the likelihood that we’re talking about many tribes over a wide ecological zone. Meta-population dynamics of extinction and resettlement in inclement periods probably drove down the effective population.

The separation seems to be distinct from the older multiregional phase. What could explain it? The existence of the Sahara, and periods of extreme desertification seems the most likely candidate. I can’t say much with any credibility because I don’t know the archaeology and paleoclimate literature, but before domesticated animals, it was probably difficult for hunter-gatherers to make a go of it in the deep Sahara during the driest phases.

If I had to bet, the Eemian interglacial, 130 to 115 thousand years ago, is when I would assume there was:

  1. Lots of gene flow across the Sahara, perhaps in both directions
  2. A major population expansion of humans, of all sorts

This gives plenty of time for a wave of modern humans to push east, probably going through milder climates, rather than expanding north into Neanderthal or Denisovan territory. Eventually, some group must have mixed with the ancestors of the Altai Neanderthals. It seems likely that a cold and dry spell after the Eemian would have been optimized more to the well adapted Eurasian groups, and modern populations would have withdrawn into refugia. The brutally expanding Sahara would have divided the majority of modern humans, who existed in the meta-populations to the south that dated back hundreds of the thousands of years, from the groups on the northern fringe.

One can imagine that large numbers of modern humans were either absorbed or went extinct with the expansion of Neanderthals and other archaics. Though Neanderthals and Denisovans were interfertile with moderns, the lineages were still distinct enough that it looks like there was some hybrid breakdown. Just as modern humans seem to have purged many Neanderthal alleles from our genome, the opposite dynamic was probably at work.

There was clearly some structure in the relict modern human group that was separated from the African populations. Basal Eurasians did not mix with Neanderthals, but the ancestors of all other non-African humans did. Though one has to be careful about such geographical inferences, that suggests to me that the range of modern humans in the period between 60,000 to 80,000 years ago extended further back into pockets of northeast Africa, where no contact with Neanderthals would have occurred. Perhaps, in the end, we’ll end up thinking that the Basal Eurasians in some ways were a lot more like Africans south of the Sahara, as they didn’t undergo the massive range expansion of other populations during the Upper Paleolithic.

I’ll end with some predictions.

  • Ancient DNA of proto-moderns and archaics in eastern Eurasia dated to between 50,000 to 100,000 years BP will be analyzed at some point and will exhibit a fair amount of admixture. That is, the Altai Neanderthal was not exceptional, and probably relatively attenuated. I’m moderately confident of this.
  • The pre-60,000 year eastern Eurasians will be found to have left some of their genes in modern eastern Eurasians. Especially in Southeast Asia and Oceanian. Probably in the 1-10% range. I’m moderately confident of this.
  • The Denisovan ancestry in Oceanians is mediated by a “first wave” group “Out of Africa.” I have low confidence in this, but I really wouldn’t be surprised either way. My confidence in my confidence is low!
  • At some point we’ll obtain sequence from a 1 million year old hominin somewhere in the colder/drier climes of Eurasia (we have a 900,000 year old horse genome). This will predate Neanderthal/Denisovans. We will see from this that some of these super-archaic populations left their heritage in later archaics, and therefore our own lineage. I’m rather confident of this.
  • By hook or crook we’ll get more ancient genomes out of African samples, and confirm a lot of ancient population structure, as well as some gene-flow from archaic non-modern lineages. Probably around the same range you see in non-Africans (though some of the gene-flow may also apply to non-Africans, since they didn’t separate from eastern Africans until 100,000 to 150,000 years ago). I’m rather confident of this.
  • H. naledi will return sequence at some point. I’m very confident of this. I don’t have inside knowledge, but I know they’re going to keep trying. They are getting more samples.
  • H. naledi will be found to have contributed ancestry to modern southern African populations. I’m moderately confident of this.
  • At some point ancient genomes from the Americas will confirm the existence of an earlier group which was only distantly related to modern New World populations descended mostly from Siberians. There is indirect evidence of this group from South American populations, but we’ll get individuals who are much more distinct at some point in the future. I’m moderately confident of this.
  • Basal Eurasians will be found to have inhabited Southern Arabia/Persian Gulf region. But “pure” population will have been found to have disappeared around the Last Glacial Maximum ~20,000 years ago, as the human populations to the north moved south, and the Near East’s southern fringe became drier. I’m moderately confident of this.

Selection is going on with SLC24A5….

The ancestral allele for rs1426654 at SLC24A5

On this week’s episode of The Insight, I talked to Matt Hahn about why he wrote his new book, his opinions on “Neutral Theory”, and what he thought about David Reich’s op-ed. Without Spencer’s supervision, I have to admit that I think I lost control and just went “full nerd”. Next week we’re dropping Carl Zimmer’s podcast, so rest assured that the world will come back into balance, and The Insight will be more welcoming to civilians!

At a certain point, Matt and I were discussing allele frequency differences between populations and he came close to saying all such differences between human populations were of modest frequency in relation to pairwise comparisons (e.g., 40% vs. 49%). Obviously, this is not true, because there is always the huge difference in SLC24A5 at SNP rs1426654 (at Duffy and a few other loci). A substitution of a G for an A converts the codon from alanine to threonine.

You have heard of this locus because of a paper in 2005, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. This paper came out in December of 2005, a few years after Armand Leroi wrote in Mutants that geneticists still hadn’t come to grips with normal variation in pigmentation in humans. The above publication was the first step in solving this question in the years between 2005 to 2010, at least to a good first approximation.

In the sample in the paper they explain 25-40% of the variation in melanin index between Africans and Europeans with this single genetic change (for various technical reasons it’s probably not that big an effect, though it is still big, and probably the largest effect quantitative trait locus for pigmentation in the human genome).

It turns out that this mutation, the derived variant, is almost disjoint is frequency between Europeans and Africans. That is, about ~100% of Africans carry the ancestry G base at while ~0% of Europeans carry the G base (as opposed to the A base). Interestingly, East Asians carry the G base at ~100% frequency as well. If you genotype an anonymous individual and their genotype is AG or GG on at rs1426654 then it is highly likely that that individual is not a European.

To give an example of how this works, in 2013 I stumbled onto a paper which genotyped 101 Europeans from Cape Town in South Africa. That means there are 202 alleles (two per person) at rs1426654. Of these, 5 of the alleles were ancestral (G). From this, I immediately concluded that it was highly likely that the Afrikaaner people of South Africa have non-European ancestry. I came to this conclusion because of 5 copies of the ancestral allele, ~2.5%, is shockingly high for a European population, and it was long surmised that the Afrikaaner people had some non-European heritage (Khoisan, Bantu, South and Southeast Asian) ancestry. The major of the whites sampled in Cape Town could have been Afrikaaners (I’ve confirmed this with genome-wide data).

To get a sense of where my intuitions come from you need to look at allele counts within populations. Using 1000 Genomes, Yale’s Alfred, and Gnomad I assembled a representative list to give you a sense of what’s going on. Using 126,548 counted alleles in Gnomad for individuals of European (non-Finnish) descent you see that 0.38% out of the total, 486, are ancestral.

Population Ancestral alleles Total alleles Freq
Samaritan 0 74 0%
Basque 0 216 0%
Greeks (Thrace, Athens) 0 184 0%
Burusho 0 50 0%
Pandit Brahmin, Kashmir 0 40 0%
European (Non-Finnish) 486 126548 0%
Ashkenazi Jewish 47 10148 0%
European (Finnish) 329 25790 1%
Iraq Kurds 1 68 2%
Yemenite Jews 2 78 3%
Havyaka Brahmin, Karnataka 2 62 3%
Palestinian 4 122 3%
Gujarati 10 206 5%
Tunisian Berber 6 110 5%
Andalusian 14 252 6%
Iranian 6 84 7%
Pashtun 21 190 11%
Uttar Pradesh Brahmin 4 34 12%
Pandit Brahmin, Haryana 13 78 17%
Punjabi 42 192 22%
South Asian 6921 30774 22%
Kalash 14 48 29%
Telugu 71 204 35%
Bangladeshi 80 172 47%
Sri Lanka Tamil 105 204 51%
Adi-Dravida, Karnataka 21 34 62%
Masai Kenya 192 286 67%
Austro-Asiatic tribe, Odisha 43 56 77%
Luhya Kenya 155 188 82%
Hausa 68 76 90%
Mende Sierra Leone 155 170 91%
Gambian 209 226 92%
Ibo 90 94 96%
Austro-Asiatic tribe, Odisha 92 96 96%
Esan Nigeria 193 198 97%
Yoruba Nigeria 213 216 99%
Biaka 135 136 99%
East Asian 18728 18856 99%
Ghana 140 140 100%
Mbuti 74 74 100%

Last fall Crawford et al. reported that rs1426654 is embedded in a haplotype that’s about ~30,000 years ago. Additionally, they contend that its presence within Africa is probably no earlier than the Holocene, the last ~12,000 years.  Martin et al. report that KhoeSan exhibit higher frequencies of the derived allele because of Eurasian back-migration and then in situ natural selection. Of course, not all Eurasians. Most East Asians have the ancestral variant of rs1426654.

This leaves us with West Eurasians, North Africans, and South Asians. I’ve put a few South Asian populations in the list to show you that there is a wide range of variation in allele frequencies. The South Asians in Gnomad, probably mostly Diaspora, have the ancestral variant at only 22%. In contrast, Austro-Asiatic speaking South Asian groups from northeast India have very high frequencies of the ancestral variant. There has clearly been in situ selection in some South Asian populations for the derived variant at rs1426654. Ancestral North Indian groups (ANI) probably brought the derived allele, and Ancient Ancestral South Indians (AASI) probably tended to carry the ancestral allele, like East Eurasians and Oceanians. Additionally, South Asian populations often have high drift. Some of the differences in the Alfred data seem to be impacted by this.

The situation in the Middle East, North Africa, and Europe is different.  In the Middle East and North Africa, the ancestral variant is present at frequencies around 1-10%.  Some of this can probably be attributed to admixture from Africa and in some cases South and East Asian populations. Ancient DNA from the Middle East and North Africa presents a mixed picture. The farmers who brought the Neolithic to Europe carried the derived variant at rs1426654, and some of the ancient Middle Eastern samples carry it. But not all. The recent Iberiomauserian samples which date to ~15,000 years ago don’t seem to have had the derived variant.

Though the hunter-gatherers of Western Europe only seem to have carried the ancestral variant at rs1426654, the hunter-gatherers of Scandinavia and Eastern Europe did exhibit the derived variant in some frequency, though lower than modern Europeans.

My own hunch is that the original genetic background against which the A mutation at rs1426654 emerged will be found increasing in frequency first somewhere in the Near East after the Last Glacial Maximum. But no ancient population shows the frequencies of the derived variant we see in modern Europeans. In isolated populations subject to drift it wouldn’t be surprising if the ancestral variant decreased to ~0%, But in European populations today in the vast majority of cases the ancestral variant is far lower than 1%, even though we know that within the last 10,000 years the ancestral populations streams had several groups with very high frequencies of that ancestral variant. The low frequency is not due to a freakish bottleneck all across Europe. It has to be selection

One thing I have pointed out is that this very low frequency of the ancestral variant indicates that the advantage at rs1426654 for the A allele in Europe is additive. In Northern Europe, the frequency of the derived variant that confers lactase persistence tops out at around ~90 percent. We know this region of the genome has been targeted by natural selection, but lactase persistence also happens to express dominantly genetically. That is, one variant of the mutant allele confers the phenotype. Once you hit ~90 percent of the derived variant only ~1 percent of the population would be lactose intolerant homozygotes (two copies of the ancestral variant). In the Gnomad sample of 60,000+ Europeans, they count three homozygote genotypes rs1426654. That’s 0.005%.

Something is happening at rs1426654. Selection. But why? No one really has any explanation beyond the obvious.

There were possibly late archaic introgression events in Eurasia

A few weeks ago I posted on the strong likelihood that there were at least two Denisovan admixture events in Eurasia into modern humans. That’s probably the floor, not the ceiling. We have an Altai Denisovan genome, but the proportion is so low in most of South and Southeast Asia I don’t think we have a good grasp of how that component differs from the Oceanian fraction, which is much higher.

At the AAPA meeting last week I noticed something strange in one of the presentations: introgressed Denisovan variants which were present among East Asian populations, but lacking elsewhere. The fractions were not >50%, but they were >10%. The Denisovan variants were nearly absent outside of this core zone of East Asians.

There are two possible reasons for this distribution. One reason is that Denisovan variants were segregating in East Asians for thousands of years, and a common bottleneck, or, more likely selection, drove them up in frequency. Another, not exclusive, explanation is that admixture occurred in East Asia relatively late. The Denisovan signature is totally absent in the New World. Either that’s selection or drift eliminating variation, or, it’s the fact that this admixture event happened in East Asia less than about 30,000 years ago when Native American populations’ East Asian-like source population began to divergence from that of East Asians.

One thing that we know from paleontology is that species exist before the remains we find, and persist after the remains we find. It’s quite possible that small relic populations of Denisovans persisted for thousands of years after modern humans came to dominate the East Asian landscape.

We’re descended from Lilith and Eve

From the comments:

Something that confused me very early on in the book- the San are shown branching off from the rest of humanity prior to Mitochondrial Eve. How can Eve be a common ancestor in this case? Admixture?

The commenter is talking about an early portion of Who We Are and How We Got Here. Someone who reads a book like that is “in the know,” and this is a reasonable question. But it points to a bigger issue that’s going to crop up with the complexificaiton of the origin of anatomically modern humanity over the last few years, and proceeding forward.

An upside of the very-recent-out-of-Africa model, where all modern humans descended exclusively from a group of East Africans who lived ~50,000 years ago, is that it was very simple. So simple that you could write the model out on a postcard.

The new model benefits from being correct and making humans less sui generis (though perhaps that is a bug rather than a feature to some?), but it also forces more thought and complexity on the lay audience.

Calibration on the coalescence of the last common ancestor of all mitochondrial DNA lineages for humans has changed several times, the last estimates are for a time to last common ancestor for all mtDNA lineages being around 100 to 200 thousand years ago. This is curious in light of the fact that both fossils and genomics are starting to suggest that anatomically modern humans emerged in their current form 200 to 400 thousand years ago.

The shallower coalescence isn’t that surprising. Y and mtDNA both have lower effective population sizes and so higher turnover rates. These high turnover rates mean the extinction of other lineages. As most of you know, the extinction of these mtDNA lineages does not mean that the genetic material of other women alive at the same time as “mtDNA Eve” is not present in modern humans (though who knows what it means to say there’s distinctive genetic material left after all these generations with recombination). Eve was always simply a personification of the coalescence of the mtDNA genealogy. Both the Y and mtDNA phylogenies and coalescence were useful in their time. They pointed to the likely important role of Africa in the origin of modern humans, and the relatively recent time depth of our species. But their coalescence at a specific time was somewhat random around a certain expected value. This is why it was not surprising at all that “Y chromosomal Adam” and “mtDNA Eve” lived at different times (there is some evidence that the Y chromosome has had a lower long-term effective population size).

The above question is inspired by the fact that San Bushmen seem to diverge earlier in their total genome than in their mtDNA. There’s always been a distinction in the literature between demographic divergence between two populations, and the divergence of their genetic genealogies. Oftentimes daughter populations share genetic variation that dates back to before their separation. But sometimes, you have this situation where it seems that the starting point of genetic variation post-dates the divergence between population.

What’s the explanation? I think the simplest one is admixture and reciprocal gene flow, as implied by the commenter. In fact, Pontus Skoglund’s latest African ancient DNA paper implies that there was some sort of isolation-by-distance cline in the eastern part of the continent, from modern Ethiopia far to the south.

And, it may also turn out that the San Bushmen themselves are an admixture between two very different populations, one more like other eastern Africans, and one basal to this clade. If so, then it may be that their divergence estimate is a compound, and the most divergent mtDNA lineages come from the eastern African population that mixed with the more basal population.

The bigger answer is that we really need to move beyond the “mitochondrial Eve” story as being central. It had its time and played its role, but we can move beyond it. Otherwise, the public will be in for a big surprise as ancient DNA starts to uncover the story of a whole antediluvian world within Africa of anatomically modern humans that flourished for hundreds of thousands of years before a small branch left to populate the rest of the world ~50,000 years ago.

On the eons of salutary neglect

New preprint, Something old, something borrowed: Admixture and adaptation in human evolution. This part jumped out at me:

…Indeed, for most traits, the contribution of archaic human alleles to present-day human phenotypic variation is not significantly larger than those of randomly drawn non-introgressed alleles occurring at the same frequency in modern humans. Interestingly, in both studies, neurological and behavioral phenotypes are an exception, with Neanderthal alleles contributing more to variation in these traits than frequency-matched modern human alleles.

I joked that perhaps we can talk about people “acting like a Neanderthal” again?

But seriously, I was thinking today about one particular stage of human evolutionary history, the long sojourn outside of Africa for the ancestors of non-Africans (including “Basal Eurasians”) which produced a sustained bottleneck. In David Reich’s new book he alludes to it, and I’ve seen other mentions of it (this is an old idea).

How long was the bottleneck? What was the normal census size? What were the cultural implications of having a small isolated population?

The PSMC and MSMC diagrams I’ve seen don’t really answer my questions.

Carl Zimmer profile of the Reich group at work

The New York Times has a review up (sort of) of Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, David Reich Unearths Human History Etched in Bone. But since Carl has been covering the publications coming out of the Reich lab for many years now it’s kinds of a survey of the whole operation and how David and company go where they are.

The last few paragraphs are pretty tantalizing:

As of last month, Dr. Reich’s team has published about three-quarters of all the genome-wide data from ancient human remains in the scientific literature. But the scientists are only getting started.

They also have retrieved DNA from about 3,000 more samples. And the lab refrigerators are filled with bones from 2,000 more denizens of prehistory.

Dr. Reich’s plan is to find ancient DNA from every culture known to archaeology everywhere in the world. Ultimately, he hopes to build a genetic atlas of humanity over the past 50,000 years.

“I try not to think about it all at once, because it’s so overwhelming,” he said.

Three years ago I was having a discussion with someone from Reich’s group and mentioned offhand that in terms of getting data I give the nod to Eske Willerslev’s group of researchers, though I thought the people around David and Nick Patterson tended to perform a deeper analysis. Three years is a long time, and as the results since then have shown, the “SNP capture” methodology is very cost effective. They might not get the whole genome sequences of individuals, but they get lots of individuals. And for a lot of population genomic analysis, you want lots of individuals more than the whole genome sequence.

But not all. The more ancient individuals probably have a lot of variation “private” to them and their population, so you don’t know all the neat polymorphisms you might miss.

With that gripe submitted, it’s pretty incredible that the Reich lab has 3,000 ancient samples in the pipeline for analysis. In Who We Are and How We Got Here David Reich outlines just how he and his collaborators transformed the artisanal process of data generation from ancient DNA into a rationalized and commoditized factory process.

White modern Northern Europeans are genetically more like brown South Asians than brown(ish) ancient Northern Europeans were

The Guardian has a piece by Arathi Prasad, Thanks to Cheddar Man, I feel more comfortable as a brown Briton. Dr. Prasad is a geneticist, so the science is pretty decent (she’s probably seen the documentary ahead of time too).

But there is a curious quirk here and it reveals something about human psychology: modern Britons are genetically much closer to South Asians, like Arathi Prasad, than these ancient darker-skinned Britons. The plot to the left illustrates this (it’s using the Dystruct package). The far right of the top panels represent South Asians. You can see Europeans pretty clearly. Let’s note two things:

1) Modern Europeans (except for Sardinians) share an orange “steppe” component with most South Asians (these are no doubt Indo-European migrations of the Bronze Age)

2) The brown element represents European hunter-gatherers. This element is found at varying quantities across Europe, with the lowest fractions in Sardinians. Though present in South Asians (this may or may not be an artifact to be honest), it’s not present at very high frequencies.

One always has to be careful about taking these proportions as literal representations of ancestral populations. They are not. But what they show is that modern Northern Europeans and South Asians have been touched by the same population movements over the past 5,000 years, and so are genetically much closer than the people who lived in Northern Europe and South Asia 5,000 years ago.

Humans are a visual species. In a pre-modern environment, physical cues were important for group identity, though I suspect just as much due to scarification and tattooing as phenotypic differences due to biology. The fact that Cheddar Man, and Paleolithic hunter-gatherers in Western Europe more generally, probably resembled modern South Asians more than they do modern Northern Europeans (I think they were more likely to be olive-brown than dark-brown, but I’m not confident), is more salient to human folk biology than the fact that modern Northern Europeans are much closer genetically to South Asians than the more “brown” ancient Northern Europeans.

Stuff like this always reminds me of the deep wisdom in Artur C. Clarke’s Childhood’s End. The ultimately benevolent alien species which mentored humanity shielded us from their physical appearance because the knew we’d find it horrifying. The substance of what they did for us, who they were, was going to be less important to immature humans than the fact of what they looked like.

Note: Fst between Sindhi from Pakistan and WHG (Cheddar Man was one) is 0.087. Sindhi from Pakistan and English is 0.023. English to WHG is 0.058 (source). Fst can not be naively interpreted as “genetic distance.” But, this gets at the fact that Mesolithic European hunter-gatherers were very distant from modern South Asians. And widespread gene flow and admixture over the past 5,000 has compressed a lot of genetic differences which were starker across geography in the past.

Ancient DNA and Dystruct

There’s a new preprint, Inference of population structure from ancient DNA, which uses explicit demographic models to make inferences about ancestry. I haven’t dug into the guts of the math, but, the outputs are quite interesting.

What seems to be obvious is that Western Eurasia has a much richer set of models to choose from than elsewhere. European, Middle Eastern and South Asian populations exhibit the greatest difference between Dystruct and Admixture.

Five things paleogenetics tells us about the human past

Since I’m flogging Enlightenment Now, I thought perhaps I should remind readers that Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past by David Reich is out in 1.5 months. For years people have asked me about a book to read to understand what genetics has to say about human history. This is that book.

And yet before you get there, what do you need to know?

Here are five things you should know. Five things that we know with a very high degree of certitude.

  1. Many (most?) modern populations clusters we perceive as clear and distinct date to the last 5,000 years. To give a concrete example, the genetics that we find to be typical of Northern Europeans only comes into being ~5,000 years ago, with the Corded Ware populations. To my knowledge none of the prior populations along the North European plain exhibit the mix of characteristics and ancestries typical of modern Northern Europeans in any way, shape, or form.
  2. Concomitantly, many of the physical characteristics we find typical of modern populations are probably relatively recent configurations due to natural selection.
  3. Non-African populations, whether European, Middle Eastern, South Asian, (South)East Asian, Amerindian or Oceanian, derive from a population expansion that dates to ~50,000 years BP. These populations experienced a bottleneck on the order of 1,000 to 10,000 breeding individuals.
  4. Modern humans are old. Population structure within Africa of modern humans dates to at least 200,000 years before the present, and perhaps even earlier.
  5. Population turnover was ubiquitous. Change was the only constant.