There were possibly late archaic introgression events in Eurasia

A few weeks ago I posted on the strong likelihood that there were at least two Denisovan admixture events in Eurasia into modern humans. That’s probably the floor, not the ceiling. We have an Altai Denisovan genome, but the proportion is so low in most of South and Southeast Asia I don’t think we have a good grasp of how that component differs from the Oceanian fraction, which is much higher.

At the AAPA meeting last week I noticed something strange in one of the presentations: introgressed Denisovan variants which were present among East Asian populations, but lacking elsewhere. The fractions were not >50%, but they were >10%. The Denisovan variants were nearly absent outside of this core zone of East Asians.

There are two possible reasons for this distribution. One reason is that Denisovan variants were segregating in East Asians for thousands of years, and a common bottleneck, or, more likely selection, drove them up in frequency. Another, not exclusive, explanation is that admixture occurred in East Asia relatively late. The Denisovan signature is totally absent in the New World. Either that’s selection or drift eliminating variation, or, it’s the fact that this admixture event happened in East Asia less than about 30,000 years ago when Native American populations’ East Asian-like source population began to divergence from that of East Asians.

One thing that we know from paleontology is that species exist before the remains we find, and persist after the remains we find. It’s quite possible that small relic populations of Denisovans persisted for thousands of years after modern humans came to dominate the East Asian landscape.

So merfolk are a real thing now: adaptation to diving

When Rasmus Nielsen presented preliminary work on diving adaptations a few years ago at ASHG I really didn’t know what to think. To be honest it seemed kind of crazy. Everyone was freaking out over it…and I guess I should have. But it just seemed so strange I couldn’t process it. High altitude adaptations, I understood. But underwater adaptations?

The paper is out now, and open access, Physiological and Genetic Adaptations to Diving in Sea Nomads. There are a lot of moving parts in it, so I really recommend Carl Zimmer’s piece, Bodies Remodeled for a Life at Sea:

On Thursday in the journal Cell, a team of researchers reported a new kind of adaptation — not to air or to food, but to the ocean. A group of sea-dwelling people in Southeast Asia have evolved into better divers.

When Dr. Ilardo compared scans from the two villages, she found a stark difference. The Bajau had spleens about 50 percent bigger on average than those of the Saluan.

Only some Bajau are full-time divers. Others, such as teachers and shopkeepers, have never dived. But they, too, had large spleens, Dr. Ilardo found. It was likely the Bajau are born that way, thanks to their genes.

A number of genetic variants have become unusually common in the Bajau, she found. The only plausible way for this to happen is natural selection: the Bajau with those variants had more descendants than those who lacked them.

As some of you might know “sea nomads” are common across much of Southeast Asia. The Bajau are just one major group. The anthropology here is not surprising…but the biology most definitely is. For various technical reasons, the authors didn’t have extremely fine-grained genome data (high coverage sequence data, or very high-density chips). So they didn’t do some haplotype-based tests (e.g., iHS), though that might not matter anyhow (see below why). But, looking at the genome-wide relatedness and comparing that to makers which deviated from that expectation, both of which they could do robustly, the authors narrowed in on candidates for targets of selection. From the paper: “Remarkably, the top hit of our selection scan (Table 1) is SNP rs7158863, located just upstream of BDKRB2, the only gene thus far suggested to be associated with the diving response in humans.

There are many cases where researchers find selection signals in an ORF of unknown function. In this case, the top hit happens to be exactly in light with the biological characteristic you’re already curious about. The alignment is so good it’s hard to believe.

But wait, there’s more! Spleen size variation is not due to variation on just one locus. It’s polygenic, albeit probably dominated by larger effect quantitative trait loci (QTLs) than something like height (so more like skin color). They compared the Bajau to a nearby population, the Saluan, as well as Han Chinese as an outgroup. On the whole the distribution of allele frequency differences should reflect the phylogeny (Han(Bajau, Saluan)). The key is to look for cases where the Bajau are the outgroup. From the paper:

While some of the selection signals uniquely present in the Bajau may be related to other environmental factors, such as the pathogens, several of the other top hits also fall in candidate genes associated with traits of possible importance for diving. Examples include FAM178B, which encodes a protein that forms a stable complex with carbonic anhydrase, the primary enzyme responsible for maintaining carbon dioxide/bicarbonate balance, thereby helping maintain the pH of the blood….

FAM1788 shows up again later:

We identified one region overlapping chr2:97627143, which falls in the gene FAM178B, that falls in the 99% quantile of the genome-wide distribution for the fD statistic (Martin et al., 2015). Of the populations considered, this region exclusively stands out in the Bajau, and the signal appears strongest when using Denisova as source. Notably, this region was also proposed as a candidate for Denisovan introgression in Oceanic populations by….

What they’re saying here is that the allele at this locus adapted to diving may have come originally from the Denisovans! Remember, we already know that one of the Tibetan high altitude adaptations come from the Denisovans. So this isn’t surprising, but it is pretty cool. But most of the other hits don’t seem to be introgressed. That is, they come from modern humans (or have been segregating in our species for a long, long, time).

Many of the alleles found at high frequencies in the Bajau are found in other populations, just as very low frequencies. This implies that selection is operating on standing variation. Another suggestion that this is so is that the widths of the regions of the genome impacted by selection seem rather narrow. In contrast, the Eurasian adaptation to lactose digestion is from a de novo mutation, something that wasn’t at high frequency at all in the ancestral human populations. The sweep is strong and powerful around that single mutation, and huge swaps of the genome around it “hitchhiked” along so that on a population-wide level the area around the mutational target was homogenized (basically, a lot of one single original mutant human is found around that causal variant for lactase persistence).

Anyone who has learned basic quantitative genetics knows that one way to change a mean trait value is just to change the allele frequencies at a lot of different loci…over time you’ll have a lot of low-frequency alleles present in an individual which would otherwise never have occurred. Eventually, you can have a median value which is outside of the range of the original distribution. The mechanism here in a dynamic sense seems totally comprehensible, though as Carl Zimmer notes, and the rather short-shrift given in the Cell paper suggest, they’re not sure in a proximate sense how the selection is working (i.e., obviously there is a fitness implication but how does it manifest? Do people die? Are they unable to support a family?).

One key issue is to consider the demographic history of these people. The authors tried to model it genetically:

We found a model compatible with the data that has a divergence time of ∼16 kya, with subsequent high migration from Bajau to Saluan and low migration from Saluan to Bajau (for details see STAR Methods). We note that the estimate of 16 kya may reflect the divergence of old admixture components shared in different proportions by the Saluan and the Bajau, similarly to, for example, European populations being closely related to each other but differing in the proportion of ancient admixture components….

The authors cite papers which outline the real story about what happened, so they know that the model is somewhat unrealistic. For example, Ancient genomes document multiple waves of migration in Southeast Asian prehistory:

Southeast Asia is home to rich human genetic and linguistic diversity, but the details of past population movements in the region are not well known. Here, we report genome-wide ancient DNA data from thirteen Southeast Asian individuals spanning from the Neolithic period through the Iron Age (4100-1700 years ago). Early agriculturalists from Man Bac in Vietnam possessed a mixture of East Asian (southern Chinese farmer) and deeply diverged eastern Eurasian (hunter-gatherer) ancestry characteristic of Austroasiatic speakers, with similar ancestry as far south as Indonesia providing evidence for an expansive initial spread of Austroasiatic languages. In a striking parallel with Europe, later sites from across the region show closer connections to present-day majority groups, reflecting a second major influx of migrants by the time of the Bronze Age.

The upshot is that the predominant genetic character of Southeast Asia dates to the Neolithic, and to a great extent even more recently. The deep divergence between two Austronesian groups may be an artifact of drift in one group (probably the Bajau), or different proportions of admixture from the primary ancestral components in maritime Southeast Asia: Austronesian, Austro-Asiatic, and indigenous hunter-gatherer. As per Lipson 2014 the Bajau are probably mostly Austronesian but may have Negrito ancestry from the Phillippines, as well as indigenous hunter-gatherer more closely related to Malaysian Negritos. There probably isn’t so much Austro-Asiatic in Sulawesi, but I’d bet the farmers have more of that.

Ultimately the question here is are the adaptations to diving old or new? Anthropologists and historians have all sorts of theories, as reported in the Carl Zimmer article and hinted at in the paper. My own bet is that they are both old and new. By this, I mean that some sort of maritime lifestyle was surely practiced by indigenous people between the end of the last Ice Age and the arrival of farmers. But if the variation was present in humans more generally, the Austronesians would probably also have the capacity for the diving adaptations. Mixing with hunter-gatherers and another bout of selection could have done the trick in concert. So the adaptations and lifestyle are old, but the Bajau people may date to the last 2,000 years, and selection within this population may be that recent.

A lot of the answer might be found in looking at the other sea nomad groups….

The maturation of the South Asian genetic landscape


The above is a stylized map from the preprint, The Genomic Formation of South and Central Asia. In broad strokes, it says some things that are very expected, and some things that are not so expected.

The abstract is long, but I’ll reproduce it in full:

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

First Turk Empire

Though the abstract is focused on South Asia, the preprint actually has quite a bit about Inner Asia, because of the provenance of the samples. We often view the typical person in the past as a peasant in an agricultural society, and therefore relatively immobile over their lifetime. The story we like to tell ourselves is that non-elites in premodern societies, on the whole, had narrow horizons, delimited by their home village, or the neighboring network of villages.

But results from this work and others show that mobile populations where individuals spanned vast areas of Eurasia across their lifetimes, were not that uncommon for pastoralists. We know this historically, as empires such as that of the Turks and Mongols were defined by a ruling elite whose writ extended from eastern to western Eurasia. The Sintashta samples, which exhibit genetic heterogeneity, with some individuals very different from the norm in their settlement, is exactly what you’d expect from a social and political culture which was united in some fashion over huge distances.

As the sample sizes for ancient DNA have increased it seems rather clear that demographic dynamics that we see in later historical expansions of Inner Asian polities extends back to the Bronze Age. With expanding populations across the ecologically friendly landscape, the ancient proto-Indo-Europeans seem to have mixed with the local substrate wherever they went, just as Turks did later. As they moved west, they mixed with late Neolithic Europeans, as they went east, they mixed with Siberian populations, and as they conquered south they mixed with descendants of West Asian farmers.

One of the primary aspects that I think one needs to keep in mind is that one can’t just imagine that this was defined by simple diffusion dynamics. Historically the boundary between pastoralists and peasants could be fluid, but when political resistance collapses pastoralists have been able to use their military prowess to swarm across the lands of agriculturalists. In other words, centuries of gradual inter-demic gene flow might be interrupted by a rapid “pulse” admixture. There’s no reason that pre-literate polities couldn’t exist. The Inca were one such example, the homogeneity of the Uruk civilization in the 4th millennium BC is strongly suggestive of an imperial hegemony or paramountcy.

Another dynamic is that pastoralists are highly mobile, and so may leapfrog over territory which is unsuitable. Or, they may move so rapidly that there isn’t much mixing with populations in between point A and point B.

This is apparently the case with the Bactria–Margiana Archaeological Complex. These people were mostly descended from people related to the eastern farmers of West Asia, those in modern day Iran. Some of their ancestry had affinities with Anatolian farmers, and there is some evidence even of Siberian admixture in this region. But there are three important takehomes of this preprint in relation to this area 1) the BMAC did not contribute much genetically to South Asia at all, 2) steppe ancestry, related to that of the Yamna culture of the Pontic region, only shows up in BMAC ~2000 3)  there is actually evidence of South Asian (Indus valley?) migration into the BMAC.

The fact that Yamna-like ancestry shows up in the BMAC region so late is a strong reason to suspect that Indo-Iranian peoples did not move to Iran and India until after 2000 BC. In earlier comments on this issue, I was rather vague about timing, because the Corded-Ware people show up in Europe before 2500 BC, and I was going along with the parsimonious idea that this was part of one single cultural and social revolution.

I was wrong. Going back to the Turkic analogy, there were multiple waves of migration and folk wandering by Turkic pastoralists. By different Turkic groups. One of the major ones occurred due to the rise of the Mongols, and the Mongols were not even Turks. The same seems to be true of Inner Eurasian Indo-European groups.

Moving on to South Asia, there are two primary constructs which come out of this preprint. “Indus Periphery” and “Ancient Ancestral South Indians.” I’ll call the former InPe and the latter is termed AASI. To some extent these complement and replace the earlier terms “Ancestral North Indian” and “Ancestral South Indian” (ANI and ASI). The AASI are the ancient hunter-gatherers of the Indian subcontinent. The authors suggested that divergence of this group from other eastern Eurasians occurred very early, that the division between the ancestors of the Papuans, Onge, and AASI was even polytomic (that basically separated very quickly without discernible structure).

The InPe samples are from eastern Iran and the BMAC. They’re unique in having AASI ancestry, at variable fractions (indicating contemporaneous admixture). They also resemble samples from Swat Valley which date to 1200 BC and later, with one major difference: the Swat Valley samples have steppe ancestry.

There are no samples from the Indus Valley proper, so the authors suggest that the InPe are reasonable proxies. Additionally, they assert that ASI can best be modeled as a mixture between InPe and AASI. In other words, there were two admixture events. Their Pulliyar samples are actually pretty good proxies for the resultant ASI, while the Kalash of Pakistan are good proxies for the ANI, who are presumably now modeled as a mixture of steppe populations with the InPe.

This resolves the enigmatic result that Priya Moorjani reported to me last year: less than 4,000 years ago “pure” ANI and ASI people existed. She was presumably going off admixture timing estimates. These results suggest that in some form ANI and ASI still exist, and the first admixture occurred with the creation of InPe.

Using a new method the authors contend that InPe emerged 4700-3000 BC. If this is true then the Indus Valley Civilization (IVC) was a compound of AASI and Iranian agriculturalists (sampled from the eastern end of the cline of admixture with Anatolians, that is, they had none of that ancestry). They also post the first arrival of agriculture to Mehrgarh by 2,000 years at the least. I suspect that it will turn out there were earlier admixtures, which are not being detected. For various ecological reasons the West Asian cultural complex was portable only to the northwest fringe of South Asia, and there it persisted for ~4,000 years. This served as a natural eastern limit for cultures which were migrating out of the West Asian zone, and a point where AASI hunter-gatherers constantly mixed into the local population.

As the IVC sites begin to get sampled in the future I predict that instead of a homogeneous transect of admixture over time and space we’ll see a lot of heterogeneity.

In the Swat samples, the authors see two correlated trends, an increase in steppe ancestry, and an increase in AASI ancestry. No doubt this dates to the “great admixture” which occurred between 2000 BC, and some time before 1000 AD (the Bengali admixture with East Asians dates to between 0 and 1000 AD, as does that of Brahmins who left the North Indian plain and mixed with local populations elsewhere).

Finally, the authors detect a skew toward steppe ancestry among some populations, in particular, Brahmins. The skew is in relation to Iranian farmer ancestry, the two being the primary constituents of ANI ancestry. In Who We Are and How We Got Here David Reich says some of the ANI admixture is much more recent than the rest, judging by tract length. And also going by the BMAC and Swat samples it seems that the time period for when Indo-Aryans arrived in South Asia has to be in the interval between 2000 BC and 1200 BC.

There’s another aspect of the preprint which allows for dating. The arrival of Austro-Asiatic people in South Asia probably has to postdate the expansion of the same group in Vietnam about 4,000 years ago (though not necessarily obviously). But the Munda Austro-Asiatic people of northeast India exhibit curious genetic patterns. They clearly have East Asian ancestry related to other Austro-Asiatic populations in Southeast Asia, but they have a lot less “West Eurasian” in their ANI/ASI mix. The authors resolve this by suggesting that the Munda arrived in South Asia when there was still heterogeneity among the ASI, and unadmixed AASI.

After 2000 BC the IVC went into decline. Various groups of Indo-Aryans were expanding and admixing. From the other end of the subcontinent arrived rice cultivators from Southeast Asia. At some point, they ran into an ASI population that had some Iranian admixture, but not as much as typical. All of this probably occurred in the period between 2000 BC and 1000 BC. I know that some researchers have argued that the Gangetic plain was inhabited by Munda speaking peoples before it was inhabited by Indo-Aryans. The main issue I’ve had with this is that modern Munda peoples are very genetically distinctive, and there’s no evidence of East Asian ancestry in most populations of the Gangetic plain (the main exceptions are those which have experienced Tibetan influence/contact).

So here is my interpretation of the genetic and historical evidence:

1) IVC emerges out of a matrix that was a synthesis of West Asian farmers and indigenous hunter-gatherers. I would not be surprised if later genetic work recapitulates the findings in Europe of an initial period of separation, and then a “resurgence” of indigenous ancestry as the barriers between the two groups break.

2) The period between 2000 BC and 1000 BC is the beginning of the transformation of the South Asian genetic and ethnolinguistic landscape, with the intrusion of two different groups from different directions, Indo-Aryans to the west and Austro-Asiatics from the east. Austro-Asiatic rice culture was superior to western wheat culture because rice is more delicious than wheat, but the Indo-Aryans ultimately established cultural supremacy across South Asia by the Iron Age.

3) The situation in South India is more complicated and confused. The admixture of groups like Pulliyar from InPe and AASI into the classic ASI configuration seems to be more recent than 2000 BC (their low bound dates go as late as 400 BC). The admixture may have occurred in various places, not just in South India. The evidence from this paper suggests that the Andronovo/Sintashta cultural zone was characterized by some genetic heterogeneity due to variation in admixture with neighboring peoples, and the same could be said for the IVC then. I would not be surprised if northern IVC locations had more AASI than southern IVC, as the latter were more insulated from the east due to the Thar desert (the results are consistent with earlier work that suggest modern populations in the lower Indus basis have less Indo-Aryan and more Iranian, with less AASI).

4) We need to be careful about assuming that everything here is a linear combination of distinct and separable atomic units of cultural integrity and wholeness. What I mean is that though Brahmins and some other North Indian groups are enriched for steppe ancestry, it is not only their purview. Rather, it may be that these upper caste groups simply mixed less with the other populations with Iranian and AASI ancestry. The statistics in this paper do not detect enrichment of steppe ancestry in South Indian Brahmins. I believe this is simply an artifact of the reality that South Indian Brahmins mixed with Iranian-enriched elites, like Reddys, when they emigrated to the south.

Though the model outlined in the preprint is much more complicated than a simple ANI/ASI mix, it still simplifies the demographic histories of many populations. For example, own survey of the data suggests that Brahmins who left the Indo-Gangetic plain mixed with local elites wherever they went (Bengali Brahmins have East Asian ancestry, just as South Indian Brahmins have more Iranian-like ancestry).

5) Language is important but is not determinative. R1a1a-Z93 arrived in South Asia relatively late with groups from the steppe. Its frequency is highest in the northwest, and among upper castes. That is, it is correlated in a coarse manner to steppe ancestry. But R1a1a-Z93 is pervasive throughout South Asia irrespective of caste and region. Even in Dravidian speaking southern populations, some groups have quite a bit of R1a1a-Z93.

The analogy that presents itself here is Southern Europe, where some groups with high frequencies of R1b, such as the Basques and Sardinians, are clearly descended in the main from pre-steppe populations. What this suggests is that a broad social-culture prestige network mediated by males extended itself into regions where its cultural hegemony was not assured. Additionally, the autosomal genetic impact was modest, even if privileges given to particular male lineages allowed them to sweep other groups out of the gene pool.

Tamil history precipitates out only a little later than that of North Indian Indo-Aryan civilization. I suspect that this is not a coincidence, that South Asia after the collapse of the IVC and the arrival of the Indo-Aryans and Mundas, could be thought of as a brought mixing cauldron genetically and culturally. In many regions, Dravidian languages persisted in the face of the expansive Indo-Aryan, but there was a cultural influence, likely reciprocal. This is why once Indian civilization reemerged its coherent unity set against peoples to the west and east was not strange despite the linguistic gap between the north and the south.

The only exception here might be the Munda. As I have said, R1a1a-Z93 is pervasive. But it is nearly unfound among the Munda, who tend to carry relatively exotic Southeast Asian Y lineages such as O. I believe that the Munda were in some way losers in a cultural conflict, but they maintained themselves in the hills above the Gangetic plain.

Finally, two reflections, one navel-gazing, one big picture. Genome bloggers in the years around 2010 actually anticipated many of these results. There’s some hindsight bias here because you remember the times you are right and not the times you were wrong. We were right that there was more than one ANI pulse. Additionally, we were looking at the ratio between “Eastern European” and “West Asian” ancestry years ago and noticing the skewed patterns, with North Indian Brahmins biased toward the former and South Indian elite non-Brahmins skewed toward the latter. Chaubey 2010 suggested to us that something was different about the Munda not only in their East Asian ancestry but in their ANI/ASI ancestry. They just didn’t seem to have any Indo-European ancestry (steppe), and a lot of ASI. Over the past few years I’ve been suggesting that Dravidian languages were not primal to South India, but the product of a recent expansion (though part of this is due to scientific publications).

The truth was out there. It just took ancient DNA and the analytic chops of the Reich group and their collaborators to prune the tree of possibilities so that we could zero in on a few precise and likely models.

In the general, I wonder about the role of clines, diffusions, and pulses. The models that the foremost practitioners of the science of ancient DNA utilize tend to assume pulse admixtures, rather than isolation-by-distance gene flow. This isn’t always a crazy assumption. But there was a discussion in the paper of a west-east admixture cline between Anatolian farmers and Iranian farmers. Is this cline due to admixture, or was it always there? A paper from a few years ago implied that early farmers were highly structured, structure that broke down later.

Also, the polytomy at the base of the eastern Eurasian human family tree, where all the major lineages diverge rapidly from each other, makes me wonder about gene flow vs. admixture. It seems possible that the polytomy may mask a phylogenetic tree topology which had gradually bifurcating nodes, if periodically a single daughter population replaced all its sister lineages in a local geographic zone. Much of history in human meta-populations may be characterized by isolation-by-distance and gene flow, erased by the extinction of most lineages and expansion of a favored lineage.

We’re descended from Lilith and Eve

From the comments:

Something that confused me very early on in the book- the San are shown branching off from the rest of humanity prior to Mitochondrial Eve. How can Eve be a common ancestor in this case? Admixture?

The commenter is talking about an early portion of Who We Are and How We Got Here. Someone who reads a book like that is “in the know,” and this is a reasonable question. But it points to a bigger issue that’s going to crop up with the complexificaiton of the origin of anatomically modern humanity over the last few years, and proceeding forward.

An upside of the very-recent-out-of-Africa model, where all modern humans descended exclusively from a group of East Africans who lived ~50,000 years ago, is that it was very simple. So simple that you could write the model out on a postcard.

The new model benefits from being correct and making humans less sui generis (though perhaps that is a bug rather than a feature to some?), but it also forces more thought and complexity on the lay audience.

Calibration on the coalescence of the last common ancestor of all mitochondrial DNA lineages for humans has changed several times, the last estimates are for a time to last common ancestor for all mtDNA lineages being around 100 to 200 thousand years ago. This is curious in light of the fact that both fossils and genomics are starting to suggest that anatomically modern humans emerged in their current form 200 to 400 thousand years ago.

The shallower coalescence isn’t that surprising. Y and mtDNA both have lower effective population sizes and so higher turnover rates. These high turnover rates mean the extinction of other lineages. As most of you know, the extinction of these mtDNA lineages does not mean that the genetic material of other women alive at the same time as “mtDNA Eve” is not present in modern humans (though who knows what it means to say there’s distinctive genetic material left after all these generations with recombination). Eve was always simply a personification of the coalescence of the mtDNA genealogy. Both the Y and mtDNA phylogenies and coalescence were useful in their time. They pointed to the likely important role of Africa in the origin of modern humans, and the relatively recent time depth of our species. But their coalescence at a specific time was somewhat random around a certain expected value. This is why it was not surprising at all that “Y chromosomal Adam” and “mtDNA Eve” lived at different times (there is some evidence that the Y chromosome has had a lower long-term effective population size).

The above question is inspired by the fact that San Bushmen seem to diverge earlier in their total genome than in their mtDNA. There’s always been a distinction in the literature between demographic divergence between two populations, and the divergence of their genetic genealogies. Oftentimes daughter populations share genetic variation that dates back to before their separation. But sometimes, you have this situation where it seems that the starting point of genetic variation post-dates the divergence between population.

What’s the explanation? I think the simplest one is admixture and reciprocal gene flow, as implied by the commenter. In fact, Pontus Skoglund’s latest African ancient DNA paper implies that there was some sort of isolation-by-distance cline in the eastern part of the continent, from modern Ethiopia far to the south.

And, it may also turn out that the San Bushmen themselves are an admixture between two very different populations, one more like other eastern Africans, and one basal to this clade. If so, then it may be that their divergence estimate is a compound, and the most divergent mtDNA lineages come from the eastern African population that mixed with the more basal population.

The bigger answer is that we really need to move beyond the “mitochondrial Eve” story as being central. It had its time and played its role, but we can move beyond it. Otherwise, the public will be in for a big surprise as ancient DNA starts to uncover the story of a whole antediluvian world within Africa of anatomically modern humans that flourished for hundreds of thousands of years before a small branch left to populate the rest of the world ~50,000 years ago.

On the eons of salutary neglect


New preprint, Something old, something borrowed: Admixture and adaptation in human evolution. This part jumped out at me:

…Indeed, for most traits, the contribution of archaic human alleles to present-day human phenotypic variation is not significantly larger than those of randomly drawn non-introgressed alleles occurring at the same frequency in modern humans. Interestingly, in both studies, neurological and behavioral phenotypes are an exception, with Neanderthal alleles contributing more to variation in these traits than frequency-matched modern human alleles.

I joked that perhaps we can talk about people “acting like a Neanderthal” again?

But seriously, I was thinking today about one particular stage of human evolutionary history, the long sojourn outside of Africa for the ancestors of non-Africans (including “Basal Eurasians”) which produced a sustained bottleneck. In David Reich’s new book he alludes to it, and I’ve seen other mentions of it (this is an old idea).

How long was the bottleneck? What was the normal census size? What were the cultural implications of having a small isolated population?

The PSMC and MSMC diagrams I’ve seen don’t really answer my questions.

Carl Zimmer profile of the Reich group at work

The New York Times has a review up (sort of) of Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, David Reich Unearths Human History Etched in Bone. But since Carl has been covering the publications coming out of the Reich lab for many years now it’s kinds of a survey of the whole operation and how David and company go where they are.

The last few paragraphs are pretty tantalizing:

As of last month, Dr. Reich’s team has published about three-quarters of all the genome-wide data from ancient human remains in the scientific literature. But the scientists are only getting started.

They also have retrieved DNA from about 3,000 more samples. And the lab refrigerators are filled with bones from 2,000 more denizens of prehistory.

Dr. Reich’s plan is to find ancient DNA from every culture known to archaeology everywhere in the world. Ultimately, he hopes to build a genetic atlas of humanity over the past 50,000 years.

“I try not to think about it all at once, because it’s so overwhelming,” he said.

Three years ago I was having a discussion with someone from Reich’s group and mentioned offhand that in terms of getting data I give the nod to Eske Willerslev’s group of researchers, though I thought the people around David and Nick Patterson tended to perform a deeper analysis. Three years is a long time, and as the results since then have shown, the “SNP capture” methodology is very cost effective. They might not get the whole genome sequences of individuals, but they get lots of individuals. And for a lot of population genomic analysis, you want lots of individuals more than the whole genome sequence.

But not all. The more ancient individuals probably have a lot of variation “private” to them and their population, so you don’t know all the neat polymorphisms you might miss.

With that gripe submitted, it’s pretty incredible that the Reich lab has 3,000 ancient samples in the pipeline for analysis. In Who We Are and How We Got Here David Reich outlines just how he and his collaborators transformed the artisanal process of data generation from ancient DNA into a rationalized and commoditized factory process.

Turks are Anatolian under the hood, somewhat more Greek than Armenian

My post, Are Turks Armenians Under The Hood?, attracted a little bit of controversy. The main criticism, which was a valid one, is that I did not sample Anatolian Greeks. A reader passed on three Anatolian Greek samples. I also added a Cypriot data set. To my mild surprise, the Anatolian Greeks and Cypriots cluster together, at the end of the Greece cline toward West Asians. Therefore, for further analysis, I pooled the three Greeks with the Cypriots.

Additionally, there are two Balkan Turk samples. Even on the PCA it’s pretty clear that they’re genetically very different from the other Turks (one of them is from what has become Bulgaria), though the shift toward East Asians indicates that Turkification is very rarely a matter purely of religious conversion to Islam and assimilation of the Turkish language (obviously it initially is for many people, but these people then intermarry with those with some East Asian ancestry).

Read More

Demographic replacement in Southeast Asia during the Holocene

Well sometimes you feel silly, and it’s not your fault. Yesterday our podcast on Sundaland went live (we talked about Doggerland and Beringia too!). Though I expressed a fair amount of skepticism, I took the argument that Stephen Oppenheimer presented in Eden of the East, that modern Austronesians are long-term residents of Southeast Asia, seriously.

The alternative view, most forcefully put by Peter Bellwood in books such as First Farmers, is that Austro-Asiatic and Austronesian people were agriculturalists issuing out of southern China that transformed the region over the past 4,000 years (the Austronesians from Taiwan specifically, though during the Pleistocene Taiwan was connected to the mainland).

I lean toward Bellwood’s view, and today a preprint came out which basically confirms it in totality, Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia. The abstract:

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

The transition to full-fledged rice agriculture occurred in Vietnam ~4,000 years ago. In First Farmers Bellwood reports on an archaeological site dating to that period where skeletal evidence has been adduced to record the presence of both Northeast Asian and Australo-Melanesian types. These results make clear though that these hunter-gatherers in Southeast Asia are more similar to the Onge of the Andaman Islands, as well as the Negritos of the interior of the Malay peninsula. They’re totally in alignment with the earlier morphological results (also, readers might be curious to know that one site of the Hoabinhian culture is in Yunnan, China). This shouldn’t be surprising, as the Andaman Islands were a peninsula which extended from southern Burma during the Pleistocene.

Already the most accepted model for the introduction of intensive agriculture into Southeast Asia is that it was brought by Austro-Asiatic peoples. These results confirm that. Additionally, it seems clear that Austro-Asiatic ancestry made it to island Southeast Asia, whether directly or through Austronesian admixture before arriving in island Southeast Asia. Java and Bali have some of the higher fractions ancestries most closely associated with Austro-Asiatic groups on the mainland.

Deeper digging into the admixture distributions has long made it pretty evident that some areas had much higher Austronesian fractions in Indonesia than others, and it wasn’t just a function of distance from the Phillippines. Why? My own hunch is that Austronesians brought social and cultural systems which were better adapted to island Southeast Asia, and were more fully able to exploit the local ecology. Meanwhile, aside from a few fringe areas such as the Malay peninsula and coastal Vietnam, they were not successful on the mainland.

The authors also detect migrations into Southeast Asia besides that of the Austro-Asiatics and Austronesians. One element seems correlated with the Tai migrations, and another with Sino-Tibetan peoples, most clearly represented in Southeast Asia by the Burmans. The excellent book, Strange Parallels: Volume 1, Integration on the Mainland: Southeast Asia in Global Context, c.800–1830, recounts the importance of the great migrations of the Tai people into Southeast Asia ~1000 A.D. Modern-day Thailand was once a flourishing center of Mon civilization, an Austro-Asiatic people related to the Khmers of Cambodia. The migrations out of the Tai highlands of southern China reshaped the ethnography of the central regions of mainland Southeast Asia. The Tai also attempted to take over the kingdoms of the Burmans. Though they failed in this, the Shan states of the highlands are the remnants of these attempts (tendrils of the Tai migrations made it to India, the Ahom people of Assam were Tai). Vietnam, shielded by the Annamese Cordillera, came through this period relatively intact. It is also well known that Cambodia’s persistence down to the present has much to do with the shielding it received from France in the 19th century in the wake of Thai expansion.

There are two bigger issues that this paper sheds light on. One is spatial, and the other is temporal.

They detect shared drift between Austro-Asiatic people and tribal populations in northeast India. This is not surprising. A 2011 paper found that Munda speaking peoples, whose variant of Austro-Asiatic is very different from that of Southeast Asia, are predominant carriers of Y chromosome O2a. This is very rare in Indo-European speaking populations, and nearly absent in Dravidian speaking groups. Additionally, their genome-wide patterns indicate some East Asian admixture, albeit a minority, while they carry the derived variant of EDAR, which peaks in Northeast Asia.

One debate in relation to the Munda people is whether they are primal and indigenous, or whether they are intrusive. The genetic data strongly point to the likelihood that they are intrusive. An earlier estimate of coalescence for O2a in South Asia suggested a deep history, but these dates have always been sensitive to assumptions, and more recent analysis of O2a diversity suggests that the locus is mainland Southeast Asia.

Now that archaeology and ancient DNA confirm Austro-Asiatic intrusion into northern Vietnam ~4,000 years ago, I think it also sheds light on when these peoples arrived in India. That is, they arrived < 4,000 years ago. As widespread intensive agriculture came to Burma ~3,500 years ago, I think that makes it likely that Munda peoples arrived in South Asia around this period.

I now believe it is likely that the presence of Austro-Asiatic, Dravidian, and Indo-Aryan languages in India proper was a feature of the period after ~4,000 years ago. None of the languages of the hunter-gatherer populations of the subcontinent remain, with the possible exception of isolates such as Nihali and Kusunda.

The temporal issue has to do with the affinities of these peoples, and how they relate to the settling of Eastern Eurasia. All the Southeast Asian groups after the original Australo-Melanesians share more of an affinity with the Tianyuan individual than Papuans. The implication here is that Tianyuan is closer to the ancestors of various agriculturalists in Southeast Asia than just some random basal Eastern Eurasian. But, since Tianyuan dates to 40,000 years ago, and, is from the Beijing region, it is hard to make strong inferences from comparisons with only it. The heartland of ancient Chinese culture in Henan was to the south of the Tianyuan, after all. More samples are needed before one can truly tease out the pattern of isolation-by-distance vs. admixture that led to the emergence of the proto-farmer populations which settled Southeast Asia.

In the podcast above one thing that came up is that a lot of genetic data indicate decreased diversity as one moves from the south to the north in East Asia. This has long been taken to mean that humans migrated north, and so were subject to bottleneck effects. I pointed out that this may simply be a consequence of admixture between two very different groups of people in Southeast Asia, elevating diversity statistics.

And yet as the map at the end of the preprint suggests it is highly plausible that Pleistocene Asia was marked by a south to north dynamic of migration. The Austro-Asiatic peoples who migrated south during the Holocene may simply have been backtracking the migration of their ancestors. What these results, and ancient DNA more generally, tell us is that humans were often on the move. The Pleistocene world of climate change probably meant that humans had to be on the move.

White modern Northern Europeans are genetically more like brown South Asians than brown(ish) ancient Northern Europeans were

The Guardian has a piece by Arathi Prasad, Thanks to Cheddar Man, I feel more comfortable as a brown Briton. Dr. Prasad is a geneticist, so the science is pretty decent (she’s probably seen the documentary ahead of time too).

But there is a curious quirk here and it reveals something about human psychology: modern Britons are genetically much closer to South Asians, like Arathi Prasad, than these ancient darker-skinned Britons. The plot to the left illustrates this (it’s using the Dystruct package). The far right of the top panels represent South Asians. You can see Europeans pretty clearly. Let’s note two things:

1) Modern Europeans (except for Sardinians) share an orange “steppe” component with most South Asians (these are no doubt Indo-European migrations of the Bronze Age)

2) The brown element represents European hunter-gatherers. This element is found at varying quantities across Europe, with the lowest fractions in Sardinians. Though present in South Asians (this may or may not be an artifact to be honest), it’s not present at very high frequencies.

One always has to be careful about taking these proportions as literal representations of ancestral populations. They are not. But what they show is that modern Northern Europeans and South Asians have been touched by the same population movements over the past 5,000 years, and so are genetically much closer than the people who lived in Northern Europe and South Asia 5,000 years ago.

Humans are a visual species. In a pre-modern environment, physical cues were important for group identity, though I suspect just as much due to scarification and tattooing as phenotypic differences due to biology. The fact that Cheddar Man, and Paleolithic hunter-gatherers in Western Europe more generally, probably resembled modern South Asians more than they do modern Northern Europeans (I think they were more likely to be olive-brown than dark-brown, but I’m not confident), is more salient to human folk biology than the fact that modern Northern Europeans are much closer genetically to South Asians than the more “brown” ancient Northern Europeans.

Stuff like this always reminds me of the deep wisdom in Artur C. Clarke’s Childhood’s End. The ultimately benevolent alien species which mentored humanity shielded us from their physical appearance because the knew we’d find it horrifying. The substance of what they did for us, who they were, was going to be less important to immature humans than the fact of what they looked like.

Note: Fst between Sindhi from Pakistan and WHG (Cheddar Man was one) is 0.087. Sindhi from Pakistan and English is 0.023. English to WHG is 0.058 (source). Fst can not be naively interpreted as “genetic distance.” But, this gets at the fact that Mesolithic European hunter-gatherers were very distant from modern South Asians. And widespread gene flow and admixture over the past 5,000 has compressed a lot of genetic differences which were starker across geography in the past.

Ancient DNA and Dystruct


There’s a new preprint, Inference of population structure from ancient DNA, which uses explicit demographic models to make inferences about ancestry. I haven’t dug into the guts of the math, but, the outputs are quite interesting.

What seems to be obvious is that Western Eurasia has a much richer set of models to choose from than elsewhere. European, Middle Eastern and South Asian populations exhibit the greatest difference between Dystruct and Admixture.