The phylogenetic trees falling on the tundra


A massive new ancient DNA preprint just dropped, The population history of northeastern Siberia since the Pleistocene:

…Here, we report 34 ancient genome sequences, including two from fragmented milk teeth found at the ~31.6 thousand-year-old (kya) Yana RHS site, the earliest and northernmost Pleistocene human remains found. These genomes reveal complex patterns of past population admixture and replacement events throughout northeastern Siberia, with evidence for at least three large-scale human migrations into the region. The first inhabitants, a previously unknown population of “Ancient North Siberians” (ANS), represented by Yana RHS, diverged ~38 kya from Western Eurasians, soon after the latter split from East Asians. Between 20 and 11 kya, the ANS population was largely replaced by peoples with ancestry from East Asia, giving rise to ancestral Native Americans and “Ancient Paleosiberians” (AP), represented by a 9.8 kya skeleton from Kolyma River. AP are closely related to the Siberian ancestors of Native Americans, and ancestral to contemporary communities such as Koryaks and Itelmen. Paleoclimatic modelling shows evidence for a refuge during the last glacial maximum (LGM) in southeastern Beringia, suggesting Beringia as a possible location for the admixture forming both ancestral Native Americans and AP. Between 11 and 4 kya, AP were in turn largely replaced by another group of peoples with ancestry from East Asia, the “Neosiberians” from which many contemporary Siberians derive. We detect additional gene flow events in both directions across the Bering Strait during this time, influencing the genetic composition of Inuit, as well as Na Dene-speaking Northern Native Americans, whose Siberian-related ancestry components is closely related to AP. Our analyses reveal that the population history of northeastern Siberia was highly dynamic, starting in the Late Pleistocene and continuing well into the Late Holocene. The pattern observed in northeastern Siberia, with earlier, once widespread populations being replaced by distinct peoples, seems to have taken place across northern Eurasia, as far west as Scandinavia.

The preprint is very interesting and thorough, and the supplements are well over 100 pages. I read the genetics and linguistics portions. They make for some deep reading, and I really regret making fun of Iosif Lazaridis’ fondness for acronyms now.

I will make some cursory and general observations. First, the authors got really high coverage (so high quality) genomes from the Yana RS site. Notice that they’re doing more data-intense analytic methods. Second, they did not find any population with the affinities to Australo-Melanesian that several research groups have found among some Amazonians. Likely they are hiding somewhere…but the ancient DNA sampling is getting pretty good. We’re missing something. Third, I am not sure what to think about the very rapid bifurcation of lineages we’re seeing around ~40,000 years ago.

The ANS population, ancestral by and large to ANE, seems to be about ~75% West Eurasian (without much Basal Eurasian) and ~25% East Eurasian. Or at least that’s one model. Did they then absorb other peoples? Or, was there an ancient population structure in the primal ur-human horde pushing out of the Near East? That is, are the “West Eurasians” and “East Eurasians” simply the descendants of original human tribes venturing out of Africa ~50,000 years ago? Also, rather than discrete West Eurasian and East Eurasian components, perhaps there was a genetic cline where the proto-ANS occupied a position closer to the former, as opposed to some later pulse admixture?

Without more ancient DNA we probably won’t be able to resolve the various alternative models.

Nomads, cosmopolitan predators, and peasants, xenophobic producers

Ten years ago when I read Peter Heather’s Empires and Barbarians, its thesis that the migrations and conquests of the post-Roman period were at least in part folk wanderings, where men, women, and children swarmed into the collapsing Empire en masse, was somewhat edgy. Today Heather’s model has to a large extent been validated. The recent paper on the Lombard migration, the discovery that the Lombards were indeed by and large genetically coherent as a transplanted German tribe in Pannonia and later northern Italy, confirms the older views which Heather attempted to resurrect. Additionally, the Lombards also seem to have been defined by a dominant group of elite male lineages.

Why is this even surprising? Because to a great extent, the ethnic and tribal character of the post-Roman power transfer between Late Antique elites and the newcomers was diminished and dismissed for decades. I can still remember the moment in 2010 when I was browsing books on Late Antiquity at Foyles in London and opened a page on a monograph devoted to the society of the Vandal kingdom in North Africa. The author explained that though the Vandals were defined by a particular set of cultural codes and mores, they were to a great extent an ad hoc group of mercenaries and refugees, whose ethnic identity emerged de novo on the post-Roman landscape.

In the next few years, we will probably get Vandal DNA from North Africa. I predict that they will be notably German (though with admixture, especially as time progresses). Additionally, I predict most of the males will be haplogroup R1b or I1. But the Vandal kingdom was actually one where there was a secondary group of barbarians: the Alans. It was Regnum Vandalorum et Alanorum. I predict that Alan males will be R1a. In particular, R1a1a-z93.

But this post is not about the post-Roman world. Rather, it’s about the Inner Asian forest steppe. The sea of grass, stretching from the Altai to the Carpathians. A new paper in Science adds more samples to the story of the Srubna, Cimmerians, Scythians, and Sarmatians. Ancient genomes suggest the eastern Pontic-Caspian steppe as the source of western Iron Age nomads. The abstract is weirdly nonspecific, though accurate:

For millennia, the Pontic-Caspian steppe was a connector between the Eurasian steppe and Europe. In this scene, multidirectional and sequential movements of different populations may have occurred, including those of the Eurasian steppe nomads. We sequenced 35 genomes (low to medium coverage) of Bronze Age individuals (Srubnaya-Alakulskaya) and Iron Age nomads (Cimmerians, Scythians, and Sarmatians) that represent four distinct cultural entities corresponding to the chronological sequence of cultural complexes in the region. Our results suggest that, despite genetic links among these peoples, no group can be considered a direct ancestor of the subsequent group. The nomadic populations were heterogeneous and carried genetic affinities with populations from several other regions including the Far East and the southern Urals. We found evidence of a stable shared genetic signature, making the eastern Pontic-Caspian steppe a likely source of western nomadic groups.

The German groups which invaded the Western Roman Empire were agropastoralists. That is, they were slash and burn farmers who raised livestock. Though they were mobile, they were not nomads of the open steppe. Man for man the Germans of Late Antiquity had more skills applicable to the military life than the Roman peasant. This explains in part their representation in the Roman armed forces in large numbers starting in the 3rd century. But the people of the steppe, pure nomads, were even more fearsome. Ask the Goths about the Huns.

Whole German tribes, like the Cimbri, might coordinate for a singular migration for new territory, but for the exclusive pastoralist, their whole existence was migration. Groups such as the Goths and Vandals might settle down, and become primary producers again, but pure pastoralists probably required some natural level of predation and extortion upon settled peoples to obtain a lifestyle beyond marginal subsistence. Which is to say that some of the characterizations of Late Antique barbarians as ad hoc configurations might apply more to steppe hordes.

There has been enough work on these populations over the past few years to admit that various groups have different genetic characteristics, indicative of a somewhat delimited breeding population. But, invariably there are outliers here and there, and indications of periodic reversals of migration and interactions with populations from other parts of Eurasia.

Earlier I noted that Heather seems to have been correct that the barbarian invasions of the Roman Empire were events that involved the migration of women and children, as well as men. The steppe was probably a bit different. Here are the Y and mtDNA results for males from these data that are new to this paper:

Read More

Do the northern Chinese have Scythian ancestors?

There was some question regarding possible Scythian admixture into the early Zhou below. This is possible because of the Zhou dynasty, arguably the foundational one of Chinese imperial culture (the Shang would have been alien to Han dynasty Chinese, but the Zhou far less so), may have had interactions with Indo-European peoples to their north and west. This has historical precedent as the Tang dynasty emerged from the same milieu 1,500 years later, albeit the Tang were descended from a Turkic tribe, not Indo-Europeans.

I looked at some of my samples and divided the Han into a northern and southern cluster based on their position on a cline (removing the majority in between). I also added Lithuanians, Sardinians, Uyghurs, Mongols, and Yakut. As you can see on the PCA the Mongols are two clusters, so I divided them between Mongol and Mongol2.

Read More

Vietnamese are not that much like the Cambodians

A comment below suggested another book on Vietnamese history, which I am endeavoring to read in the near future. The comment also brought up issues relating to the ethnogenesis of the Vietnamese people, their relationship to the Yue (or lack thereof) and the Khmer, and also the Han Chinese.

Obviously, I can’t speak to the details of linguistics and area studies history. But I can say a bit about genetics because over the years I’ve assembled a reasonable data set of Asians, both public and private. The 1000 Genomes collected Vietnamese from Ho Chi Minh City in the south. I compared them to a variety of populations using ADMIXTURE with 5 populations.

Click to enlarge

You can click to enlarge, but I can tell you that the Vietnamese samples vary less than the Cambodian ones, and resemble Dai more than the other populations. The Dai were sampled from southern Yunnan, in China, and historically were much more common in southern China, before their assimilation into the Han (as well as the migration of others to Southeast Asia).

Curiously, I have four non-Chinese samples from Thailand, and they look to be more like the Cambodians. This aligns well with historical and other genetic evidence the Thai identity emerged from the assimilation of Tai migrants into the Austro-Asiatic (Mon and Khmer) substrate.

Aside from a few Vietnamese who seem Chinese, or a few who are likely Khmer or of related peoples, the Vietnamese do seem to have some Khmer ancestry. Or something like that.

Read More

Avars across a sea of grass

That sound you hear is the rumbling of the earth caused by the rippling tsunami that’s coming. The swell of ancient DNA papers focused on historical, rather than prehistorical, time periods. Some historians are cheering. Some are fearful. Others know not what to think. It will be. The illiterate barbarians of yore shall come out of the shadows.

If they had arrived on the edge of Europe two centuries earlier, the Avars would have a reputation as fearsome with the Huns, with whom they are often confused, and rightly so. But the Avars emerged as a force on the European landscape after the end of the West Roman Empire. The post-Roman polities did not have their own Ammianus Marcellinus (sorry Bede, you lived in the middle of nowhere).

And yet for centuries the Avars dominated east-central Europe and held the numerous Slavic tribes in thrall. They smashed past the borders of Byzantium during the reign of the heir of Justinian, and by 600 AD, on the eve of the great battle with Persia Constantinople had lost control of most of its Balkan hinterlands to these barbarians. A Byzantium which still controlled North Africa, much of Italy, southern Spain, Egypt, Anatolia, and the Levant, had been reduced to strongpoints all around the Balkan littoral. During the wars with the Sassanids, the Avars took advantage of the opportunity offered, and even raided the suburbs of Constantinople itself!

So who were these people? The most plausible conjecture is that they were part of the great mass mobilization of Turkic peoples which began in the early centuries of the first millennium after Christ. As Rome and Han China fell, nomadic barbarians rose. A new preprint seems to all but confirms this, Inner Asian maternal genetic origin of the Avar period nomadic elite in the 7th century AD Carpathian Basin:

After 568 AD the nomadic Avars settled in the Carpathian Basin and founded their empire, which was an important force in Central Europe until the beginning of the 9th century AD. The Avar elite was probably of Inner Asian origin; its identification with the Rourans (who ruled the region of today’s Mongolia and North China in the 4th-6th centuries AD) is widely accepted in the historical research. Here, we study the whole mitochondrial genomes of twenty-three 7th century and two 8th century AD individuals from a well-characterised Avar elite group of burials excavated in Hungary. Most of them were buried with high value prestige artefacts and their skulls showed Mongoloid morphological traits. The majority (64%) of the studied samples’ mitochondrial DNA variability belongs to Asian haplogroups (C, D, F, M, R, Y and Z). This Avar elite group shows affinities to several ancient and modern Inner Asian populations. The genetic results verify the historical thesis on the Inner Asian origin of the Avar elite, as not only a military retinue consisting of armed men, but an endogamous group of families migrated. This correlates well with records on historical nomadic societies where maternal lineages were as important as paternal descent.

The samples were from a period about a century after the arrival of the Avars. It is not unreasonable to think that the Avar conquest meant that a continuous stream of Inner Asian pastoralists kept entering into the territory which they occupied for the opportunity, but this sort of genetic distinctiveness indicates that the Avars remained very separate from the people from whom they extracted tribute. Most, though not all, of these people, were or became Slavs.

Around 800 AD the Avars were finally defeated decisively by the Franks, and their elite converted to Christianity. I suspect this was the final step which would result in their assimilation over the next few centuries into the location population until they diminished and disappeared.

The results above support the proposition that the Pannonian Avars of the second half of the 6th century were the descendants of the Rouran Khaganate of the early half 6th century. The kicker is that the Rouran flourished in Mongolia! So like the Mongols six hundred years later, the Avars seem to have swept across the entire length of Eurasia that was accessible to their horses in a generation. To some extent, this is a recapitulation of the pattern we see nearly 3,000 years before the Avar, when the Afanasievo culture established itself in the Altai region, far from its clear point of origin in the forest-steppe of Eastern Europe.

Perhaps the period between 500 BC and 300 AD can be seen as an ephemeral transient between the vast periods before and after when pastoralists had free reign across most of temperate Eurasia?

The genetics of Afrikaners (again)

    Click to enlarge

 

I personally get asked about the genetics of Afrikaners, because I’ve written about/analyzed the issue before. The main outlines seem to be established, but I thought I might go and revisit it again. The main reason is that we have ancient South African DNA, and I’ve been adding it to my personal analyses for a while. It might be worthwhile to reanalyze the South Africa samples I do have with some of these added in.

The plot at the top shows the core populations I started with. I did some outlier pruning. I only kept the South African samples that were overwhelmingly white. I picked Malays and a South Indian population because of Cape Coloureds, a mixed-race Afrikaans speaking group which has Asian ancestry that can be attributed to both South and Southeast Asian populations (the Dutch imported many slaves from India and had outposts in Java). I also used Bantu samples from South Africa, Kenya, as well as a Nigeria population. Finally, I also had some Hadza as a different hunter-gatherer population than the San Bushmen. For Europeans, I used white Dutch.

The final marker density as 200,000 SNPs, so not too bad.

As you can see if you click on the image all of the South African whites were shifted away from the Dutch. There were two outlier individuals, one of which was closer to the Dutch cluster, and one further. All the other individuals form a neat cluster. None of these individuals were close relatives.

Click to enlarge

I ran Treemix on the data with multiple migrations until the migrations stopped making sense to me. The African populations’ exhibit migration flows to each other. Much of it is entirely comprehensible. The Esan receive no migration, highlighting that this population did not receive gene flow from any groups in these data. The Kenya Bantus receive gene flow from the direction of Eurasians. This is also certainly Nilotic mediated. The gene flow they receive from the base of the ancient San is more enigmatic, but probably reflects uptake of local ancestry as the Bantus expanded. The southern Bantus receive gene flow from modern San.

The South African whites receive gene flow from a position on the graph between the modern San and other non-San African groups.

Click to enlarge

Next, I ran Admixture in the unsupervised mode with K = 6. The two populations mostly light-blue are South African whites and the Dutch, from the top to the bottom. You can see though that the South African whites clearly have other ancestral components. Most of these individuals have the components modal in the San, Esan Nigerians, Indians, and Malays. The two outlier individuals are also clear. The individual very close to the Dutch, but shifted toward the Asians, in the PCA does not have any African admixture. The individual shifted more toward the non-Europeans in the PCA also has more non-European fractions of ancestral components (that is, those components modal in non-European populations).

Next, I decided to confirm things by running a three population test. If you read this blog you’ve seen this before. Basically this is measuring shared ancestry by looking at deviations from a particular phylogenetic model: (test population(pop 1, pop2)). The relatedness of the test population to either pop1 or pop2 (that is, it’s a mix of the two) is measured by the negative f3 statistic, and I focused on z-scores greater than two.

Here they are:

Outgroup Pop1 Pop2 f3 z
Bantu_NE EsanNigeria Dutch -0.0009 -6.54
Bantu_NE EsanNigeria South_Africa_White -0.0010 -6.54
Bantu_NE EsanNigeria Malay -0.0009 -6.33
Bantu_NE EsanNigeria Telegu -0.0008 -6.00
Bantu_NE Bantu_S South_Africa_White -0.0008 -4.84
Bantu_NE Bantu_S Dutch -0.0008 -4.77
Bantu_NE Bantu_S Malay -0.0007 -4.21
Bantu_NE Bantu_S Telegu -0.0007 -4.05
Bantu_NE Dutch San_Ancient -0.0009 -3.02
Bantu_NE Hadza EsanNigeria -0.0004 -2.97
Bantu_NE Telegu San_Ancient -0.0007 -2.32
Bantu_NE Malay San_Ancient -0.0007 -2.04
Bantu_S EsanNigeria San_Modern -0.0028 -21.62
Bantu_S EsanNigeria San_Ancient -0.0039 -20.78
Bantu_S San_Ancient Bantu_NE -0.0030 -12.91
Bantu_S San_Modern Bantu_NE -0.0019 -12.45
Bantu_S Dutch San_Ancient -0.0031 -10.63
Bantu_S Telegu San_Ancient -0.0030 -10.33
Bantu_S San_Ancient South_Africa_White -0.0027 -9.17
Bantu_S Malay San_Ancient -0.0029 -8.97
San_Modern Dutch San_Ancient -0.0091 -34.96
San_Modern Telegu San_Ancient -0.0087 -33.86
San_Modern San_Ancient South_Africa_White -0.0089 -33.54
San_Modern San_Ancient Bantu_NE -0.0063 -31.93
San_Modern Malay San_Ancient -0.0085 -30.98
San_Modern Bantu_S San_Ancient -0.0052 -28.91
San_Modern Hadza San_Ancient -0.0051 -27.58
South_Africa_White Dutch Bantu_NE -0.0017 -12.96
South_Africa_White EsanNigeria Dutch -0.0017 -12.68
South_Africa_White San_Modern Dutch -0.0018 -12.41
South_Africa_White Bantu_S Dutch -0.0017 -12.36
South_Africa_White Dutch San_Ancient -0.0021 -12.14
South_Africa_White Hadza Dutch -0.0014 -10.41
South_Africa_White Malay Dutch -0.0007 -5.97
South_Africa_White Telegu Dutch -0.0003 -3.64
Telegu Malay Dutch -0.0004 -2.79

 

No surprises so far. One thing that did surprise me though was the extent of the admixture even after PCA outlier removal. So I took the output you saw above and removed individuals that were very mixed, except for the case of the white South Africans. Then, I ran admixture in supervised mode, where the “pure” populations were fixed as references (I merged the moden San without much admixture with the ancient San). You can see the results below:

Click to enlarge

Re-running the three population test with these “pure” populations I only got significant results for the below cases:

Outgroup Pop1 Pop2 f3 z
South_Africa_White Dutch EsanNigeria -0.0017 -13.1937
South_Africa_White San Dutch -0.0020 -12.6910
South_Africa_White Hadza Dutch -0.0014 -9.7246
South_Africa_White Malay Dutch -0.0009 -6.6481
South_Africa_White Telegu Dutch -0.0004 -4.6167

No big surprise.

The average European ancestry I got in my South African white samples, N = 12, is 93.5%. Making a composite individual, note that if someone had great-great-grandparents who were not European, they would be expected to have 6.25% non-European ancestry. That’s 4 generations back. So about 100 years. These individuals are presumably adults. Let’s say they are 25 years old. That goes back 125 years. It’s probably reasonable in a single person admixture people to suggest it was sometime in the mid to late 19th century.

This seems unlikely. The evenness of admixture and balance between different groups indicates that it is older than that, and they are obtaining it from different lineages. Traditional genealogical estimates suggested in the range of 5-7.5% non-European ancestry in Afrikaners, and one study of 185 individuals showed 18% non-European mtDNA.

I will probably do some ancestry deconvolution and see if I can get a figure for the time of admixture (though the fractions here are very small, as is the sample size of the admixtured population). But the non-European ancestry of Afrikaners is uncannily similar to the non-European ancestry of the Cape Coloureds. That to me leads us to the conclusion that in the early European settler community a fair number of mixed-race women married in. Those mixed-race women who married mixed-race men helped found the Cape Coloureds.

Live not by the haplogroup alone

In The population genomics of archaeological transition in west Iberia the authors note that “the population of Euskera speakers shows one of the maximal frequencies (87.1%) for the Y-chromosome variant, R1b-M269…” In the early 2000s the high frequency of R1b-M269 among the Basques, a non-Indo-European linguistic isolate, was taken to be suggestive of the possibility that R1b-M269 reflected ancestry from European hunter-gatherers present when farmers and pastoralists pushed into the continent.

The paper above shows that the reality is that the Basque people have higher fractions of Neolithic farmer ancestry than any other Iberian people. Additionally, they have lower fractions of the steppe pastoralist ancestry than other Iberian groups. This, despite the fact that we also know from ancient DNA that R1b-M269 does seem to have spread with steppe pastoralists, likely Indo-Europeans.

Obviously the relationship between Y chromosomes and genome-wide ancestry is complex. The pattern here for the indicates that Indo-European male lineages were assimilated into the Basques. Perhaps the Basque were matrilineal? One can’t know. But, these men did not impose their culture. Instead, they were assimilated into the Basque. This is entirely not shocking. There history of contact between different peoples in the recent past shows plenty of cases where individuals have “gone native.” In some cases, many individuals.

I was thinking this when looking at South Asian Y chromosome frequencies. Though R1a1a is correlated with higher castes and Indo-European speakers, its frequency is quite high in some ASI-enriched groups. I suspect that the period after 2000 BC down to the Common Era witness a dynamic where particular patrilineal societies were quite successful in maintain their status over generations. Additionally, the ethnogenesis of “Indo-Aryan” and “Dravidian” India was occurring over this period, in some cases through a process of expansion, integration, and conflict. It seems some pre-Aryan paternal lineages were assimilated into Brahmin communities. For example, Y haplogroup R2, whose origin is almost certainly in the Indus Valley Civilization society.

Some population genetic models are stylized and elegant. They have to be to be tractable. But we always need to remember that real history and prehistory were complex, and exhibited a richer and more chaotic texture.

Why the Y chromosome is coming back


Last week Spencer and I talked about chromosomes and their sociological import on The Insight. It was a pretty popular episode, but then again, my post on the genetics of Genghis Khan is literally my most popular piece of writing of all time which wasn’t distributed in a non-blog channel (hundreds of thousands of people have read it). Thanks to everyone who left a review on iTunes and Stitcher (well, a good review). We’re getting close to my goal of 100 reviews on iTunes and 10 on Stitcher so that I won’t pester you about it.

Of course the reality is that the heyday of  chromosomal population genetic studies was arguably about 15 years ago, when Spencer wrote The Journey of Man. I have personally constructed Y phylogenies before…but as you know from reading this weblog, I tend to look at genome-wide autosomal studies. There is a reason that why Who We Are and How We Got Here focuses on autosomal data.

All that being said, Y (and mtDNA) still have an important role to play in understanding the past: sociological dynamics. The podcast was mostly focused on star phylogenies, whether it be the Genghis Khan haplotype, or the dominant lineages of R1a and R1b. Strong reproductive skew does have genome-wide effects, but unless it’s polygyny as extreme as an elephant seal’s those effects are going to be more subtle than what you see in the Y and mtDNA.

Submitted for your approval, two recent preprints on bioRxiv: The role of matrilineality in shaping patterns of Y chromosome and mtDNA sequence variation in southwestern Angola and Cultural Innovations influence patterns of genetic diversity in Northwestern Amazonia. The future is going to be in understanding sexual dynamics and culture.

Migration at the roof of West Asia

Click to see the full figure

The figure to the left is from The genetic prehistory of the Greater Caucasus. If you are a regular reader of this weblog, or Eurogenes, you can figure out what’s going on, and keep track of the terminology. But in 2018 I think we’re getting to the end of the line in making sense of “admixture graphs” in relation to West Eurasian population structure. The models are just getting too complicated to keep everything straight, and the distinct-populations-subject-to-pulse-admixture seems to be an assumption that may not necessarily hold.

To get a sense of what I’m talking about, the above preprint focuses on populations in and around the Caucasus region. One of the major reasons that this is important is that the Caucasus was and is to some extent a continental hinge, connecting Eastern Europe and the Pontic steppe, to the Near East. The Arab Muslims pushed north of the Caucasus, and came into conflict with the Khazars, while Cimmerians and Scythians moved south from the Pontic steppe.

The elephant in the room is the relevance to the “Indo-European controversy.” Colin Renfrew long ago posited that the Indo-European languages derive from West Asian farmers who expanded into Europe as early as ~9,000 years ago. A rival theory is that Indo-Europeans spread out of the Pontic steppe ~4,000 years ago. In 2015 two major papers suggested that the steppe was a major source of Indo-European expansion. Case closed? This preprint suggests perhaps not.

But we’ll get to that later. What do the results here show? The prose is a little hard to tease apart, but the major issues seem to be that in antiquity, or at least the period they’re focusing on, much of the gene flow seems to have been south (Near East) to the north (through the Caucasus, and out to the north slope). To some extent, we already knew this: the Yamna people of the Pontic steppe have “southern” ancestry from the Near East that earlier East European/Pontic people do not. In this preprint, the authors show that groups such as the Maykop of the north slope of the Caucasus carry Y haplogroups such as G2, and not the R1 lineages commonly found in the steppe. David W. suggests that this confirms that Near Eastern gene flow into the steppe was female-mediated.  This is plausible, but I would caution that Y chromosomes alone can be deceptive, due to the power of particular patrilineages. We’ll probably rely on the X chromosome to make a final judgment.

The plot below shows many of the relationships as a function of location and time. The green component is modal among “Iranian farmers,” the orange among “Anatolian farmers,” and the blue among “Western hunter-gatherers.”

A major aspect of this preprint is that it has to work hard to differentiate two Anatolian farmer-like signals: the first, from Anatolian farmers proper, and the second from the descendants of European farmers, who themselves are a mix of Anatolian farmers with a minority ancestry among the hunter-gatherers. The answers would probably be totally unintelligible if not for archaeology. It’s clear that the steppe people had contact with both European and Near Eastern farmers and that later East European groups that succeeded the Yamna were subject to reflux from Central Europe, and received European farmer ancestry.

Another curious nugget in their results is that there was early detection of both Ancestral North Eurasian (ANE) ancestry and, some East Eurasian gene flow (related to Han Chinese). One of their individuals carries the East Eurasian variant of EDAR, which today is only found in Finns, though it was found in reasonable frequencies among the Motala hunter-gatherers of Scandinavia. Additionally, Fu et al. 2016 found that the ancestors of Mesolithic hunter-gatherers received some gene flow from Eastern Eurasians as well (also in the supplements of Lazaridis et al. 2016).

The authors admit that there is probably population structure among ANE and undiscovered groups of East Eurasians who were traversing the Inner Asian landscape. I think this is all suggestive of some long-distance contacts, though the intensity and magnitude increased a lot with high-density societies and the mobility of pastoralism.

Much of the genetic mixing in the Near East, and to some extent in the trans-Caucasian region, seems to date to the 4th millennium. This is technically prehistory, but it is also the Uruk period. This was a phase of Mesopotamian culture expansion between 4000 and 3100 BC which resulted in replicas of Uruk style settlements as far away as Syria and southeastern Anatolia. There is even evidence of Uruk-related migration to the North Caucasus.

The Uruk experienced abrupt and sudden collapse. Uruk settlements outside of the core zone of Mesopatamia disappear.

It’s the final paragraph that warrants discussion:

The insight that the Caucasus mountains served not only as a corridor for the spread of CHG/Neolithic Iranian ancestry but also for later gene-flow from the south also has a bearing on the postulated homelands of Proto-Indo-European (PIE) languages and documented gene-flows that could have carried a consecutive spread of both across West Eurasia…Perceiving the Caucasus as an occasional bridge rather than a strict border during the Eneolithic and Bronze Age opens up the possibility of a homeland of PIE south of the Caucasus, which itself provides a parsimonious explanation for an early branching off of Anatolian languages. Geographically this would also work for Armenian and Greek, for which genetic data also supports an eastern influence from Anatolia or the southern Caucasus. A potential offshoot of the Indo-Iranian branch to the east is possible, but the latest ancient DNA results from South Asia also lend weight to an LMBA spread via the steppe belt…The spread of some or all of the proto-Indo-European branches would have been possible via the North Caucasus and Pontic region and from there, along with pastoralist expansions, to the heart of Europe. This scenario finds support from the well attested and now widely documented ‘steppe ancestry’ in European populations, the postulate of increasingly patrilinear societies in the wake of these expansions (exemplified by R1a/R1b), as attested in the latest study on the Bell Beaker phenomenon….

But instead of tackling this let’s focus on the paper that came out of the Willerslev group, The first horse herders and the impact of early Bronze Age steppe expansions into Asia. This is a final manuscript in Science. That means it was probably written before The Genomic Formation of South and Central Asia. When it comes to South Asia, the results from the two publications are consanant. There is no conflict.*

More interesting are the results in West Asia, and the linguistic supplement. In the authors note that tablets now indicate an Indo-Aryan presence in Syria ~1750 BC. Second, Assyrian merchants record Indo-European Hittite, or Nesili (the people of Nesa), as early as ~2500 BC.

As suggested in earlier work Hittite remains don’t suggest steppe influence. David W. says:

The apparent lack of steppe ancestry in five Hittite-era, perhaps Indo-European-speaking, Anatolians was interpreted in Damagaard et al. 2018 as a major discovery with profound implications for the origin of the Anatolian branch of Indo-European languages.

But I disagree with this assessment, simply because none of these Hittite-era individuals are from royal Hittite, or Nes, burials. Hence, there’s a very good chance that they were Hattians, who were not of Indo-European origin, even if they spoke the Indo-European Hittite language because it was imposed on them.

The main aspect I’d bring up with this is that in other areas steppe ancestry has spread deeply and widely into the population, including non-Indo-European ones. It is certainly possible that the sample is not needed enough to pick up the genuinely Hittite elite, but I probably lean to the likelihood that the steppe signal won’t be found. It seems that the Anatolian languages were already diversified by ~2000 BC, and perhaps earlier. Linguists have long suggested that they are the outgroup to other Indo-European languages, though this could just be a function of their isolation among highly settled and socially complex populations.

Two alternative models present themselves for these results. The Anatolian Indo-European languages expanded through elite diffusion,  part of the same general migrations that emerged out of the Yamna culture ~3000 BC. The lack of a steppe signal may be due to sampling bias, as David W. suggested, or, more likely in my opinion, simple dilution of the signal. Second, the steppe migrations were one part of a broader palette of population movements and cultural diffusions, and the Anatolian Indo-Europeans are basal to the efflorescence of the steppe derived branches.

The evidence of the explosion of Indo-Aryans in the years after 2000 BC in West and South Asia, as well as the expansion of Iranians across vast swaths of Inner Asia during the same period, suggest to me that Indo-Iranians are most definitely part of the steppe pulse. The connection to the Sintashta charioteers presents itself, and, connections to the Uralic languages indicates incubation in the trans-Volga region.

In West Asia, the Indo-Aryans crashed themselves against the most advanced civilizations of their time. Like the Bulgars, and unlike the Hittites, Indo-Aryan Mitanni was totally absorbed by their non-Indo-European Hurrian substrate. Indo-Aryan linguistic influence was preserved in their names, their gods, and in particular words relating to chariots. And yet in 2017’s Continuity and Admixture in the Last Five Millennia of Levantine History from Ancient Canaanite and Present-Day Lebanese Genome Sequences, the authors observe:

We next tested a model of the present-day Lebanese as a mixture of Sidon_BA and any other ancient Eurasian population using qpAdm. We found that the Lebanese can be best modeled as Sidon_BA 93% ± 1.6% and a Steppe Bronze Age population 7% ± 1.6% (Figure 3C; Table S6). To estimate the time when the Steppe ancestry penetrated the Levant, we used, as above, LD-based inference and set the Lebanese as admixed test population with Natufians, Levant_N, Sidon_BA, Steppe_EMBA, and Steppe_MLBA as reference populations. We found support (p = 0.00017) for a mixture between Sidon_BA and Steppe_EMBA which has occurred around 2,950 ± 790 ya (Figure S13B).

This needs to be more explored. The admixture could have come from many sources. I am curious about the frequency of R1a1a-z93 among modern-day Syrians and Lebanese.

For me these arguments can only be resolved with a deeper understanding of linguistic evolution. The close relationship of Indo-Aryan and Iranian languages is obvious to any speaker of either of these languages (I can speak some Bengali). A divergence in the range of 4 to 5 thousand years before the present seems most likely to me. But the relationship of the other Indo-European languages is much less clear.

One of the arguments in Peter Bellwood’s First Farmers is that the Indo-European languages exhibit a “rake-like” topology with the exception of Indo-Iranian, which forms a clear clade. To him and others in his camp, this argues for deep divergences very early in time.

It is hard to deny that the steppe migrations between 4 and 5 thousand years ago had something to do with the distribution of modern Indo-European languages. But, it is harder to falsify the model that there were earlier Indo-European migrations, perhaps out of the Near East, that preceded these. Only a deeper understanding of linguistic evolution, and multidisciplinary analysis of regional substrates will generate the clarity we need.

* I’m going to skip the Botai angle in this post.

Is American genetic diversity enough?


In the nearly 20 years since the draft of the human genome was complete,* we’ve moved on to bigger and better things. In particular, researchers are looking to diversify their panels of human genetic diversity, because of differences between groups matter. You can’t just substitute them for each other genetically.

There have been efforts to diversify the population panels recently, but that prompts the question whether American population coverage is sufficient. My first thought is that the genetic diversity in the USA is probably getting us 90% of the way there. Consider Spencer’s comment about Queens, it’s the most ethnically diverse large conurbation in the country.

There are some gaps though. In Who We Are David Reich points out the distinctiveness of Indian population genetics. The subcontinent has lots of large census populations which have drifted upward deleterious alleles due to long-term endogamy. And, many of these populations don’t have a strong representation in the Diaspora.

In contrast, much of the rest of the world is panmictic enough that an American panel can pick up most of the variation. American Chinese are skewed toward Guandong and Fujian, but a substantial number of people from other parts of China have arrived in the last generation. Regional structure is not so strong that you’ll miss out on too much, aside from very rare variants which are more extended pedigree scale rather than population scale.

There are small populations such as Hadza, Khoikhoi, and Pygmies in Africa which are probably going to be missed by American population panels, but the total census size of these groups is pretty low (for comparison, there are 1 million Pulayar Dalits in the state of Kerala alone). Much of the rest of Africa is West African variation well represented in African Americans, and Bantu and Nilotic variation probably captured my immigrant communities.

I’d propose supplementing American genetic diversity with sampling Cape Coloureds in South Africa.

* No discussions about how the genome isn’t totally complete. I know that.