Computational linguistic phylogenetics and Indo-Europeans

A new paper, Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages, has made a splash by inferring a far older date of diversification of these languages than has been assumed by other linguists, archaeologists and geneticists. As you can see above, the splits start a bit earlier than 5000 BC in this model, 1,500-2,000 years before the “classic” Pontic-steppe hypothesis. There are divergences in the typology from what some have assumed, for example, the deep split of Indo-Iranian from other groups. Nemets in Proto-Indo-European Urheimat Debate has given some skeptical thoughts, while Iosif Lazaridis of the “Southern Arc” fame has also offered his two cents. Others on social media have pointed out what seem to be important errors in the paper:

I can’t speak to the linguistics. I will say that I was taught about Bayesian phylogenetic inference in graduate school, so I know some of the models and parameters they’re using, and I’ve even used BEAST 2 myself. It’s not my specialty at all, so I have weak intuition, but these are serious methods that allow for understanding the past and reconstruction of evolutionary relationships. But I will pass on Asya Pereltsvaig’s criticism in The Indo-European Controversy: Facts and Fallacies in Historical Linguistics that the reliance on lexicon as input data might be a major problem in these inferences; misleading data in, misleading result out. But I’ll comment a few issues that jumped out at me informed by ancient DNA.

First, the position of Tocharian is not surprising…it often comes out as diverging early from the other languages. Tocharian languages were found in the northern and northeast regions of the Tarim basin. Historically, the southern rim of the basin was dominated by Iranian languages. It seems the most likely candidate for the people that gave rise to the Tocharian languages is the Afanasievo culture. The Afanasievo we now know were basically an eastern branch of the Yamnaya that show up in the Altai 3300 BC. This is 5,300 years BP. In the paper, the Tocharian split from other Indo-Europeans 5,400 to 8,600 years BP over a 95% confidence interval. The only way this makes sense to me is if there was deep linguistic structure within the Yamnaya despite overall genetic homogeneity maintained through mate exchange. In the text the authors seem to imply that the Tocharians are an early eastward migration, perhaps from the south Caucasus region. This does not align very well with the ancient DNA. The Afanasievo early on are replica copies of Yamnaya. Were the Tocharians already there? Did the Afanasievo just adopt their language?

The second issue I have broadly is with the Indo-Iranians. The authors propose that the Indic and Iranian branches separated in 3,500 BC. While earlier work indicates that the Indo-Iranian languages descend from the Sintashta language and the cultures of the Andronovo horizon, these authors emphasize the role of populations from the south Caucasus traversing Iran south of the Caspian Sea.

Below is a map that gets to the crux of my confusion about this ancient date and longstanding indigeneity of Indo-Iranians on the Iranian plateau:

We have some histories of the Middle Eastern Bronze Age. We know that the area of southwest Iran was dominated by the non-Indo-European Elamites as early as 3000 BC, and these people persisted down into the Common Era. Modern Armenia was dominated by non-Indo-European speaking Urartians after 1000 BC. This language is related to Hurrian, documented 4,000 years ago. Before the Indo-European Hittites ruled Hatti, the Hattians ruled Hatti. And the Hattians were not Indo-European. Judging by the obscure Eteocretan language that persisted into antiquity the Minoans were almost certainly not Indo-European speaking. Around 1500 BC it is true the ruling elite of the Hurrians, the Mittani, seem to have had an Indo-Aryan connection, but they were also likely intrusive, and their emergence as mobile warriors suspiciously post-dates the development of the light war chariot and the domestic horse thousands of miles to the north several centuries earlier by the Sintashta. The Assyrian royal annals date the arrival of Persians to the 9th century BC, but the results in the paper imply that the Iranians were already present in the Zagros for thousands of years before this (the south Caucasus being the Indo-European ur-heimat ultimately, the Indo-Iranians moving south and east very early on from that region).

I’m focusing on the Middle East because there is a rich history of textual evidence starting in the third millennium BC. These results imply that Indo-European languages are in fact native to the northern Middle East, in the southern Caucasus. And yet assorted obscure languages like Gutian, Kassite and Kaska, are found where you might expect a stray Indo-European here and there.  To me this is curious and weird. Further to the west, these results seem to imply that Greek was brought with Caucasus ancestry, but Minoan was likely not Indo-European. There are all these non-Indo-European languages attested in the textual record…and only a few Indo-European ones (Hittite being the first).

One of the major points of this paper that contradicts some theories in historical linguistics is a rejection of the tentative connection between Balto-Slavic and Indo-Iranian. Genetically, the curious aspect of the two language families is that Y chromosomal haplogroup R1a is very frequent in both, but differentiated into two lineages that seem to have diverged 5,500-6,000 years ago. But there is more than just Y chromosomes here; over the past decade autosomal genome analyses show that many South Asians, in particular those in the northwest and upper caste populations are enriched for a minority ancestral component that resembles Eastern Europeans. We now know what happened due to ancient DNA: Genetic ancestry changes in Stone to Bronze Age transition in the East European plain. A branch of the Corded Ware Culture (CWC) migrated eastward, becoming the Fatyanovo Culture, then the Balanovo Culture, then the Abeshevo Culture, and finally the Sintashta Culture. The Sintashta seem to have given rise a group of societies known as Andronovo that are hypothesized to evolved into Iranians and Indo-Aryans.

The result here does away with all this. Rather than Indo-Aryan speech being brought by steppe pastoralists between 3,500 and 4,000 years ago, as genetics would imply, the Indo-Aryan speech was likely present during the Indus Valley Civilization. These results imply that Indo-Aryan arrived in India thousands of years before the intrusion of steppe pastoralists, and it was carried eastward by farmers from the Caucasus. The Vedas and Sanskrit then come down from the IVC. And yet strangely the Vedas do not depict a very complex society like the IVC, but a more simple agro-pastoralist one. And, the sacred language of the IVC people presumably, Sanskrit, was maintained in particular by a Brahmin priestly caste that is notable for having a very high fraction of steppe ancestry, that much arrived later.

A massive issue of this paper is that it makes a hash of a major phenomenon that we know between 3500 and 2500 BC, and that’s the spread of steppe-people in all directions, especially out of the Corded Ware complex. The CWC are notable for having a major admixture of Globular Amphora Culture (GAC) Neolithic ancestry, about 25-35% of their genetics, and then spreading into all directions. As noted by the authors and other observers, ancient DNA suggests that Anatolian, Armenian, and perhaps Greek and Illyrian (Albanian), are exceptions to this, deriving directly from Yamnaya or pre-Yamnaya (in the case of Hittites) Indo-European people (remember, CWC is a mix of Yamnaya and GAC). The genetics is very clear that a major wave of post-CWC people went into Asia, and south into the Indian subcontinent and Iran. The Y chromosomes imply this was male mediated, and post-CWC Y chromosomes are found in appreciable quantities as far south as Sri Lanka. But these data place this demographic migration far too late to have been the origin of Sanskrit, which is associated with Arya culture.

As Lazaridis points out on social media, the divergence of European language groups like Germanic, Celtic, Italic and Balto-Slavic also predates the CWC expansion westward. For example, Italic language split off in 3500 BC, 500 years earlier than the expansion of CWC into Eastern Europe, with a 95% lower-bound of 2200 BC, about when steppe ancestry shows up in the Italian peninsula according to ancient DNA. If the dates are true then it seems that the various Indo-European language groups were differentiated already very early on in the Yamnaya, and not later on through their expansion across Europe. In other words, this is a model of “ancient linguistic substructure.”

To be entirely candid, it’s very hard for me to reconcile the ancient DNA with this typology and time-depth in a parsimonious manner that holds together in my head. This doesn’t mean the other models don’t have holes, the “Southern Arc” theory is pretty complicated too, and everything would have been “easier” if the Hittites had steppe ancestry, and they do not seem to. But there are too many things that are hard for me to understand with this new model. For example, the vast numbers of steppe Iranian people seem to be mostly descended from the CWC societies that gave rise to Europeans, but their languages diverged extremely early from their western neighbors, almost 2,000 years before the diversification of the CWC as an archaeological, demographic and genetic unit.

Indo-Europeans!


For some pieces on my Substack I’ve been re-reading a lot of the stuff on the ancient genetics and archaeology of Eurasia as they relate to Indo-Europeans. This means I get a different view from usual…as it’s more synoptic. I’m not entirely clear on the dates or archaeology, but here is what I’ve concluded: the Indo-European expansions can be partitioned into “waves.” That is, they weren’t a simple “demic diffusion” where disease (against their rivals) and reproductive excess generated a continuous expansion across their range.

So here’s what I get

1 – An “early phase” where Yamna people push west (Kurgan) and become Corded Ware, and east (far) and become Afanasievo. Date this to right before 3,000 BC, but pretty much “completes” in Europe by 2900-2800 BC, as the broad zone of Central and Northeast Europe is dominated by these people (there are still debates on whether Afanasievo became the “Tocharians”; I think they did)

2 – ~2500 BC, 400-500 years after the initial push west, Indo-European populations push beyond their limits on the Rhine, and breakthrough past the mountains ringing the Southern European peninsulas. The dates are often vague in the south, but it looks to be around 2500 to 2000 BC. For example, the Neolithic farmer descended Remedello Culture in northern Italy ends about 2400 BC. The Bell Beaker Indo-Europeans seem to have arrived in Ireland and England at just about this time, perhaps a century after they came to dominate France.

Though there were obviously islands of exception (often quite literally as in Sardinia and Crete), Europe by 2000 BC was Indo-European.

3 – The third wave dates to after 2000 BC, and it is the “Asia reflux.” Populations used the forest-steppe zone as a stepping stone out to the east. Derived from the same synthesis between Yamna and European farmer as Corded Ware, these populations seem ancestral to the Indo-Iranians. Slavic-speaking people (or the ancestors of those people) occupied the western fringe of this expansion zone, and by the Iron Age had begun to move east, marginalizing Indo-Iranians across much of their core European territory.

It seems that Indo-Iranians had pushed into the margins of northeast Iran, Khorasan, by ~2000 BC. In the period between 2000-1500 BC they clearly began to occupy their historical core zones in Iran and India. Obviously, Indo-European Iranians are present in western Iran by 1000 BC in the historical record, though Indo-European Mitanni are present by 1540 BC at the latest in Syria and northern Iraq.

The Iranians also moved into the Tarim basin, so the cities of the west and southern edge were Iranian-speaking (the cities of the north and east were Tocharian).

What explains these pulses? I don’t know totally, but we know a few things:

– There are star phylogenies on the Y chromosomal associated with these migrations. R1b, R1a, and I1. I think the last is due to the assimilation of non-Indo-European men in Europe, but the first two are clearly primal. The Indo-Europeans were clearly very patrilineal.

– The last, Asian, migration clearly has something to do with chariots and horses. The coincidence in timing seems too much. But the earlier migrations were before chariots (I believe). But, the horse does seem to have come with Indo-Europeans, so there was a level of mobility involved.

– The “Bell Beaker” motif seems to have emerged among non-Indo-Europeans in Iberia, and spread to Indo-Europeans, who expanded outward. I think we’re seeing something related to religion.

Unfortunately for I suspect that the Indo-European advantage was “social technology”, not material technology. Social technology is hard to infer in a preliterate society.

Question for readers: Can you nail down the chronology better? Those who know archaeology?

The brotherhoods of the plains

One of my favorite concepts is “evoked culture.” This is basically pointing to the fact that some human cultural forms and practices aren’t contingent and arbitrary, but naturally emerge due to the canalization imposed by our cognitive biases and the physical and social world around us. An example Spencer Wells likes to use to illustrate this is that indigenous hunters on the Andaman Islanders immediately took to dogs as helpmates when they were introduced to them. The coadaptation between domestic dogs and humans is clearly intense and natural.

Horses are another example. The Plains Indians of the New World became some of the most fearsome mounted warriors in history after the Spaniards introduced domestic horses. By the 18th century some tribes, such as the Comanche, were fully mounted and mobile. This occurred over 200 years.

As documented in 2010’s Empire of the Summer Moon the Comanche recaptiulated some of the patterns of steppe pastoralists of the past: they integrated women of other people into their nation while committing acts of brutal violence against those other people. The last great chief of the Comanche, Quanah Parker, was the son of a white Texan woman.

I think of the Comanche when I wonder what the early steppe peoples were like. The Scythians, Sarmatians, the Sintashtas and Turks. The impact of the horse on their lifestyles, and the centrality of the horse, still echoes down to the present. The Central Asian Turks still drink mare’s milk. Indian grooms still ride on a white mare at their wedding.

But one of the major themes in pastoralist communities seems to be patrilineality and integration of local women. This seems to be illustrated in a new paper, Corded Ware cultural complexity uncovered using genomic and isotopic analysis from south-eastern Poland:

During the Final Eneolithic the Corded Ware Complex (CWC) emerges, chiefly identified by its specific burial rites. This complex spanned most of central Europe and exhibits demographic and cultural associations to the Yamnaya culture. To study the genetic structure and kin relations in CWC communities, we sequenced the genomes of 19 individuals located in the heartland of the CWC complex region, south-eastern Poland. Whole genome sequence and strontium isotope data allowed us to investigate genetic ancestry, admixture, kinship and mobility. The analysis showed a unique pattern, not detected in other parts of Poland; maternally the individuals are linked to earlier Neolithic lineages, whereas on the paternal side a Steppe ancestry is clearly visible. We identified three cases of kinship. Of these two were between individuals buried in double graves. Interestingly, we identified kinship between a local and a non-local individual thus discovering a novel, previously unknown burial custom.

This seems a consistent pattern: “steppe” ancestry seems to be mediated through male migrations. There are likely differences between agro-pastoralists (like the early Germans who moved into the Roman Empire), and full-blown nomads like the Huns, Turks, and Mongols. But overall the trend seems to be the rise of a particular patriarchal culture with the horse people, along with the spread of gods of the sky and lacking attachment to a particular place.

Near Prehistory in Northern Europe was an Indo-European world

The Picts were the topic of discussion on this week on In Our Time. They are a mysterious yet intriguing people because we don’t know much about them in their own words, but, they are one of the roots of modern Scottish identity. When I first encountered the Picts decades ago there was some debate as to whether they were a pre-Indo-European people or not. Today that seems to not be a hypothesis people entertain. Rather, the Picts were simply the least Romanized of the Brythonic Celtic people of Britain.

Today because of the genetic data I think we can be rather confident that by the time of the Roman Empire there were no non-Indo-Europeans left in Northern Europe. The Beaker people in Britain and Ireland seem to have overwhelmingly replaced the native population of farmers, whose ancestors had predominantly arrived from the eastern Mediterranean thousands of years ago (via the Atlantic littoral or Central Europe). Across Northern Europe, in general, the replacement of the previous populations was substantial, though not total.

In Southern Europe, the arrival of Indo-Europeans was more fitful, and persistence of Basque attests to the fact that non-Indo-European languages were spoken down to historical times (if Etruscan is considered native to the Italian peninsula, that’s another example, though this is hotly debated and I lean toward the exogenous model). The pre-Latin language of Sardinia was almost certainly not Indo-European, while Greek has a high proportion of non-Indo-European words in its lexicon.