A new paper, Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages, has made a splash by inferring a far older date of diversification of these languages than has been assumed by other linguists, archaeologists and geneticists. As you can see above, the splits start a bit earlier than 5000 BC in this model, 1,500-2,000 years before the “classic” Pontic-steppe hypothesis. There are divergences in the typology from what some have assumed, for example, the deep split of Indo-Iranian from other groups. Nemets in Proto-Indo-European Urheimat Debate has given some skeptical thoughts, while Iosif Lazaridis of the “Southern Arc” fame has also offered his two cents. Others on social media have pointed out what seem to be important errors in the paper:
1- Heggarty et al. 2023 claims that no southward migration of Steppe_MLBA pastoralists is attested by aDNA from BMAC sites in 2300-1700 BCE by citing Narasimhan et al. 2019.
2- Narasimhan et al. 2019 concludes that Steppe_MLBA pastoralists migrated southward during 2100-1700 BCE pic.twitter.com/DY6f91uTgC
— (@vicayana) August 1, 2023
I can’t speak to the linguistics. I will say that I was taught about Bayesian phylogenetic inference in graduate school, so I know some of the models and parameters they’re using, and I’ve even used BEAST 2 myself. It’s not my specialty at all, so I have weak intuition, but these are serious methods that allow for understanding the past and reconstruction of evolutionary relationships. But I will pass on Asya Pereltsvaig’s criticism in The Indo-European Controversy: Facts and Fallacies in Historical Linguistics that the reliance on lexicon as input data might be a major problem in these inferences; misleading data in, misleading result out. But I’ll comment a few issues that jumped out at me informed by ancient DNA.
First, the position of Tocharian is not surprising…it often comes out as diverging early from the other languages. Tocharian languages were found in the northern and northeast regions of the Tarim basin. Historically, the southern rim of the basin was dominated by Iranian languages. It seems the most likely candidate for the people that gave rise to the Tocharian languages is the Afanasievo culture. The Afanasievo we now know were basically an eastern branch of the Yamnaya that show up in the Altai 3300 BC. This is 5,300 years BP. In the paper, the Tocharian split from other Indo-Europeans 5,400 to 8,600 years BP over a 95% confidence interval. The only way this makes sense to me is if there was deep linguistic structure within the Yamnaya despite overall genetic homogeneity maintained through mate exchange. In the text the authors seem to imply that the Tocharians are an early eastward migration, perhaps from the south Caucasus region. This does not align very well with the ancient DNA. The Afanasievo early on are replica copies of Yamnaya. Were the Tocharians already there? Did the Afanasievo just adopt their language?
The second issue I have broadly is with the Indo-Iranians. The authors propose that the Indic and Iranian branches separated in 3,500 BC. While earlier work indicates that the Indo-Iranian languages descend from the Sintashta language and the cultures of the Andronovo horizon, these authors emphasize the role of populations from the south Caucasus traversing Iran south of the Caspian Sea.
Below is a map that gets to the crux of my confusion about this ancient date and longstanding indigeneity of Indo-Iranians on the Iranian plateau:
We have some histories of the Middle Eastern Bronze Age. We know that the area of southwest Iran was dominated by the non-Indo-European Elamites as early as 3000 BC, and these people persisted down into the Common Era. Modern Armenia was dominated by non-Indo-European speaking Urartians after 1000 BC. This language is related to Hurrian, documented 4,000 years ago. Before the Indo-European Hittites ruled Hatti, the Hattians ruled Hatti. And the Hattians were not Indo-European. Judging by the obscure Eteocretan language that persisted into antiquity the Minoans were almost certainly not Indo-European speaking. Around 1500 BC it is true the ruling elite of the Hurrians, the Mittani, seem to have had an Indo-Aryan connection, but they were also likely intrusive, and their emergence as mobile warriors suspiciously post-dates the development of the light war chariot and the domestic horse thousands of miles to the north several centuries earlier by the Sintashta. The Assyrian royal annals date the arrival of Persians to the 9th century BC, but the results in the paper imply that the Iranians were already present in the Zagros for thousands of years before this (the south Caucasus being the Indo-European ur-heimat ultimately, the Indo-Iranians moving south and east very early on from that region).
I’m focusing on the Middle East because there is a rich history of textual evidence starting in the third millennium BC. These results imply that Indo-European languages are in fact native to the northern Middle East, in the southern Caucasus. And yet assorted obscure languages like Gutian, Kassite and Kaska, are found where you might expect a stray Indo-European here and there. To me this is curious and weird. Further to the west, these results seem to imply that Greek was brought with Caucasus ancestry, but Minoan was likely not Indo-European. There are all these non-Indo-European languages attested in the textual record…and only a few Indo-European ones (Hittite being the first).
One of the major points of this paper that contradicts some theories in historical linguistics is a rejection of the tentative connection between Balto-Slavic and Indo-Iranian. Genetically, the curious aspect of the two language families is that Y chromosomal haplogroup R1a is very frequent in both, but differentiated into two lineages that seem to have diverged 5,500-6,000 years ago. But there is more than just Y chromosomes here; over the past decade autosomal genome analyses show that many South Asians, in particular those in the northwest and upper caste populations are enriched for a minority ancestral component that resembles Eastern Europeans. We now know what happened due to ancient DNA: Genetic ancestry changes in Stone to Bronze Age transition in the East European plain. A branch of the Corded Ware Culture (CWC) migrated eastward, becoming the Fatyanovo Culture, then the Balanovo Culture, then the Abeshevo Culture, and finally the Sintashta Culture. The Sintashta seem to have given rise a group of societies known as Andronovo that are hypothesized to evolved into Iranians and Indo-Aryans.
The result here does away with all this. Rather than Indo-Aryan speech being brought by steppe pastoralists between 3,500 and 4,000 years ago, as genetics would imply, the Indo-Aryan speech was likely present during the Indus Valley Civilization. These results imply that Indo-Aryan arrived in India thousands of years before the intrusion of steppe pastoralists, and it was carried eastward by farmers from the Caucasus. The Vedas and Sanskrit then come down from the IVC. And yet strangely the Vedas do not depict a very complex society like the IVC, but a more simple agro-pastoralist one. And, the sacred language of the IVC people presumably, Sanskrit, was maintained in particular by a Brahmin priestly caste that is notable for having a very high fraction of steppe ancestry, that much arrived later.
A massive issue of this paper is that it makes a hash of a major phenomenon that we know between 3500 and 2500 BC, and that’s the spread of steppe-people in all directions, especially out of the Corded Ware complex. The CWC are notable for having a major admixture of Globular Amphora Culture (GAC) Neolithic ancestry, about 25-35% of their genetics, and then spreading into all directions. As noted by the authors and other observers, ancient DNA suggests that Anatolian, Armenian, and perhaps Greek and Illyrian (Albanian), are exceptions to this, deriving directly from Yamnaya or pre-Yamnaya (in the case of Hittites) Indo-European people (remember, CWC is a mix of Yamnaya and GAC). The genetics is very clear that a major wave of post-CWC people went into Asia, and south into the Indian subcontinent and Iran. The Y chromosomes imply this was male mediated, and post-CWC Y chromosomes are found in appreciable quantities as far south as Sri Lanka. But these data place this demographic migration far too late to have been the origin of Sanskrit, which is associated with Arya culture.
As Lazaridis points out on social media, the divergence of European language groups like Germanic, Celtic, Italic and Balto-Slavic also predates the CWC expansion westward. For example, Italic language split off in 3500 BC, 500 years earlier than the expansion of CWC into Eastern Europe, with a 95% lower-bound of 2200 BC, about when steppe ancestry shows up in the Italian peninsula according to ancient DNA. If the dates are true then it seems that the various Indo-European language groups were differentiated already very early on in the Yamnaya, and not later on through their expansion across Europe. In other words, this is a model of “ancient linguistic substructure.”
To be entirely candid, it’s very hard for me to reconcile the ancient DNA with this typology and time-depth in a parsimonious manner that holds together in my head. This doesn’t mean the other models don’t have holes, the “Southern Arc” theory is pretty complicated too, and everything would have been “easier” if the Hittites had steppe ancestry, and they do not seem to. But there are too many things that are hard for me to understand with this new model. For example, the vast numbers of steppe Iranian people seem to be mostly descended from the CWC societies that gave rise to Europeans, but their languages diverged extremely early from their western neighbors, almost 2,000 years before the diversification of the CWC as an archaeological, demographic and genetic unit.