Unless you have been sleeping today you may have noticed two important papers on South Asian historical population genetics have been published. The simple and short paper is An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers. The longer paper, which is basically a book if you read the supplements, is The Formation of Human Populations in South and Central Asia (and update on a preprint which came out over a year ago).
So the “Rakhigarhi genome” is finally out. She turns out to be an interesting individual: she has some, but not much, Andamanese-related hunter-gatherer ancestry, a lot of Iranian-farmer-related ancestry, and no steppe ancestry. She is very similar the dozen or so “Indus Periphery” samples found outside of South Asia, in the region’s near-abroad (Khorasan and into Turan). Her mtDNA is U2b2. My mtDNA is U2b. So my mother’s maternal lineage dates back to the IVC period. Not a surprise, but still cool.
The major finding that is of great interest is that the “Iranian-farmer” ancestry of the Indus Valley Civilization population was possibly not “Iranian” at all. That is, it seems unlikely that the West Asian-related ancestry in the IVC people was due to a migration out of the Zagros agricultural hearth. The reasoning here is simple. There was ancient population structure in the Near East at the beginning of the Holocene. There were, roughly, there major groups which expanded, Anatolian farmers, related Levantine farmers, and more distantly related Iranian (Zagros) farmers. These groups intermixed copiously during the Holocene. All the farmers of the Holocene in western Iran and even the hunter-gatherers had some ancestry from the Anatolian lineage.
Anatolian heritage is not present in the IVC people. Because Anatolian ancestry is found in Iranian hunter-gatherers at the beginning of the Holocene, the West Asian-related ancestors of the IVC people must have diverged earlier. One option is that there were a set of hunter-gatherer populations in the territory of modern Iran, Afghanistan, and Pakistan (and possibly northwest India) who were related to each other but differentiated due to distance and separation. Modern Iran is bifurcated by some rather harsh deserts between the west and the east. There is no reason the same could not have applied to the Pleistocene. In particular, during the Last Glacial Maximum.
Related to this, Iosif Lazaridis has a preprint out which argues that the difference between the “Anatolian” and “Iran” clusters lay in differential admixture with “Ancient North Eurasians” (ANE) into the latter. The non-Rakhigarhi paper above highlights the role of Turan in mediated interaction and gene flow between northern Eurasia and Iran-Afghanistan-Central Asia region. The difference between the quasi-Iranian ancestors of the IVC people and those of the Zagros, the Iranians proper, may simply be that the ANE-related admixture was stronger further east. Or not. In some ways, the paper opens up a lot of possibilities as to the landscape of late Pleistocene western Asia. It is a reasonable interpretation in the paper that agriculture was spread not through mass migration (e.g., Bantu expansion, farming in Neolithic Europe, etc.) to northwest South Asia, but through cultural diffusion. But the distribution and origin of the quasi-Iranian population need a lot more ancient DNA.
The origin and distribution of Andamese-related hunter-gatherers (AHG), earlier described as “Ancient Ancestral South Indians” (AASI), also needs more elucidation. It has long been known that the various East Eurasian groups seem to have separated very soon after 40,000 years ago. The AHG clade is only distantly related to the Andamanese themselves, who have more of an affinity with the Hoabinhian people of Southeast Asia. Though the diversity of mtDNA macro-haplogroup M is suggestive of long-term habitation of South Asia by some of the AHG, we cannot reject the possibility that they were intrusive from the east during the Pleistocene or Holocene, at least in part.
The awkward construct proposed by Indian researchers to David Reich to term the ancestral populations “ANI” and “ASI” (Ancestral North Indian and Ancestral South Indian) was to some extent a political move. It left open the possibility of deep geographical indigeneity of most of the ancestry of modern South Asians. I was moderately skeptical because I suspected the ANI was intrusive from West Asia (the Iranian-farmer and steppe migration models). These results do not support that, and it may, in fact, be the case that ANI-like quasi-Iranians occupied northwest South Asia for a long time, and AHG populations hugged the southern and eastern fringes, during the height of the Pleistocene.
What a lot of these questions need are people with detailed paleoclimate knowledge. The human geography would be much easier to infer if we had a sense of the primary carrying capacity. Hunter-gatherers tend to be very thin in desert areas, so those would serve as natural gene flow barriers. The divergence between western and eastern Eurasian populations is rather stark, so one might suppose that the Thar desert region was particularly difficult during the Pleistocene to traverse.
At some point, I have to come back to the “Aryan question.” These papers strongly point to the likelihood that the Aryans were intrusive to the Indian subcontinent.
From the Cell paper:
Since language spreads in pre-state societies are often accompanied by large-scale movements of people (Bellwood, 2013), these results argue against the model (Heggarty, 2019) of a trans-Iranian- plateau route for Indo-European language spread into South
Asia. However, a natural route for Indo-European languages to have spread into South Asia is from Eastern Europe via Central Asia in the first half of the 2nd millennium BCE, a chain of transmission now documented in detail with ancient DNA. The fact that the Steppe pastoralist ancestry in South Asia matches that in Bronze Age Eastern Europe (but not Western Europe [de Barros Damgaard et al., 2018; Narasimhan et al., 2019]) provides additional evidence for this theory, as it elegantly explains the shared distinctive features of Balto-Slavic and Indo-Iranian languages (Ringe et al., 2002).
From the Science paper:
Our results not only provide negative evidence against an Iranian plateau origin for Indo-European languages in South Asia, but also positive evidence for the theory that these languages spread from the Steppe. While ancient DNA has documented westward movements of Steppe pastoralist ancestry providing a likely conduit for the spread of many Indo-European languages to Europe (7, 8), the chain-of-transmission into South Asia has been unclear because of a lack of relevant ancient DNA. Our observation of the spread of Central_Steppe_MLBA ancestry into South Asia in the first half of the 2 nd millennium BCE provides this evidence, and is particularly striking as it provides a plausible genetic explanation for the linguistic similarities between the Balto-Slavic and Indo-Iranian sub-families of Indo-European, which despite their vast geographic separation, share the Satem innovation and Ruki sound laws (63). If the spread of people from the Steppe in this period was a conduit for the spread of South Asian Indo-European languages, then it is striking that there are so few material culture similarities between the central Steppe and South Asia in the Middle to Late Bronze Age (i.e. after the middle of the 2nd millennium BCE). Indeed, the material culture differences are so substantial that some archaeologists recognize no evidence of a connection. However, lack of material culture connections does not provide evidence against spread of genes, as has been demonstrated in the case of the Beaker Complex, which originated largely in western Europe, but in Central Europe was associated with skeletons that harbored ~50% ancestry related to Yamnaya Steppe pastoralists (18).
If you look deeper in the paper you see that the authors zeroed in on the period between 2000 and 1000 BCE for a reason. The people of the Eurasian steppe are diverse, and always in flux, and the earlier and later agro-pastoralists were genetically distinct. The Yamnaya culture lacked a “European” element that arrived on the forest-steppe through demographic reflux. The later Indo-European agro-pastoralists, such as the Scythians and Kushans, tended to have East Asian ancestry which is lacking in northwest South Asia. The particular profile found groups such as North Indian Brahmins fits best with the steppe people which were ascendant in the period between 2000 and 1500 BCE.
There is, of course, the assertion by some Indians that Indo-European languages are indigenous to South Asia. If that is the case, then they would have had to expand elsewhere. I won’t address archaeological or linguistic issues. Rather, the problem is that the spread of “steppe” ancestry in the period between 3000 and 1000 BCE across the whole zone of Indo-European speaking languages is so clear that it is the most likely candidate, and the steppe ancestry has origins in the…forest-steppe. Indian counter-arguments are not impossible but tend to be highly complicated.
To me, the more interesting aspect of the story is not the origin of the Indo-Aryans, but how they came into being into what they were as depicted in the Vedas, and later the epics such as the Mahabharata and Ramayana. Let me quote from the Science paper:
Taken together, the poor fits at both extremes of the Indian Cline imply that the Indian Cline does not represent a simple mix of two homogeneous ancestral populations, ANI and ASI. Instead, in the Middle to Late Bronze Age both of these groups were themselves part of metapopulations—relatively well represented by the Steppe Cline and the Indus Periphery Cline—that were not completely homogenized at the time they met and mixed. Most groups in India today can be represented as mixtures of average points along the Steppe Cline (we show below that the ANI fit along the Steppe Cline) and the Indus Periphery Cline (the ASI) but there are deviations from this simple model that contribute to the observed patterns.
Between 1500 and 500 BCE South Asia saw the development of Indian genetics and culture in a way that we understand it today, from the north to the south. One of the striking aspects of the Swat valley samples in the Science paper is that AHG ancestry increases over time (along with steppe ancestry). The Swat people seem to have started out a much higher fraction of IVC sorts, very high on Iranian-related ancestry. But after 1000 BCE they integrated more and more with people to their south and east. Meanwhile, in South India, groups like Nadars from the Tamil country are still about 5% steppe in their heritage, and non-trivial fractions of R1a1a is found among these groups.
There is now a good amount of evidence that the Austro-Asiatic Munda expanded into a landscape where unmixed AHG/AASI populations existed. Though the Science paper puts this in the 3rd millennium, I think the period between 2000 and 1000 BCE is more likely, since Austro-Asiatic rice farmers are found in northern Vietnam in 1900 BCE. The existence of unmixed AHG/AASI suggests to me that the expansion and dominance of Dravidian-speaking agricultural societies in much of South India in the form we recognize them today does not predate the arrival of Indo-Aryans by much if at all. Rather than thinking of Indian culture as the application of Indo-Aryan elements atop a Dravidian base, it is more accurate I think to consider them a synthesis that developed simultaneously. Though it is quite likely that the IVC language was related to that of the Dravidians, the impact of the Indo-Aryans shapes most Dravidian-speaking societies both culturally and genetically.
In fact, the Indo-Aryans themselves had changed genetically and culturally by the time they occupied territory within South Asia. They had mixed with people in eastern Iran and Afghanistan, reducing their steppe fraction, and then mixed again with local South Asian populations. The Indo-Iranian soma/homa cult may have been picked up from the culture of Bactria-Margiana.
A major takeaway from these sorts of papers is the uniqueness of humans and the integrative and panmictic power of culture. From a population genetic perspective parameters such as distance and topography matter a lot. Major ecological barriers such as deserts also have an impact. But the spread of Indo-European languages and genes is more than just a matter of diffusion. A powerful cultural organism expanded, assimilated, and in some cases integrated and synthesized, huge swaths of Eurasia. The IVC society was successful for several thousand years. But it is clear that there were plenty of AHG peoples in the Indian subcontinent while they flourished in the northwest. It was the arrival of Indo-Aryans which revolutionized things so that no “pure” AHG community exists in South Asia today.
Ironically, the sons of Indra spread the seed of the Dasa far and wide, from the Himalaya to Kanyakumari.