The day of the Dasa

Unless you have been sleeping today you may have noticed two important papers on South Asian historical population genetics have been published. The simple and short paper is An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers. The longer paper, which is basically a book if you read the supplements, is The Formation of Human Populations in South and Central Asia (and update on a preprint which came out over a year ago).

So the “Rakhigarhi genome” is finally out. She turns out to be an interesting individual: she has some, but not much, Andamanese-related hunter-gatherer ancestry, a lot of Iranian-farmer-related ancestry, and no steppe ancestry. She is very similar the dozen or so “Indus Periphery” samples found outside of South Asia, in the region’s near-abroad (Khorasan and into Turan). Her mtDNA is U2b2. My mtDNA is U2b. So my mother’s maternal lineage dates back to the IVC period. Not a surprise, but still cool.

The major finding that is of great interest is that the “Iranian-farmer” ancestry of the Indus Valley Civilization population was possibly not “Iranian” at all. That is, it seems unlikely that the West Asian-related ancestry in the IVC people was due to a migration out of the Zagros agricultural hearth. The reasoning here is simple. There was ancient population structure in the Near East at the beginning of the Holocene. There were, roughly, there major groups which expanded, Anatolian farmers, related Levantine farmers, and more distantly related Iranian (Zagros) farmers. These groups intermixed copiously during the Holocene. All the farmers of the Holocene in western Iran and even the hunter-gatherers had some ancestry from the Anatolian lineage.

Anatolian heritage is not present in the IVC people. Because Anatolian ancestry is found in Iranian hunter-gatherers at the beginning of the Holocene, the West Asian-related ancestors of the IVC people must have diverged earlier. One option is that there were a set of hunter-gatherer populations in the territory of modern Iran, Afghanistan, and Pakistan (and possibly northwest India) who were related to each other but differentiated due to distance and separation. Modern Iran is bifurcated by some rather harsh deserts between the west and the east. There is no reason the same could not have applied to the Pleistocene. In particular, during the Last Glacial Maximum.

Related to this, Iosif Lazaridis has a preprint out which argues that the difference between the “Anatolian” and “Iran” clusters lay in differential admixture with “Ancient North Eurasians” (ANE) into the latter. The non-Rakhigarhi paper above highlights the role of Turan in mediated interaction and gene flow between northern Eurasia and Iran-Afghanistan-Central Asia region. The difference between the quasi-Iranian ancestors of the IVC people and those of the Zagros, the Iranians proper, may simply be that the ANE-related admixture was stronger further east. Or not. In some ways, the paper opens up a lot of possibilities as to the landscape of late Pleistocene western Asia. It is a reasonable interpretation in the paper that agriculture was spread not through mass migration (e.g., Bantu expansion, farming in Neolithic Europe, etc.) to northwest South Asia, but through cultural diffusion. But the distribution and origin of the quasi-Iranian population need a lot more ancient DNA.

The origin and distribution of Andamese-related hunter-gatherers (AHG), earlier described as “Ancient Ancestral South Indians” (AASI), also needs more elucidation. It has long been known that the various East Eurasian groups seem to have separated very soon after 40,000 years ago. The AHG clade is only distantly related to the Andamanese themselves, who have more of an affinity with the Hoabinhian people of Southeast Asia. Though the diversity of mtDNA macro-haplogroup M is suggestive of long-term habitation of South Asia by some of the AHG, we cannot reject the possibility that they were intrusive from the east during the Pleistocene or Holocene, at least in part.

The awkward construct proposed by Indian researchers to David Reich to term the ancestral populations “ANI” and “ASI” (Ancestral North Indian and Ancestral South Indian) was to some extent a political move. It left open the possibility of deep geographical indigeneity of most of the ancestry of modern South Asians. I was moderately skeptical because I suspected the ANI was intrusive from West Asia (the Iranian-farmer and steppe migration models). These results do not support that, and it may, in fact, be the case that ANI-like quasi-Iranians occupied northwest South Asia for a long time, and AHG populations hugged the southern and eastern fringes, during the height of the Pleistocene.

What a lot of these questions need are people with detailed paleoclimate knowledge. The human geography would be much easier to infer if we had a sense of the primary carrying capacity. Hunter-gatherers tend to be very thin in desert areas, so those would serve as natural gene flow barriers. The divergence between western and eastern Eurasian populations is rather stark, so one might suppose that the Thar desert region was particularly difficult during the Pleistocene to traverse.

At some point, I have to come back to the “Aryan question.” These papers strongly point to the likelihood that the Aryans were intrusive to the Indian subcontinent.

From the Cell paper:

Since language spreads in pre-state societies are often accompanied by large-scale movements of people (Bellwood, 2013), these results argue against the model (Heggarty, 2019) of a trans-Iranian- plateau route for Indo-European language spread into South
Asia. However, a natural route for Indo-European languages to have spread into South Asia is from Eastern Europe via Central Asia in the first half of the 2nd millennium BCE, a chain of transmission now documented in detail with ancient DNA. The fact that the Steppe pastoralist ancestry in South Asia matches that in Bronze Age Eastern Europe (but not Western Europe [de Barros Damgaard et al., 2018; Narasimhan et al., 2019]) provides additional evidence for this theory, as it elegantly explains the shared distinctive features of Balto-Slavic and Indo-Iranian languages (Ringe et al., 2002).

From the Science paper:

Our results not only provide negative evidence against an Iranian plateau origin for Indo-European languages in South Asia, but also positive evidence for the theory that these languages spread from the Steppe. While ancient DNA has documented westward movements of Steppe pastoralist ancestry providing a likely conduit for the spread of many Indo-European languages to Europe (7, 8), the chain-of-transmission into South Asia has been unclear because of a lack of relevant ancient DNA. Our observation of the spread of Central_Steppe_MLBA ancestry into South Asia in the first half of the 2 nd millennium BCE provides this evidence, and is particularly striking as it provides a plausible genetic explanation for the linguistic similarities between the Balto-Slavic and Indo-Iranian sub-families of Indo-European, which despite their vast geographic separation, share the Satem innovation and Ruki sound laws (63). If the spread of people from the Steppe in this period was a conduit for the spread of South Asian Indo-European languages, then it is striking that there are so few material culture similarities between the central Steppe and South Asia in the Middle to Late Bronze Age (i.e. after the middle of the 2nd millennium BCE). Indeed, the material culture differences are so substantial that some archaeologists recognize no evidence of a connection. However, lack of material culture connections does not provide evidence against spread of genes, as has been demonstrated in the case of the Beaker Complex, which originated largely in western Europe, but in Central Europe was associated with skeletons that harbored ~50% ancestry related to Yamnaya Steppe pastoralists (18).

If you look deeper in the paper you see that the authors zeroed in on the period between 2000 and 1000 BCE for a reason. The people of the Eurasian steppe are diverse, and always in flux, and the earlier and later agro-pastoralists were genetically distinct. The Yamnaya culture lacked a “European” element that arrived on the forest-steppe through demographic reflux. The later Indo-European agro-pastoralists, such as the Scythians and Kushans, tended to have East Asian ancestry which is lacking in northwest South Asia. The particular profile found groups such as North Indian Brahmins fits best with the steppe people which were ascendant in the period between 2000 and 1500 BCE.

There is, of course, the assertion by some Indians that Indo-European languages are indigenous to South Asia. If that is the case, then they would have had to expand elsewhere. I won’t address archaeological or linguistic issues. Rather, the problem is that the spread of “steppe” ancestry in the period between 3000 and 1000 BCE across the whole zone of Indo-European speaking languages is so clear that it is the most likely candidate, and the steppe ancestry has origins in the…forest-steppe. Indian counter-arguments are not impossible but tend to be highly complicated.

To me, the more interesting aspect of the story is not the origin of the Indo-Aryans, but how they came into being into what they were as depicted in the Vedas, and later the epics such as the Mahabharata and Ramayana. Let me quote from the Science paper:

Taken together, the poor fits at both extremes of the Indian Cline imply that the Indian Cline does not represent a simple mix of two homogeneous ancestral populations, ANI and ASI. Instead, in the Middle to Late Bronze Age both of these groups were themselves part of metapopulations—relatively well represented by the Steppe Cline and the Indus Periphery Cline—that were not completely homogenized at the time they met and mixed. Most groups in India today can be represented as mixtures of average points along the Steppe Cline (we show below that the ANI fit along the Steppe Cline) and the Indus Periphery Cline (the ASI) but there are deviations from this simple model that contribute to the observed patterns.

Between 1500 and 500 BCE South Asia saw the development of Indian genetics and culture in a way that we understand it today, from the north to the south. One of the striking aspects of the Swat valley samples in the Science paper is that AHG ancestry increases over time (along with steppe ancestry). The Swat people seem to have started out a much higher fraction of IVC sorts, very high on Iranian-related ancestry. But after 1000 BCE they integrated more and more with people to their south and east. Meanwhile, in South India, groups like Nadars from the Tamil country are still about 5% steppe in their heritage, and non-trivial fractions of R1a1a is found among these groups.

There is now a good amount of evidence that the Austro-Asiatic Munda expanded into a landscape where unmixed AHG/AASI populations existed. Though the Science paper puts this in the 3rd millennium, I think the period between 2000 and 1000 BCE is more likely, since Austro-Asiatic rice farmers are found in northern Vietnam in 1900 BCE. The existence of unmixed AHG/AASI suggests to me that the expansion and dominance of Dravidian-speaking agricultural societies in much of South India in the form we recognize them today does not predate the arrival of Indo-Aryans by much if at all. Rather than thinking of Indian culture as the application of Indo-Aryan elements atop a Dravidian base, it is more accurate I think to consider them a synthesis that developed simultaneously. Though it is quite likely that the IVC language was related to that of the Dravidians, the impact of the Indo-Aryans shapes most Dravidian-speaking societies both culturally and genetically.

In fact, the Indo-Aryans themselves had changed genetically and culturally by the time they occupied territory within South Asia. They had mixed with people in eastern Iran and Afghanistan, reducing their steppe fraction, and then mixed again with local South Asian populations. The Indo-Iranian soma/homa cult may have been picked up from the culture of Bactria-Margiana.

A major takeaway from these sorts of papers is the uniqueness of humans and the integrative and panmictic power of culture. From a population genetic perspective parameters such as distance and topography matter a lot. Major ecological barriers such as deserts also have an impact. But the spread of Indo-European languages and genes is more than just a matter of diffusion. A powerful cultural organism expanded, assimilated, and in some cases integrated and synthesized, huge swaths of Eurasia. The IVC society was successful for several thousand years. But it is clear that there were plenty of AHG peoples in the Indian subcontinent while they flourished in the northwest. It was the arrival of Indo-Aryans which revolutionized things so that no “pure” AHG community exists in South Asia today.

Ironically, the sons of Indra spread the seed of the Dasa far and wide, from the Himalaya to Kanyakumari.

India vs. China, genetically diverse vs. homogeneous

About 36% of the world’s population are citizens of the Peoples’ Republic of China and the Republic of India. Including the other nations of South Asia (Pakistan, Bangladesh, etc.), 43% of the population lives in China and/or South Asia.

But, as David Reich mentions in Who We Are and How We Got Here China is dominated by one ethnicity, the Han, while India is a constellation of ethnicities. And this is reflected in the genetics. The relatively diversity of India stands in contrast to the homogeneity of China.

At the current time, the best research on population genetic variation within China is probably the preprint A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese. The author used low-coverage sequencing of over 10,000 women to get a huge sample size of variation all across China. The PCA analysis recapitulated earlier work. Genetic relatedness among the Han of China is geographically structured. The largest component of variance is north-south, but a smaller component is also east-west. The north-south element explains more than 4.5 times the variance as the east-west.

Read More

No steppe ancestry in the the Rakhigarhi samples = non sequitur

Harappan site of Rakhigarhi: DNA study finds no Central Asian trace, junks Aryan invasion theory:

The much-awaited DNA study of the skeletal remains found at the Harappan site of Rakhigarhi, Haryana, shows no Central Asian trace, indicating the Aryan invasion theory was flawed and Vedic evolution was through indigenous people.

“The Rakhigarhi human DNA clearly shows a predominant local element — the mitochondrial DNA is very strong in it. There is some minor foreign element which shows some mixing up with a foreign population, but the DNA is clearly local,” Shinde told ET. He went on to add: “This indicates quite clearly, through archeological data, that the Vedic era that followed was a fully indigenous period with some external contact.”

I haven’t heard anything definitive, but this is what I have heard: that the genetics they could analyze indicates continuity, but none of the steppe element ubiquitous in modern North India (and that there was contamination in the Korean lab). The Rakhigarhi samples date to 2500 to 2250 BC last I checked. That means they shouldn’t have any steppe ancestry if the model of the relatively late demographic impact of Indo-Aryans after 2000 BC is correct.

Basically, the whole article is kind of a non sequitur. I do understand that many archaeologists think there was continuity culturally. And there could have been. But taking into account the genetics of the modern region of India where Rakhigarhi is located, there was a major demographic perturbation after 2250 BC.

Rakhigarhi sample doesn’t have steppe ancestry (probably “Indus Periphery”)

We’ve been waiting for two years now, and it looks like they’re about to pull the trigger, Indus Valley People Did Not Have Genetic Contribution From The Steppes: Head Of Ancient DNA Lab Testing Rakhigarhi Samples:

Niraj Rai, the head of the Ancient DNA Laboratory at Lucknow’s Birbal Sahni Institute of Palaeosciences (BSIP), where the DNA samples from the Harappan site of Rakhigarhi in Haryana are being analysed, has revealed that a forthcoming paper on the work will show that there is no steppe contribution to the DNA of the Harappan people….

“It will show that there is no steppe contribution to the Indus Valley DNA,” Rai said. “The Indus Valley people were indigenous, but in the sense that their DNA had contributions from near eastern Iranian farmers mixed with the Indian hunter-gatherer DNA, that is still reflected in the DNA of the people of the Andaman islands.” He added that the paper based on the examination of the Rakhigarhi samples would soon be published on bioRxiv (pronounced “bio-archive”), a preprint repository of papers in the life sciences.

At this point none of this is surprising. I also wonder if this preprint was hastened by the release of The Genomic Formation of South and Central Asia. It seems that the results here are totally consonant with what came before. My expectation is that the lone sample that they got genetic material out of will be similar to the “Indus Periphery” (InPe) individuals in the earlier preprint: a mix of West Asian with ancestry strongly shifted toward eastern Iran, and indigenous South Asian “hunter-gatherer.”  That’s pretty much what Niraj Rai states in the piece. I think genetically the individual won’t be that different from the Chamars of modern day Punjab.

In fact, Rai, the lead researcher, ends by twisting the knife:

In other words, the preprint observes that the migration from the steppes to South Asia was the source of the Indo-European languages in the subcontinent. Commenting on this, Rai said, “any model of migration of Indo-Europeans from South Asia simply cannot fit the data that is now available.”

A major caveat here is that we’re talking about one sample from the eastern edge of the Indus Valley Civilization (IVC). I’m not sure that this should adjust our probabilities that much. From all the other things we know, as well as copious ancient DNA from Central Asia, our probability for the model which the Rakhigarhi result aligns with should already be quite high.

Again, since it’s one sample, we need to be cautious…but I bet once we have more samples from the IVC the Rakhigarhi individual will probably be enriched for AASI relative to other samples from the IVC. The InPe samples in The Genomic Formation of South and Central Asia exhibited some variation, and it’s likely that the IVC region was genetically heterogeneous.

But, this is going to be a DNA sample from an individual who lived 4,600 years ago within the orbit of the IVC when it was in its mature phase. That’s still a big deal. As most of you know the IVC is prehistory because we haven’t deciphered the seals which are associated with this civilization. But, the IVC clearly had relationships with West Asia and Central Asia, with parts of eastern Iran and the BMAC culture both being influenced and interaction with it. Traders who were likely from the IVC seem to be mentioned in Mesopotamian records.

Additionally, the genetics of one individual can be highly informative if it’s high-quality whole-genome data (I’m skeptical of that in this case). One could possibly even identify the time period that admixture between West Asian and AASI components occurred from a single genome, by looking at ancestry tract lengths.

A single sample isn’t going to falsify the idea held by some that steppe peoples were long present within the IVC. Perhaps they’ll show up in other samples? That’s possible, and it’s what I would argue if I held their position, but I think the constellation of evidence on the balance now does suggest that a relatively late incursion into South Asia is likely. The steppe ancestry with Northern European affinities shows up in BMAC only around 4,000 years ago. It is hard to imagine it was in South Asia before it was in Central Asia.

As I’ve been saying for a while it seems that though there will be more genetic work written on India in the near future, the real analysis is going to have to come out of archaeology and mythology.

It’s pretty clear that in Northern Europe the arrival of the Corded Ware peoples from the steppe zone resulted in great tumult. A linguistic analysis suggests that the languages of Northern Europe have words related to agriculture with a non-Indo-European origin, of common provenance.  But we don’t have much in the way of mythos about the arrival of the Corded Ware.

In contrast, India has a rich mythos which seems to date to the early period of the arrival of the Indo-Aryans. One interpretation has been that since these myths seem to take as a given that Indo-Aryans were autochtonous to India, they were. But the genetic data seem to be strongly suggesting that the arrival of pastoralists occurred in South Asia concomitant with their arrival in West Asia, and somewhat after their expansion westward into Europe. Indian tradition and mythos could actually be a window into the general process of how these pastoralists dealt with native peoples and an illustration of the sort of cultural synthesis that often occurred.

The population genomics of South Asia is complicated, and politics doesn’t make it easier


Many people have been sending me links to this article, By rewriting history, Hindu nationalists aim to assert their dominance over India. Here’s a key section:

The RSS asserts that ancestors of all people of Indian origin – including 172 million Muslims – were Hindu and that they must accept their common ancestry as part of Bharat Mata, or Mother India. Modi has been a member of the RSS since childhood. An official biography of Culture Minister Sharma says he too has been a “dedicated follower” of the RSS for many years.

Sharma told Reuters he expects the conclusions of the committee to find their way into school textbooks and academic research. The panel is referred to in government documents as the committee for “holistic study of origin and evolution of Indian culture since 12,000 years before present and its interface with other cultures of the world.”

Sharma said this “Hindu first” version of Indian history will be added to a school curriculum which has long taught that people from central Asia arrived in India much more recently, some 3,000 to 4,000 years ago, and transformed the population

There are several threads here. First, it is a fact that the ancestors of South Asia’s non-Hindus were Hindu. There are minor exceptions, such as the Parsis, who are ~75% Iranian. One can quibble as to whether many tribal and peasant populations were truly Hindu in a formal and explicit sense. But I think this is a semantic dodge. Muslims would recognize these beliefs and practices as Hindu, no matter if one was a Brahmin monk or a member of a tribe which still sacrificed animals.

I’ve looked at the genotypes of a fair amount of South Asians of Muslim background. The overwhelming (usually exclusive) proportion of their ancestry is South Asian. It’s a fact that the ancestors of non-Hindu South Asians were Hindu.

But, the article and a dominant theme in Hindu nationalism today are that distinctive historically important groups like Indo-Aryans are indigenous to South Asia. This is set against a narrative of invasions and migrations from the outside, which is presumed more friendly to a multicultural paradigm (I have a hard time keeping track of the political valence of all these things). To some extent, the reality of invasions and migrations cannot be denied, whether it be Alexander, the Kushans, or the various Muslim groups. But these historical invasions left little genetic imprint.

When 2009’s Reconstructing Indian Population History was published things changed for the impact of the earlier migrations. By the time the ancient Greeks were recording observations of India in Classical Antiquity, it was already noted as the most populous nation in the world. I was initially skeptical about the result in Reconstructing Indian Population History, that there was massive admixture between West Eurasians (ANI) and indigenous South Asians (ASI) because that would imply massive migration. Additionally, phenotypically the pigmentation genes didn’t seem to work out if the source population was European-like.

Nearly 10 years on we have a lot more clarity. Ancient DNA has changed our understanding of the past. Massive migrations were common. And, the pigmentation and genetic profile of modern Europeans is recent, within the last 4,000 years. The source population(s) for “Ancestral North Indians” (ANI) may not have been Europeans in the way we’d understand them. In fact, a follow-up paper, Genetic Evidence for Recent Population Mixture in India, hinted at two admixtures. There’s a fair amount of circumstantial evidence now that one component of “Ancestral North Indian” relates to West Asian populations and another component to the more classical steppe Indo-Aryans. The former is more widespread across the subcontinent than the latter, which is concentrated in the northwest and among upper castes.

I do understand Indians who want to interpret their own history through the lens of their own cultural priors. The problem is that genetic science has proceeded so fast in the last few years that many propositions which were speculative in the 20th century are testable in the 21st century. Some Hindu nationalist friends and acquaintances express embarrassment and worry about the track that Indian nationalists are going on. I don’t know what to say, but Americans have their own delusions and blithe acceptance of propaganda, so I’m not going to be one pointing fingers. Other Indians have told me via Facebook that they “believe in the results from the 2000s” (when they were more congenial to their viewpoints?). I guess that’s one strategy; just keep up with the science until it starts refuting your model.

Read More

The Indo-Aryan migration to the Indian subcontinent

The piece is up at India Today. The headline and title are of course optimized for clicks. I would, for example, say that the Indo-Aryans came from the west, not the West.

In the course of writing this it has become clear that many people have very specific commitments on this issue. I think it is clear I do not. Genetic inference methods have wide shoulders of confidence in particular dates. So I’ll leave it to those with more archaeological knowledge to argue over specific date. But it strikes me that the dates point to a likelihood that much of the expansion and diversification of Indo-Aryans may precede their expansion into the Gangetic plain ~1500 BCE, the date preferred by many scholars.

Apparently we shouldn’t have to wait too long for ancient DNA from Rakighari (months, not years). But I doubt that will settle anything, as opposed to being preliminary and setting off new debates.