India vs. China, genetically diverse vs. homogeneous

About 36% of the world’s population are citizens of the Peoples’ Republic of China and the Republic of India. Including the other nations of South Asia (Pakistan, Bangladesh, etc.), 43% of the population lives in China and/or South Asia.

But, as David Reich mentions in Who We Are and How We Got Here China is dominated by one ethnicity, the Han, while India is a constellation of ethnicities. And this is reflected in the genetics. The relatively diversity of India stands in contrast to the homogeneity of China.

At the current time, the best research on population genetic variation within China is probably the preprint A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese. The author used low-coverage sequencing of over 10,000 women to get a huge sample size of variation all across China. The PCA analysis recapitulated earlier work. Genetic relatedness among the Han of China is geographically structured. The largest component of variance is north-south, but a smaller component is also east-west. The north-south element explains more than 4.5 times the variance as the east-west.

No steppe ancestry in the the Rakhigarhi samples = non sequitur

Harappan site of Rakhigarhi: DNA study finds no Central Asian trace, junks Aryan invasion theory:

The much-awaited DNA study of the skeletal remains found at the Harappan site of Rakhigarhi, Haryana, shows no Central Asian trace, indicating the Aryan invasion theory was flawed and Vedic evolution was through indigenous people.

“The Rakhigarhi human DNA clearly shows a predominant local element — the mitochondrial DNA is very strong in it. There is some minor foreign element which shows some mixing up with a foreign population, but the DNA is clearly local,” Shinde told ET. He went on to add: “This indicates quite clearly, through archeological data, that the Vedic era that followed was a fully indigenous period with some external contact.”

I haven’t heard anything definitive, but this is what I have heard: that the genetics they could analyze indicates continuity, but none of the steppe element ubiquitous in modern North India (and that there was contamination in the Korean lab). The Rakhigarhi samples date to 2500 to 2250 BC last I checked. That means they shouldn’t have any steppe ancestry if the model of the relatively late demographic impact of Indo-Aryans after 2000 BC is correct.

Basically, the whole article is kind of a non sequitur. I do understand that many archaeologists think there was continuity culturally. And there could have been. But taking into account the genetics of the modern region of India where Rakhigarhi is located, there was a major demographic perturbation after 2250 BC.

Rakhigarhi sample doesn’t have steppe ancestry (probably “Indus Periphery”)

We’ve been waiting for two years now, and it looks like they’re about to pull the trigger, Indus Valley People Did Not Have Genetic Contribution From The Steppes: Head Of Ancient DNA Lab Testing Rakhigarhi Samples:

Niraj Rai, the head of the Ancient DNA Laboratory at Lucknow’s Birbal Sahni Institute of Palaeosciences (BSIP), where the DNA samples from the Harappan site of Rakhigarhi in Haryana are being analysed, has revealed that a forthcoming paper on the work will show that there is no steppe contribution to the DNA of the Harappan people….

“It will show that there is no steppe contribution to the Indus Valley DNA,” Rai said. “The Indus Valley people were indigenous, but in the sense that their DNA had contributions from near eastern Iranian farmers mixed with the Indian hunter-gatherer DNA, that is still reflected in the DNA of the people of the Andaman islands.” He added that the paper based on the examination of the Rakhigarhi samples would soon be published on bioRxiv (pronounced “bio-archive”), a preprint repository of papers in the life sciences.

At this point none of this is surprising. I also wonder if this preprint was hastened by the release of The Genomic Formation of South and Central Asia. It seems that the results here are totally consonant with what came before. My expectation is that the lone sample that they got genetic material out of will be similar to the “Indus Periphery” (InPe) individuals in the earlier preprint: a mix of West Asian with ancestry strongly shifted toward eastern Iran, and indigenous South Asian “hunter-gatherer.”  That’s pretty much what Niraj Rai states in the piece. I think genetically the individual won’t be that different from the Chamars of modern day Punjab.

In fact, Rai, the lead researcher, ends by twisting the knife:

In other words, the preprint observes that the migration from the steppes to South Asia was the source of the Indo-European languages in the subcontinent. Commenting on this, Rai said, “any model of migration of Indo-Europeans from South Asia simply cannot fit the data that is now available.”

A major caveat here is that we’re talking about one sample from the eastern edge of the Indus Valley Civilization (IVC). I’m not sure that this should adjust our probabilities that much. From all the other things we know, as well as copious ancient DNA from Central Asia, our probability for the model which the Rakhigarhi result aligns with should already be quite high.

Again, since it’s one sample, we need to be cautious…but I bet once we have more samples from the IVC the Rakhigarhi individual will probably be enriched for AASI relative to other samples from the IVC. The InPe samples in The Genomic Formation of South and Central Asia exhibited some variation, and it’s likely that the IVC region was genetically heterogeneous.

But, this is going to be a DNA sample from an individual who lived 4,600 years ago within the orbit of the IVC when it was in its mature phase. That’s still a big deal. As most of you know the IVC is prehistory because we haven’t deciphered the seals which are associated with this civilization. But, the IVC clearly had relationships with West Asia and Central Asia, with parts of eastern Iran and the BMAC culture both being influenced and interaction with it. Traders who were likely from the IVC seem to be mentioned in Mesopotamian records.

Additionally, the genetics of one individual can be highly informative if it’s high-quality whole-genome data (I’m skeptical of that in this case). One could possibly even identify the time period that admixture between West Asian and AASI components occurred from a single genome, by looking at ancestry tract lengths.

A single sample isn’t going to falsify the idea held by some that steppe peoples were long present within the IVC. Perhaps they’ll show up in other samples? That’s possible, and it’s what I would argue if I held their position, but I think the constellation of evidence on the balance now does suggest that a relatively late incursion into South Asia is likely. The steppe ancestry with Northern European affinities shows up in BMAC only around 4,000 years ago. It is hard to imagine it was in South Asia before it was in Central Asia.

As I’ve been saying for a while it seems that though there will be more genetic work written on India in the near future, the real analysis is going to have to come out of archaeology and mythology.

It’s pretty clear that in Northern Europe the arrival of the Corded Ware peoples from the steppe zone resulted in great tumult. A linguistic analysis suggests that the languages of Northern Europe have words related to agriculture with a non-Indo-European origin, of common provenance.  But we don’t have much in the way of mythos about the arrival of the Corded Ware.

In contrast, India has a rich mythos which seems to date to the early period of the arrival of the Indo-Aryans. One interpretation has been that since these myths seem to take as a given that Indo-Aryans were autochtonous to India, they were. But the genetic data seem to be strongly suggesting that the arrival of pastoralists occurred in South Asia concomitant with their arrival in West Asia, and somewhat after their expansion westward into Europe. Indian tradition and mythos could actually be a window into the general process of how these pastoralists dealt with native peoples and an illustration of the sort of cultural synthesis that often occurred.

The population genomics of South Asia is complicated, and politics doesn’t make it easier

Many people have been sending me links to this article, By rewriting history, Hindu nationalists aim to assert their dominance over India. Here’s a key section:

The RSS asserts that ancestors of all people of Indian origin – including 172 million Muslims – were Hindu and that they must accept their common ancestry as part of Bharat Mata, or Mother India. Modi has been a member of the RSS since childhood. An official biography of Culture Minister Sharma says he too has been a “dedicated follower” of the RSS for many years.

Sharma told Reuters he expects the conclusions of the committee to find their way into school textbooks and academic research. The panel is referred to in government documents as the committee for “holistic study of origin and evolution of Indian culture since 12,000 years before present and its interface with other cultures of the world.”

Sharma said this “Hindu first” version of Indian history will be added to a school curriculum which has long taught that people from central Asia arrived in India much more recently, some 3,000 to 4,000 years ago, and transformed the population

There are several threads here. First, it is a fact that the ancestors of South Asia’s non-Hindus were Hindu. There are minor exceptions, such as the Parsis, who are ~75% Iranian. One can quibble as to whether many tribal and peasant populations were truly Hindu in a formal and explicit sense. But I think this is a semantic dodge. Muslims would recognize these beliefs and practices as Hindu, no matter if one was a Brahmin monk or a member of a tribe which still sacrificed animals.

I’ve looked at the genotypes of a fair amount of South Asians of Muslim background. The overwhelming (usually exclusive) proportion of their ancestry is South Asian. It’s a fact that the ancestors of non-Hindu South Asians were Hindu.

But, the article and a dominant theme in Hindu nationalism today are that distinctive historically important groups like Indo-Aryans are indigenous to South Asia. This is set against a narrative of invasions and migrations from the outside, which is presumed more friendly to a multicultural paradigm (I have a hard time keeping track of the political valence of all these things). To some extent, the reality of invasions and migrations cannot be denied, whether it be Alexander, the Kushans, or the various Muslim groups. But these historical invasions left little genetic imprint.

When 2009’s Reconstructing Indian Population History was published things changed for the impact of the earlier migrations. By the time the ancient Greeks were recording observations of India in Classical Antiquity, it was already noted as the most populous nation in the world. I was initially skeptical about the result in Reconstructing Indian Population History, that there was massive admixture between West Eurasians (ANI) and indigenous South Asians (ASI) because that would imply massive migration. Additionally, phenotypically the pigmentation genes didn’t seem to work out if the source population was European-like.

Nearly 10 years on we have a lot more clarity. Ancient DNA has changed our understanding of the past. Massive migrations were common. And, the pigmentation and genetic profile of modern Europeans is recent, within the last 4,000 years. The source population(s) for “Ancestral North Indians” (ANI) may not have been Europeans in the way we’d understand them. In fact, a follow-up paper, Genetic Evidence for Recent Population Mixture in India, hinted at two admixtures. There’s a fair amount of circumstantial evidence now that one component of “Ancestral North Indian” relates to West Asian populations and another component to the more classical steppe Indo-Aryans. The former is more widespread across the subcontinent than the latter, which is concentrated in the northwest and among upper castes.

I do understand Indians who want to interpret their own history through the lens of their own cultural priors. The problem is that genetic science has proceeded so fast in the last few years that many propositions which were speculative in the 20th century are testable in the 21st century. Some Hindu nationalist friends and acquaintances express embarrassment and worry about the track that Indian nationalists are going on. I don’t know what to say, but Americans have their own delusions and blithe acceptance of propaganda, so I’m not going to be one pointing fingers. Other Indians have told me via Facebook that they “believe in the results from the 2000s” (when they were more congenial to their viewpoints?). I guess that’s one strategy; just keep up with the science until it starts refuting your model.

The Indo-Aryan migration to the Indian subcontinent

The piece is up at India Today. The headline and title are of course optimized for clicks. I would, for example, say that the Indo-Aryans came from the west, not the West.

In the course of writing this it has become clear that many people have very specific commitments on this issue. I think it is clear I do not. Genetic inference methods have wide shoulders of confidence in particular dates. So I’ll leave it to those with more archaeological knowledge to argue over specific date. But it strikes me that the dates point to a likelihood that much of the expansion and diversification of Indo-Aryans may precede their expansion into the Gangetic plain ~1500 BCE, the date preferred by many scholars.

Apparently we shouldn’t have to wait too long for ancient DNA from Rakighari (months, not years). But I doubt that will settle anything, as opposed to being preliminary and setting off new debates.