DNA results from Rakhigarhi are now being reported (really!)

It looks like Outlook India is the first out of the gates to start reporting on the results from Rakhigarhi in northwest India, We Are All Harrapans. This is a “mature phase” Harrapan site that dates to about 2250 BC or so. Media reports have always been garbled on this topic, so anything that is coming not out of a paper needs to be treated cautiously. But I’ve heard some of the same things from independent sources from a while back, so I believe that this reporting is broadly on the mark.

Basically, the individual(s) they got DNA out of did not have any Eurasian steppe ancestry. This seems to confirm again that Eurasian steppe ancestry, which is found in fractions as high as ~30% in twice-born varna in Northern India (e.g., Rajputs, Tiwari Brahmins), arrived after 2000 BC. That is, after the peak period of the Indus Valley Civilization.

Again, one has to be wary of anything from the media because I’ve heard so many confusing things, including claims of garbled quotes, but here’s one of the authors of the forthcoming paper being quoted:

We did some analysis to figure out the exact date of the admixture. We have prepared a model in which all these stats fit together very tightly and that model suggests the Central Asian admixture happened about 1500-1000 BC…. Significant mixing happened around 1000 BC, also at 800 BC and 600 BC.

This is totally in line with the results from the March preprint discussed in the piece. That is, the Swat Valley samples show admixture and genetic change after 1200 BC. And the semi-historical understanding that we have of India during the period between 1000 BC and the rise of Mauryas is that it was a society in flux. But the only way the dating was changed by the Rakhigarhi results is if the genome is high enough quality that it allowed them to narrow down the parameters on some of the estimates of admixture.

One thing to keep in mind is that it is unlikely that the “Harappan people” were one single people genetically. There was probably a lot of variation in admixture with the indigenous South Asian substrate. And, I believe that the inflated steppe & AASI (“Ancient Ancestral South Indian”) ancestry you see in some North Indian Brahmin groups compared to Sindhis (who are more “Iranian”) is evidence that the Indo-Aryan intrusion resulted in an expansion of people with West Eurasian ancestry much deeper into South Asia than was the case with the Harappans.

And of the Harappans, some of the Indian scholars have asserted that their descendants are still present in the region. I think this is right, insofar as some of the jati groups, often scheduled caste, in the northwestern region of South Asia share a lot more affinity with populations to the south and east.

Related: Michael Witzel has commentary from a more linguistic perspective. If the “Para-Munda” hypothesis is right, I think what Witzel is seeing is the substrate language on which Munda was overlain, because Munda people are clearly intrusive from Southeast Asia in the period between 2000 and 1000 BC.

Addendum: If a relatively late intrusion (after 1500 BC) of Indo-Aryans to South Asia is supported by the evidence, it would be interesting in light of the high likelihood that Indo-Aryans were present in the region of upper Mesopotamia before 1500 BC. I believe that these “Indo”-Aryans actually probably never had any contact with South Asia, but descended from the horizon of cultures of which Sintashta and Andronovo were constituents. The Indo-Aryans who arrived in South Asia were probably from a different branch, and likely had interactions with other peoples in what is today eastern Iran and Afghanistan.

No steppe ancestry in the the Rakhigarhi samples = non sequitur

Harappan site of Rakhigarhi: DNA study finds no Central Asian trace, junks Aryan invasion theory:

The much-awaited DNA study of the skeletal remains found at the Harappan site of Rakhigarhi, Haryana, shows no Central Asian trace, indicating the Aryan invasion theory was flawed and Vedic evolution was through indigenous people.

“The Rakhigarhi human DNA clearly shows a predominant local element — the mitochondrial DNA is very strong in it. There is some minor foreign element which shows some mixing up with a foreign population, but the DNA is clearly local,” Shinde told ET. He went on to add: “This indicates quite clearly, through archeological data, that the Vedic era that followed was a fully indigenous period with some external contact.”

I haven’t heard anything definitive, but this is what I have heard: that the genetics they could analyze indicates continuity, but none of the steppe element ubiquitous in modern North India (and that there was contamination in the Korean lab). The Rakhigarhi samples date to 2500 to 2250 BC last I checked. That means they shouldn’t have any steppe ancestry if the model of the relatively late demographic impact of Indo-Aryans after 2000 BC is correct.

Basically, the whole article is kind of a non sequitur. I do understand that many archaeologists think there was continuity culturally. And there could have been. But taking into account the genetics of the modern region of India where Rakhigarhi is located, there was a major demographic perturbation after 2250 BC.

The peopling of the Indian subcontinent at the dawn of knowing

A few people have been pointing me to a new paper, A Bayesian phylogenetic study of the Dravidian language family, which implies that the Dravidian language family diversified ~4,500 years ago. I don’t have much to say about the paper itself since it aligns with my own conclusions, but it’s well outside of any field that I can judge (though it does use standard phylogenetic packages I’ve used).

Recently I’ve been going back to old posts of mine on South Asian population genetics because no matter how much some people drag their feet on this question, we’re pretty close to knowing how South Asians came to be. Here’s what I said in December of 2010:

Who were the Indo-Iranians? I lean toward the proposition that they do derive from the Andronovo culture of the Eurasian steppe. This would date the entrance and expansion of Indo-Aryans in northern India 3-4,000 years ago. I also contend that the dominant element of ancestry among modern South Asians is not Indo-Aryan. Rather, it is an ancient stabilized hybrid of pre-agricultural societies in the Indus valley and Neolithic farmers who originated from what is today western Iran and eastern Anatolia. Therefore, I posit that the “Aryanization” of the Indian subcontinent is properly modeled as the same processes which led to the emergence of an Anatolian and Rumelian Turkish identity; a small elite population which forces an identity shift among the majority.

Where was I wrong? Where was I right?

Even looking at ADMIXTURE plots which don’t always give an accurate sense of population history it seemed likely that “Ancestral North Indian” (ANI) was not one thing. Some South Asian populations seemed to have much stronger affinities to West Asian populations. And in particular those from highland West Asia, toward the Caucasus. These include groups in southern Pakistan, but also to some extent in South India. In contrast, other groups had affinities with Eastern European populations, in particular, high caste North Indians, and to a lesser extend Indo-European peoples more generally.

I think I got the dynamic correct. Subsequent analyses comparing ancient DNA from the Caucasus and Iran suggest that all South Asians have a lot of shared drift (ergo, ancestry) with highland West Asians, while a smaller subset has high shared drift (ergo, ancestry) with pastoralists from the Eurasian steppe. The groups match up with what the ADMIXTURE plots were suggesting.

There was more than one pulse of ANI-like ancestry and that one of them was like West Asians and one more like Europeans. Remember, this is before we knew the acronyms ANE, WHG, and EEF. Or CHG and Eastern Middle Eastern Farmers and Western Middle Eastern Farmers.

But, I think I was wrong about the magnitude of the admixture. This was before ancient DNA had revolutionized our understanding of population movement and turnover. I was still resisting the mass migration of a whole folk across huge distances. I’m more open to that now. I am not sure I still believe the very high steppe fractions implied in some of the recent analyses, but it’s certainly higher than I would have believed back then.

Finally, the recent diversification of the Dravidian languages supports the model that their current distribution is not primordial. Rather, they probably expanded relatively recently from the northwest of the subcontinent. Probably earlier than the Indo-Aryan expansion into the Gangetic plain, but not that much earlier.

Additionally, because the Dravidians were not primordial, but expanding only somewhat ahead of Indo-Aryans, they were part of an interactive social-cultural sphere with the Indo-Aryans. I think the very high frequency of R1a1a-Z93 in some non-Brahmin South Indian groups, even tribal ones, suggests to me that the expansiveness of some paternal Indo-Aryan kin networks across the whole subcontinent.

Addendum: Much of the attention goes to the ANI dynamics. But though recent work attests to the overwhelmingly diversity, and basal character, of South Asian mtDNA lineages, we can’t be entirely sure that they are indigenous without ancient DNA. If a migration from the east at the Pleistocene-Holocene boundary was characterized by gradual diffusion of groups with reasonable effective population sizes they could have brought over their diversity.

The genetics of the St. Thomas Christians

First, I have to say I appreciate everyone who keeps sending data to the South Asian Genotype Project. Basically, I’m automating the pipeline, finding ways to merge data from a host of sources, but also figuring out how to refine the analysis.

But until then, today I decided to do some more manual analysis of three St. Thomas Christian samples I have (also called Nasranis). The reason is that there were some questions on Twitter in relation to the genetics of this group, and though three is not a great sample size, it’s better than nothing.

The St. Thomas Christians are a diverse group of people of various denominations in the southern state of Kerala who have diverse origin stories. Today the St. Thomas Christians have a range of denominational and sectarian affinities, but their origins probably have something to the Church of the East.

These Christians claim roots among the local Brahmin community, Jews, and West Asian settlers. To be honest, whenever people tell me about the Brahmin ancestors unless they were recent converts I discount this because there are about ten times as many St. Thomas Christians in Kerala as there are Brahmins. There is a small Jewish community in the area, and this region of India was long part of the Indian Ocean trade network of the Arabs.

I merged the three Nasrani samples with a lot of other populations. Zooming in on the South Asians, if you look at the PCA plot to the left (click it), you’ll see that they are not in the same cluster as the South Indian Brahmins (Brahmins from the four South Indian states are very similar to each). But, in comparison to non-Brahmin South Indians, they do seem Brahmin shifted.

As I have observed before these South Indian Brahmins can be thought of as more than 50% North Indian Brahmin, but the remainder being South Indian non-Brahmin. Aside from exotic exceptions (Parsis, Bengalis), most South Asians exist on an ANI-ASI “cline,” with lower caste South Indians being at one end of the cline (more ASI), and populations in the far northwest, such as the Kalash, being at the other end (more ANI). The PCA would suggest that the Nasrani are more ANI-shifted than a generic South Indian group, but less so than South Indian Brahmins.

Using Treemix to detect gene flow events, what I found is that the Nasranis look like a generic South Indian group. There’s no evidence of gene flow from Middle Eastern populations (Jews, Persians).

I did some f-3 tests and there isn’t anything conclusive I see to suggest Middle Eastern gene flow into the Nasranis.

Finally, I ran ADMIXTURE in supervised mode. Here are the average results for a set of South Asian populations (mean values):

Group Druze Georgian Han Iranian Telugu Yemenite Jew
Bangladeshi 1% 2% 12% 1% 83% 1%
Chamar 0% 0% 3% 0% 97% 0%
Gujurati_Patel 0% 1% 0% 10% 89% 0%
UP Kshatriya 0% 3% 1% 21% 76% 0%
Nasrani 0% 4% 1% 12% 83% 0%
Pathan 0% 4% 1% 55% 40% 0%
Piramalai_Kallar 0% 0% 2% 0% 97% 0%
SI_Brahmin 0% 4% 1% 16% 78% 0%
Telugu_Reddy 0% 3% 0% 0% 94% 3%
UP_Brahmin 0% 4% 1% 26% 69% 0%
UP_Kayastha 0% 0% 1% 20% 79% 0%
Velama 1% 1% 0% 2% 96% 0%
West_Bengal_Kayastha 0% 0% 7% 8% 85% 0%

In these results, the Nasrani do look shifted in the same direction as South Indian Brahmins, though less so. Observe that there is no clear Middle Eastern signal in the Nasrani above and beyond what you see in South Asians. This, despite the fact that Indian Jews show a very strong signal of admixture from the Middle East. At this point, I am confident in rejecting Nasrani St. Thomas Christian origins in a converted Jewish community, or one with a large degree of West Asian admixture.

Though the genetic profile of these three individuals does not support clear descent from South Indian Brahmins, I can not reject the model of Brahmin admixture into this community. On the contrary, a plausible model would see to be that various South Indian groups, including Brahmins, contributed to the Nasrani community over the centuries.

To be continued….

“Rakhigarhi paper” out in January 2018? (maybe?)

Tony Joseph has an interesting piece up, Who built the Indus Valley civilisation?, which people are asking me about via email. First, I don’t have any inside information. Last I heard in September was that the Rakhigarhi results were “one or two months away,” like they have been for a year or so. So I put it out of mind.

In any case, here are the important points:

All this could now change thanks to the science of genetics and four ancient skeletons excavated from a village called Rakhigarhi in Haryana. The four people to whom these bones once belonged — a couple, a boy and a man — lived roughly 4,600 years ago when the Indus Valley civilisation was in full bloom.

In the three-and-a-half years since its excavation, Shinde has brought together scientists from Indian and international institutions like the Centre for Cellular and Molecular Biology, Hyderabad (CCMB), Harvard Medical School, Seoul National University, and the University of Cambridge to work on different parts of the project, including extracting and analysing DNA from these ancient people, reconstructing their faces, and studying the remains of their habitation to understand their daily habits and ways of life.

The DNA analysis will also help figure out their height, body features, and even the colour of their eyes….

Joseph also asserts that the publication will happen in a “leading international journal” in a month or so. If I had to bet, I’d say Nature.

Harvard Medical School suggests to me they finally got David Reich’s group involved. As for Cambridge University, Eske Willerslev now has an appointment there. He’s apparently assembling a paleogenetics group.

The piece specifically highlights Y and mtDNA. But if they are talking about height, body features, and color of eyes, they must have gotten genome-wide data. If Eske Willerslev is involved they may have sequenced the whole genome at some coverage of at least one of the samples.

If I had to bet I think the Rakhigarhi samples will be Y haplogroups J2 or the Indian branch of L, and the mtDNA will be an Indian branch of M. In terms of genome-wide patterns they will exhibit a mixture between West Eurasian ancestry, with strong affinities to Near Eastern farmers from the Zagros, and what we now term “Ancestral South Indians” (AS), who descend from the aboriginal peoples of the subcontinent, and are genetically somewhat closer to East Eurasians than West Eurasians (to be fair, I think it is not implausible that much of ASI heritage is the product of westward migration out of Southeast Asia during the Pleistocene and early Holocene).

Overall, genetically these samples may look the most like South Indian non-Brahmin middle-to-upper castes. Think the Reddy people of Andhra Pradesh. Additionally, going back to R1a1a-Z93, I do think it was intrusive with the Indo-Aryans. Its highest frequencies do tend to be among upper castes, and there is an increasing cline toward the northwest of the subcontinent.

ButR1a1a-Z93’s presence at appreciable frequencies in South India among non- Brahmins, including tribal populations, indicates a more complex ethnogenesis of Dravidian speaking groups than we might have realized. Priya Moorjani told me specifically that 4,000 years ago there were “unmixed ANI and ASI groups” in the subcontinent. I think for the former she’s picking up the signal of intrusive Indo-Aryans. But what about the latter? I doubt there were unmixed ASI in the Indus Valley. But they probably still persisted to the south and east when the Indus Valley people were in decline and the Indo-Aryans arrived. The South Indian Neolithic dates from 3000 to 1400 BC.

Here my moderate confidence sketch. The collapse of the Indus Valley civilization was probably ultimately due to the fact that these early antique societies were not very robust to exogenous shocks and endogenous decay of asabiya. Once these societies, which have accumulated some level of surplus wealth by squeezing it out of the Malthusian margin, start to totter social collapse and dissolution can happen fast, and barbarian groups outside of the gates with more social cohesion can engage in a takeover.

In the case of the collapse of the Sumerian-Akkadian civilization, the barbarian Amorites actually took over and maintained cultural continuity. In post-Roman Britain, the Roman civilization collapsed in totality, and “Roman Christianity” had to be reintroduced from the European continent and from the Celts into Anglo-Saxon England. The barbarian takeover resulted in the total cultural obliteration of the Britons. Finally, you have instances such as post-Roman Gaul, which transformed into Francia. Unlike the case of the transition from the rule of the Third Dynasty of Ur to that of the Amorites, the Frankish rulers oversaw a wholesale reimagining of the identity of the people of Gaul. Even as late as 800, a ruler such as Charlemagne still spoke a dialect of German as his first language. And yet the Franks of Neustria were ultimately transformed and became one with the “Romans” whom they ruled.

In the post-Harappan world of northwest India I suspect something close to the Anglo-Saxon precedent is likely. Though the majority of the ancestry of the Upper Gangetic plain is not Indo-Aryan, a substantial proportion is. And this ancestry is detectable at lower fractions even among non-Brahmin Bengalis. In Central and South India the situation was probably more like Mesopotamia around ~2000 BC or Gaul post-500 AD. There were various sorts of interactions between Indo-Aryans and local populations, as well as the final assimilation of aboriginal peoples into Indo-Aryan and Dravidian speaking peoples.*

* The Munda people clearly have some East Asian ancestry. And, they are mostly a mix of ANI and ASI. But whenever I look at their genome-wide results it strikes me they may not have any Indo-Aryan ancestry. This may ultimately be totally comprehensible in light of the chronology of migration and segregation.

Update: One of the researchers involved indicates Eske Willerslev is not involved.

The Indo-Aryan migration to the Indian subcontinent

The piece is up at India Today. The headline and title are of course optimized for clicks. I would, for example, say that the Indo-Aryans came from the west, not the West.

In the course of writing this it has become clear that many people have very specific commitments on this issue. I think it is clear I do not. Genetic inference methods have wide shoulders of confidence in particular dates. So I’ll leave it to those with more archaeological knowledge to argue over specific date. But it strikes me that the dates point to a likelihood that much of the expansion and diversification of Indo-Aryans may precede their expansion into the Gangetic plain ~1500 BCE, the date preferred by many scholars.

Apparently we shouldn’t have to wait too long for ancient DNA from Rakighari (months, not years). But I doubt that will settle anything, as opposed to being preliminary and setting off new debates.