The Munda arrived in India 4,000 years ago (probably)

I didn’t plan to talk about the Munda any time soon, in part because I recently wrote a post, The Munda as upland rice cultivators, which outlined my views. But there is a new preprint with new samples which attempts to estimate admixture times using genome-wide data. You can see the results above, and, also note that they found similar estimates using Y chromosome SNP variation around haplogroup O2a1.

The preprint is, The genetic legacy of continental scale admixture in Indian Austroasiatic speakers:

Surrounded by speakers of Indo-European, Dravidian and Tibeto-Burman languages, around 11 million Munda (a branch of Austroasiatic language family) speakers live in the densely populated and genetically diverse South Asia. Their genetic makeup holds components characteristic of South Asians as well as Southeast Asians. The admixture time between these components has been previously estimated on the basis of archaeology, linguistics and uniparental markers. Using genome-wide genotype data of 102 Munda speakers and contextual data from South and Southeast Asia, we retrieved admixture dates between 2000 – 3800 years ago for different populations of Munda. The best modern proxies for the source populations for the admixture with proportions 0.78/0.22 are Lao people from Laos and Dravidian speakers from Kerala in India, while the South Asian population(s), with whom the incoming Southeast Asians intermixed, had a smaller proportion of West Eurasian component than contemporary proxies. Somewhat surprisingly Malaysian Peninsular tribes rather than the geographically closer Austroasiatic languages speakers like Vietnamese and Cambodians show highest sharing of IBD segments with the Munda. In addition, we affirmed that the grouping of the Munda speakers into North and South Munda based on linguistics is in concordance with genome-wide data.

There is a weird pattern of the affinities in f3 statistics in the IBD in this preprint. I think the explanation that they give, that Vietnamese and Cambodians have been subject to later admixture, probably explains it. In the case of the Vietnamese, it’s southern Chinese ancestry. In the case of the Cambodians…it might be Indian ancestry! This might strike you as strange, but the Indian ancestry in the Cambodians may be more enriched for the West Asian component that’s not found in the Munda specifically: the element brought in by the Indo-Aryans.

The peninsular Malay groups are “proto-Malays,” and these groups tend to be somewhat higher in AASI-like ancestry as well as lower in Austronesian ancestry. High shared drift tendencies with Lao and groups in more isolated areas of Malaysia may be a function of the fact that these are less cosmopolitan populations, with less Indian and Chinese ancestry, than other mainland Southeast Asians and Malays proper.

Click to enlarge

These results are broadly in line with the Narasimhan et al. preprint, which is cited within it. In that preprint the Reich group outlines its general model, where modern South Asians can be thought of as a compound of several different ancestral populations of different affinities. The Munda in particular are enriched for “Ancient Ancestral South Asian” (AASI) vs. any other group, and the hypothesis is given is that the Southeasts Asian mixed first with with an AASI group which lacked the admixture with West Asians, and then mixed again with “Ancestral South Indians”, which had some West Asian (“Iranian Farmer”) ancestry.

Since ALDER based methods, last I checked, tended to pick up the last admixture event, the more recent time for northern Munda groups makes sense. Looking at the Y chromosomes it is pretty clear to me that some of the East Asian ancestry in Bengali-speaking agriculturalists in the lower Gangetic plain is from Munda groups. Conversely, some of the Munda probably admixed populations from in from the west practicing intensive rice agriculture, which apparently did not become a feature of the landscape until after 1000 BC.

One of my points in the post above I wrote on the Munda is that the common words for Austro-Asiatic languages indicates that they were upland rice farmers. This is exactly the modern distribution of the Munda. One hypothesis, which I now am skeptical of, is that the Munda once occupied the bottomlands and were driven into the hills by people from the west and south. I no longer believe this. Rather, the Munda may always have preferred the uplands, and so traversed the flat lands between the Khasi hills and the Chota Nagpur plateau. This preference for uplands may strike us as strange, but it’s not that rare. Yankee farmers in Ohio preferred upland zones, even though these were less agriculturally rich (farmers moving up from the South didn’t have this aversion).

A point observed and implied in the preprint is that the expansion of Indo-Aryans, Dravidians, and Munda, seems to have happened all rather close in time. Though the northwest region of the subcontinent seems to have developed a settled agricultural society by 3000 BC of long standing, its expansion was limited by climatic restrictions on its crop toolkit. But by 2500 BC it seems pastoralists were already pushing into the Deccan via the dry-zone on the eastern edge of the Thar down from the Punjab. The Toda people of the far south of India are probably representative of the lifestyle of these peoples, who were Dravidian-speaking.

A few centuries after this period is probably when the proto-Munda began pushing out of Southeast Asia. The DNA evidence is pretty strong this was a hugely male-skewed event once it got beyond the Khasi hills. Why? My hypothesis is that these were not quite small-scale peoples. Perhaps the male-mediation of a lot of gene flow in South Asia is due to the emergence of militarized confederacies where elite lineages engaged in conquest of territory from native groups. The Munda have very low frequencies of R1a, and very high frequencies of O2a. The admixture with Dravidian and Indo-Aryan speaking peoples that occurred between 2000 BC and 0 AD was probably overwhelmingly female-mediated.

The narrative above suggests that most of the genetic changes we see in South Asia to result in the landscape of the present occurred in the period between 2500 BC and 500 BC. About 2,000 years. And yet agriculture of some form arrived in Mehegarh in western Pakistan 9,000 to 7,500 years ago, depending on what dates you trust. What took so long? Similarly, millet and rice agriculture in China is 7,000 years old, but only around 4,000 years ago did rice farmers start pushing south (and probably west in the case of the Munda).

I’ll present the hypothesis here that this coincidence wasn’t a coincidence, and that certain things in relation to social complexity have a particular rate of change. In general I agree with economic historians who say that our need to posit an “Industrial Revolution,” or a “Neolithic Revolution,” is somewhat of an imposition because humans don’t want to think quantitatively. It probably takes small-scale societies moving from hunting and gathering to full-brown agriculture a certain amount of time, and then to proceed to greater social complexity that enables migration which is more than due to simple natural increase and Malthusian driven expansion. Mainland India beyond what is today Pakistan and much of Southeast Asia were “filled up” by agricultural peoples around the same time after a long incubation to the west and north because similar social forces were at play.

DNA results from Rakhigarhi are now being reported (really!)

It looks like Outlook India is the first out of the gates to start reporting on the results from Rakhigarhi in northwest India, We Are All Harrapans. This is a “mature phase” Harrapan site that dates to about 2250 BC or so. Media reports have always been garbled on this topic, so anything that is coming not out of a paper needs to be treated cautiously. But I’ve heard some of the same things from independent sources from a while back, so I believe that this reporting is broadly on the mark.

Basically, the individual(s) they got DNA out of did not have any Eurasian steppe ancestry. This seems to confirm again that Eurasian steppe ancestry, which is found in fractions as high as ~30% in twice-born varna in Northern India (e.g., Rajputs, Tiwari Brahmins), arrived after 2000 BC. That is, after the peak period of the Indus Valley Civilization.

Again, one has to be wary of anything from the media because I’ve heard so many confusing things, including claims of garbled quotes, but here’s one of the authors of the forthcoming paper being quoted:

We did some analysis to figure out the exact date of the admixture. We have prepared a model in which all these stats fit together very tightly and that model suggests the Central Asian admixture happened about 1500-1000 BC…. Significant mixing happened around 1000 BC, also at 800 BC and 600 BC.

This is totally in line with the results from the March preprint discussed in the piece. That is, the Swat Valley samples show admixture and genetic change after 1200 BC. And the semi-historical understanding that we have of India during the period between 1000 BC and the rise of Mauryas is that it was a society in flux. But the only way the dating was changed by the Rakhigarhi results is if the genome is high enough quality that it allowed them to narrow down the parameters on some of the estimates of admixture.

One thing to keep in mind is that it is unlikely that the “Harappan people” were one single people genetically. There was probably a lot of variation in admixture with the indigenous South Asian substrate. And, I believe that the inflated steppe & AASI (“Ancient Ancestral South Indian”) ancestry you see in some North Indian Brahmin groups compared to Sindhis (who are more “Iranian”) is evidence that the Indo-Aryan intrusion resulted in an expansion of people with West Eurasian ancestry much deeper into South Asia than was the case with the Harappans.

And of the Harappans, some of the Indian scholars have asserted that their descendants are still present in the region. I think this is right, insofar as some of the jati groups, often scheduled caste, in the northwestern region of South Asia share a lot more affinity with populations to the south and east.

Related: Michael Witzel has commentary from a more linguistic perspective. If the “Para-Munda” hypothesis is right, I think what Witzel is seeing is the substrate language on which Munda was overlain, because Munda people are clearly intrusive from Southeast Asia in the period between 2000 and 1000 BC.

Addendum: If a relatively late intrusion (after 1500 BC) of Indo-Aryans to South Asia is supported by the evidence, it would be interesting in light of the high likelihood that Indo-Aryans were present in the region of upper Mesopotamia before 1500 BC. I believe that these “Indo”-Aryans actually probably never had any contact with South Asia, but descended from the horizon of cultures of which Sintashta and Andronovo were constituents. The Indo-Aryans who arrived in South Asia were probably from a different branch, and likely had interactions with other peoples in what is today eastern Iran and Afghanistan.

The peopling of the Indian subcontinent at the dawn of knowing

A few people have been pointing me to a new paper, A Bayesian phylogenetic study of the Dravidian language family, which implies that the Dravidian language family diversified ~4,500 years ago. I don’t have much to say about the paper itself since it aligns with my own conclusions, but it’s well outside of any field that I can judge (though it does use standard phylogenetic packages I’ve used).

Recently I’ve been going back to old posts of mine on South Asian population genetics because no matter how much some people drag their feet on this question, we’re pretty close to knowing how South Asians came to be. Here’s what I said in December of 2010:

Who were the Indo-Iranians? I lean toward the proposition that they do derive from the Andronovo culture of the Eurasian steppe. This would date the entrance and expansion of Indo-Aryans in northern India 3-4,000 years ago. I also contend that the dominant element of ancestry among modern South Asians is not Indo-Aryan. Rather, it is an ancient stabilized hybrid of pre-agricultural societies in the Indus valley and Neolithic farmers who originated from what is today western Iran and eastern Anatolia. Therefore, I posit that the “Aryanization” of the Indian subcontinent is properly modeled as the same processes which led to the emergence of an Anatolian and Rumelian Turkish identity; a small elite population which forces an identity shift among the majority.

Where was I wrong? Where was I right?

Even looking at ADMIXTURE plots which don’t always give an accurate sense of population history it seemed likely that “Ancestral North Indian” (ANI) was not one thing. Some South Asian populations seemed to have much stronger affinities to West Asian populations. And in particular those from highland West Asia, toward the Caucasus. These include groups in southern Pakistan, but also to some extent in South India. In contrast, other groups had affinities with Eastern European populations, in particular, high caste North Indians, and to a lesser extend Indo-European peoples more generally.

I think I got the dynamic correct. Subsequent analyses comparing ancient DNA from the Caucasus and Iran suggest that all South Asians have a lot of shared drift (ergo, ancestry) with highland West Asians, while a smaller subset has high shared drift (ergo, ancestry) with pastoralists from the Eurasian steppe. The groups match up with what the ADMIXTURE plots were suggesting.

There was more than one pulse of ANI-like ancestry and that one of them was like West Asians and one more like Europeans. Remember, this is before we knew the acronyms ANE, WHG, and EEF. Or CHG and Eastern Middle Eastern Farmers and Western Middle Eastern Farmers.

But, I think I was wrong about the magnitude of the admixture. This was before ancient DNA had revolutionized our understanding of population movement and turnover. I was still resisting the mass migration of a whole folk across huge distances. I’m more open to that now. I am not sure I still believe the very high steppe fractions implied in some of the recent analyses, but it’s certainly higher than I would have believed back then.

Finally, the recent diversification of the Dravidian languages supports the model that their current distribution is not primordial. Rather, they probably expanded relatively recently from the northwest of the subcontinent. Probably earlier than the Indo-Aryan expansion into the Gangetic plain, but not that much earlier.

Additionally, because the Dravidians were not primordial, but expanding only somewhat ahead of Indo-Aryans, they were part of an interactive social-cultural sphere with the Indo-Aryans. I think the very high frequency of R1a1a-Z93 in some non-Brahmin South Indian groups, even tribal ones, suggests to me that the expansiveness of some paternal Indo-Aryan kin networks across the whole subcontinent.

Addendum: Much of the attention goes to the ANI dynamics. But though recent work attests to the overwhelmingly diversity, and basal character, of South Asian mtDNA lineages, we can’t be entirely sure that they are indigenous without ancient DNA. If a migration from the east at the Pleistocene-Holocene boundary was characterized by gradual diffusion of groups with reasonable effective population sizes they could have brought over their diversity.