The Munda languages of the northeastern quadrant of the Indian subcontinent are quite interesting because they are more closely related to the Austro-Asiatic languages of Southeast Asia than to the Indo-Aryan or Dravidian languages which are spoken by their neighbors. The Munda are usually classified as adivasi, which has connotations of being an ‘original inhabitant’ of the Indian subcontinent.
More concretely, the Munda have traditionally operated outside of the bounds of Sanskrit-influenced Hindu civilizations, occupying upland zones and governing themselves as tribal units, rather than being a caste population.
What the field of genetics tells us is that there are really no true aboriginal inhabitants of the Indian subcontinent in an unmixed form. That is, the vast majority of people in the Indian subcontinent have a substantial contribution of ancestry from the wave of migration out of Africa that occupied the southeast fringe of Eurasia beginning ~50-60,000 years ago. The modern adivasi generally are defined more by their social-cultural position within the landscape of Indian culture, as opposed to their long-term residence in the subcontinent.*
The term is a particular misnomer for the Munda because of the evidence that they are intrusive to the subcontinent from Southeast Asia. We have ancient DNA and archaeology which indicates that upland rice farmers, likely Austro-Asiatic, arrived in northern Vietnam ~4,000 years ago. This makes it unlikely to me that they were in India much earlier. The Y chromosomal data indicate that the paternal ancestry of the Munda derives from Southeast Asians, not the other way around.
A new genome-wide analysis of the Southeast Asian fraction of Munda ancestry suggests that it can be as high as ~30%. The paper is The genetic legacy of continental scale admixture in Indian Austroasiatic speakers:
Surrounded by speakers of Indo-European, Dravidian and Tibeto-Burman languages, around 11 million Munda (a branch of Austroasiatic language family) speakers live in the densely populated and genetically diverse South Asia. Their genetic makeup holds components characteristic of South Asians as well as Southeast Asians. The admixture time between these components has been previously estimated on the basis of archaeology, linguistics and uniparental markers. Using genome-wide genotype data of 102 Munda speakers and contextual data from South and Southeast Asia, we retrieved admixture dates between 2000–3800 years ago for different populations of Munda. The best modern proxies for the source populations for the admixture with proportions 0.29/0.71 are Lao people from Laos and Dravidian speakers from Kerala in India. The South Asian population(s), with whom the incoming Southeast Asians intermixed, had a smaller proportion of West Eurasian genetic component than contemporary proxies. Somewhat surprisingly Malaysian Peninsular tribes rather than the geographically closer Austroasiatic languages speakers like Vietnamese and Cambodians show highest sharing of IBD segments with the Munda. In addition, we affirmed that the grouping of the Munda speakers into North and South Munda based on linguistics is in concordance with genome-wide data.
The paper already came out as a preprint many months back, so I’ve already mentioned it. The big finding, to me, is that it uses genome-wide methods to estimate an admixture in the range of ~4,000 between the southern Munda Southeast Asian and South Asian ancestral components. It also confirms something that has been pretty evident for nearly ten years of genome-wide analysis of South Asian population genetics: the Munda have less West Eurasian ancestry even after you account for the Southeast Asian admixture than any mainland Indian population outside of the Tibeto-Burman fringe.
In Narasimhan et al. the authors present a model that fits the data where:
- The proto-Munda mix with an “Ancient Ancestral South Indian” (AASI) population that has no West Eurasian admixture in India’s northeast
- Then, mix more with an “Ancestral South Indian” (ASI) population that has some West Eurasian admixture
The authors in this paper are skeptical of this model because they have a data set of northern and southern Munda groups, who differ in their Southeast Asian ancestry, but not the ratio of West to South Asian ancestry quanta. The implication here is that Southeast Asian ancestral groups were mixing into a substrate of relatively even West and South Asian admixture, and not one of population structure and clinal variation. And yet the authors are very tentative, and I think they know that resolution is going to come with more data. Their own work indicates for example that the northern Munda have been subject to more gene flow from their neighbors than the southern Munda, so the demographic history could be quite complex.
But a fact I want to highlight is that in a sample of ~900 Munda the fraction of R1a1a, which is presumed by many to be a marker associated with Indo-Aryans, is quite low, at 5%. Mind you, that the Southeast Asian haplogroup is found at more than 50% representation in the Munda, while their mtDNA is almost all deeply South Asian (so ~30% Southeast Asian is reasonable genome-wide). I have long believed that this is indicative of the fact that the modern Munda descend from a patrilocal cultural group which managed to maintain their integrity down the present, as evidenced by their unique languages and mythologies, as well as folkways which have traditionally set them outside of the Indian caste system.
The early Indian legends speak of forest-dwelling tribes, which operate somewhat outside of the domain of agro-pastoralist civilization. Some of these were almost certain the swidden rice cultivating ancestors of the Munda.
The ALDER admixture dates in the paper above probably indicate a slightly later admixture from the true overall one (more precisely it will pick up the last admixture). Additionally, there is no reason it has to be India proper. The ancestors of the Andamanese probably arrived from what is today southern Burma, and so AASI-like people certainly existed in mainland Southeast Asia as part of the Australo-Melanesian continuum which extended out to Oceania.
More details will be forthcoming with more ancient DNA. Rather, I want to suggest here the critical issue of how we relate genes to culture. Or, more poetically, genes to memes. Both the Indo-Aryans and Munda seem to have been male-mediated migrations which brought a distinctive memetic package to the Indian subcontinent, while at the same time not contributing the preponderance of genes to their cultural heirs. Though more than four out of five people in the Indian subcontinent speak Indo-Aryan languages, which likely derive from the speech of agro-pastoralists with roots on the Eurasian steppe, probably closer to one of ten of the ancestors of modern South Asians 4,000 years ago were residents of the said steppe.**
And yet the complexity here defies simple attempts to model how ancestry and culture interact and refract down the generations. I am quite convinced that much of the original Austro-Asiatic rice farmer ancestry of the Vietnamese has been diluted by gene flow from southern China during the period when the region was under Chinese imperial rule. And yet the Austro-Asiatic language persisted. Both the Vietnamese and Munda maintain linguistic connections to ancestors who may not contribute to the dominant proportion of the ancestry to the people who continue their linguistic tradition. In the case of the Munda, this was due to demic diffusion into a landscape where they took the daughters of local peoples as wives, while in Vietnam it was probably continuous gene flow from southern China.
It’s hard to put together a hard and fast list of heuristics to infer past history from extant data. To some extent, there needs to be model building, in addition to making inferences. That being said, it is rather clear now that the period between 5000 BC and 1000 BC seems to be one where we are witnessing several instances of distinct admixture between very different ancestral streams. The latest work from the Reich group confirms the intuition by many of us looking impressionistically at clustering results that there are several layers of West Eurasian admixture within South Asia, and, that those admixtures occurred at different times and different places. Additionally, the genetic data are now in alignment with archaeological and linguistic evidence of the expansion of swidden rice farming people out of the fringe of what is today southern China into South and Southeast Asia.
It is now possible to still suppose that a distinct West Eurasian component was present in northwest South Asia, in the valley of the Indus, as far back as the Pleistocene. And, that this element was mixed in the later Holocene with an ancestral component dominant beyond the Thar desert, with affinities to the south and east. This would preserve the possibility of “Out of India” that some are still holding to. Or at least some sort of broader proto-Indo-European network that spanned more than half of Eurasia. But I think a more parsimonious explanation is that rather than deep local structure within South Asia, agricultural populations migrated from both the west and east, and assimilated the deeply entrenched local substrate.
More thorough ancient DNA temporal transects will tell the tale. As with many empirical questions I’m rather patient about waiting on data to make final conclusions, in part because I have high confidence that we’ve been swimming in the right direction. But you could always be wrong!
* This is in contrast to aboriginal groups in European settler nations, who have precedence as inhabitants of the land, and also organize themselves politically somewhat independently and maintain cultural distinctiveness.
** The methods in the latest preprints give somewhat higher figures, but I heard that they were going to go lower. Additionally, it doesn’t change my qualitative point that most of the ancestry is not from Bronze Age Indo-Aryans.