Indian ancestry in Southeast Asia is older than statistical genetic tests suggest

The panels above are from a new preprint, Reconstructing the human genetic history of mainland Southeast Asia: insights from genome-wide data from Thailand and Laos. It’s an OK preprint, marked mostly by the inclusion of a lot of samples from Thailand. The “southern Thai” samples are from peninsular Thailand, and there are Malays in there. The “central Thai” samples are from in and around Bangkok. The Mon seems to be sampled from Thailand as well.

Most of the papers on mainland Southeast Asian genetics are hard to follow because there isn’t a clear relationship in many cases between language and genetics, and linguistic classification can be dodgy. E.g., is Vietnamese Austro-Asiatic? The biggest difference is the old “Australo-Melanesian” substrate, and the ancestry brought by the farmers from the north. But these farmers themselves come out of a southern Chinese milieu where there isn’t a distinction. The biggest difference between a lot of the “Austro-Asiatic” and “Tai-Kadai” groups is how much Australo-Melanesian (Hoabinhian) ancestry they carry (the former carry more since they arrived earlier).

But the question of “Indian ancestry” is more interesting and a bit clearer. It seems obvious that a lot of Southeast Asian groups have South Asian ancestry. For twenty years it’s been clear that the HGDP Cambodian has a West Eurasian affinity, and many of us assumed it was simple “Ancestral South Indian” (ASI) shared lineage. Basically, the people from India to the South China sea were part of a genetic continuum before the intrusion of West Asians into South Asia and Northeast Asians into Southeast Asia. But this is wrong. The Indian ancestry clearly exhibits “Ancestral North Indian” heritage. In Cambodia itself on the order of 5% of the men seem to carry Y haplogroup R1a1a. This is steppe-associated.

So the question is when did this come into the region? The preprint’s figure is a little misleading, though in the text it’s clearer: the statistics indicate a major admixture ~750 years ago. The Mon in particular have lots of Indian ancestry. 20% is probably a low bound figure for this group. When I ran ALDER I got about 750 years for Cambodia. There is zero chance that there was a large scale migration of Indians into Cambodia at that date. Unlike proto-Burma, Cambodia is also pretty far from mainland India.

The most plausible explanation is that these admixture dates are picking up the mixing between a Southeast Asian set of populations without much Indian ancestry, and a group of Austro-Asiatic people who had a lot of Indian ancestry from an earlier admixture.


Indian ancestry maritime Southeast Asia

In the comments, people keep asking about Indonesia, and Java in particular. The reason is pretty simple: before wholesale conversion to Islam maritime Southeast Asia was dominated at the elite level by Indic social and religious forms. I say “Indic” because unlike mainland Southeast Asia Theravada Buddhism did not supplant other Indian religions, and in fact, while indigenous Buddhism that led to the Borobudur temple complex in the 9th-century went extinct, Hinduism persisted for quite a bit longer and persists to this day. Not only are there long-standing Hindu traditions in Bali, but far eastern Java remained a Hindu kingdom until 1770, and there remain Javanese Hindus (some of them are recent converts).

As several mainland Southeast Asian groups seem to have Indian admixture, what is the evidence for Indonesia? (the Singapore genome data offers up some Malays, and though some show recent Indian admixture, all of them have some Indian admixture). Luckily, there is a paper and data, Complex Patterns of Admixture across the Indonesian Archipelago. It uses the GLOBETROTTER framework, so I decided to reanalyze the data in a simpler manner, adding the Cambodians as a check (since from my previous posts you know a fair amount about that as a baseline).

Three points.

1) Definitely gene flow. But on the whole less than mainland Southeast Asia?

2) Lots of heterogeneity. Not surprising. The Sumatra samples seem to be taken from Aceh. This may matter a great deal.

3) In mainland Southeast Asia east of Burma there hasn’t been lots of colonial migration of Indians, nor a great deal of trade. The opportunities within maritime Southeast Asia for contact with outsiders are far greater. The inspection of results from Malaysia indicates continuous gene flow over a long period of time. In contrast, the results from Thailand and Cambodia indicate an early pulse.


The Indian admixture into Southeast Asia is not just a function of distance

In the comments to the post below about Indian ancestry in Thailand, some observed that this should not be surprising due to reciprocal gene flow and proximity. Implicitly, I think what is being suggested here is that there is isolation by distance and continuous gene flow. Obviously some of this is true, but there details here which suggest that it is simply not just geography at work.

The reason I was curious about the Dusun people in coastal Borneo is that while Malays all seem to have Indian ancestry, many tribal Austronesian groups in maritime Southeast Asia do not. The Indian admixture into the Malays is not just recent. Some of it seems quite a bit older than the colonial period.

In the context of Southeast Asia, it seems that some of the more ancient Austro-Asiatic people, in particular, the Mon and Khmer, have Indian ancestry, and groups which mixed with Austro-Asiatic substrates, such as Burmans and Thai, also have this.

Additionally, some groups in the northeastern states of India have less “Indian” admixture than the Thai and Khmer. To show this, see this PCA:

Indian Y chromosomes in Thailand

The region of modern Thailand has gone through a major cultural shift over the last 1,000 years. Today the zone of Austro-Asiatic speech in mainland Southeast Asia is fragmented. To the east, there are the Khmer people of Cambodia, as well as various “hill-tribes” in Thailand and Laos who also speak Austro-Asiatic dialects. To the west, there are the Mon people of Burma.

But around 1000 A.D. the whole zone from the India ocean out toward the Mekong was dominated by Austro-Asiatic peoples. Modern-day Thailand was dominated by the Dvaravati polity, of which little is known, but possible Mon associations are assumed.

I have posted several times about the reality that it seems the whole zone between Burma and Cambodia seems to be impacted by a non-trivial proportion of South Asian (Indian) ancestry. A new preprint has a lot of Y chromosomes from various groups in Thailand. Below are frequencies I pulled out of two ethnic groups with large sample sizes (from table 3 in the supplements):

R1a+RLJ2HSample Size
Central Thai13%0%3%5%129

These lineages are clearly more evidence of Indian males settling in this region.


Indian Ancestry In Thailand During the Iron Age

A follow-up to my previous post, one of the “Iron Age” samples from Thailand seems a definite outlier in comparison to the other Iron Age and Bronze Age samples. There is suggestive evidence again of Indian ancestry, as one sees in the plot above. One of the samples from Thailand overlaps with the Cambodians and Burmese, who do seem to have South Asian shift, while the other samples from Thailand do not. Today most Thais seem to show some Indian ancestry as well, at low levels.

Unfortunately, much of Southeast Asian history before 1000 A.D. is pretty much a cipher. Perhaps the best survey I’ve seen is Strange Parallels: Volume 1, Integration on the Mainland: Southeast Asia in Global Context, c.800–1830, though even there it’s rather thin before the arrival of the Tai and the shocks that entailed for the earlier Indic societies of Southeast Asia.


Indian ancestry in Cambodia was present ~2,000 years ago

When STRUCTURE-style bar plots first emerged using the HGDP Cambodian samples, there were often strange residual components with affinities to South Asians. When Treemix was developed there were strange edges between South Asians and Cambodians. In discussions with Joe Pickrell, the author of Treemix, we both adduced this must be due to deep affinities to “Ancestral South Indians” (ASI). Though Cambodia had “Indic” cultural affinities, the standard model is that this was due to cultural diffusion, not gene flow. Then Spencer Wells told me that The Genographic Project had detected that many Cambodian males seem to carry the R1a1a lineage. Looking at the literature, several Southeast Asian groups carry West/South Eurasian haplogroups which are likey Indian-mediated (R1a, R2, and J2, to name three). The enrichment is notable in groups like the Thai and Khmer which are located at some distance from South Asia.

Out of curiosity, I decided to look at the “Cambodian Iron Age” sample from a recent ancient DNA paper. This sample dates to 100 to 300 A.D., the period of ancient Funan, which we know mostly though not exclusively through Chinese sources:

According to modern scholars drawing primarily on Chinese literary sources, a foreigner named “Huntian” [pinyin: Hùntián] established the Kingdom of Funan around the 1st century CE in the Mekong Delta of southern Vietnam. Archeological evidence shows that extensive human settlement in the region may go back as far as the 4th century BCE. Though treated by Chinese historians as a single unified empire, according to some modern scholars Funan may have been a collection of city-states that sometimes warred with one another and at other times constituted a political unity.

Look at the Iron Age sample it does seem it is notably “Indian-shifted” even compared to modern Cambodians. This could just be an artifact of ancient DNA, but when I looked at a few dozen ancient Vietnamese samples, only one exhibited this same pattern of being Indian-shifted. Reducing the dataset to the 55,000 SNPs that came back on this ancient sample, you see the result above (many of the modern samples don’t have the full complement of these SNPs).

Something on the order of ~5-10% of the ancestry of many Southeast Asian groups seems to be of Indian origin. Looking at the Malays in the Singapore Genome Project, some of them have clear recent Indian ancestry, but even removing all of those you see notable Indian-shift, just as you see with the Cambodians. In contrast, Vietnamese and Dayaks from Borneo don’t show any evidence of such admixture. Neither do samples from the Phillippines.

The question is when this admixture occurred then. A large number of Indians migrated to Southeast Asia during the colonial period to Malaysia and Burma. But some preliminary analysis suggests to me that this doesn’t account for all of the Indian ancestry there. And, it can’t account for Cambodia and Thailand at all (though there aren’t too many genome-wide samples from Thais, the Y chromosomes show the same pattern as the Khmer).

Over time the genetic data is going to coalesce and converge on the details, though I think we see where it’s pointing. At that point, it’s up to archaeologists and historians to make sense of it. This includes scholars of South as well as Southeast Asia. The genetic imprint of South Asians in Iran and Central Asia is rather modest compared to what one sees in Southeast Asia, so it’s an interesting contrast as to why.


Why Indian forms dominated Chinese forms in mainland Southeast Asia

On Twitter Peter Turchin had a question in response to me tweeting a new preprint on bioRxiv:

This was my impression too until a few years ago, but the genetic evidence does point to gene-flow. Here are two recent posts from me, Likely Male-Mediated Indianization In Southeast Asia and Indic Civilization Came To Southeast Asia Because Indian People Came To Southeast Asia. Lots Of Them.

A clash of civilizations along the lower Mekong

The lower Mekong region is a fascinating zone from the perspective of human geography and ethnography. Divided between Cambodia and Vietnam, until the past few centuries it was, in fact, part of the broader Khmer world, and historically part of successive Cambodian polities. Vietnam, as we know it, emerged in the Red River valley far to the north 1,000 years ago as an independent, usually subordinate, state distinct from Imperial China. Heavily Sinicized culturally, the Vietnamese nevertheless retained their ethnic identity.

Vietnamese, like the language of the Cambodians, is Austro-Asiatic. In fact, the whole zone between South Asia and the modern day Vietnam, and south to maritime Southeast Asia, may have been Austro-Asiatic speaking ~4,000 years ago, as upland rice farmers migrated from the hills of southern China, and assimilated indigenous hunter-gatherers.

But the proto-Vietnamese language was eventually strongly shaped by Chinese influence. This includes the emergence of tonogenesis. Genetically, the Vietnamese are also quite distinct, being more shifted toward southern Han Chinese and ethnic Chinese minorities such as Dai. My personal assumption is that this is due to the repeated waves migration out of southern China over the past few thousand years, first by Yue ethnic minorities, and later by Han Chinese proper. Many of these individuals were culturally assimilated as Vietnamese, but they clearly left both their biological and cultural distinctiveness in what was originally an Austro-Asiatic population likely quite similar to the Khmer.

As I have posted elsewhere it is also clear to me that Cambodians have Indian ancestry. Because unlike Malaysia Cambodia has not had any recent migration of South Asians due to colonialism, the most parsimonious explanation is that the legends and myths of Indian migration during the Funan period are broadly correct. There is no other reason for fractions of R1a1a among Cambodian males north of 5%. Depending on how you estimate it, probably about ~10% of the ancestry of modern Cambodians is South Asian (the Indian fraction is easier to calculate because it is so different from the East Asian base).

Demographic replacement in Southeast Asia during the Holocene

Well sometimes you feel silly, and it’s not your fault. Yesterday our podcast on Sundaland went live (we talked about Doggerland and Beringia too!). Though I expressed a fair amount of skepticism, I took the argument that Stephen Oppenheimer presented in Eden of the East, that modern Austronesians are long-term residents of Southeast Asia, seriously.

The alternative view, most forcefully put by Peter Bellwood in books such as First Farmers, is that Austro-Asiatic and Austronesian people were agriculturalists issuing out of southern China that transformed the region over the past 4,000 years (the Austronesians from Taiwan specifically, though during the Pleistocene Taiwan was connected to the mainland).

I lean toward Bellwood’s view, and today a preprint came out which basically confirms it in totality, Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia. The abstract:

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

The transition to full-fledged rice agriculture occurred in Vietnam ~4,000 years ago. In First Farmers Bellwood reports on an archaeological site dating to that period where skeletal evidence has been adduced to record the presence of both Northeast Asian and Australo-Melanesian types. These results make clear though that these hunter-gatherers in Southeast Asia are more similar to the Onge of the Andaman Islands, as well as the Negritos of the interior of the Malay peninsula. They’re totally in alignment with the earlier morphological results (also, readers might be curious to know that one site of the Hoabinhian culture is in Yunnan, China). This shouldn’t be surprising, as the Andaman Islands were a peninsula which extended from southern Burma during the Pleistocene.

Already the most accepted model for the introduction of intensive agriculture into Southeast Asia is that it was brought by Austro-Asiatic peoples. These results confirm that. Additionally, it seems clear that Austro-Asiatic ancestry made it to island Southeast Asia, whether directly or through Austronesian admixture before arriving in island Southeast Asia. Java and Bali have some of the higher fractions ancestries most closely associated with Austro-Asiatic groups on the mainland.

Deeper digging into the admixture distributions has long made it pretty evident that some areas had much higher Austronesian fractions in Indonesia than others, and it wasn’t just a function of distance from the Phillippines. Why? My own hunch is that Austronesians brought social and cultural systems which were better adapted to island Southeast Asia, and were more fully able to exploit the local ecology. Meanwhile, aside from a few fringe areas such as the Malay peninsula and coastal Vietnam, they were not successful on the mainland.

The authors also detect migrations into Southeast Asia besides that of the Austro-Asiatics and Austronesians. One element seems correlated with the Tai migrations, and another with Sino-Tibetan peoples, most clearly represented in Southeast Asia by the Burmans. The excellent book, Strange Parallels: Volume 1, Integration on the Mainland: Southeast Asia in Global Context, c.800–1830, recounts the importance of the great migrations of the Tai people into Southeast Asia ~1000 A.D. Modern-day Thailand was once a flourishing center of Mon civilization, an Austro-Asiatic people related to the Khmers of Cambodia. The migrations out of the Tai highlands of southern China reshaped the ethnography of the central regions of mainland Southeast Asia. The Tai also attempted to take over the kingdoms of the Burmans. Though they failed in this, the Shan states of the highlands are the remnants of these attempts (tendrils of the Tai migrations made it to India, the Ahom people of Assam were Tai). Vietnam, shielded by the Annamese Cordillera, came through this period relatively intact. It is also well known that Cambodia’s persistence down to the present has much to do with the shielding it received from France in the 19th century in the wake of Thai expansion.

There are two bigger issues that this paper sheds light on. One is spatial, and the other is temporal.

They detect shared drift between Austro-Asiatic people and tribal populations in northeast India. This is not surprising. A 2011 paper found that Munda speaking peoples, whose variant of Austro-Asiatic is very different from that of Southeast Asia, are predominant carriers of Y chromosome O2a. This is very rare in Indo-European speaking populations, and nearly absent in Dravidian speaking groups. Additionally, their genome-wide patterns indicate some East Asian admixture, albeit a minority, while they carry the derived variant of EDAR, which peaks in Northeast Asia.

One debate in relation to the Munda people is whether they are primal and indigenous, or whether they are intrusive. The genetic data strongly point to the likelihood that they are intrusive. An earlier estimate of coalescence for O2a in South Asia suggested a deep history, but these dates have always been sensitive to assumptions, and more recent analysis of O2a diversity suggests that the locus is mainland Southeast Asia.

Now that archaeology and ancient DNA confirm Austro-Asiatic intrusion into northern Vietnam ~4,000 years ago, I think it also sheds light on when these peoples arrived in India. That is, they arrived < 4,000 years ago. As widespread intensive agriculture came to Burma ~3,500 years ago, I think that makes it likely that Munda peoples arrived in South Asia around this period.

I now believe it is likely that the presence of Austro-Asiatic, Dravidian, and Indo-Aryan languages in India proper was a feature of the period after ~4,000 years ago. None of the languages of the hunter-gatherer populations of the subcontinent remain, with the possible exception of isolates such as Nihali and Kusunda.

The temporal issue has to do with the affinities of these peoples, and how they relate to the settling of Eastern Eurasia. All the Southeast Asian groups after the original Australo-Melanesians share more of an affinity with the Tianyuan individual than Papuans. The implication here is that Tianyuan is closer to the ancestors of various agriculturalists in Southeast Asia than just some random basal Eastern Eurasian. But, since Tianyuan dates to 40,000 years ago, and, is from the Beijing region, it is hard to make strong inferences from comparisons with only it. The heartland of ancient Chinese culture in Henan was to the south of the Tianyuan, after all. More samples are needed before one can truly tease out the pattern of isolation-by-distance vs. admixture that led to the emergence of the proto-farmer populations which settled Southeast Asia.

In the podcast above one thing that came up is that a lot of genetic data indicate decreased diversity as one moves from the south to the north in East Asia. This has long been taken to mean that humans migrated north, and so were subject to bottleneck effects. I pointed out that this may simply be a consequence of admixture between two very different groups of people in Southeast Asia, elevating diversity statistics.

And yet as the map at the end of the preprint suggests it is highly plausible that Pleistocene Asia was marked by a south to north dynamic of migration. The Austro-Asiatic peoples who migrated south during the Holocene may simply have been backtracking the migration of their ancestors. What these results, and ancient DNA more generally, tell us is that humans were often on the move. The Pleistocene world of climate change probably meant that humans had to be on the move.


South Asian gene flow into Burmese and Malays?

I happen to have a data set merged from the 1000 Genomes and Estonian Biocentre which has Malays, Burmans, and other assorted Southeast Asians, East Asians, and South Asians. In light of recent posts I thought I would throw out something in relation to this data set (you can download the data here). Above you can see the populations in the data. You see Bangladeshis consistently are shifted toward Southeast Asians in comparison to other South Asians. But both Burmans and Malays exhibit some shift toward South Asians.

I ran ADMIXTURE at K = 4. Click the image for the larger file which shows the populations, but I will tell you what’s going on.

The yellow to green represent a north-south axis in East Asia. The Han sample is mostly yellow, but there is a green component in varying degrees. This almost certainly represents heterogeneity in the Han sample of north to south Chinese. The green component is nearly ~100% in some individuals from indigenous tribes in Borneo, and balanced with the yellow among peninsular Malays. It is more at a higher frequency in Cambodia than in Vietnam or Burma, indicating the older roots of Khmers and their relative insulation from later migrations of Sino-Tibetan and Tai peoples.

The red South Asian component is found in many Southeast Asians, but curious in the Burmans and Malays there is a lot of variation within the population. That indicates admixture over time that has not homogenized throughout the population.

I ran Treemix with 5 migration edges and French rooted (1000 SNP blocks out of 225,000 SNPs) and they all looked like this. Commentary I will leave to readers….