Malaysia – Gene Expression

Thanks to the Singapore Genome Variation Project, and some data from Lynn Jorde‘s lab, I added some Tibetans and Malays for a pooled data set of East, Southeast, and some South Asians. The marker density was 70,000. I was curious to explore the various contributions of ancestry from eastern Eurasians into northeast South Asia.

The Tibetans, in particular, seem to be a common “source” population in a lot of places. There are fewer than 10 million Tibetan people, proper, today. But the impact of Tibetan or quasi-Tibetan people historically has been much greater than their current numbers might suggest. Additionally, Tibetans occupy a very wide geographical range, far outside of the Tibetan Autonomous Region of modern China. The majorit of ethnic Tibetans live outside of its political boundaries. The terms “Sino-Tibetan” and “Tibeto-Burman” are both ethno-linguistic terms which point to the affinities of Tibetans in a broader East and Inner Asian context. Not only was the Tibetan Empire of the 7th to 9th centuries a major geopolitical power, but the Tangut state which dominated much of modern Gansu for centuries had Tibetan affinities.

Meanwhile, in the northeastern quadrant of South Asia, Indo-Aryan languages are dominant today. But, Tibeto-Burman, Tai, and Austro-Asiatic languages are all important as well (or at least present). As noted in Strange Parallels, the Tai are most recent arrivals in Southeast Asia proper. This is known from history.

For Southeast Asia various archaeological, philological, and now genetic, data suggest that Austro-Asiatic languages arrived with the first farmers, who emigrated from what is today southern China, in the range of 4,000 years ago. The arrival of Tibeto-Burman languages and peoples to Southeast Asia surely precedes that of the Tai, which dates to 1,000 years ago, but likely postdates the arrival of Austro-Asiatic groups.*

The situation in northeastern South Asia is somewhat confused in terms of period of arrival of the various groups. A few years ago a paper on cholera genetics in Bangladesh reported analysis which indicates that the ancestors of eastern Bengalis received an admixture pulse of East Asian ancestry about 1,500 years ago. And, that a pulse model would suffice. An immediate explanation that came to mind is that these Bengalis mixed with Munda people, who have substantial East Asian ancestry, and speak an Austro-Asiatic language.

In The Genomic Formation of South and Central Asia one model for the emergence of Munda people that fits the data is:

1) An admixture of East Asian people (presumably, Austro-Asiatic farmers), with “Ancestral Ancestral South Indians” (AASI). AASI being indigenous South Asians who lack any West Eurasian ancestry.

2) A mix of this component with “Ancestral South Indians” (ASI), with consists of AASI and a minority ancestry of West Eurasian farmer from Iran.

Presumably there other models which fit the data as well, but even with naive admixture analysis it was long evident to anyone who looked at the Munda were atypical. The Turan/East European ancestry that one can find with classical model-based admixture at various levels in various South Asian populations is always absent in the Munda. Not only that, but they had very high fractions of modal South Asian ancestry combined with the East Asian component.

So can the East Asian ancestry in Bengalis be explained by the Munda? I’ve posted on this topic before, and every time I come to the same conclusion, probably not. Now that I have Tibetan and Malay samples to define a northwest-southeast transect I can say that again, more definitively.

At K = 10 in the admixture plot above you notice that the cluster modal in Malays and Cambodians accounts for almost all the East Asian ancestry in the Austro-Asiatic Munda sample. In Bengalis that component is found, but so is the proportion modal in Tibetans, and also in Han Chinese. The same pattern is found in the Burmese, but with much higher fractions. In fact, let’s compare average fractions between Bengalis and Burmese.

	Han	Tibetan	Austro-Asiatic
Bengali	3%	4%	4%
Burmese	16%	34%	28%

The Han-model component is kind of general. We can’t reject the possibility I think from these proportions that the East Asian ancestry in Bengalis is exactly the same as that in Burmese…though based on Y chromosomal data I do think there is some Munda ancestry in Bengalis. Additionally, Munda people are found in some numbers even today in Bengal, into Bangladesh (the Santhals).

Looking at results from a three-population test the Tibetan(like) contribution to Bengalis seems likely:

outgroup	pop1	pop2	f3	f3-error	z
AustroAsiatic	Dai	Telegu	-0.00187497	0.00012171	-15.4052
AustroAsiatic	Telegu	Lahu	-0.00182418	0.000124765	-14.6209
AustroAsiatic	Malay	Telegu	-0.00135077	0.0001035	-13.0508
AustroAsiatic	Han	Telegu	-0.00138827	0.000112488	-12.3415
AustroAsiatic	Telegu	Kinh	-0.00158805	0.000130918	-12.1301
AustroAsiatic	Miaozu	Telegu	-0.00142974	0.00011907	-12.0076
AustroAsiatic	Telegu	She	-0.0015296	0.000127609	-11.9866
AustroAsiatic	Cambodians	Telegu	-0.00135761	0.000119312	-11.3786
AustroAsiatic	Telegu	Tujia	-0.00137567	0.000125184	-10.9892
AustroAsiatic	Telegu	Naxi	-0.00106498	0.000123643	-8.61336
AustroAsiatic	Telegu	Japanese	-0.000991272	0.000116507	-8.50824
AustroAsiatic	Yizu	Telegu	-0.00111008	0.000133985	-8.28514
AustroAsiatic	Telegu	Han_N	-0.000844919	0.000122055	-6.92243
AustroAsiatic	Telegu	Tibetan	-0.000414633	0.000109922	-3.77207
AustroAsiatic	Telegu	Hezhen	-0.000428829	0.000116739	-3.67339
AustroAsiatic	Xibo	Telegu	-0.000504054	0.000138498	-3.63943
AustroAsiatic	Telegu	Burmese	-0.000335967	0.000108207	-3.10485
Bengali	AustroAsiatic	Iranian	-0.00331738	7.5938E-05	-43.6853
Bengali	Miaozu	Telegu	-0.00250784	6.12097E-05	-40.9712
Bengali	Han	Telegu	-0.00250669	6.11899E-05	-40.9658
Bengali	Telegu	Tibetan	-0.0022997	5.672E-05	-40.5448
Bengali	Telegu	Japanese	-0.00240064	6.02193E-05	-39.865
Bengali	Dai	Telegu	-0.00253233	6.53283E-05	-38.7632
Bengali	Malay	Telegu	-0.00212941	5.51377E-05	-38.6199
Bengali	Xibo	Telegu	-0.0023685	6.24874E-05	-37.9036
Bengali	Telegu	Han_N	-0.00241445	6.40346E-05	-37.7054
Bengali	Telegu	Burmese	-0.00205009	5.43997E-05	-37.6857
Bengali	Telegu	Naxi	-0.00249315	6.66967E-05	-37.3804

OK, so what do we do with this, and how does it make sense? If you read a book like Land of Two Rivers, you won’t have any sense that an admixture between a Tibeto-Burman people, and Indo-Aryan speakers, occurred in eastern South Asia 1,500 years ago. To a great extent this is “prehistoric,” hidden from us, even if by that period mentions of the fringes of modern Bengal exist in Classical Indian sources. It is clear that many of the people who lived in Bengal were not part of Aryan society. The later Vedic sources assert this explicitly, mentioning non-Aryan tribes beyond the march.

I currently believe that southern and eastern South Asia were touched by the expansion of Indo-Aryan/Dravidian speaking people after 4,000 years ago. This would make sense in light of the Vedic memory. I also suspect that Austro-Asiatic Munda people arrived after 4,000 years ago into a landscape where the population was AASI, without any West Eurasian influence. By 500 BC it seems that Indo-Aryan culture at least arrived on the edge of Bengal. At this date I suspect most of the tribes living in Bengal were probably already Munda. If the argument in The Rise of Islam and the Bengal Frontier is correct that much of eastern Bengal was not intensively cultivated until after 1000 AD. The period between 500 AD and 1000 AD was also the only one in ancient or medieval India where Bengal was home to the paramount hegemonic power in South Asia, a state ruled by the Pala dynasty.

Meanwhile, Tibeto-Burman people seem to have arrived to the east around 200 BC in the Irrawaddy basin. Rice cultivation in this region dates to 1500 BC. This is 500 years after rice cultivation arrived in northern Vietnam. Presumably then around 200 BC and later there was a transition from Austro-Asiatic languages to Tibeto-Burman langauges (the Mon may be intrusive from Thailand). Somehow I suspect that between 0 and 500 AD a group of Tibeto-Burmans moved up to the coast and arrived in eastern Bengal. Mixing with the native Munda they were probably absorbed by the expansion of Indo-Aryans eastward triggered by the political dominance of the Pala dynasty.

But was the gene flow in one direction? This seems unlikely. All the Burmese samples have South Asian admixture. This can be explained by proximity. But there are signatures of this in Cambodia, and the Malay samples I selected were part of a tight cluster. It seems that the Malay samples also have substantial South Asian admixture. The Indian Ocean economy and Diasporas between 0 AD and 1000 AD, after which Muslims and later Europeans became dominant, is a lacunae in our understanding. The presence of Malagasy and clear Austronesian influence in East Africa indicate a east to west migration. But Indian genetic signatures are found through Southeast Asia as well. Some of this can be chalked up to proximity (Burma) and colonial era contact (Malaysia), but Cambodia is too far for either to be plausible. Curiously, this influence is mostly lacking in Vietnam, or the interior of Southeast Asia. This is strongly suggestive of maritime trade contact. The regions where Indic culture were strong are the regions where there is a genetic signature of South Asians.

At this point I think I’ve established enough about South Asia and Bengal to move on from that. In the future I’m more curious about exploring contacts between South Asia and Southeast Asia, and how it left a cultural and biological impact.

* The 4,000 year date I arrival from the genetic sample and culture which emerges in northern Vietnam’s Red River Valley, and marks the transition between hunter-gatherers to agriculture.

I happen to have a data set merged from the 1000 Genomes and Estonian Biocentre which has Malays, Burmans, and other assorted Southeast Asians, East Asians, and South Asians. In light of recent posts I thought I would throw out something in relation to this data set (you can download the data here). Above you can see the populations in the data. You see Bangladeshis consistently are shifted toward Southeast Asians in comparison to other South Asians. But both Burmans and Malays exhibit some shift toward South Asians.

I ran ADMIXTURE at K = 4. Click the image for the larger file which shows the populations, but I will tell you what’s going on.

The yellow to green represent a north-south axis in East Asia. The Han sample is mostly yellow, but there is a green component in varying degrees. This almost certainly represents heterogeneity in the Han sample of north to south Chinese. The green component is nearly ~100% in some individuals from indigenous tribes in Borneo, and balanced with the yellow among peninsular Malays. It is more at a higher frequency in Cambodia than in Vietnam or Burma, indicating the older roots of Khmers and their relative insulation from later migrations of Sino-Tibetan and Tai peoples.

The red South Asian component is found in many Southeast Asians, but curious in the Burmans and Malays there is a lot of variation within the population. That indicates admixture over time that has not homogenized throughout the population.

I ran Treemix with 5 migration edges and French rooted (1000 SNP blocks out of 225,000 SNPs) and they all looked like this. Commentary I will leave to readers….

Tag: Malaysia

Tibeto-Burmans in Bengal, and Indians in ancient Malaya

South Asian gene flow into Burmese and Malays?