

The mtDNA was a surprise. It was G1a2. This was curious to me since Bangladesh has some of the highest frequencies in the world of haplogroups M, the subhaplogroups in question being mostly restricted to South Asia. I wasn’t surprised that I was R1a1a, but I was even more confident that my maternal lineage was going to be an M, as would my father’s (my own mtDNA is U2b, not common, but not so surprising). As you can see from the map 23andMe places my father’s maternal lineage somewhere in Northeast Asia. The only information I could get about the geography was for G1a, “G1a has been found in samples from China (Daur, Hui, Kazakh, Korean, Manchu, and a sample of the general population of the city of Shenyang), Japan, Korea, Vietnam, and Siberia (Yakut).”

Looking at the Y chromosome haplogroups in the 1000 Genomes there are two of O2 and O3, and one of C3, which are clearly of Southeast Asian origin. With N =5 out of 44 samples that is ~10%. O2 is interesting because it is found at very high frequencies among the Austro-Asiatic populations in South Asia, whether it be the Khasi, or Munda groups (general O2a). O3 seems associated with Tibeto-Burman populations, and C3 with East Asia more generally.

But there are numerous other Austro-Asiatic languages in Southeast and South Asia. The indigenous people of the deep forests of the Malay peninsula, including the Negritos, speaking Austro-Asiatic languages. As one moves west there are Austro-Asiatic languages in Burma, such as Mon, which used to be far more common. And in India there are two groups, the language of the Khasi of the northeast, which seems to share some affinity with the Palaungic dialects of interior Burma and southern China, and the Munda languages farthest west which seem very distinct from all the other branches.
The genetics seems to suggest that the Munda tribes do have East Asian ancestry, but it is almost totally male-mediated. Their Y chromosomal lineages are very unique, with high proportions of O2a, but their mtDNA lineages are overwhelming South Asian macro-haplogroup M. The Khasi of the hills north of Bangladeshi occupy a different position, with both maternal and paternal East Asian heritage, as well as much higher genome-wide ancestry that is not South Asian. At this point, I am convinced that the Austro-Asiatic language groups came into South Asia from the east to the west.

Finally, there are historically attested Tai peoples who migrated into South Asia. The most famous of these are the Ahoms of Assam. These were part of the same migrations ~1,000 years ago that led to the shift of Thailand from being a zone dominated by Mon and Khmer Austro-Asiatic peoples, to Tai peoples. In Burma, the Tai migrations resulted in the Shan states of the uplands, though the Burman and Mon polities were able to fight off the attempts at take over.

All of this ultimately goes back to the question: how did my father get his mtDNA? If you read my post from a few years back, How did Bengalis get East Asian?, you will know that it is probably a mix of Austro-Asiatic and Tibeto-Burman ancestry. Can we say any more at this stage?
Some Austronesian data sets have come online. So I thought I’d give it another shot. Additionally, I spent several hours removing outliers and combining populations to generate a full data set. The number of markers was 195,000 SNPs.
| Label | N | Notes |
| AA | 17 | Munda (outliers removed) |
| BD | 74 | Bangladesh, 1K BEB (outliers removed) |
| Borneo | 31 | Orang Asli tribes (outliers removed) |
| Burmese | 20 | Bamar ethnicity |
| Cambodians | 39 | Outliers removed |
| Dai | 40 | |
| Han_C | 47 | Pooled Han from HGDP and 1K |
| Han_N | 28 | Pooled Han from HGDP and 1K |
| Han_S | 29 | Pooled Han from HGDP and 1K |
| Japanese | 28 | |
| Malay | 21 | |
| Miao | 10 | |
| Phil | 16 | Luzon and Visaya |
| Phil_Highland | 15 | Igorot tribesman Luzon (outliers removed) |
| Telugu | 34 | 1K STU (outliers removed) |
| Viet | 18 |

yellow = South Asian (modal in Telugu)
green = Northeast Asian (modal in Japan and northern Han)
navy = Southeast Asian/Austro-Asiatic (modal in Cambodians)
red = Austronesian (modal in Igorot tribesman from the highlands of the Philippines)
The two bottom population groups are Bangladeshis and Munda. You can see that all are mostly yellow. That is, they’re mostly South Asian. But the Munda have a much lower South Asian proportion than the Bangladeshis. This is not surprising. The Munda language and mythology is very distinct from other South Asians. Clearly, they have ancient East Asian connections, and this shows in their genome-wide ancestry.
But notice a difference between Bangladeshis and Munda: most of the Bangladeshis have a green component, which is in common among Northeast Asians, while none of the Munda do. The total fractions are 38% navy (Austro-Asiatic) for the Munda, and 7% each for navy and green (Northeast Asian) for the Bangladeshis.

This seems to be pretty clear rejection of the model where Bangladeshis are a two population mix of Munda tribesman, and a more conventional South Asian group.
Here are the average percentages by population:
| Group | Austro-Asiatic | Austronesian | South Asian | Northeast Asian |
| AA | 38% | 0% | 62% | 0% |
| BD | 7% | 2% | 84% | 7% |
| Borneo | 61% | 38% | 0% | 0% |
| Burmese | 29% | 0% | 23% | 48% |
| Cambodians | 73% | 1% | 15% | 11% |
| Dai | 49% | 7% | 0% | 44% |
| Han_C | 16% | 5% | 0% | 79% |
| Han_N | 1% | 1% | 2% | 96% |
| Han_S | 27% | 7% | 0% | 66% |
| Japanese | 0% | 1% | 2% | 97% |
| Malay | 64% | 16% | 13% | 7% |
| Miao | 24% | 3% | 0% | 73% |
| Phil | 34% | 37% | 6% | 22% |
| Phil_Highland | 0% | 100% | 0% | 0% |
| Telugu | 0% | 3% | 96% | 0% |
| Viet | 45% | 7% | 0% | 48% |
I’m 99% sure that “South Asian” is in some of these cases a proxy for anything that’s not East Asian. But the Malay and Cambodian results are probably South Asian. And the Burmese certainly are.

Both the Malays and the Burmese exhibit a “South Asia cline.” This is due to admixture. But the Burmese project toward the position of the central Han, while the Malays are shifted toward a Southeastern Asian population.
Both the Bangladeshis and Munda samples are East Asia shifted, but the Munda sample clearly skews toward the Southeast Asian populations. The Bangladeshi samples do not seem to exhibit this clear pattern.
Then I ran Treemix with blocks of 1000 SNPs and no migration edges as well as global rearrangements turned on and rooted with the Telugu.
The results are absolutely unsurprising. Unfortunately adding migration edges doesn’t really add much value with so many populations, as there is a great deal of complex population history in Southeast Asia.
Removing many of the populations and setting the migration edges to 3, you get:

At this point I ran a “three population test.” Basically, you take an outgroup, and compare it to a clade of two other populations, and see how good the fit of the data to the model is. If there is “complex population history” you’ll get a negative f3 statistic. Complex population history means that there is almost certainly gene flow between the outgroup and one of the ingroups.
Below are results where the Bangladeshis are the outgroup, and f3 statistics are negative (sorted most negative to least).
| Ougroup | Pop1 | Pop2 | f3 | f3-error | Z-score |
| BD | Telugu | Miao | -0.00240554 | 6.21107e-05 | -38.7298 |
| BD | Telugu | Han_S | -0.00238905 | 5.49332e-05 | -43.4901 |
| BD | Telugu | Dai | -0.00238103 | 5.73977e-05 | -41.4831 |
| BD | Telugu | Han_C | -0.00237904 | 5.74148e-05 | -41.4359 |
| BD | Telugu | Viet | -0.0023151 | 5.63663e-05 | -41.0725 |
| BD | Telugu | Han_N | -0.00229979 | 5.55838e-05 | -41.3752 |
| BD | Telugu | Japanese | -0.00225745 | 5.65642e-05 | -39.9095 |
| BD | Telugu | Phil_Highland | -0.00225153 | 6.87595e-05 | -32.745 |
| BD | Telugu | Borneo | -0.00219619 | 5.91978e-05 | -37.0992 |
| BD | Telugu | Phil | -0.00209752 | 5.97396e-05 | -35.1111 |
| BD | Telugu | Cambodians | -0.00198719 | 4.88719e-05 | -40.6613 |
| BD | Telugu | Malay | -0.00195706 | 5.32466e-05 | -36.7547 |
| BD | Telugu | Burmese | -0.00183415 | 4.79121e-05 | -38.2816 |
| BD | AA | Telugu | -0.000744786 | 4.17995e-05 | -17.818 |
The model where Bangladeshis are a combination of Austro-Asiatic populations and conventional South Asians is not crazy. But observe that there is a jump in the f3 statistics between that row and the previous row. Bangladeshis almost certainly have non-Austro-Asiatic ancestry, which is why the scores are more extreme for cases such as (Bangladesh(Telugu, Vietnamese)).
What I’ve established then are:
- Bangladeshi East Asian ancestry is not sufficiently explained by Munda ancestry.
- A minority of Bangladeshi Y and mtDNA lineages have East Asian connections, and this can not be explained exclusively by Munda ancestry.
- Some of these Y and mtDNA lineages seems to be of Tibeto-Burman affinity.
- Admixture analysis genome-wide indicates ancestry from non-Munda populations of East Asian origin.
- The fraction of Austro-Asiatic ancestry is balanced with more “northern” elements, while in Burma the northern element is a greater proportion than in Bangladesh.
- There is a moderate negative correlation between Austro-Asiatic ancestry and Northeast Asian ancestry in the Bangladeshi sample.
- Bangladeshis seem to have moderate signatures of gene flow from a wide range of East Asian populations.
- In contrast, the Mundas seem to have a connection most strongly with Cambodians.

I assume that there is a true signal there. But the model may still be too parsimonious.
My own predictions are as follows:
- There will be a east-west cline of Tibeto-Burman ancestry.
- There will be a more constant fraction of Austro-Asiatic ancestry.
- The ratio of Austro-Asiatic ancestry will be reversed from the Tibeto-Burman cline.
- Two admixture events will eventually be detected. A strong sex-balanced pulse at 500 AD and later. And an older continuous event that will be more male skewed, as it will involve absorption of Munda substrate.
- The Padma river will turn out to be a major differentiator, with much more Tibeto-Burman ancestry to the east (Bengali dialects from east of the Padma show more Tibeto-Burman influence).
Note: a separate issue that I did not want to explore is that the South Asian ancestry of the Munda seems to show almost no Indo-Aryan influence. The Bengali population does have a small, but consistent, “Indo-Aryan” signature that you can not find in the Telugu sample. Naturally this will bias the statistics a touch.

Razib,
Thanks for the detailed analysis of the your father’s data and in the process teasing out the likely population history for Bangladeshis. Even for a layperson in genetics, it was fascinating to see the analysis approach and tools used. Knowing the general tools well enough (e.g. PCA), I appreciate how the right slice of data is important and the associated hard work for clean data.
Just one minor request for your future PCA in R. Please change the default colour spectrum in ggplot to have a greater contrast. I have really hard time distinguishing the pink of Telugu from the pinky-violet of Viet. Theoretically I know there would be isolated clusters of these two, but I am unsure if Telugu cluster is the bottom left of the first two components or extreme right. The text is unclear as to what is PC2.
Re: PCA and Burmese, I do find that the Naxi and Yi tend to fit better as a proxy ancestor for Burmese (as a group have lower Fst and high CM IBD sharing with Naxi / Yi compared to Central / Northern Han) and Naxi/Yi tend to relate to outgroups more like Northern Han than like Central Han.
(Like Tibetans these groups seem in the vast majority like an outmigration of North Han like farmers? Possibly with very low levels (single digit) of ancient High Altitude Adapted ancestry, peaking in Tibet.)
So I wonder if for Burmese there’s not a couple of phenomena going on: First essentially Naxi / Yi that sits where Northern Hand does mixes with a Cambodian like population, producing a point that’s around PC1: -0.04, PC2: 0.04 (Ratio 6:3 Naxi:Cambodian), around where the Burmese sample with highest position on PC2 sits. Then this population admixes forming the admixture cline stretching it towards that point between AA and BGD.
That might ultimately fit a little better with both the Naxi affinity, and with the history of contribution of preexisting Mon groups (Cambodian like?) into the groups that formed as proto-Burmans migrated into Burma.
Re: ADMIXTURE, have to say (though you already know of course) the orange probably isn’t % *real* Austronesian ancestry, rather relates to the high drift in Igorot and Kankaey highland PHL as well. Including them is good to trigger an Austronesian component, but taking orange literally as Austronesian would probably underrates Austronesian ancestry in PHL and Malay?
I have a lot of trouble with the colours. But then, I live with two females who tell me my colour perception is whack.
Hi razib, can you include me in your analysis as I am very keen on knowing my results being Assamese of mixed heritage. Can i send you my ancestry dna, ftdna or living dna raw data