Vietnamese are not that much like the Cambodians

A comment below suggested another book on Vietnamese history, which I am endeavoring to read in the near future. The comment also brought up issues relating to the ethnogenesis of the Vietnamese people, their relationship to the Yue (or lack thereof) and the Khmer, and also the Han Chinese.

Obviously, I can’t speak to the details of linguistics and area studies history. But I can say a bit about genetics because over the years I’ve assembled a reasonable data set of Asians, both public and private. The 1000 Genomes collected Vietnamese from Ho Chi Minh City in the south. I compared them to a variety of populations using ADMIXTURE with 5 populations.

Click to enlarge

You can click to enlarge, but I can tell you that the Vietnamese samples vary less than the Cambodian ones, and resemble Dai more than the other populations. The Dai were sampled from southern Yunnan, in China, and historically were much more common in southern China, before their assimilation into the Han (as well as the migration of others to Southeast Asia).

Curiously, I have four non-Chinese samples from Thailand, and they look to be more like the Cambodians. This aligns well with historical and other genetic evidence the Thai identity emerged from the assimilation of Tai migrants into the Austro-Asiatic (Mon and Khmer) substrate.

Aside from a few Vietnamese who seem Chinese, or a few who are likely Khmer or of related peoples, the Vietnamese do seem to have some Khmer ancestry. Or something like that.

Narrowing the populations, and using Indians as an outgroup, I wanted to test the Vietnamese against a few select populations. In the graph to the right you see that they are on the same branch as the Dai, and there is gene flow from the Dia into the Cambodians, and from the Cambodians into the Vietnamese. These results actually suggest that the Cambodians have had more gene flow in than the Vietnamese.

If you check the ADMIXTURE plot though you notice that there is a huge range of variation in the Cambodians in terms of their ancestry. The Mon kingdoms to the west of Cambodia fell to the Tai, but Cambodia itself did not. It probably absorbed a fair amount of Tai ancestry though, even if it retained its cultural distinctiveness and character.

A PCA shows that the Vietnamese are a distinct cluster. Different from both the Dai and South Chinese. Some of the samples in the 1000 Genomes are shifted toward the Cambodians and others toward the Chinese.

Finally, I ran a three population test. Here are some results of interest:

o3 pop1 pop2 f3 z
Cambodia Dai Indian -0.00175342 -25.8023
Cambodia French_Basque Dai -0.00192501 -22.1918
Cambodia Vietnamese Indian -0.00122671 -20.5523
Cambodia French_Basque Vietnamese -0.00136869 -17.6703
Cambodia Dai Papuan -0.0013018 -12.7299
Cambodia Han_S Indian -0.000790546 -10.365
Cambodia Vietnamese Papuan -0.000929681 -9.57058
Cambodia French_Basque Han_S -0.00087403 -9.24743
Cambodia Han_S Papuan -0.000476145 -4.05509
Dai Han_S Cambodia -0.000106184 -4.15877
Dai Cambodia She -0.000123515 -3.04445
Han_N French_Basque Han_S -0.000690947 -6.04291
Han_N Han_S Indian -0.000379328 -3.60634
Han_S Dai Han_N -0.000562373 -20.0654
Han_S Vietnamese Han_N -0.000425554 -15.6301
Han_S Filipino Han_N -0.000560061 -14.4192
Han_S Filipino Naxi -0.000529454 -10.9605
Han_S Malay Han_N -0.00038395 -10.3834
Han_S Dai Naxi -0.000316766 -9.36127
Han_S Filipino Yizu -0.000377863 -7.59642
Han_S Dai Yizu -0.000271844 -7.57112
Han_S Cambodia Han_N -0.000272892 -6.90769
Han_S Vietnamese Naxi -0.000211726 -6.09433
Han_S Vietnamese Yizu -0.000178654 -5.79285
Han_S Filipino Tujia -0.000175578 -4.66665
Han_S Thailand Han_N -0.000270477 -4.17533
Han_S Vietnamese Tujia -9.7422E-05 -3.79926
Han_S Tujia Dai -8.98028E-05 -3.0287
Han_S Tujia Malay -6.18931E-05 -1.67189
Han_S She Han_N -7.74747E-05 -1.41452
Han_S Filipino She -3.55034E-05 -0.888484
Vietnamese Han_S Cambodia -0.000646757 -34.4357
Vietnamese Han_S Malay -0.000420205 -22.545
Vietnamese Cambodia She -0.000615643 -17.2252
Vietnamese Tujia Cambodia -0.000553747 -15.6249
Vietnamese Malay She -0.000460983 -13.9445
Vietnamese Tujia Malay -0.000384676 -12.4208
Vietnamese Dai Indian -0.000494414 -12.4142
Vietnamese Cambodia Han_N -0.000494095 -12.2197
Vietnamese Miaozu Cambodia -0.000421982 -11.4913
Vietnamese Malay Han_N -0.000378602 -10.154
Vietnamese French_Basque Dai -0.000524036 -9.99871
Vietnamese Miaozu Malay -0.000280205 -8.27434
Vietnamese Dai Papuan -0.000339828 -5.83617
Vietnamese Han_S Indian -0.000210588 -4.70338
Vietnamese Dai Han_N -0.000122813 -4.42234
Vietnamese Malay Naxi -0.000152052 -3.8678
Vietnamese Han_S Thailand -0.000147552 -3.73211
Vietnamese Cambodia Yizu -0.000145687 -3.71074
Vietnamese Cambodia Naxi -0.000133426 -3.20226
Vietnamese Burm Dai -5.79109E-05 -3.12906
Vietnamese Dai Yizu -7.91838E-05 -3.00809

7 thoughts on “Vietnamese are not that much like the Cambodians

  1. “Do we have any genetic data on the Cham?”

    Extant Chamic-speaking populations are genetically inhomogeneous.

    Tsat (Sanya, Hainan, China) Y-DNA/Li Hui et al. 2008
    4/31 = 12.9% C
    5/31 = 16.1% O*(xO1a-M119, O2a-M95, O3-M122)
    18/31 = 58.1% O1a*(xO1a2)
    1/31 = 3.2% O2a-M95(xO2a1-M88/M111)
    2/31 = 6.5% O3a5-M134(xO3a5a-M117)
    1/31 = 3.2% O3a5a-M117

    Utsat (Sanya, Hainan, China) Y-DNA/Li Dong-Na et al. 2013
    6/72 = 8.3% C-M130
    10/72 = 13.9% F*-M89
    44/72 = 61.1% O1a*-M119
    1/72 = 1.4% O2*-P31
    3/72 = 4.2% O2a1*-M95
    3/72 = 4.2% O3*-M122
    2/72 = 2.8% O3a2a-M159
    3/72 = 4.2% O3a2c1a-M117

    Utsat (Sanya, Hainan, China) mtDNA/Li Dong-Na et al. 2013
    2/102 = 2.0% B*
    4/102 = 3.9% B4a
    1/102 = 1.0% B5*
    8/102 = 7.8% B5a
    3/102 = 2.9% F*
    3/102 = 2.9% F1*
    9/102 = 8.8% F1a1
    9/102 = 8.8% F1b
    16/102 = 15.7% F2a
    17/102 = 16.7% D4
    4/102 = 3.9% D5
    5/102 = 4.9% M*
    3/102 = 2.9% M7*
    1/102 = 1.0% M7a
    3/102 = 2.9% M7b1
    7/102 = 6.9% M8a
    2/102 = 2.0% N9a
    3/102 = 2.9% R9b
    2/102 = 2.0% Y1

    Cham (Bình Định Province, Vietnam) Y-DNA/Li Hui et al. 2008
    1/11 = 9.1% O*(xO1a-M119, O2a-M95, O3-M122)
    10/11 = 90.9% O1a*(xO1a2)

    Cham (Bình Thuận Province, Vietnam) Y-DNA/He et al. 2012
    1/59 = 1.7% C-M216(xM217)
    5/59 = 8.5% C2-M217
    1/59 = 1.7% F-M213(xH-M69, K-P131)
    1/59 = 1.7% H-M69
    6/59 = 10.2% K-P131(xN-M231, O-P191, Q-P36, R-M207)
    1/59 = 1.7% O-P191(xO1a-M119, O-M95, O2-M122)
    1/59 = 1.7% O1a-M119(xP203, M50)
    2/59 = 3.4% O-P203(xM101)
    18/59 = 30.5% O-M95(xM88)
    5/59 = 8.5% O-M88
    4/59 = 6.8% O-P200(xM121, M164, P201, JST002611)
    3/59 = 5.1% O-M7
    1/59 = 1.7% O-M134
    8/59 = 13.6% R1a-M17
    2/59 = 3.4% R2a-M124

    Cham mtDNA/He et al. 2012
    32/168 = 19.0% B4
    30/168 = 17.9% B5
    27/168 = 16.1% M*(xD, E, G, M7, M8, M9a’b, M10, M12)
    18/168 = 10.7% F
    13/168 = 7.7% M7
    12/168 = 7.1% N9a
    9/168 = 5.4% R*(xB, F, R9, R11)
    8/168 = 4.8% R9
    4/168 = 2.4% M8
    3/168 = 1.8% D
    3/168 = 1.8% N*(xR, N9a, W4)
    2/168 = 1.2% E
    2/168 = 1.2% M12
    2/168 = 1.2% M9a’b
    2/168 = 1.2% R11
    1/168 = 0.6% G

    Jarai (Northeastern Cambodia) Y-DNA/Zhang et al. 2014
    31/45 = 68.9% O-F789/F4181
    3/45 = 6.7% O-M88(xF761, F2346, F2758)

    Abstract of Li Dong-Na et al. 2013, “Substitution of Hainan indigenous genetic lineage in the Utsat people, exiles of the Champa kingdom”:

    “The Utsat people do not belong to one of the recognized ethnic groups in Hainan, China. Some historical literature and linguistic classification confirm a close cultural relationship between the Utsat and Cham people; however, the genetic relationship between these two populations is not known. In the present study, we typed paternal Y chromosome and maternal mitochondrial (mt) DNA markers in 102 Utsat people to gain a better understanding of the genetic history of this population. High frequencies of the Y chromosome haplogroup O1a*‐M119 and mtDNA lineages D4, F2a, F1b, F1a1, B5a, M8a, M*, D5, and B4a exhibit a pattern similar to that seen in neighboring
    indigenous populations. Cluster analyses (principal component analyses and networks) of the Utsat, Cham, and other ethnic groups in East Asia indicate that the Utsat are much closer to the Hainan indigenous ethnic groups than to the Cham and other mainland southeast Asian populations. These findings suggest that the origins of the Utsat likely involved massive assimilation of indigenous ethnic groups. During the assimilation process, the language of Utsat has been structurally changed to a tonal language; however, their Islamic beliefs may have helped to keep their culture and

    The demographically miniscule Utsat/Tsat/Utsul minority of Sanya, Hainan appears to be genetically similar to “neighboring
    indigenous populations” in Hainan, who should be mainly Hlai/Li people, an isolated Daic group. However, the frequency of O-M119 appears to be higher among the Utsat than among the Hlai, whereas most Hlai belong to O-M95(xM88), which is the predominant Y-DNA haplogroup among the Chamic Jarai in northeastern Cambodia and among the Cham in Bình Thuận Province, Vietnam (as well as among the Munda and Nicobarese peoples in India).

    The Cham sample from Bình Thuận Province of Vietnam (He et al. 2012) clearly shows some Y-DNA influence from South Asia. Between 11/59 = 18.6% (the sum of R1a-M17, R2a-M124, and H-M69) and 19/59 = 32.2% (the former sum plus the members of K-P131(xN-M231, O-P191, Q-P36, R-M207), F-M213(xH-M69, K-P131), and C-M216(xM217)) of these Chams should be of South Asian patrilineal ancestry. I guess their genomes probably should be between 10% and 15% South Asian overall (or possibly somewhat more, depending on the origins of their M*(xD, E, G, M7, M8, M9a’b, M10, M12) mtDNAs).

  2. malays have indian and austro-asiatic ancestry.

    filipinos have a bit of spanish ancestry.

    the filipinos also have more negrito ancestry than malays. and that ancestry is from a different group of negritos that are very differentiated from the ones in western southeast asian.

  3. The mtDNA of the sample of Cham that I have mentioned in my previous comment in this thread has been described in greater detail in a paper by Min-Sheng Peng et al. (2010). This sample has been collected in Binh Thuan Province of Vietnam, and the Y-DNA of the male subset of this sample has been described in Jun-Dong He, Min-Sheng Peng, Huy Ho Quang et al. (2012), “Patrilineal Perspective on the Austronesian Diffusion in Mainland Southeast Asia,” PLoS ONE 7(5): e36437.

    Cham/Binh Thuan Province, Vietnam (Min-Sheng Peng, Huy Ho Quang, Khoa Pham Dang, et al. (2010), “Tracing the Austronesian Footprint in Mainland Southeast Asia: A Perspective from Mitochondrial DNA.” Mol. Biol. Evol. 27(10):2417–2430. doi:10.1093/molbev/msq131):

    1/168 B4a
    2/168 B4a1
    1/168 B4a1a
    1/168 B4b1
    1/168 B4c1b
    3/168 B4c1b2
    18/168 B4c2
    2/168 B4g
    3/168 B4h
    27/168 B5a
    1/168 B5b1
    2/168 B5b2a
    1/168 C
    3/168 D4
    1/168 E1a1a
    1/168 E2a
    3/168 F1
    3/168 F1a
    2/168 F1a1
    10/168 F1a1a
    1/168 G
    2/168 M12

    7/168 M17 (also found in Thai, Laotians, Indonesia, Urak Lawoi, Philippines (incl. Luzon), Cambodia, Maniq, Mon, Blang, Lawa)
    1/168 M20 (also found in China, Malaysia, Thai, Thailand, Thailand-Laos, Myanmar, Saudi Arabia, Madagascar)
    4/168 M21d (also found in Thailand; M21 in general has also been found among Semang, Semelai, Temuan, Jehai, Maniq, Mon, and Karen and in Bangladesh)
    1/168 M22 (also found in Malaysia (including Semang), Thailand-Laos, Kinh, and China, with the Cham lineage being apparently more closely related to those from Malaysia and Thailand-Laos than to those from China and Vietnam)
    3/168 M50 (also found in Indonesia, Myanmar, Thailand, Moken, Urak Lawoi, Philippines)
    2/168 M51 (also found in Cambodia, Myanmar, Thailand, Thailand-Laos, Sumatra, Kinh)
    1/168 M71 (also found in Myanmar, Cambodia, Thailand, Thailand-Laos, China, Philippines, Indonesia, Moken, Urak Lawoi)
    4/168 M73
    4/168 M77

    4/168 M7b
    5/168 M7b1
    2/168 M7c
    2/168 M7c3c
    3/168 M8a2
    2/168 M9b
    3/168 N21
    1/168 N9a1
    6/168 N9a4
    5/168 N9a6
    2/168 R11
    7/168 R22
    2/168 R23
    5/168 R9b
    3/168 R9c

    I cannot find any information regarding M73 or M77. However, most of the M mtDNAs among the Chams in Binh Thuan Province appear to be indigenous to Southeast Asia. N21 and R22 also appear to be ancient Southeast Asian lineages. R23 may also be indigenous to Southeast Asia; it seems to have been found in Bali and Sumba in central Indonesia besides these Chams. R11 has been found more often among East Asians than among Southeast Asians, and it also has been found in Rajasthan, but I suppose it is more likely that the matrilineal ancestors of these two Chams are either indigenous or have migrated southward from China or Japan than that they have migrated eastward from India.

    As far as I can see, this sample of Chams from Binh Thuan Province, Vietnam does not exhibit any clear South Asian influence in its mtDNA. This contrasts starkly with the significant (18.6% to 32.2%) South Asian influence that is apparent in the Y-DNA of the male subset of the same sample.

    The autosomal contribution from South Asia to Southeast Asia appears modest in comparison to linguistic, cultural, and Y-DNA contribution from South Asia to Southeast Asia. The Indianization of Southeast Asian societies in ancient times may have been effected by male migrants with very little or no participation of Indian females.


