Variation with the 1000 Genomes data set in China


I have mentioned before that the 1000 Genomes Chinese are heterogenous. Many of the ones sampled in Beijing are North Chinese. But there is structure within the South Chinese samples as well. The PCA above shows it. I’ve pruned some of the data for clarity (it’s probably a cline really, with cut-offs and breaks happening because of variation in population density)

Nothing surprising in the Fst matrix. The two South Chinese groups are close to each other, while the North Chinese are shifted toward the Koreans, who are shifted toward the Japanese.

Admixture analysis shows that the two South Chinese groups can be modeled as a mix of North Chinese and the Dai people of southern China, who are ancestral to the Tai people of Southeast Asia. The “South China 2” cluster is somewhat more Dai than the “South China” cluster proper.

The Miao/Hmong samples from the HGDP are very similar to the South China cluster in admixture analysis (and less Dai than the South China 2 cluster). This is not surprising, as the Miao/Hmong are relatively recent migrants into Southeast Asia from China.

What does Treemix say? Basically, the two South Chinese clusters seem to differ mainly in their Dai proportions (as admixture would imply).  They could be on the same cline, and the perception of structure might be an artifact.