Substack cometh, and lo it is good. (Pricing)

Variation with the 1000 Genomes data set in China


I have mentioned before that the 1000 Genomes Chinese are heterogenous. Many of the ones sampled in Beijing are North Chinese. But there is structure within the South Chinese samples as well. The PCA above shows it. I’ve pruned some of the data for clarity (it’s probably a cline really, with cut-offs and breaks happening because of variation in population density)

Nothing surprising in the Fst matrix. The two South Chinese groups are close to each other, while the North Chinese are shifted toward the Koreans, who are shifted toward the Japanese.

Admixture analysis shows that the two South Chinese groups can be modeled as a mix of North Chinese and the Dai people of southern China, who are ancestral to the Tai people of Southeast Asia. The “South China 2” cluster is somewhat more Dai than the “South China” cluster proper.

The Miao/Hmong samples from the HGDP are very similar to the South China cluster in admixture analysis (and less Dai than the South China 2 cluster). This is not surprising, as the Miao/Hmong are relatively recent migrants into Southeast Asia from China.

What does Treemix say? Basically, the two South Chinese clusters seem to differ mainly in their Dai proportions (as admixture would imply).  They could be on the same cline, and the perception of structure might be an artifact.

6 thoughts on “Variation with the 1000 Genomes data set in China

  1. How does the Asian intra-regional variation compare to European intra-regional variation? In particular, given recent history, how similar are the Han Chinese/Korean/Japanese genomes vs. say UK/Germany/France/Italy?

  2. Do SouthChina and SouthChina2 correspond to the samples from Hunan and Fujian provinces respectively? I would assume more of the Hunan samples cluster with the Hmong/Miao, while more of the Fujian samples cluster with the Dai.

  3. Do SouthChina and SouthChina2 correspond to the samples from Hunan and Fujian provinces respectively? I would assume more of the Hunan samples cluster with the Hmong/Miao, while more of the Fujian samples cluster with the Dai.

    i don’t know.

  4. @James, roughly in terms of Fst differentiation (in very Anglocentric and Hanocentric terms), if you want (very approximate) analogies it looks like

    Han_N:Han_S2 = Great_Britain:Spanish; Han_N:Han_S1: Great_Britain:French, Han_N:Mongol = Great_Britain:Irish; Han_N:Korea = Great_Britain:Hungarian; Han_N:Vietnamese = Great_Britain:Sicily; Han_N:Japanese = Great_Britain:Finnish/Russian; Han_N:Cambodian = Great_Britain:Pathan.

    These are approximate so some will depend on where you sample in a country, e.g. for example South and East Germans, North Swedes will probably show more differentiation from UK than populations from closer parts of their countries, and so on.

    The actual combinations of how genetic drift and admixture make up the population differentiation are different, and how this relates to shared drift and similarity (populations can have the same Fst at a wide range of shared drift depending on intra-population diversity). But the basic Fst figures are approximately analogous.

  5. Can you provide the full FST distance above. I download the Fst matrix you posted but it is the table describing the genetic distance between European and Yakut
    Thank so much.

Comments are closed.