Eurasia became a melting pot during the Holocene


One of the things you notice when you look at genome-wide data are peculiar populations that seem to be shifted on PCA and other metrics in relation to exotic genetic affinities. For example, Sardinians, Japanese, and Taiwanese aborigines exhibit this pattern. When looking at Han Chinese data, many of the southern samples seem a bit further shifted away from West Eurasians than all the northern Chinese. That is, almost all northern Chinese seem to have low levels of West Eurasian affinity. Some of the southern Chinese do not.

When you look at West Eurasian data, you see evidence of East Eurasian gene flow into parts of Eastern Europe. Among Lithuanians, it seems to be there. It’s old and well-mixed, so it doesn’t jump out at you. But it’s there. Even more striking is that many of the Muslim populations in the Near East seem to have some proportion of East Asian ancestry because of the Turkic expansions.

We know the reason for this ancestry in West Asia. The rise of the Turks in the Islamic world is historically attested (thank you al-Mu’tasim!). Similarly, the arrival of Tatars and Magyars in Eastern Europe is also recorded. In China, various Turkic and West Asian populations arrived after the fall of the Han dynasty in the northern half of the country. I’ve documented on this weblog strong evidence of Indian ancestry across Southeast Asia.

As more ancient DNA comes to light I think one phenomenon that will become more clear is that the cultural tookit of humans over the last 10,000 years has allowed for more continuous, constant, and frequent, long-distance gene flow. Pairwise Fst values crashed with the rise of agriculture and larger-scale polities. But the adoption of the horse and the emergence of agro-pastoralism also served as a reciprocal conveyer belt of genes across the two antipodes of Eurasia.

West Eurasians and East Eurasians still remain genetically distinct. But evidence from Japanese and Sardinians gives a clear indication that within the last few thousand years have substantial reciprocal gene flows.*

* I am aware that in some of the work in David Reich’s lab there is evidence of East Eurasian gene flow into Mesolithic hunter-gatherers in Europe.

The Belgians did not invent the Hutu and Tutsi ethnic groups, who have different origins

Since I resurrected the analysis of Tutsi genotypes last year I’ve been getting a fair number of emails and messages from people. The issue is that periodically someone, usually, but not always, a white male, will explain that “actually, Tutsis and Hutus aren’t real ethnic groups, and were invented by the Belgian colonialists….” Many people from this region of the world are privately very skeptical of this viewpoint (they tell me so, but don’t want to get into a huge public spat with all-knowing-white-gods). After all, they are from this region, and Hutus are Tutsis are physically often quite distinctively different. They simply do not buy the social constructionist narrative as explaining everything that they see with their own eyes.

But we’ve seen this before, haven’t we? “Well actually, the Lombards weren’t ethnically different from the Romans, they were a Germanized group of mercenaries who created an identity de novo.” Also, “well actually, ‘caste’ is an ancient Indian concept but modern caste-jati groups were reified by the British in the 19th-century….” (genetics tells us both assertions were wrong).

Historiography of the early 21st-century will observe that many white semi-intellectuals took on the metaphorical role of Hamlet in world history, tortured and self-hating souls who put themselves at the center of every dramatic event. All roads lead back to Hamlet.

As it happens, I now have a single Hutu to compare the dozen or so Tutsis to.

Click to enlarge

On the PCA plot above you see the Hutu is near the Luhya and Bantu agriculturalists from Kenya. The Tutsis are shifted toward various Near Eastern populations. Nothing surprising.

Read More

PCA remains the swiss-army-knife to explore population structure


I put up a poll without context yesterday to gauge people about what methods they preferred when it came to population genetic structure.* PCA came out on top by a plural majority. More explicitly model-based methods, such as Structure/Admixture, come in right behind them. Curiously, the oldest method, pairwise Fst comparisons (greater Fst means more variance partitioned between the groups), and Treemix, the newest method, have lower proportions of adherence.

Why is PCA so popular? Unlike Treemix or pairwise Fst you don’t have to label populations ahead of time. You just put the variation in there, and the individuals shake out by themselves. Pairwise Fst and Treemix both require you to stipulate which population individuals belong to a priori. This means you often end up using PCA or some other method to do a pre-analysis stage. Structure/Admixture model-based methods make you select the number of distinct populations you want to explore, and often assume an underlying model of pulse admixture between populations (Treemix does this too when you have an admixture edge).

PCA is also better at smoking out structure than Structure/Admixture for the same number of markers, and, it’s pretty fast as well. This is why the first thing I do when I get population genetic data where I want to explore structure is do a PCA and look for clusters and outliers. After this pre-analysis stage, I can move onto other methods.

Further reading:

* I stipulated “genotyped-based” methods to set aside some of the new-fangled techniques, which often assume phasing and analysis of haplotypes, such as Chromopainter or explicit local ancestry deconvolution (some local ancestry deconvolution does not require phased haplotypes, but the most popular do).