Substack cometh, and lo it is good. (Pricing)

ADMIXTURE vs. MDS, visualization is just visualization

Dienekes did another run of his data with K = 64. He posted a huge plot with the two largest dimensions of variation. He also posted an accompanying spreadsheet with the coordinates of where the Dodecad samples were. So I found my own position pretty quickly. Before going to that, I thought I’d repost a comparison between myself, the HapMap Gujaratis, the North Kannadi sample, and the HGDP Uygurs. This is at K = 10 in ADMIXTURE from Dodecad.

OK, with that in mind, here’s the full MDS with the two largest components of genetic variation. I’ve added large labels. Also, click the image for a larger file so you can read the small labels.


One thing that jumps out at me is the tight clustering of very populous groups such as Europeans. The East Asians and Yoruba samples aren’t as representative of their macro-region, so that makes some sense. But the Dodecad Ancestry Project has a lot of West Eurasian groups, so the affinity there is still striking. I am basically a touch off the “North Kannadi” cluster, a little toward the Uygurs. In the clustering which is the main focus of Dienekes’ post I also fall into a North Kannadi cluster. Interestingly, in Zack’s preliminary run with the South Asian data set I’m 71% with Nepalis, and 29% with part of the Singapore Indians (most of whom I assume are Tamil). Note the close position of the Uygurs to the North Kannadi, despite the fact that geographically the Uygur are much closer to Pakistani populations. It just goes to show you what happens when you throw a whole lot of genetic variation into the pot, and then focus on the two largest components of variance. The axis between Europe and East Asia is spanned by South Asians. But some South Asian groups, such as the North Kannadi sample, have an ancestry component somewhat more like East Asians than West Eurasians, so they get placed closer to East Asians on the two dimensional plots. This is what Dienekes terms the “South Eurasian” element, which has been submerged almost everywhere by a West and East Eurasian element.

Here’s a close up of the South Asian region of the plot. You can see how close the Uygurs are to the North Kannadi sample, and how close I am to the North Kannadi. But two of the North Kannda samples are out of the cluster in the MDS. I assume they’re the individuals with a lot of the purple ancestral component, what Dienekes’ termed “West Asian.” The individual between the Gujarati and North Kannadi clusters is probably the one with the slight orange “East Asian” component. And that gives you insight what’s going on with me. If you removed the orange component from my ancestry I’d probably be in the Gujarati cluster. I’m “pulled” to the North Kannadi cluster as a direct proportion of my East Asian ancestral component. The MDS plot isn’t “wrong,” it is visualizing the data correctly with the constraints imposed by our own abilities to process information intuitively. But without the ADMIXTURE plot you’d probably make the wrong inference about my population assignment. With that information the likely hypothesis would be that I’m from a liminal population which has interactions with East Asian groups (e.g., Nepali, Assamese, or Bengali).

Note: Removing the Africans from the sample, or visualizing different combinations of dimensions, would also certainly clear up the confusion in this case. But again, these sorts of steps require a human understanding of what the techniques are presenting to you.

Posted in Uncategorized

Comments are closed.