Substack cometh, and lo it is good. (Pricing)

Genetic distances across Eurasia

I feel that for whatever reason that over the past few years that many people have started to exhibit weak intuitions about the magnitude of between population differences on this weblog. Two suggestions for why this might occur.

* First, the proliferation of PCA plots with individuals can make it hard to discern averages

* Second, model-based admixture plots don’t explicitly quantify the differences between the different clusters

To get a better sense of between-group differences I decided to take a step back and look at Fst. Fst basically looks all the genetic variance between groups and quantifies the proportion that can be attributed to differences between groups.

The plot at the top of this post is from an Fst matrix I generated with Plink (I wrote a script to do the pairwise comparison). I did some PCA pruning of the populations to be clear (e.g., with both Cambodians and Filipinos I made them more distinct than they would otherwise be). The goal was to give people a sense of genetic distances within regions and between them.

I also generated a PCA plot and a Treemix plot, for the sake of comparison.

It’s also useful to look at a few group comparisons and judge them in a global context.

TamilTamil Scheduled Caste0.0016
TamilSouth Indian Brahmin0.0031
TamilUttar Pradesh Brahmin0.0041
Southern ChineseNorthern Chinese0.0033
Southern ChineseVietnamese0.0034
Southern ChineseKorea0.0045
Southern ChineseJapanese0.0087
Southern ChineseTamil0.0711
Southern ChinesePolish0.1141
Gujurati_PatelUttar Pradesh Brahmin0.0065
GreatBritainUttar Pradesh Brahmin0.0264

The non-Brahmin and non-Dalit samples in the 1000 Genomes are not much partitioned much by geography. The Tamil vs. Telugu difference is smaller than that between the British and Irish. Within Tamil Nadu Brahmins though are nearly as different from typical Tamils as Poles are from the English (most of the British sample is English). The biggest differences in Europe are between Sicilians and Northern European groups, which similar in a degree to that between South Indians and Pakistanis. The South Chinese sample is nearly as close to Vietnamese as it is to a North Chinese group, while the difference between Koreans and Chinese is relatively small when compared to the variance you see in South Asia and Europe.

Note: Drift tends to inflate Fst.

11 thoughts on “Genetic distances across Eurasia

  1. I was wondering, why are some of these Punjabis closer to South Indians than to other Pakistanis? Are they recent imports or are they of low caste stock?

    Re-post from earlier, still relevant here:


    How come these Punjabi samples have a great deal of Onge DNA, and reduced Iran Neolithic and Steppe ancestry? I thought Punjabis would be genetically similar to Sindhis. I myself am a Punjabi Gujjar and in almost all of my results I got no more than 20% Onge+Han ancestry. My Steppe proportions were between 36 and 40% and my Iran Neolithic proportions ranged from 40 to 46%. David ran these for me.

    Thanks Razib

  2. I was wondering, why are some of these Punjabis closer to South Indians than to other Pakistanis? Are they recent imports or are they of low caste stock?

    yeah, i don’t know. it’s the 1000 genomes data. lots of structure in punjab.

  3. “I was wondering, why are some of these Punjabis closer to South Indians than to other Pakistanis? Are they recent imports or are they of low caste stock?”

    No offence, but I think assuming a certain group as “lower caste” or outlier just by looking scores seem unfair, especially for south Asian groups.
    A so called lower caste Punjabi cant be so much different from other Punjabis.

  4. There are some slight differences in the trees when using the “User Supplied Distance” method to build a neighbour joining tree in PAST3 – That just takes the matrix as ground truth and builds the tree around it. Stuff at the margins like Korea clading with Japan and Han_S with Vietnamese. Nothing too major, may be worth a look.

    (Treating the input matrix as the distances as oppsed to treating each column of the matrix as a separate euclidean distance variable, which would be and mirrors the first tree in the post.

    My experience is that has a slightly varying outcome where populations that have more similar levels of global drift and relatedness to outgroups will join together more, as opposed to placing the best fit of populations that are directly close as measured by their pairwise Fst).

    Principal coordinates analysis of this matrix in PAST3 nicely recapitulates your PCA’s first two dimensions: (Although of course you get a less solid sense of the within population variation.).

  5. @Razib Thanks.

    @Kev They are though, they’re very different. For instance, they have twice as much Onge ancestry as myself, a decent amount of less Steppe, and a great deal of less Neolithic Iranian. You would be surprised but, they are as far away from me as Tamils are. Not that it matters or anything.

  6. @Kev

    There actually happens to be a massive difference between “low caste” Punjabis and “upper caste” or biraderi /tribal Punjabis. It’s actually so large that a Jatt or Gujjar Punjabi shows a smaller population distance to a UP or even Tamil Brahmin than to that of a “low caste” Punjabi. On Razib’s Genotype project, some Punjabis (both Indian and Pakistani) are scoring between 47-56% Tamil, 30%+ Iranian, 10%+ Lithuanian while some academic samples from PJL (Punjabi Lahore) on the representative population tab are scoring upwards of 80-85%+ Tamil and not much else.

  7. @Sapporo Thanks for clarifying this point for him. Great to see you on here. It’s been a while since we last conversed. I think a distinction must be made between us biraderi/tribal/upper caste whatever you want to call it Punjabis and lower caste/dalit/chammar/massali Punjabis. I do not think that many people here know this but, Punjabis are not all the same, a lot of genetic differences involved in the equation.


  8. @Xehanort @Sapporo
    Yeah, i’m aware of that but still I dont think those scores should be taken seriously-i mean they help us to learn about our ancestral components and their possible percentages.
    A lower caste punjabi might be low ANI, but his ANI and ASI definitely not came from Tamil nadu rather it is the same local ANI that high caste Jatts or Gujjars have. Similarly, a Tamil Brahmin might clusters with North Indians, but their male ancestors had taken local wives iirc, so half of their ANI/ASI is local. That’s why majority of Tam Brahms have similar facial features as mid caste Tamils,and low caste punjabis look same as other punjabis(slightly more darker and wider nosed maybe?) and not Tamil.
    Looking at those scores, even indoaryan groups like midcaste Gujrati and Bengalis score very high Tamil,possibly because their ANI/ASI ratio is similar, also a lot of Iranian(Neolithic) component must be captured by Tamil component plus we dont even know what is the exact ASI/onge percentage these components have,for example both midcaste Telugu and a dalit Telugu would score 90% or more Tamil, the midcaste telugu could be 50-60 ASI but the dalit telugu must be even more than that. However ancient dna will resolve the puzzle, if we are lucky enough to find it,atm im hoping for Rakhigarhi results to provide us some useful information. Btw im very new to the subject, all i said are just based on my very little genetical knowledge and I apologize in advance if something is wrong.

  9. @Kev You did not say anything wrong per say, and I somewhat agree with you, though keep in mind that I never claimed that they were foreign imports and I highly doubt they were. I asked Razib, but knew that they’re just natives. Anyway, I have seen most of them and they do look surprisingly Tamil and not North Indian/Pakistani. There is not really a difference between lower caste North Indians and South Indians, with the exception of South Indian Brahmins. Also, South Indian Brahmins do look fairly distinct from South Indian lower castes. There is a huge gap between us and them. It’s not a racial issue or anything, I am just making a point here.

    We have the ancient data, it just needs to be released, and I hope that it will be sooner rather than later.

  10. The thing is that these differences aren’t limited to Razib’s work. They are consistent with any admixture calculator. So, while I disagree about taking the scores seriously, I do agree that we need to look at them within context. Firstly, the ANI and ASI are composites. ANI Is made up of Iran N/CHG, Steppe related admixture, other more recent “West Eurasian like admixture” and possibly ANE while we can’t really break down ASI yet without ancient South Asian genomes. So, in essence, I agree it’s likely that the ANI or ASI a Punjabi “dalit” or “low caste” scores is similar to that of a Punjabi Jatt, Gujjar, Rajput, Arain, Khatri, Kamboj, Tarkhan, etc. They just score them in very different proportions and dalits essentially lack any Steppe related admixture.

    Finally, I also agree the Tamil component here is very mixed. The HGDP Pathan are scoring mostly upwards of 35% and even the Brahui are scoring around 30%.

    On a separate not, you don’t need to apologize for anything. You’ve made some solid points. I just wanted to clarify that the data does indicate that in many cases there are actually fairly distinct admixture profiles between a so called “upper caste” and “lower caste.”

Comments are closed.