Substack cometh, and lo it is good. (Pricing)

Punjabi genetic variation in 1000 Genomes: Hindu caste in the Land of the Pure?


In the 1000 Genomes, there is a Punjabi dataset. Here is the description:

These cell lines and DNA samples were prepared from blood samples collected in Lahore, Pakistan. The samples are from a mix of parent- adult child trios and unrelated individuals who identified themselves and their parents as Punjabi.

A few years ago I did an analysis of the population structure in the 1000 Genomes dataset. In the Chinese data, there seemed to be some curious structure (there were two clusters of South Chinese). But the biggest issues predictably were in the South Asians. To give concrete examples, there were a few Brahmins in the Telugu data. A subset of Tamils and Telugus were highly ASI shifted. The Gujurati were highly heterogeneous, and one subcluster were almost certainly Patels (the samples were collected in Houston). The ASI shifted groups were almost certainly Scheduled Castes (Dalits) because I could see that they clustered with those samples from Estonian Biocentre dataset.

There was something curious about the samples from Pakistan and Bangladesh. Aside from a small number of individuals, whose samples were collected at the same time judging by their IDs (these individuals cluster with Scheduled Castes), the Bangladeshi sample didn’t have much South Asian style structure. That is, there wasn’t a cline or lots substructure within the ethnicity.

As noted by some commenters, the Punjabi samples were very different. Like the Gujurati samples, there was a huge variance along the ANI-ASI cline. To me, this was somewhat surprising. To make the 1000 Genomes more useful I used PCA and divided both Gujuratis and Punjabis into groups based on their position on the ANI-ASI cline. So that ANI_1 is the subpopulation with the most ANI and ANI_4 the least.

Using Treemix produced some weird results. As you can see above Punjabi_ANI_1 looks like an Iranian population with gene flow from Punjabi_ANI_3. Punjabi ANI_2 looks like a North Indian population with Iranian gene flow (so it is more ASI). Punjabi_ANI_3 are less ANI shifted than Uttar Pradesh Brahmins, but more than Uttar Pradesh Kshatriya. Finally, Punjabi_ANI_4 actually is very similar to Punjabi_ANI_2, except it has gene flow from a Dalit-like population.

With the South Asian Genotype Project I have a few Punjabi samples. All of them are within Punjabi_ANI_1.

I don’t know what’s going on here. Is this really caste-like structure in Punjab? Or are we see lots of admixture of people who are called “Punjabi” today? For example, the gene flow edges suggest lots of mixing between quite South Asian types of groups and an Iranian sort. Perhaps this is the absorption of Pathans into South Asian groups? Could it be Muhajir people who mixed with local Punjabis and identified as such?

I was curious to see if I could find something similar in relation to the three Jatts. As you can see with Treemix, no. Jatts are just very ANI-shifted. I added Lithuanians and Georgians, and you can see that Uttar Pradesh Brahmins get gene flow from a Lithuanian shifted group, while South Indian Brahmins have a more Georgian gene flow. This is just an artifact I suspect of the fact that South Indian Brahmins have a lot of admixture from non-Brahmin South Indians, who are more Georgian than Lithuanian (Iran_N as opposed to Yamnaya).

Finally, going back to the Bengali (Bangladeshi) vs. Punjabi contrast, it is really interesting. If Punjab has such deep caste-like structures it really goes to show how within South Asia caste is a very very powerful institution, and ~1,000 years of Muslim rule and in western Punjab a majority Muslim population did not break down the institution. In contrast, in Bangladesh, there doesn’t seem to be much caste structure. I am routinely the most East Asian shifted Bengali in datasets, but my family is also from the eastern edge of eastern Bengal. Why the difference?

in The Rise of Islam and the Bengal Frontier the author posits that the Islamicization of eastern Bengal was to a great extent the function of the opening up of lands for cultivation under the supervision of Muslim elites under the rule of Afghans and later Mughals. This would explain the lack of caste structure because presumably, caste structure would be difficult to maintain in a frontier landscape, where the cultural elite does not promote or accept caste (though the elite West Asian Muslims were racially exclusive, they were also a very small minority).

In contrast, the Punjab has long been settled by Indo-Aryan peoples, and despite its long history of Islam, it was not recently a frontier society.

Anyway, that’s all I got to say for that. I’m sure readers will have more insight on this pattern than I do….

12 thoughts on “Punjabi genetic variation in 1000 Genomes: Hindu caste in the Land of the Pure?

  1. Lol! You actually went to the trouble of writing an entire blog post to answer my question? Why thanks, mate. I appreciate it! This was a great read for me. Can you add me to the project and run my data? What’s your email? Thanks bro!

  2. Back in 2009 a visiting Sikh Guru was killed at a Sikh temple in Austria by a couple of other Sikhs, as the Guru was giving a sermon. It was caste related violence.

    Shortly after the killings, riots erupted in Punjab, with Dalits and low caste Sikhs protesting the killing of their leader. What I noticed was that a very large (majority) of the protesters looked more like South Indians and tribal people than like the Sikhs who brawled at the Golden Temple in 2014. I noticed many of the police looked different from the protesters, looking more like Pathans.

    I wonder if the reason for the huge differences in Punjabi populations mentioned here is because of caste differences that have not been identified in the 1000 genome samples?

    Here are some articles on the above mentioned incidents.

    http://blogs.reuters.com/india/2009/05/26/is-caste-behind-the-killing-in-vienna-and-riots-in-punjab/

    http://www.sandiegouniontribune.com/sdut-eu-austria-temple-shooting-052509-2009may25-story.html
    http://content.time.com/time/world/article/0,8599,1900882,00.html

  3. @Nathan
    Interesting, Those lower caste Punjabi sikhs in your links look very dalit. I didnt know there exist people with such traits in the NW.

    @Xehanort
    Now i understand why you said lower caste Punjabis look like Tamils.Though not all Tamils look like that.

    @Razib
    Thanks for the article. It’s really fascinating that almost all south Asian groups have deep caste structures, including Muslim Punjabis, but Bangladeshis didn’t had any caste structure. Initially I thought there would be some west Asian mix on them, I know few Bangladeshis who looked very similar to Pathans, but you already said they don’t have any west Asian ancestry and also they don’t have any caste-like structure. They scored little to none Iranian, but very high Tamil, are they formed by any lower caste groups? Though they also have some Lithuanian and East Asian.

  4. Thanks for the article. It’s really fascinating that almost all south Asian groups have deep caste structures, including Muslim Punjabis, but Bangladeshis didn’t had any caste structure. Initially I thought there would be some west Asian mix on them, I know few Bangladeshis who looked very similar to Pathans, but you already said they don’t have any west Asian ancestry and also they don’t have any caste-like structure. They scored little to none Iranian, but very high Tamil, are they formed by any lower caste groups? Though they also have some Lithuanian and East Asian.

    re: bengalis. the methods bake-into-the-cake the west asian and indo-aryan. there is some there.

    1) to a first approx east bengalis look like a dominant non-brahmin non-dalit south indian base (think reddy) + 10-20% east asian + 5-10% “indo-aryan”? i’m confident about the east asian part, because it seems to be distinct and have come in a pulse admixture.

    2) in the 1K sample there is someone who is probably half bengali brahmin. i removed them. second, there are 4 or 5 individuals who cluster with dalits. i don’t know if they are bengali dalits, or descendants of migrants from elsewhere, or what. but

    a) they aren’t east asian shifted
    b) there isn’t a cline, these individuals cluster together well away from other bangladesh samples, and they were collected together

    one hypothesis that i think is likely is that they are hindu, because i don’t see how they would be genetically so distinct with no intermarriage.

    the bangladesh cluster looks like a european one, except for the scheduled castes, in topology. the orb-like scatter = not that much internal structure. if you remove dalits and brahmins from south indians they aren’t too different, aside from higher inbreeding/endogamy rates, but those two groups are like 30% of the population, so there’s a lot of structure.

  5. @Razib
    Thanks for the explaination. 70-80% midcaste S.Indian/Reddy + 5-10% Indo Aryan would shift them somewhere in the north-central India(geographically they are in the Northern part of the subcontinent), also most of high caste Brahmin or NW Indians dont score more than 12% Lithuanian. I still dont understand how some Bangladeshis look like Pathans, maybe due to Iran_Neolithic present in tamil component and ASI % lower than S.Indians? I guess The Iranian component in your calculator is the same Iranian component captured by Tamil component.
    I agree, The Bangladeshi_SC could be Bengali Dalit hindu, The Bengali brahmin sample in your project seems slightly different to Bangladeshis, it also got very high tamil, the difference is due to Iranian-east Asian %, but the Bengali Kayastha sample look almost similar as it has similar east Asian( i’ve noticed even Bengali Vaidya samples in Harappadna project have similar east Asian % as east Bengalis, are they from West Bengal?).

  6. kev, looks like there is a east asian admix cline for non-brahmins. west bengal ppl have less than east bengal. i have more than most east bengal, and chittagong ppl seem to have the most.

    don’t take the % estimates too literally. i think UP brahmins are much more than 12% indo-aryan for sure. lithuanians imperfect proxy (more EEF probably?).

    it looks to me that as in south india bengal had three groups

    brahmins
    non-brahmins & non-dalits/tribals
    dalits/tribals

    kayastha seem to be like other bengalis, not brahmins. (my maternal grandfather’s family was originally kayastha fwiw)

  7. There are lot of hints at how Brahmins came to be in connection with Indo-Aryan expansion. Is there much well developed theory on how, when and where Dalits came to be a distinct population in South Asia?

  8. Looking at the Fst trees and the treemix, it looks like all the Punjabi_ANI subgroups are pretty flat to the tree, other than ANI_4, so looks like little drift unique to any of the other 3.

    The Gujarati_Patel group also seemed to show the same kind of drift effect (relatively long, unique branch) as ANI_4, absent in ANI_1-3. Not much of anything like that in Tamil or Telugu (for what I presume are other fairly large / general populations?).

    If they are Caste-like groups, then I guess they are also quite large and general subpopulations? (Rather than small, specific endogamous “jatis”? I may not be using the right terms here).

    (As well, could ask if they show much RoH / IBD?)

  9. (As well, could ask if they show much RoH / IBD?)

    they are inbred as you’d expect from a pop with lots of cousin marriage. though not sure about IBD within these clusters. i could run finestructure on these data i guess….

    There are lot of hints at how Brahmins came to be in connection with Indo-Aryan expansion. Is there much well developed theory on how, when and where Dalits came to be a distinct population in South Asia?

    no theory. i’m frankly surprised that chamars from UP are so like dalits from south.

  10. Ah, like in terms of the patterns from (“Runs of homozygosity” – Ceballos), Class 1 populations, large population consanguinous patterns then (fairly low NROH and SROH, relatively high ratio of SROH to NROH, high SD on SROH)? Not much drift from restricted population size over time.

  11. Fascinating discussion on Bengalis.

    I expected the East Asian cline to be more pronounced in the peripheries of Bangladesh ie Sylhet as well as Chittagong, Rangpur etc, but I’ve noticed that even samples from around Dhaka are fairly similar in the amount of East Asian, including the BEB samples. Is this what you’re referring to as the pulse admixture? Do you have many West Bengali samples to compare with?

    I’ve always assumed the founding population of Bengali Muslims was a combination of Hindu (or Buddhist at the time) castes including Brahmins, Kayasthas, mid castes as well as low castes. Certainly there is a good mix of R1a1a as well as H and some austroasiatic markers. And as you say, without maintaining a rigid caste or even class system as elsewhere in the subcontinent, the population has become more homogenised.

    My one gripe has been that most academic plots and analyses involving Bengalis only seem to use the BEB samples as the reference point. With the extra Bengali samples you’ve received, how do they compare on a PCA with the BEB samples?

    There appears to be more variance on the admixture results that you’ve run from external results, and whilst I appreciate there does not appear to be much structure given the orb-like scatter on PCA plotting, but are these BEB truly representative of a population like Bangladesh?

    Are we just not capturing the (limited) variance due to sample bias?

  12. Is there a way to check how much “chopped up” the Iran_N and Yamnaya in North vs South Brahmins? Do any of the tools look at this metric?
    Don’t know if I am making sense… even if two genomes both have 25% total affinity to Iran_N, if one of them is in contiguous big blocks while the other in small bits all over different chromosomes. I am assuming the time scale of admixture in the latter is much more ancient.
    I am assuming the deeper the admixture is in time, the more chopped up it will be.
    My guess is that the admixture in North Indian brahmins from a steppe like population is more recent than in the south.

Comments are closed.