South Asian Genotype Project, Summer 2018 Update

I’ve put another update on the South Asian Genotype Project. Make sure to go to “‘projectmembers v2” sheet. If you’ve contributed since March check it out.

Again, if you are interested: send me a 23andMe, Ancestry, MyHeritage, Family Tree DNA raw genotype file to contactgnxp -at-

In the subject please put:

  1. “South Asian Genotype Project”
  2. The state/province your family is from
  3. Ethnolinguistic group
  4. If applicable, caste

I decided to some poking around with some of the higher quality samples people have given me. 180,000 SNPs with almost no genotyping missing rate. I also removed “relatives.” That means that a lot of Muslim groups from Pakistan had individuals dropping out. In the PCA above you can see 4 Burushos left! Not too many Pathans either.

Click to enlarge!

First, I decided to look at the Brahmin samples I had.

– Uttar Pradesh, Bihar, and the Gujarati Brahmin(s) I had are one cluster
– South Indian Brahmins (mostly Iyer) are another

To my surprise, the two Maharashtra Brahmins that I have are firmly in the South Indian cluster. The Bengali Brahmin is more like the North Indians. But there is a subtle skew toward the distant Bangladesh cluster. This individual seems less East Asian than even the typical Bengali Brahmin, but I think Bengali Brahmins can be modeled as North Indian Brahmin with non-Brahmin (and therefore East Asian) ancestry.

Click to enlarge!

Next, I wanted to look at Gujaratis. The 1000 Genomes has a large number of this population…but there’s not a group identity label. Years ago Zack Ajmal of Harappa DNA concluded that a large and relatively related cluster in these data were “Patels.” Someone who is a Bohra Muslim of presumably Patel background sent me their data. They did not fall in the Patel cluster. Rather, they were in the “Gujurati_ANI_1” group, which is more like Pakistanis than other Gujuratis. In fact, the Gujurati Brahmin is not in this cluster. An individual who is Solanki seems to be more ASI-shifted, like the Patels and Gujurati_ANI_4.

Overall, Gujarat has a lot of population structure in a rather small state (yes, I can’t spell Gujarat as you can see in my population labels).

Click to enlarge!

From Maharashtra, right to the south of Gujarat in western India, I have two Brahmins and one Kayastha. For non-South Asians, my understanding is that Kayasthas are literate non-Brahmin castes. In Bengal, they take the places of the Kshatriya in the caste hierarchy, and with Brahmins formed the traditional Hindu educated classes. I have seen Bengali Kayastha genotypes, and they look rather like other Bengalis (my mother’s father’s family is from a Kayastha family before their conversion to Islam judging from their customary surname).

There are Kayasthas in other parts of South Asia. I have a Kayastha sample from Maharashtra. Curiously on the PCA this individual is in the same position as the two Brahmins from the region, and South Indian Brahmins. I don’t know what this means.

Click to enlarge!

Next some odds and ends from the northwest of the subcontinent. I have a few Jatts who are not related. This group from Punjab is quite ANI-shifted. Someone who claims to be a Rajput from Rajasthan is where they should be on account of geography. The Punjabi 1000 Genome group is quite diverse. I have a Ramgarhia individual who seems to be somewhere between Punjabi_ANI_1 and Punjabi_ANI_2. The Jatts are on the edge (ANI-shifted) of Punjabi_ANI_1.

I have two individuals who claim to be Kashmiri. A Butt and a Syed. I have no idea what that means. But both are Punjabi_ANI_2…but they look somewhat East Asian shifted. This is not surprising. Trans-Himalayan populations tend to be. The curious thing about Kashmiris is that they are culturally and geographically quite distinct from Indians to their south. But genetically they are not so different. In fact, they are “more South Asian” (ASI) than Jatt, and considerably more than Iranian speaking groups like Pathans.

Finally, there is a Marwari individual. This community is from Rajasthan, though they occupy a mercantile role across the subcontinent. Strangely (or not?) they are very close to the Patels. Much more ASI-enriched than the Rajput.

Click to enlarge!

Shifting to South Indian samples, I plotted the Chamar with them, who I believe were collected from Uttar Pradesh in the north. These Dalits actually seem to cluster with a subset of the 1000 Genomes Tamil and Telugu samples I believe are Scheduled Caste (Dalit) as well. The Chamar are somewhat distinct. They are more ANI-shifted. But notice that the bulk of Tamils and Telugus are still more ANI-shifted than the Chamars are! This surprised me.

I have some Velama individuals, as well as a Reddy from Andhra Pradesh, and a Padmashali. All these individuals are in the main distribution of South Indians. I do have a Mudaliar Tamil sample, and this individual is placed among the Chamars. Though not really in the Tamil Scheduled Caste group.

Click to enlarge!

Finally some odds & ends. The Nasrani samples from Kerala are between the South Indian Brahmins and middle caste South Indians. I suspect this is due to the origin of the Nasranis in the Nair community, who have mixed some with Brahmins. The Vania sample from Gujarat is clustered with South Indian Brahmins. The Dusadhs, an agricultural group from Uttar Pradesh and Bihar, that is depressed in some manner in relation to the dominant groups (Google says so), are not quite Chamars, but they are ASI-shifted.

Some of you will be asking about admixture. I ran K = 4 unsupervised on the data set. You can find it here.

11 thoughts on “South Asian Genotype Project, Summer 2018 Update

  1. I think the case of Jats (and similar groups like Rors) of Haryana & UP is very interesting. Haryana & West UP is the absolute ground zero in terms of locating the Rigvedic homeland.

    Recently we have heard of some news reports on the Rakhigarhi aDNA.

    These reports suggest that the modern inhabitants of that region, i.e. Haryana are very closely related to the Rakhigarhi samples.

    The modern inhabitants of Haryana are groups like Jats and Rors. This is also the Vedic homeland.

    And interestingly enough, Haryana Jats show the maximum steppe related ancestry in South Asians, higher than the Brahmins and even the Pashtuns. Rors, who are quite similar to Jats, have the highest percentage of Lactase Persistence among Indians (if not among South Asians).

    Incidentally, the LP gene has a direct correlation to the milk comsumption in South Asia. As per the below report,

    Haryana has the highest per capita milk consumption in India and they also apparently have the highest percentage of LP. But what is also interesting is that milk consumption pattern (and therefore perhaps LP) in India is highest across the Northwest and it actually mimics the geography of the Indus civilization.

    These things suggest that the Northwestern groups of South Asia, particularly the Jats, Rors and related groups need to be intensively studied as in their genes may lie the secret of Indo-European & Harappan history in India.

    Maybe, if the upcoming Rakhigarhi study, if done properly, would go a long way to solve this riddle.

  2. I’m curious about the Nasrani placement. Don’t disagree with likely Nair origins but Nair/Brahmin mix is fairly recent in time and not sure it preceeds Nasrani conversions. Also unsure about the magnitude of Nair/Brahmin mixture.

    I think there was something more to the story on western side of S. India that might show up in Nasrani’s and Nairs. Curious to see of any data from other groups like Tulus (who share some mythologic origins w Nairs), Kodavas, Kotas, Todas.

  3. “A Butt and a Syed. I have no idea what that means. ”

    Butt, at least, were Pandits who converted. I believe it’s the same name as Bhat, but we picked a … very unfortunate spelling. A good deal of Butts living Punjab aren’t ‘real’ Butts. It’s a very stereotypical Kashmiri name, so many people without surnames took it as a surname (as my mother’s family did) even without Pandit ancestry.

    The Kashmiri community in Punjab has been historically endogamous, but you really don’t need a high-rate of out marriage to assimilate a population since the endogamy will still result in the out-group DNA being spread all around. So I don’t know how representative I (the Kashmiri Butt in the data) am of a guy still living in Srinagr.

  4. Maharashtrian Brahmins are divided. Chitpavans are closer to NI Brahmins, but the Deshashta belong to the Pancha Dravida grouping.

  5. “Are Dravidian middle castes Caucasoid or Australoid?”

    @Rahim I don’t think the anthropological terms ‘Caucasoid or Australoid’ have any direct connection with genetic.Dravidian middle-caste or the Tamil sample in this project is a hybrid between West Asian agriculturalist and AASI. As for physical appearance, only a few genes are responsible as far as I know, and generally peoples appearance vary in every region in south Asia. More south you go more melanin people gets, maybe even other facial parts are affected by environment, most Dravidians have a distinctive look. There are distinctions even in the same region, for example Telegus or Kannadigas are a bit lighter than Tamils, similarly Srilankan-Tamils are a bit darker than Sinhalese, maybe because Sinhalese have some East Indian origin.

  6. “They are more ANI-shifted. But notice that the bulk of Tamils and Telugus are still more ANI-shifted than the Chamars are! This surprised me.”

    It would be interesting, which relates to the debate about IVC and Dravidians in another comment section, to check whether their West Eurasian component is more Iranian farmer or steppe like in comparison to Indoaryan speaking groups of the North, including low caste and dalit samples.

  7. “Dravidian middle-caste or the Tamil sample in this project is a hybrid between West Asian agriculturalist and AASI”

    I think you missed R1a1.
    Hello @kevin I want to know which haplogroup corresponds to West Asian agriculturist.

    I suppose J2-M172,L-M20 and R2. I don’t know about H1, whether it is related to AASI or agriculturist since Onge tribe(Pure AASI) only have hg D not hg H1.

    Assuming H1 is AASI,So average Dravidian/Tamil middle caste have,
    47%> = West Asian Agriculturist.
    16% =Steppe.

    I think as per many anthropologist, Dravidian middle caste are belonging to ‘Mediterranean’ Caucasoid (Sjoberb)(Malhotra et al.) based on facial feature,morphology etc.
    Skin colour may be attributed to presence of Equator in region of Tamil Nadu and Sri Lanka.

  8. “I suppose J2-M172,L-M20 and R2. I don’t know about H1, whether it is related to AASI or agriculturist since Onge tribe(Pure AASI) only have hg D not hg H1.

    @Rahim Im also not sure about H1, if that came with West Asian farmers or not, There is a sub-branch “H2” present in Europe and also in West Asia.H1 is absent in Paniya tribes surprisingly,so I assume H1 has west Asian origin or it was born somewhere in Indo-Gangetic plains.

    “47%> = West Asian Agriculturist.
    16% =Steppe.
    ^That’s more appropriate for mid-caste East Indians or Bengalis IMO if you make room for ~10% east Asian.
    Steppe component is probably very low in mid-caste Dravidians, they didn’t score any Lithuanian in this project.

    “I think as per many anthropologist, Dravidian middle caste are belonging to ‘Mediterranean’ Caucasoid (Sjoberb)(Malhotra et al.) based on facial feature,morphology etc.
    Skin colour may be attributed to presence of Equator in region of Tamil Nadu and Sri Lanka.”
    Yeah, Mediterranean morphology is present(maybe not in pure form) allover from British Isles to South Asia, but that does not necessarily reflects on genetical scores although its very much possible that west Asian farmers brought those traits.

    @Razib In “Projectmembers v3” add “Chamar” component also a few tribal components, Paniya or Munda would be good.Mid-caste south Indians have more ANI than chamars after all.

  9. @kevin Yes, on average Dravidian middle caste has only 11% of R1a1(WS Watkins et al). Lower than lower caste(SC/ST).However, it differ from caste to caste within Middle caste. For eg, Tamil Yadhava[cow herders] has around 15.8% and Kallar[Agriculturist] has very few R1a1(2-3%) and J2-m172. Kallar posses L-M20 around 48% and Tam. Yadhava around 20.5%. Similarly Tam. Vellalar[agr.] has J2-M172 around 38.7%. That much diverse.

    So West Eurassian origin of Dravidians[not sure about languange] seems possible.Nevertheless,there is also Sub-Saharan hypothesis [] which seems to be impossible with given data.
    We don’t know where the Dravidian language originated and what was the language of Harrapa(para-munda?).
    It is entirely plausible to assert that proto-Dravidian originated within Indian sub-continent among AASI.

  10. Hi Jaydeepsinh,

    You are right in saying that Rors have the highest incidence of lactase persistence allele in the subcontinent. Unpublished research also shows the steppes component to peak in this group, higher than most Jats except such Jats who only recently broke from Rors and assimilated into the larger Jat identity.

    I have a hypothesis that Iranian Lors/Lurs and Indian Rors are related as both are known as warriors in history and a group of Lors in Iran have been known to fight alongside their female counterparts, which is also true of Ror accounts around the last battle of Tarain where Prithviraj Chauhan and the 52 fort Rors lost to the Turkic forces under Ghori. Interestingly, Lors are high on R1b whereas Indian Rors have R1a and L1c at almost similar levels with R1a slightly more prevalent. This Lor+Ror complex would have both R1a and R1b, much like the Yamnaya complex. There is thus an argument to be made that Yamnaya is but a forward post of these Indo Iranian pastoral warrior complex people of Indic Ror and Lor/Lur.

    Cheers Anurag

Comments are closed.