Again, if you are interested: send me a 23andMe, Ancestry, MyHeritage, Family Tree DNA raw genotype file to contactgnxp -at- gmail.com.
In the subject please put:
- “South Asian Genotype Project”
- The state/province your family is from
- Ethnolinguistic group
- If applicable, caste
I decided to some poking around with some of the higher quality samples people have given me. 180,000 SNPs with almost no genotyping missing rate. I also removed “relatives.” That means that a lot of Muslim groups from Pakistan had individuals dropping out. In the PCA above you can see 4 Burushos left! Not too many Pathans either.
First, I decided to look at the Brahmin samples I had.
– Uttar Pradesh, Bihar, and the Gujarati Brahmin(s) I had are one cluster
– South Indian Brahmins (mostly Iyer) are another
To my surprise, the two Maharashtra Brahmins that I have are firmly in the South Indian cluster. The Bengali Brahmin is more like the North Indians. But there is a subtle skew toward the distant Bangladesh cluster. This individual seems less East Asian than even the typical Bengali Brahmin, but I think Bengali Brahmins can be modeled as North Indian Brahmin with non-Brahmin (and therefore East Asian) ancestry.
Next, I wanted to look at Gujaratis. The 1000 Genomes has a large number of this population…but there’s not a group identity label. Years ago Zack Ajmal of Harappa DNA concluded that a large and relatively related cluster in these data were “Patels.” Someone who is a Bohra Muslim of presumably Patel background sent me their data. They did not fall in the Patel cluster. Rather, they were in the “Gujurati_ANI_1” group, which is more like Pakistanis than other Gujuratis. In fact, the Gujurati Brahmin is not in this cluster. An individual who is Solanki seems to be more ASI-shifted, like the Patels and Gujurati_ANI_4.
Overall, Gujarat has a lot of population structure in a rather small state (yes, I can’t spell Gujarat as you can see in my population labels).
From Maharashtra, right to the south of Gujarat in western India, I have two Brahmins and one Kayastha. For non-South Asians, my understanding is that Kayasthas are literate non-Brahmin castes. In Bengal, they take the places of the Kshatriya in the caste hierarchy, and with Brahmins formed the traditional Hindu educated classes. I have seen Bengali Kayastha genotypes, and they look rather like other Bengalis (my mother’s father’s family is from a Kayastha family before their conversion to Islam judging from their customary surname).
There are Kayasthas in other parts of South Asia. I have a Kayastha sample from Maharashtra. Curiously on the PCA this individual is in the same position as the two Brahmins from the region, and South Indian Brahmins. I don’t know what this means.
Next some odds and ends from the northwest of the subcontinent. I have a few Jatts who are not related. This group from Punjab is quite ANI-shifted. Someone who claims to be a Rajput from Rajasthan is where they should be on account of geography. The Punjabi 1000 Genome group is quite diverse. I have a Ramgarhia individual who seems to be somewhere between Punjabi_ANI_1 and Punjabi_ANI_2. The Jatts are on the edge (ANI-shifted) of Punjabi_ANI_1.
I have two individuals who claim to be Kashmiri. A Butt and a Syed. I have no idea what that means. But both are Punjabi_ANI_2…but they look somewhat East Asian shifted. This is not surprising. Trans-Himalayan populations tend to be. The curious thing about Kashmiris is that they are culturally and geographically quite distinct from Indians to their south. But genetically they are not so different. In fact, they are “more South Asian” (ASI) than Jatt, and considerably more than Iranian speaking groups like Pathans.
Finally, there is a Marwari individual. This community is from Rajasthan, though they occupy a mercantile role across the subcontinent. Strangely (or not?) they are very close to the Patels. Much more ASI-enriched than the Rajput.
Shifting to South Indian samples, I plotted the Chamar with them, who I believe were collected from Uttar Pradesh in the north. These Dalits actually seem to cluster with a subset of the 1000 Genomes Tamil and Telugu samples I believe are Scheduled Caste (Dalit) as well. The Chamar are somewhat distinct. They are more ANI-shifted. But notice that the bulk of Tamils and Telugus are still more ANI-shifted than the Chamars are! This surprised me.
I have some Velama individuals, as well as a Reddy from Andhra Pradesh, and a Padmashali. All these individuals are in the main distribution of South Indians. I do have a Mudaliar Tamil sample, and this individual is placed among the Chamars. Though not really in the Tamil Scheduled Caste group.
Finally some odds & ends. The Nasrani samples from Kerala are between the South Indian Brahmins and middle caste South Indians. I suspect this is due to the origin of the Nasranis in the Nair community, who have mixed some with Brahmins. The Vania sample from Gujarat is clustered with South Indian Brahmins. The Dusadhs, an agricultural group from Uttar Pradesh and Bihar, that is depressed in some manner in relation to the dominant groups (Google says so), are not quite Chamars, but they are ASI-shifted.
Some of you will be asking about admixture. I ran K = 4 unsupervised on the data set. You can find it here.