It’s been a while since I updated the South Asian Genotype Project. Well, I updated almost everyone (‘projectmembers v2’ tab is one what you want). A few people had strangely formatted text files, so I’ll go add them tomorrow. Thanks to everyone who has submitted so far!
One of the main things that I’ve been curious about is undersampled groups. I finally got an Uttar Pradesh Kayastha in the data set (well, technically my second…but the first is a friend). I also got a submission of a Bengali Brahmin with origins in the west, and another in the east (in fact, from Comilla, which is where my own family is from). And, I got the submission of another West Bengali Kayastha.
Finally, I got another Maharashtra Kayastha.
If you click the image above you see some obvious things:
Bengali Brahmins don’t seem to be geographically structured. The eastern and western individuals are near each other on the PCA. Additionally, they are very close to Uttar Pradesh Pradesh Brahmins. Not the main Bangladesh cluster.
In contrast, the West Bengal Kayastha is positioned close to the Bangladeshis, though outside of that particular cluster.
In other words, to some extent Bengal’s landscape reflects both aspects of the South Asian genetic variation: it is strongly structured by caste, and, geography also plays a role. People from western Bengal have less East Asian ancestry and more affinity with peoples to the west on the Gangetic plain. But Bengali Brahmins are genetically entirely dissimilar from other Bengalis.
The dissimilar position of Kayastha groups across South Asia is in contrast to Brahmins. Though Brahmin groups in Bengal and South India seem to have mixed with local groups (they are always somewhat shifted to the regional substrate), overall their genetic character indicates shared common ancestry. In contrast, the different Kayastha groups seem much more likely to be a case of local populations who arose to fill a particular occupational niche that emerged with polities which required a bureaucratic class.
I’ve put another update on the South Asian Genotype Project. Make sure to go to “‘projectmembers v2” sheet. If you’ve contributed since March check it out.
Again, if you are interested: send me a 23andMe, Ancestry, MyHeritage, Family Tree DNA raw genotype file to contactgnxp -at- gmail.com.
In the subject please put:
“South Asian Genotype Project”
The state/province your family is from
Ethnolinguistic group
If applicable, caste
I decided to some poking around with some of the higher quality samples people have given me. 180,000 SNPs with almost no genotyping missing rate. I also removed “relatives.” That means that a lot of Muslim groups from Pakistan had individuals dropping out. In the PCA above you can see 4 Burushos left! Not too many Pathans either.
First, I decided to look at the Brahmin samples I had.
– Uttar Pradesh, Bihar, and the Gujarati Brahmin(s) I had are one cluster – South Indian Brahmins (mostly Iyer) are another
To my surprise, the two Maharashtra Brahmins that I have are firmly in the South Indian cluster. The Bengali Brahmin is more like the North Indians. But there is a subtle skew toward the distant Bangladesh cluster. This individual seems less East Asian than even the typical Bengali Brahmin, but I think Bengali Brahmins can be modeled as North Indian Brahmin with non-Brahmin (and therefore East Asian) ancestry.
Next, I wanted to look at Gujaratis. The 1000 Genomes has a large number of this population…but there’s not a group identity label. Years ago Zack Ajmal of Harappa DNA concluded that a large and relatively related cluster in these data were “Patels.” Someone who is a Bohra Muslim of presumably Patel background sent me their data. They did not fall in the Patel cluster. Rather, they were in the “Gujurati_ANI_1” group, which is more like Pakistanis than other Gujuratis. In fact, the Gujurati Brahmin is not in this cluster. An individual who is Solanki seems to be more ASI-shifted, like the Patels and Gujurati_ANI_4.
Overall, Gujarat has a lot of population structure in a rather small state (yes, I can’t spell Gujarat as you can see in my population labels).
From Maharashtra, right to the south of Gujarat in western India, I have two Brahmins and one Kayastha. For non-South Asians, my understanding is that Kayasthas are literate non-Brahmin castes. In Bengal, they take the places of the Kshatriya in the caste hierarchy, and with Brahmins formed the traditional Hindu educated classes. I have seen Bengali Kayastha genotypes, and they look rather like other Bengalis (my mother’s father’s family is from a Kayastha family before their conversion to Islam judging from their customary surname).
There are Kayasthas in other parts of South Asia. I have a Kayastha sample from Maharashtra. Curiously on the PCA this individual is in the same position as the two Brahmins from the region, and South Indian Brahmins. I don’t know what this means.
Next some odds and ends from the northwest of the subcontinent. I have a few Jatts who are not related. This group from Punjab is quite ANI-shifted. Someone who claims to be a Rajput from Rajasthan is where they should be on account of geography. The Punjabi 1000 Genome group is quite diverse. I have a Ramgarhia individual who seems to be somewhere between Punjabi_ANI_1 and Punjabi_ANI_2. The Jatts are on the edge (ANI-shifted) of Punjabi_ANI_1.
I have two individuals who claim to be Kashmiri. A Butt and a Syed. I have no idea what that means. But both are Punjabi_ANI_2…but they look somewhat East Asian shifted. This is not surprising. Trans-Himalayan populations tend to be. The curious thing about Kashmiris is that they are culturally and geographically quite distinct from Indians to their south. But genetically they are not so different. In fact, they are “more South Asian” (ASI) than Jatt, and considerably more than Iranian speaking groups like Pathans.
Finally, there is a Marwari individual. This community is from Rajasthan, though they occupy a mercantile role across the subcontinent. Strangely (or not?) they are very close to the Patels. Much more ASI-enriched than the Rajput.
Shifting to South Indian samples, I plotted the Chamar with them, who I believe were collected from Uttar Pradesh in the north. These Dalits actually seem to cluster with a subset of the 1000 Genomes Tamil and Telugu samples I believe are Scheduled Caste (Dalit) as well. The Chamar are somewhat distinct. They are more ANI-shifted. But notice that the bulk of Tamils and Telugus are still more ANI-shifted than the Chamars are! This surprised me.
I have some Velama individuals, as well as a Reddy from Andhra Pradesh, and a Padmashali. All these individuals are in the main distribution of South Indians. I do have a Mudaliar Tamil sample, and this individual is placed among the Chamars. Though not really in the Tamil Scheduled Caste group.
Finally some odds & ends. The Nasrani samples from Kerala are between the South Indian Brahmins and middle caste South Indians. I suspect this is due to the origin of the Nasranis in the Nair community, who have mixed some with Brahmins. The Vania sample from Gujarat is clustered with South Indian Brahmins. The Dusadhs, an agricultural group from Uttar Pradesh and Bihar, that is depressed in some manner in relation to the dominant groups (Google says so), are not quite Chamars, but they are ASI-shifted.
Some of you will be asking about admixture. I ran K = 4 unsupervised on the data set. You can find it here.
I’ve been working on the South Asian Genotype Project. Again, if you are interested: send me a 23andMe, Ancestry, or Family Tree DNA raw genotype file to contactgnxp -at- gmail.com.
In the subject please put:
“South Asian Genotype Project”
The state/province your family is from
Ethnolinguistic group
If applicable, caste
I changed the reference populations because the earlier ones were too complicated. You can see the population averages from public data sets for some groups. The results for project members are here. I re-ran everyone who has sent data in so far. I’ll leave commentary for later.
At this point, I think the easiest way to update project members is to create a mailing list. If you are have submitted genotypes, please join:
Just a quick update. I know I haven’t been responsive, but I’ve been traveling and spending time with the family and working a lot for the past few weeks. I’m going to make some revisions to my pipeline as well. I will get back to generating results soon (as in a week or so). So please keep sending data to contactgnxp@gmail.com.
It’s been a few years since I’ve done any serious “Genome Blogging.” Mostly I’ve been very busy and there isn’t much low-hanging fruit left as it is. But today I want to announce that I’ll be running the generically titled “South Asian Genotype Project.”
The way it works is simple: send me a 23andMe, Ancestry, or Family Tree DNA raw genotype file to contactgnxp -at- gmail.com (though 23andMe’s new chip has far less overlap with other platforms earlier, so probably best if you were typed before August 2017).
In the subject please put:
“South Asian Genotype Project”
The state/province your family is from
Ethnolinguistic group
If applicable, caste
In the body of the email you can put Y and mtDNA and any other information you want. Obviously your data is confidential and I won’t identify you by name, just ethnolinguistic group and such.
Since the last time I did this I have some scripts that make this a lot of easier, so hopefully I’ll be adding individuals to this spreadsheet every few days. I’ll give project members an ID and try to email them when the results are up.
The main motivator for this project on my part is that people still ask me questions about Sinhalese, Nasrani Christians, and other assorted groups which we don’t have answers to because current research projects haven’t focused on them.
Since Zack worked on the Harappa Ancestry Project we know a lot more about South Asian ancestry. Basically, there is an ANI-ASI cline, and some South Asians have exogenous ancestry off this cline. Indian Jews have Middle Eastern ancestry, while Bengalis have East Asian ancestry, and some groups in Pakistan have African ancestry. With that in mind I’ll be testing a smaller number of populations. The marker set is 240,000 SNPs by the way.
Below are some representative results. You can see that my results from three DTC services are basically the same. Also, some South Indian groups (see Pulliyar) show “Dai” ancestry, when I’m pretty sure it’s just that I didn’t sample as much on the extreme portion of the ASI-cline.