Indic civilization came to Southeast Asia because Indian people came to Southeast Asia. Lots of them

Reading Indonesia: Peoples and Histories. I selected it because unlike many books it wasn’t incredibly skewed to the early modern and postcolonial period. The author makes the interesting point that the Islamicization of western Indonesia and the rise of the great Javanese Hindu kingdom of Majapahit occurred around the same time. This, in contrast to the skein of Indic civilization which had been layered over maritime Southeast Asia for hundreds of years before the medieval period, starting around 500 AD with polities such as that of Kalingga.

As is usual in these sorts of books, it is emphasized that Indian civilization spread through cultural diffusion (in contrast to the fact that though Chinese trade was evident and present early on, the cultural impact was minimal). Any migrations are dismissed as legends, with the possible exception of a few elite religious functionaries.

I now believe this is wrong. I’ve discussed this extensively in the past, but the Singapore Genome Variation Project (SGVP) data set along with more Southeast Asians allows me to illustrate rather clearly the issues. The short of it is that it is highly likely that substantial South Asian ancestry exists within Southeast Asia, and that that ancestry is not just a function of colonial contact (e.g., as certainly occurred in Malaysia).

Read More

The main interesting thing about Bangladeshi genetics is how East Asian Bangladeshis are

Click to enlarge

 

I got a question about endogamy and Bangladeshis on of my other weblogs, as well as their relatedness to western (e.g., Iranian) and eastern (e.g., Southeast Asian) populations. Instead of talking, what do the data say? Most of you have probably seen me write about this before, but I think it might be useful to post again for Google (or Quora, as Quora seems to like my blog posts as references).

The 1000 Genomes project collected samples a whole lot of Bangladeshis in Dhaka. The figure at the top shows that the Bangladeshis overwhelmingly form a relatively tight cluster that is strongly shifted toward East Asians. There is one exception: about five individuals, several of which were collected right after each other (their sample IDs are sequential) who show almost no East Asian shift.

Read More

Punjabi genetic variation in 1000 Genomes: Hindu caste in the Land of the Pure?


In the 1000 Genomes, there is a Punjabi dataset. Here is the description:

These cell lines and DNA samples were prepared from blood samples collected in Lahore, Pakistan. The samples are from a mix of parent- adult child trios and unrelated individuals who identified themselves and their parents as Punjabi.

A few years ago I did an analysis of the population structure in the 1000 Genomes dataset. In the Chinese data, there seemed to be some curious structure (there were two clusters of South Chinese). But the biggest issues predictably were in the South Asians. To give concrete examples, there were a few Brahmins in the Telugu data. A subset of Tamils and Telugus were highly ASI shifted. The Gujurati were highly heterogeneous, and one subcluster were almost certainly Patels (the samples were collected in Houston). The ASI shifted groups were almost certainly Scheduled Castes (Dalits) because I could see that they clustered with those samples from Estonian Biocentre dataset.

There was something curious about the samples from Pakistan and Bangladesh. Aside from a small number of individuals, whose samples were collected at the same time judging by their IDs (these individuals cluster with Scheduled Castes), the Bangladeshi sample didn’t have much South Asian style structure. That is, there wasn’t a cline or lots substructure within the ethnicity.

As noted by some commenters, the Punjabi samples were very different. Like the Gujurati samples, there was a huge variance along the ANI-ASI cline. To me, this was somewhat surprising. To make the 1000 Genomes more useful I used PCA and divided both Gujuratis and Punjabis into groups based on their position on the ANI-ASI cline. So that ANI_1 is the subpopulation with the most ANI and ANI_4 the least.

Using Treemix produced some weird results. As you can see above Punjabi_ANI_1 looks like an Iranian population with gene flow from Punjabi_ANI_3. Punjabi ANI_2 looks like a North Indian population with Iranian gene flow (so it is more ASI). Punjabi_ANI_3 are less ANI shifted than Uttar Pradesh Brahmins, but more than Uttar Pradesh Kshatriya. Finally, Punjabi_ANI_4 actually is very similar to Punjabi_ANI_2, except it has gene flow from a Dalit-like population.

With the South Asian Genotype Project I have a few Punjabi samples. All of them are within Punjabi_ANI_1.

I don’t know what’s going on here. Is this really caste-like structure in Punjab? Or are we see lots of admixture of people who are called “Punjabi” today? For example, the gene flow edges suggest lots of mixing between quite South Asian types of groups and an Iranian sort. Perhaps this is the absorption of Pathans into South Asian groups? Could it be Muhajir people who mixed with local Punjabis and identified as such?

I was curious to see if I could find something similar in relation to the three Jatts. As you can see with Treemix, no. Jatts are just very ANI-shifted. I added Lithuanians and Georgians, and you can see that Uttar Pradesh Brahmins get gene flow from a Lithuanian shifted group, while South Indian Brahmins have a more Georgian gene flow. This is just an artifact I suspect of the fact that South Indian Brahmins have a lot of admixture from non-Brahmin South Indians, who are more Georgian than Lithuanian (Iran_N as opposed to Yamnaya).

Finally, going back to the Bengali (Bangladeshi) vs. Punjabi contrast, it is really interesting. If Punjab has such deep caste-like structures it really goes to show how within South Asia caste is a very very powerful institution, and ~1,000 years of Muslim rule and in western Punjab a majority Muslim population did not break down the institution. In contrast, in Bangladesh, there doesn’t seem to be much caste structure. I am routinely the most East Asian shifted Bengali in datasets, but my family is also from the eastern edge of eastern Bengal. Why the difference?

in The Rise of Islam and the Bengal Frontier the author posits that the Islamicization of eastern Bengal was to a great extent the function of the opening up of lands for cultivation under the supervision of Muslim elites under the rule of Afghans and later Mughals. This would explain the lack of caste structure because presumably, caste structure would be difficult to maintain in a frontier landscape, where the cultural elite does not promote or accept caste (though the elite West Asian Muslims were racially exclusive, they were also a very small minority).

In contrast, the Punjab has long been settled by Indo-Aryan peoples, and despite its long history of Islam, it was not recently a frontier society.

Anyway, that’s all I got to say for that. I’m sure readers will have more insight on this pattern than I do….

South Asian Genotype Project, update


I’ve been working on the South Asian Genotype Project. Again, if you are interested: send me a 23andMe, Ancestry, or Family Tree DNA raw genotype file to contactgnxp -at- gmail.com.

In the subject please put:

  1. “South Asian Genotype Project”
  2. The state/province your family is from
  3. Ethnolinguistic group
  4. If applicable, caste

I changed the reference populations because the earlier ones were too complicated. You can see the population averages from public data sets for some groups.  The results for project members are here. I re-ran everyone who has sent data in so far. I’ll leave commentary for later.

At this point, I think the easiest way to update project members is to create a mailing list. If you are have submitted genotypes, please join:

Subscribe to the South Asian Genotype Project

* indicates required




South Asian Genotype Project update

Just a quick update. I know I haven’t been responsive, but I’ve been traveling and spending time with the family and working a lot for the past few weeks. I’m going to make some revisions to my pipeline as well. I will get back to generating results soon (as in a week or so). So please keep sending data to contactgnxp@gmail.com.