Indra is absolved: The “caste system” predates the Indo-Aryans


In the near future the ancient DNA group led by David Reich will publish a bunch of stuff, and one paper will note that the variation in steppe ancestry in the Bronze Age Greeks did not have class implications. In other words, the ancient Greeks did not have a caste system as you might find in India, despite the way the Spartans treated the helots or Messinians. Of course, the Indo-Europeans did have a ‘tripartite’ caste system of rulers and warriors, priests and commoners. In the Indian varna system, this was translated into Kshatriyas, Brahmins and Vaishyas. But it is found elsewhere, including among the German Saxons. But to my knowledge nothing like the Indian caste system has been found genetically in these ancient populations; some individuals have more “farmer” ancestry in initial generations, but this is all smoothed away by admixture.

India is different. It has jati-varna, with varna being caste as you would understand, but jati being one of thousands of endogamous “communities” in the subcontinent. At first, like many, I assumed this had something to do with the Indo-Aryan migration into the subcontinent, but I noticed a few things early on. First, there are apparent genetic differences between groups in Non-Aryan South India that are not Brahmin. For example, between Nadars and Dalits. These differences correlated to global biographic fraction differences. Dalits have more AASI (Ancient Ancestral South Indian). Mind you, I believe South Indian “Dravidians” were strongly shaped by the Indo-Aryans, so that’s not dispositive. Nevertheless, Non-Aryan cultural regions have this institution, and, jati is peculiarly India, even if varna is not.

But, what has shifted my view is looking at admixture variation in “Indus Valley Periphery” samples in samples dated from 3000-2000 BC. There’s a wide range of AASI. Why? Well, admixture in structured populations takes time. But there’s something suspicious to me about this variation combined with India’s later endogamy and the mystery of how the Indus Valley Civilization organized itself. There seems to have been very little stratification in a way we would understand it from Egypt or Mesopatamia. Were they anarchists? I doubt it. The early emergence of jati may explain the IVC sociopolitical system. The Indo-Aryans, when they arrived, were simply integrated into the framework.

Ancient DNA will prove me right or wrong. But I’m putting my cards on the table.

Men’s language and women’s language

Indian English is a Prakrit, not a creole, says linguist Peggy Mohan. This part jumped out at me:

You mention that women were likely barred from writing Vedic poetry. But weren’t there some Vedic poetesses?

[Philologist] Michael Witzel thinks no. It’s possible, but we do know that women were not allowed to speak Sanskrit after some time. Witzel says they were men speaking as women. It fits the pattern of women being different, speaking an earlier language.

In Vedic ceremonies, their role is extremely limited. It’s almost like the women were a separate ethnic group. And they had to be tolerated because there would be no children without them, but they were the people who, on another day, you were fighting on the battlefield.

The latest work I’ve seen shows even in northern Pakistani groups steppe-derived mtDNA is present at around 10% of the total. You can look at the raw results for the Sintashta in Narasimhan et al., and the discontinuity in the Y vs. mtDNA is striking.

One thing I want to moot is that the sex bias was even stronger as Indo-Aryans moved out of the Indus Valley. My hypothesis is that sons of mixed background would be more likely to migrate east and south than “pure-blood” Indo-Aryans. The imbalance between Y chromosomes and mtDNA is much higher east and south than in Pakistan.

The Munda arrived in India 4,000 years ago (probably)

I didn’t plan to talk about the Munda any time soon, in part because I recently wrote a post, The Munda as upland rice cultivators, which outlined my views. But there is a new preprint with new samples which attempts to estimate admixture times using genome-wide data. You can see the results above, and, also note that they found similar estimates using Y chromosome SNP variation around haplogroup O2a1.

The preprint is, The genetic legacy of continental scale admixture in Indian Austroasiatic speakers:

Surrounded by speakers of Indo-European, Dravidian and Tibeto-Burman languages, around 11 million Munda (a branch of Austroasiatic language family) speakers live in the densely populated and genetically diverse South Asia. Their genetic makeup holds components characteristic of South Asians as well as Southeast Asians. The admixture time between these components has been previously estimated on the basis of archaeology, linguistics and uniparental markers. Using genome-wide genotype data of 102 Munda speakers and contextual data from South and Southeast Asia, we retrieved admixture dates between 2000 – 3800 years ago for different populations of Munda. The best modern proxies for the source populations for the admixture with proportions 0.78/0.22 are Lao people from Laos and Dravidian speakers from Kerala in India, while the South Asian population(s), with whom the incoming Southeast Asians intermixed, had a smaller proportion of West Eurasian component than contemporary proxies. Somewhat surprisingly Malaysian Peninsular tribes rather than the geographically closer Austroasiatic languages speakers like Vietnamese and Cambodians show highest sharing of IBD segments with the Munda. In addition, we affirmed that the grouping of the Munda speakers into North and South Munda based on linguistics is in concordance with genome-wide data.

There is a weird pattern of the affinities in f3 statistics in the IBD in this preprint. I think the explanation that they give, that Vietnamese and Cambodians have been subject to later admixture, probably explains it. In the case of the Vietnamese, it’s southern Chinese ancestry. In the case of the Cambodians…it might be Indian ancestry! This might strike you as strange, but the Indian ancestry in the Cambodians may be more enriched for the West Asian component that’s not found in the Munda specifically: the element brought in by the Indo-Aryans.

The peninsular Malay groups are “proto-Malays,” and these groups tend to be somewhat higher in AASI-like ancestry as well as lower in Austronesian ancestry. High shared drift tendencies with Lao and groups in more isolated areas of Malaysia may be a function of the fact that these are less cosmopolitan populations, with less Indian and Chinese ancestry, than other mainland Southeast Asians and Malays proper.

Click to enlarge

These results are broadly in line with the Narasimhan et al. preprint, which is cited within it. In that preprint the Reich group outlines its general model, where modern South Asians can be thought of as a compound of several different ancestral populations of different affinities. The Munda in particular are enriched for “Ancient Ancestral South Asian” (AASI) vs. any other group, and the hypothesis is given is that the Southeasts Asian mixed first with with an AASI group which lacked the admixture with West Asians, and then mixed again with “Ancestral South Indians”, which had some West Asian (“Iranian Farmer”) ancestry.

Since ALDER based methods, last I checked, tended to pick up the last admixture event, the more recent time for northern Munda groups makes sense. Looking at the Y chromosomes it is pretty clear to me that some of the East Asian ancestry in Bengali-speaking agriculturalists in the lower Gangetic plain is from Munda groups. Conversely, some of the Munda probably admixed populations from in from the west practicing intensive rice agriculture, which apparently did not become a feature of the landscape until after 1000 BC.

One of my points in the post above I wrote on the Munda is that the common words for Austro-Asiatic languages indicates that they were upland rice farmers. This is exactly the modern distribution of the Munda. One hypothesis, which I now am skeptical of, is that the Munda once occupied the bottomlands and were driven into the hills by people from the west and south. I no longer believe this. Rather, the Munda may always have preferred the uplands, and so traversed the flat lands between the Khasi hills and the Chota Nagpur plateau. This preference for uplands may strike us as strange, but it’s not that rare. Yankee farmers in Ohio preferred upland zones, even though these were less agriculturally rich (farmers moving up from the South didn’t have this aversion).

A point observed and implied in the preprint is that the expansion of Indo-Aryans, Dravidians, and Munda, seems to have happened all rather close in time. Though the northwest region of the subcontinent seems to have developed a settled agricultural society by 3000 BC of long standing, its expansion was limited by climatic restrictions on its crop toolkit. But by 2500 BC it seems pastoralists were already pushing into the Deccan via the dry-zone on the eastern edge of the Thar down from the Punjab. The Toda people of the far south of India are probably representative of the lifestyle of these peoples, who were Dravidian-speaking.

A few centuries after this period is probably when the proto-Munda began pushing out of Southeast Asia. The DNA evidence is pretty strong this was a hugely male-skewed event once it got beyond the Khasi hills. Why? My hypothesis is that these were not quite small-scale peoples. Perhaps the male-mediation of a lot of gene flow in South Asia is due to the emergence of militarized confederacies where elite lineages engaged in conquest of territory from native groups. The Munda have very low frequencies of R1a, and very high frequencies of O2a. The admixture with Dravidian and Indo-Aryan speaking peoples that occurred between 2000 BC and 0 AD was probably overwhelmingly female-mediated.

The narrative above suggests that most of the genetic changes we see in South Asia to result in the landscape of the present occurred in the period between 2500 BC and 500 BC. About 2,000 years. And yet agriculture of some form arrived in Mehegarh in western Pakistan 9,000 to 7,500 years ago, depending on what dates you trust. What took so long? Similarly, millet and rice agriculture in China is 7,000 years old, but only around 4,000 years ago did rice farmers start pushing south (and probably west in the case of the Munda).

I’ll present the hypothesis here that this coincidence wasn’t a coincidence, and that certain things in relation to social complexity have a particular rate of change. In general I agree with economic historians who say that our need to posit an “Industrial Revolution,” or a “Neolithic Revolution,” is somewhat of an imposition because humans don’t want to think quantitatively. It probably takes small-scale societies moving from hunting and gathering to full-brown agriculture a certain amount of time, and then to proceed to greater social complexity that enables migration which is more than due to simple natural increase and Malthusian driven expansion. Mainland India beyond what is today Pakistan and much of Southeast Asia were “filled up” by agricultural peoples around the same time after a long incubation to the west and north because similar social forces were at play.

South Asian Genotype Project, Summer 2018 Update


I’ve put another update on the South Asian Genotype Project. Make sure to go to “‘projectmembers v2” sheet. If you’ve contributed since March check it out.

Again, if you are interested: send me a 23andMe, Ancestry, MyHeritage, Family Tree DNA raw genotype file to contactgnxp -at- gmail.com.

In the subject please put:

  1. “South Asian Genotype Project”
  2. The state/province your family is from
  3. Ethnolinguistic group
  4. If applicable, caste

I decided to some poking around with some of the higher quality samples people have given me. 180,000 SNPs with almost no genotyping missing rate. I also removed “relatives.” That means that a lot of Muslim groups from Pakistan had individuals dropping out. In the PCA above you can see 4 Burushos left! Not too many Pathans either.

Click to enlarge!

First, I decided to look at the Brahmin samples I had.

– Uttar Pradesh, Bihar, and the Gujarati Brahmin(s) I had are one cluster
– South Indian Brahmins (mostly Iyer) are another

To my surprise, the two Maharashtra Brahmins that I have are firmly in the South Indian cluster. The Bengali Brahmin is more like the North Indians. But there is a subtle skew toward the distant Bangladesh cluster. This individual seems less East Asian than even the typical Bengali Brahmin, but I think Bengali Brahmins can be modeled as North Indian Brahmin with non-Brahmin (and therefore East Asian) ancestry.

Click to enlarge!

Next, I wanted to look at Gujaratis. The 1000 Genomes has a large number of this population…but there’s not a group identity label. Years ago Zack Ajmal of Harappa DNA concluded that a large and relatively related cluster in these data were “Patels.” Someone who is a Bohra Muslim of presumably Patel background sent me their data. They did not fall in the Patel cluster. Rather, they were in the “Gujurati_ANI_1” group, which is more like Pakistanis than other Gujuratis. In fact, the Gujurati Brahmin is not in this cluster. An individual who is Solanki seems to be more ASI-shifted, like the Patels and Gujurati_ANI_4.

Overall, Gujarat has a lot of population structure in a rather small state (yes, I can’t spell Gujarat as you can see in my population labels).

Click to enlarge!

From Maharashtra, right to the south of Gujarat in western India, I have two Brahmins and one Kayastha. For non-South Asians, my understanding is that Kayasthas are literate non-Brahmin castes. In Bengal, they take the places of the Kshatriya in the caste hierarchy, and with Brahmins formed the traditional Hindu educated classes. I have seen Bengali Kayastha genotypes, and they look rather like other Bengalis (my mother’s father’s family is from a Kayastha family before their conversion to Islam judging from their customary surname).

There are Kayasthas in other parts of South Asia. I have a Kayastha sample from Maharashtra. Curiously on the PCA this individual is in the same position as the two Brahmins from the region, and South Indian Brahmins. I don’t know what this means.

Click to enlarge!

Next some odds and ends from the northwest of the subcontinent. I have a few Jatts who are not related. This group from Punjab is quite ANI-shifted. Someone who claims to be a Rajput from Rajasthan is where they should be on account of geography. The Punjabi 1000 Genome group is quite diverse. I have a Ramgarhia individual who seems to be somewhere between Punjabi_ANI_1 and Punjabi_ANI_2. The Jatts are on the edge (ANI-shifted) of Punjabi_ANI_1.

I have two individuals who claim to be Kashmiri. A Butt and a Syed. I have no idea what that means. But both are Punjabi_ANI_2…but they look somewhat East Asian shifted. This is not surprising. Trans-Himalayan populations tend to be. The curious thing about Kashmiris is that they are culturally and geographically quite distinct from Indians to their south. But genetically they are not so different. In fact, they are “more South Asian” (ASI) than Jatt, and considerably more than Iranian speaking groups like Pathans.

Finally, there is a Marwari individual. This community is from Rajasthan, though they occupy a mercantile role across the subcontinent. Strangely (or not?) they are very close to the Patels. Much more ASI-enriched than the Rajput.

Click to enlarge!

Shifting to South Indian samples, I plotted the Chamar with them, who I believe were collected from Uttar Pradesh in the north. These Dalits actually seem to cluster with a subset of the 1000 Genomes Tamil and Telugu samples I believe are Scheduled Caste (Dalit) as well. The Chamar are somewhat distinct. They are more ANI-shifted. But notice that the bulk of Tamils and Telugus are still more ANI-shifted than the Chamars are! This surprised me.

I have some Velama individuals, as well as a Reddy from Andhra Pradesh, and a Padmashali. All these individuals are in the main distribution of South Indians. I do have a Mudaliar Tamil sample, and this individual is placed among the Chamars. Though not really in the Tamil Scheduled Caste group.

Click to enlarge!

Finally some odds & ends. The Nasrani samples from Kerala are between the South Indian Brahmins and middle caste South Indians. I suspect this is due to the origin of the Nasranis in the Nair community, who have mixed some with Brahmins. The Vania sample from Gujarat is clustered with South Indian Brahmins. The Dusadhs, an agricultural group from Uttar Pradesh and Bihar, that is depressed in some manner in relation to the dominant groups (Google says so), are not quite Chamars, but they are ASI-shifted.

Some of you will be asking about admixture. I ran K = 4 unsupervised on the data set. You can find it here.

Rakhigarhi sample doesn’t have steppe ancestry (probably “Indus Periphery”)

We’ve been waiting for two years now, and it looks like they’re about to pull the trigger, Indus Valley People Did Not Have Genetic Contribution From The Steppes: Head Of Ancient DNA Lab Testing Rakhigarhi Samples:

Niraj Rai, the head of the Ancient DNA Laboratory at Lucknow’s Birbal Sahni Institute of Palaeosciences (BSIP), where the DNA samples from the Harappan site of Rakhigarhi in Haryana are being analysed, has revealed that a forthcoming paper on the work will show that there is no steppe contribution to the DNA of the Harappan people….

“It will show that there is no steppe contribution to the Indus Valley DNA,” Rai said. “The Indus Valley people were indigenous, but in the sense that their DNA had contributions from near eastern Iranian farmers mixed with the Indian hunter-gatherer DNA, that is still reflected in the DNA of the people of the Andaman islands.” He added that the paper based on the examination of the Rakhigarhi samples would soon be published on bioRxiv (pronounced “bio-archive”), a preprint repository of papers in the life sciences.

At this point none of this is surprising. I also wonder if this preprint was hastened by the release of The Genomic Formation of South and Central Asia. It seems that the results here are totally consonant with what came before. My expectation is that the lone sample that they got genetic material out of will be similar to the “Indus Periphery” (InPe) individuals in the earlier preprint: a mix of West Asian with ancestry strongly shifted toward eastern Iran, and indigenous South Asian “hunter-gatherer.”  That’s pretty much what Niraj Rai states in the piece. I think genetically the individual won’t be that different from the Chamars of modern day Punjab.

In fact, Rai, the lead researcher, ends by twisting the knife:

In other words, the preprint observes that the migration from the steppes to South Asia was the source of the Indo-European languages in the subcontinent. Commenting on this, Rai said, “any model of migration of Indo-Europeans from South Asia simply cannot fit the data that is now available.”

A major caveat here is that we’re talking about one sample from the eastern edge of the Indus Valley Civilization (IVC). I’m not sure that this should adjust our probabilities that much. From all the other things we know, as well as copious ancient DNA from Central Asia, our probability for the model which the Rakhigarhi result aligns with should already be quite high.

Again, since it’s one sample, we need to be cautious…but I bet once we have more samples from the IVC the Rakhigarhi individual will probably be enriched for AASI relative to other samples from the IVC. The InPe samples in The Genomic Formation of South and Central Asia exhibited some variation, and it’s likely that the IVC region was genetically heterogeneous.

But, this is going to be a DNA sample from an individual who lived 4,600 years ago within the orbit of the IVC when it was in its mature phase. That’s still a big deal. As most of you know the IVC is prehistory because we haven’t deciphered the seals which are associated with this civilization. But, the IVC clearly had relationships with West Asia and Central Asia, with parts of eastern Iran and the BMAC culture both being influenced and interaction with it. Traders who were likely from the IVC seem to be mentioned in Mesopotamian records.

Additionally, the genetics of one individual can be highly informative if it’s high-quality whole-genome data (I’m skeptical of that in this case). One could possibly even identify the time period that admixture between West Asian and AASI components occurred from a single genome, by looking at ancestry tract lengths.

A single sample isn’t going to falsify the idea held by some that steppe peoples were long present within the IVC. Perhaps they’ll show up in other samples? That’s possible, and it’s what I would argue if I held their position, but I think the constellation of evidence on the balance now does suggest that a relatively late incursion into South Asia is likely. The steppe ancestry with Northern European affinities shows up in BMAC only around 4,000 years ago. It is hard to imagine it was in South Asia before it was in Central Asia.

As I’ve been saying for a while it seems that though there will be more genetic work written on India in the near future, the real analysis is going to have to come out of archaeology and mythology.

It’s pretty clear that in Northern Europe the arrival of the Corded Ware peoples from the steppe zone resulted in great tumult. A linguistic analysis suggests that the languages of Northern Europe have words related to agriculture with a non-Indo-European origin, of common provenance.  But we don’t have much in the way of mythos about the arrival of the Corded Ware.

In contrast, India has a rich mythos which seems to date to the early period of the arrival of the Indo-Aryans. One interpretation has been that since these myths seem to take as a given that Indo-Aryans were autochtonous to India, they were. But the genetic data seem to be strongly suggesting that the arrival of pastoralists occurred in South Asia concomitant with their arrival in West Asia, and somewhat after their expansion westward into Europe. Indian tradition and mythos could actually be a window into the general process of how these pastoralists dealt with native peoples and an illustration of the sort of cultural synthesis that often occurred.

The population genomics of South Asia is complicated, and politics doesn’t make it easier


Many people have been sending me links to this article, By rewriting history, Hindu nationalists aim to assert their dominance over India. Here’s a key section:

The RSS asserts that ancestors of all people of Indian origin – including 172 million Muslims – were Hindu and that they must accept their common ancestry as part of Bharat Mata, or Mother India. Modi has been a member of the RSS since childhood. An official biography of Culture Minister Sharma says he too has been a “dedicated follower” of the RSS for many years.

Sharma told Reuters he expects the conclusions of the committee to find their way into school textbooks and academic research. The panel is referred to in government documents as the committee for “holistic study of origin and evolution of Indian culture since 12,000 years before present and its interface with other cultures of the world.”

Sharma said this “Hindu first” version of Indian history will be added to a school curriculum which has long taught that people from central Asia arrived in India much more recently, some 3,000 to 4,000 years ago, and transformed the population

There are several threads here. First, it is a fact that the ancestors of South Asia’s non-Hindus were Hindu. There are minor exceptions, such as the Parsis, who are ~75% Iranian. One can quibble as to whether many tribal and peasant populations were truly Hindu in a formal and explicit sense. But I think this is a semantic dodge. Muslims would recognize these beliefs and practices as Hindu, no matter if one was a Brahmin monk or a member of a tribe which still sacrificed animals.

I’ve looked at the genotypes of a fair amount of South Asians of Muslim background. The overwhelming (usually exclusive) proportion of their ancestry is South Asian. It’s a fact that the ancestors of non-Hindu South Asians were Hindu.

But, the article and a dominant theme in Hindu nationalism today are that distinctive historically important groups like Indo-Aryans are indigenous to South Asia. This is set against a narrative of invasions and migrations from the outside, which is presumed more friendly to a multicultural paradigm (I have a hard time keeping track of the political valence of all these things). To some extent, the reality of invasions and migrations cannot be denied, whether it be Alexander, the Kushans, or the various Muslim groups. But these historical invasions left little genetic imprint.

When 2009’s Reconstructing Indian Population History was published things changed for the impact of the earlier migrations. By the time the ancient Greeks were recording observations of India in Classical Antiquity, it was already noted as the most populous nation in the world. I was initially skeptical about the result in Reconstructing Indian Population History, that there was massive admixture between West Eurasians (ANI) and indigenous South Asians (ASI) because that would imply massive migration. Additionally, phenotypically the pigmentation genes didn’t seem to work out if the source population was European-like.

Nearly 10 years on we have a lot more clarity. Ancient DNA has changed our understanding of the past. Massive migrations were common. And, the pigmentation and genetic profile of modern Europeans is recent, within the last 4,000 years. The source population(s) for “Ancestral North Indians” (ANI) may not have been Europeans in the way we’d understand them. In fact, a follow-up paper, Genetic Evidence for Recent Population Mixture in India, hinted at two admixtures. There’s a fair amount of circumstantial evidence now that one component of “Ancestral North Indian” relates to West Asian populations and another component to the more classical steppe Indo-Aryans. The former is more widespread across the subcontinent than the latter, which is concentrated in the northwest and among upper castes.

I do understand Indians who want to interpret their own history through the lens of their own cultural priors. The problem is that genetic science has proceeded so fast in the last few years that many propositions which were speculative in the 20th century are testable in the 21st century. Some Hindu nationalist friends and acquaintances express embarrassment and worry about the track that Indian nationalists are going on. I don’t know what to say, but Americans have their own delusions and blithe acceptance of propaganda, so I’m not going to be one pointing fingers. Other Indians have told me via Facebook that they “believe in the results from the 2000s” (when they were more congenial to their viewpoints?). I guess that’s one strategy; just keep up with the science until it starts refuting your model.

Read More

The Dravidianization of India

On this week’s The Insight Spencer Wells and I talk about the Indo-Aryan arrival to South Asia. This was recorded very early last summer, and I’m rather unguarded (it’s well before I had the piece published in India Today).

I think 2018 will finally be the year that a lot of South Asia will be “solved.” There has been some foot-dragging on papers and results, but that can only go so long.

All that being said I suppose I should make some suppositions I have arrived at on this topic more explicit, as in a discussion with an Indian friend he admitted had no idea about some of my views, though he reads this weblog when I expressed them. That’s because they are speculative and my confidence in them is weak, though you can infer my opinions if you look very closely.

The figure to the left is from Genomic insights into the origin of farming in the ancient Near East, a paper published about a year and a half ago. You see various South Asian populations being modeled as a mixture of four different source populations. The Onge are an Andaman Islander population (and the closest we can get to the aboriginal peoples of South Asia). Iran_N represents Neolithic Iranians, the canonical “eastern farmer” population. Steppe_EMBA represent Yamnaya pastoralists, who are themselves modeled as a mixture of Eastern European Hunter-Gatherers (EHG) and southern population which has affinities with the Iran_N cluster. EHG in their turn seems to exhibit ancestry from Western European Hunter-Gatherers (WHG), whose heritage dates to the late Pleistocene, and Ancient North Eurasians (ANE), who flourished in Siberia, and contributed ancestry to populations to the west and east (including the ancestors of Native Americans).

When I first saw this specific figure I was incredulous. I had long thought that “Ancient North Indians” (ANI) were a compound of two elements, one related to the farmers of West Asia (Iran_N), and the other steppe Indo-European (Steppe_EMBA/Yamnaya). But the fraction of Yamnaya/Indo-European/Indo-Aryan ancestry seemed far too high.

A few years later I am less certain about my skepticism. The fractions here in the details are debatable. Within the text of the paper, the author admits that the true ancestral populations are probably not represented by the model. But they are close. In most cases, the “Han” ancestry is probably indicative of the fact that the non-ANI component of South Asian ancestry is most closely related to the Onge, but is significantly different nonetheless.

The ratio of Iran_N and Steppe_EMBA is the key. Here is a selection from the paper:

GroupIran_NSteppe_EMBARatio
Jew_Cochin0.530.232.27
Brahui0.600.301.98
Kharia0.130.071.97
Balochi0.570.321.75
Mala0.230.181.25
Vishwabrahmin0.250.201.21
GujaratiD0.290.281.04
Sindhi0.380.381.00
Bengali0.220.250.91
Pathan0.360.450.81
Punjabi0.240.330.72
GujaratiB0.270.380.72
Lodhi0.210.290.72
Burusho0.270.430.64
GujaratiC0.230.370.61
Kalash0.290.500.58
GujaratiA0.260.460.57
Brahmin_Tiwari0.230.440.51

Any way you slice it, a group like the Tiwari Brahmins of Northern India have more Onge-like ancestry than most of the groups in Pakistan. But also observe that the ratio toward Steppe_EMBA is more skewed in them than among even Pathans or Kalash.  The Lodhi, a non-upper caste population from Uttar Pradesh in north-central South Asia are more skewed toward Steppe_EMBA than Pathans.

It is important for me to reiterate that the key is to focus on ratios and not exact percentages. Though the Steppe_EMBA fraction did strike me as high, glimmers of these sorts of results were evident in model-based clustering approaches as early as 2010. The population in the list above most skewed toward Iran_N are Cochin Jews. This group has known Middle Eastern ancestry. But next on the list are Brahui, a Dravidian speaking group in Pakistan. There is a north-south cline within Pakistan, with northern populations (Burusho) being skewed toward Steppe_EMBA and southern ones (Sindhi) being skewed toward Iran_N. Additionally, Iranian groups such as Pathans and Baloch likely have had some continuous gene flow with Middle Eastern groups, probably inflating their Iran_N.

Trends I see in the data:

  1. There is a north-south cline within Pakistan with Steppe_EMBA vs. Iran_N
  2. There is a north-south cline within South Asia with Steppe_EMBA vs. Iran_N
  3. There is caste stratification within regions between Steppe_EMBA vs. Iran_N
  4. Though not clear in this table, there are strong suggestions that Indo-European speaking groups tend to be enriched in Steppe_EMBA, all things equal (e.g., the Bengalis in the 1000 Genomes look a lot like the middle-caste Telugus in the 1000 Genomes when you remove the East Asian ancestry…except for a noticeable small fraction of a component which I think points to Indo-European ancestry)

What does this mean in terms of a model of the settlement of South Asian over the past 4,000 years? One conclusion I have come to is that Dravidian speaking groups are not the aboriginal peoples of the subcontinent. Rather, their settlement across much of South Asia is very recent. Almost as recent as Indo-Aryan habitation. In First Farmers the archaeologist Peter Bellwood proposed this model, whereby Indo-Aryans and Dravidians both expanded across South Asia concurrently. Though I think elements of Bellwood’s model that are incorrect, it’s far more correct in my opinion than I believed when I first encountered it.

Why do I believe this?

  1. The Neolithic begins in South India in 3000 BC.
  2. Sri Lanka is Indo-European speaking
  3. The Dravidian languages of South India don’t seem particularly diverged from each other
  4. There is ancestry/caste stratification in South India even excluding Brahmins (e.g., Reddys and Naidus in Andhra Pradesh look somewhat different from Dalits and tribals)
  5. Some scholars claim that there isn’t a Dravidian substrate in the Gangetic plain
  6. R1a1a-Z93, almost certainly associated with Indo-Aryans, is found in South Indian tribal populations
  7. Using LD-based methods researchers are rather sure that the last admixture events between ANI and ASI (“Ancestral South Indians”) populations occurred around ~4,000 years ago

Here is my revised model as succinctly as I can outline it. The northwest fringes of South Asia, today Pakistan, and later to be the home of the Indus Valley Civilization (IVC), was populated by a mix of indigenous populations, a form of ASI, when West Asian agriculturalists arrived ~9,000 years ago from what is today Iran. These were the Iran_N or “eastern farmer” groups. The West Asian agricultural toolkit was serviceable in northwestern South Asia for reasons of climate and ecology, but could not expand further east and south for thousands of years.

There is where the first admixture occurred that led to a population was mixed between ANI and ASI. These people lacked Steppe_EMBA. They were pre-Indo-European. They were almost certainly not all Dravidian speaking. The Burusho people of northern Pakistan, for example, speak a language isolate (in India proper you have Nihali and Kusunda)

By ~3000 BC this proto-South Asian (in a modern sense) population began to expand, while the IVC matured and waxed. Eventually, the IVC waned, fragmented, and disappeared.

Around ~2000 BC, or perhaps somewhat later, Indo-Aryans arrive in South Asia. The situation at this stage in not one of a primordial and static Dravidian India, on which Indo-Aryans place themselves on top. Rather, it’s a dynamic one as the collapse of the IVC has opened up a disordered power vacuum, and a reconfiguration of cultural and sociopolitical alliances.

In the paper above the author alludes to the pervasiveness of both Iran_N and Steppe_EMBA ancestry in South Asia, including in South India. “Indo-European” Y chromosomal lineages are also found among many South Indian groups, albeit at attenuated proportions region-wide. In Peter Turchin’s formulation, I believe that “Indo-Aryan” and “Dravidian” identities became meta-ethnic coalitions in the post-IVC world. Genetically the two groups are different, on average. But some Dravidian populations assimilated and integrated Indo-Aryan tribes and bands, while Indo-Aryans as newcomers assimilated many Dravidian populations.

The reason that the ratio of Iran_N to Steppe_EMBA does not decline monotonically as one goes from west to east along North Indian plain is that Indo-Aryans were not expanding into a Dravidian India.  Dravidian India was expanding only somewhat ahead of Indo-Aryan India, and in some places not all at all. In the northwest fringe of South Asia there had long been a settled population of peasants with West Asian ancestry with Iran_N affinities. In contrast to the east the landscape was populated by nomadic tribal populations with ASI affinities. North Indian Brahmins may have more Steppe_EMBA than some populations in Pakistan and more ASI because they descend from Indo-Aryan groups who absorbed indigenous ASI populations as they expanded across the landscape.

Dravidian groups as they expanded also assimilated indigenous populations. This explains some groups with very high fractions of ASI. Their ASI ancestry is a compound, of an old admixture in Northwest India, and also later assimilation in South India. The presence of R1a1a-Z93 in these populations reflects the integration of some originally Indo-Aryan groups into the expanding Dravidian wavefront.

Where does this leave us?

  1. The Indo-Aryan vs. Dravidian dichotomy is not one of newcomers vs. aboriginals. It is of two different sociocultural configurations which came into their current shape in the waning days of the IVC. That is, it is less than 4,000 years old
  2. The two populations were clearly interacting closely around the time of the collapse and disintegration of the IVC and post-IVC societies. There has been gene flow between the two
  3. ~4000 years ago ANI and ASI populations existed in their “pure” form, but that is because ASI aboriginals still existed to the south and east of the IVC, while Indo-Aryans were a new intrusive presence in the Indian subcontinent