


The following table illustrates what I’m talking about:

The cultural-historical debate is whether the Austro-Asiatic languages are indigenous to South Asia or not. The balance of the evidence now seems to be that they are not. What likely occurred is that the Austro-Asiatic languages waxed with the rise of an agricultural Diaspora, whose locus of origin was in what is today the southern regions of China proper. More precisely, the Austro-Asiatic languages may have spread with rice farming across Southeast Asia and eastern South Asia. Likely they were the first on the scene in Southeast Asia, as Bellwood reports in First Farmers and First Migrants that archaeology and anthropometrics can detect admixture between the farmers arriving from the north and native hunter-gatherers in places like the Red river valley in northern Vietnam ~4,000 years ago. The frequency of O2a1-M95 for regions and populations is subdivided very precisely in the above paper, and it is clear that in island Southeast Asia its proportions match those in an earlier paper on autosomal inferences of Austro-Asiatic ancestry. Populations in eastern Indonesia and in the Philippines have minimal numbers of males carrying lineages of O2a1-M95, while the densely populated island land of Java has frequencies of ~50%.
The clincher for why O2a1-M95, and therefore Austro-Asiatic populations, are likely exogenous to India genetically would be the genetic diversity of the lineages. In short, there is tentative information from the variation on the microsatellites that the coalescence of the diverse lineages in Laos are the deepest by a few thousand years. But there was another paper from a few years back which makes my confidence in these results higher, Population Genetic Structure in Indian Austroasiatic speakers: The Role of Landscape Barriers and Sex-specific Admixture, which presented autosomal data which was very persuasive to me. In particular, the derived variation of EDAR which is present in very high frequencies among Northeast Asians and Amerindian populations, is present at about ~5% frequency among Munda groups. Among Dravidian populations in South India according to the 1000 Genomes Browser the frequency is less than 1%, while it is absent among populations in Northwest India, aside from those with clear East Asian admixture.
Next we address the issue of the Dravidian languages. A new paper in Human Genetics, West Eurasian mtDNA lineages in India: an insight into the spread of the Dravidian language and the origins of the caste system, points to an association between particular mtDNA lineages in South India and southern Iran, in particular the region which was once inhabited by the Elamites, who have been posited to have an association with the Dravidian languages. I don’t put particular stock in the philological association between Dravidian langauges today and Elamite; I can’t judge it with any degree of certainty or competency. But the genetic data is certainly suggestive. Here’s the portion which is relevant:
The autochthonous subhaplogroups—HV14a1 and U1a1a4 uniquely found in contemporary Dravidian speakers share their ancestry primarily with the Near East-Iran populations (Derenko et al. 2013). The coalescence times of HV14a1 and U1a1a4 were estimated to be ~10.5–17.9 kya. The shared ancestry of the Dravidian of South India and Iranian of Near East populations has been shown in the HV14 and U1a1 phylogeny (Fig. 1a) and their time estimates are consistent with the proto-Elamo-Dravidian language diffusion. hypothesis which emphasized that the proto-Dravidian language evolved over 15 kya, specifically in western Asia before the beginning of agricultural development ~11 kya. This language was introduced by Neolithic pastoralists, and was thought to be associated with the spread of these west Eurasian-specific mtDNAs to peninsular India (Pagel et al. 2013). The Y-chromosome haplogroup L1a has added a further dimension to this hypothesis. The subclades of haplogroup L such as L1a, L1b, and L1c were found predominantly in Iranian populations of western Asia (Grugni et al. 2012). In India, only the L1a lineage was observed and was largely restricted to the Dravidian-speaking populations of south India (Sahoo et al. 2006; Sengupta et al. 2006). The coalescence time (~9.1 kya) (Sengupta et al. 2006) and the virtual absence in Indo-Aryan speakers in north indicate that the L1a lineage arrived from western Asia during the Neolithic period and perhaps was associated with the spread of the Dravidian language to India
There has long been a presumption to assume that the Dravidian languages are primal to South Asia. But that was before modern genomics revolutionized our understanding of Indian genetic history. More or less all South Asian populations are a fusion between a deeply indigenous strain which distant affinities to the peoples of eastern Eurasia (ASI), and a group very close to the ones typically found in Western Eurasia (ANI). There are no pure indigenes. South Indian tribal populations, who are presumed to be the closest to indigenous groups are at least ~25% ANI, if not more. To presume that the Dravidian languages are indigenous to South Asia one would have to assume that this exogenous element was absorbed by the cultural substrate, something I find implausible on cross-cultural grounds (more dominant South Asian social elites, even ones of pure Dravidian extraction, such as the Reddy group, have higher fractions of ANI). Additionally, Dravidian languages themselves are not particularly variegated, as one might expect if there was deep local structure, as is the case in inland Papua and pre-Columbian America.
Of course the title of this post has to do with males, so with that, let’s look back to a paper which was first posted on the web last year (though finally “published” this March), The phylogenetic and geographic structure of Y-chromosome haplogroup R1a. Here’s the important part:
…Using the 8 R1a lineages, with an average length of 48 SNPs accumulated since the common ancestor, we estimate the splintering of R1a-M417 to have occurred rather recently, ~5800 years ago (95% CI: 4800–6800). The slowest mutation rate estimate would inflate these time estimates by one-third, and the fastest would deflate them by 17%.
With reference to Figure 1, all fully sequenced R1a individuals share SNPs from M420 to M417. Below branch 23 in Figure 5, we see a split between Europeans, defined by Z282 (branch 22), and Asians, defined by Z93 and M746 (branch 19; Z95, which was used in the population survey, would also map to branch 19, but it falls just outside an inclusion boundary for the sequencing data4). Star-like branching near the root of the Asian subtree suggests rapid growth and dispersal. The four subhaplogroups of Z93 (branches 9-M582, 10-M560, 12-Z2125, and 17-M780, L657) constitute a multifurcation unresolved by 10 Mb of sequencing; it is likely that no further resolution of this part of the tree will be possible with current technology. Similarly, the shared European branch has just three SNPs.
The authors emphasize that the TMRCA has a wide confidence interval. I don’t think so. There’s now a fair amount of work on sequencing R1b and R1a lineages which are very common across Eurasia, and one thing is clear: they’re star-shaped phylogenies which are likely reflecting massive population expansions relatively recently (see A recent bottleneck of Y chromosome diversity coincides with a global change in culture). Additionally, they note that the “Asian” (which includes South, Central, Southwest Asia) and the European branches of R1a1a are relatively well separated, and, the greatest diversity of R1a1a can be found in Iran.
I doubt that R1a1a was associated with one ethno-linguistic group at the end of the last Ice Age. It is present at relatively high frequencies in low caste and tribal populations in South India, so I am skeptical of an exclusive association with Indo-Europeans, though in Europe it may actually be that it arrived only with Indo-Europeans. But, the fact that R1a1a is so common all across Eurasia points to a genetic-cultural revolution. Just as Haplogroup O2a1 is almost certainly rooted in populations outside of South Asia before the Holocene, so is the case with R1a1a. They came with groups of men who brought a new dominant lifestyle. From the west came wheat and cattle. From the east, rice.
The latest research suggests about half the ancestry of modern South Asians dates to the Pleistocene. That is, it predates 10,000 BC. The majority of the mtDNA lineages are from this ancestral element. But culturally this group likely had minimal influence. One question which comes to mind is whether the ASI ancestry is from many groups, or, from only a few which were assimilated into an expanding group of agriculturalists. If the former, then one expects that the ASI ancestral segments which exhibit a tendency toward regional structure. I suspect thought that this is not the case, that the genetic landscape of modern India is characterized by overlapping populations which are all hybrids of different regional groups which only recently expanded. The pattern of Munda groups in South Asia, surrounded by Dravidian and Indo-European speaking groups, points one to the possibility that these groups were pioneers of some sort, but eventually lost.
* Language isolates like Kusunda and Nihali may date to the era before the Holocene, but without relatives we can’t really make a good guess. Possible relationships of Kusunda to Andaman or Papuan languages strike me as implausible due to the time depth of separation.



Comments are closed.