Computational linguistic phylogenetics and Indo-Europeans

A new paper, Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages, has made a splash by inferring a far older date of diversification of these languages than has been assumed by other linguists, archaeologists and geneticists. As you can see above, the splits start a bit earlier than 5000 BC in this model, 1,500-2,000 years before the “classic” Pontic-steppe hypothesis. There are divergences in the typology from what some have assumed, for example, the deep split of Indo-Iranian from other groups. Nemets in Proto-Indo-European Urheimat Debate has given some skeptical thoughts, while Iosif Lazaridis of the “Southern Arc” fame has also offered his two cents. Others on social media have pointed out what seem to be important errors in the paper:

I can’t speak to the linguistics. I will say that I was taught about Bayesian phylogenetic inference in graduate school, so I know some of the models and parameters they’re using, and I’ve even used BEAST 2 myself. It’s not my specialty at all, so I have weak intuition, but these are serious methods that allow for understanding the past and reconstruction of evolutionary relationships. But I will pass on Asya Pereltsvaig’s criticism in The Indo-European Controversy: Facts and Fallacies in Historical Linguistics that the reliance on lexicon as input data might be a major problem in these inferences; misleading data in, misleading result out. But I’ll comment a few issues that jumped out at me informed by ancient DNA.

First, the position of Tocharian is not surprising…it often comes out as diverging early from the other languages. Tocharian languages were found in the northern and northeast regions of the Tarim basin. Historically, the southern rim of the basin was dominated by Iranian languages. It seems the most likely candidate for the people that gave rise to the Tocharian languages is the Afanasievo culture. The Afanasievo we now know were basically an eastern branch of the Yamnaya that show up in the Altai 3300 BC. This is 5,300 years BP. In the paper, the Tocharian split from other Indo-Europeans 5,400 to 8,600 years BP over a 95% confidence interval. The only way this makes sense to me is if there was deep linguistic structure within the Yamnaya despite overall genetic homogeneity maintained through mate exchange. In the text the authors seem to imply that the Tocharians are an early eastward migration, perhaps from the south Caucasus region. This does not align very well with the ancient DNA. The Afanasievo early on are replica copies of Yamnaya. Were the Tocharians already there? Did the Afanasievo just adopt their language?

The second issue I have broadly is with the Indo-Iranians. The authors propose that the Indic and Iranian branches separated in 3,500 BC. While earlier work indicates that the Indo-Iranian languages descend from the Sintashta language and the cultures of the Andronovo horizon, these authors emphasize the role of populations from the south Caucasus traversing Iran south of the Caspian Sea.

Below is a map that gets to the crux of my confusion about this ancient date and longstanding indigeneity of Indo-Iranians on the Iranian plateau:

We have some histories of the Middle Eastern Bronze Age. We know that the area of southwest Iran was dominated by the non-Indo-European Elamites as early as 3000 BC, and these people persisted down into the Common Era. Modern Armenia was dominated by non-Indo-European speaking Urartians after 1000 BC. This language is related to Hurrian, documented 4,000 years ago. Before the Indo-European Hittites ruled Hatti, the Hattians ruled Hatti. And the Hattians were not Indo-European. Judging by the obscure Eteocretan language that persisted into antiquity the Minoans were almost certainly not Indo-European speaking. Around 1500 BC it is true the ruling elite of the Hurrians, the Mittani, seem to have had an Indo-Aryan connection, but they were also likely intrusive, and their emergence as mobile warriors suspiciously post-dates the development of the light war chariot and the domestic horse thousands of miles to the north several centuries earlier by the Sintashta. The Assyrian royal annals date the arrival of Persians to the 9th century BC, but the results in the paper imply that the Iranians were already present in the Zagros for thousands of years before this (the south Caucasus being the Indo-European ur-heimat ultimately, the Indo-Iranians moving south and east very early on from that region).

I’m focusing on the Middle East because there is a rich history of textual evidence starting in the third millennium BC. These results imply that Indo-European languages are in fact native to the northern Middle East, in the southern Caucasus. And yet assorted obscure languages like Gutian, Kassite and Kaska, are found where you might expect a stray Indo-European here and there.  To me this is curious and weird. Further to the west, these results seem to imply that Greek was brought with Caucasus ancestry, but Minoan was likely not Indo-European. There are all these non-Indo-European languages attested in the textual record…and only a few Indo-European ones (Hittite being the first).

One of the major points of this paper that contradicts some theories in historical linguistics is a rejection of the tentative connection between Balto-Slavic and Indo-Iranian. Genetically, the curious aspect of the two language families is that Y chromosomal haplogroup R1a is very frequent in both, but differentiated into two lineages that seem to have diverged 5,500-6,000 years ago. But there is more than just Y chromosomes here; over the past decade autosomal genome analyses show that many South Asians, in particular those in the northwest and upper caste populations are enriched for a minority ancestral component that resembles Eastern Europeans. We now know what happened due to ancient DNA: Genetic ancestry changes in Stone to Bronze Age transition in the East European plain. A branch of the Corded Ware Culture (CWC) migrated eastward, becoming the Fatyanovo Culture, then the Balanovo Culture, then the Abeshevo Culture, and finally the Sintashta Culture. The Sintashta seem to have given rise a group of societies known as Andronovo that are hypothesized to evolved into Iranians and Indo-Aryans.

The result here does away with all this. Rather than Indo-Aryan speech being brought by steppe pastoralists between 3,500 and 4,000 years ago, as genetics would imply, the Indo-Aryan speech was likely present during the Indus Valley Civilization. These results imply that Indo-Aryan arrived in India thousands of years before the intrusion of steppe pastoralists, and it was carried eastward by farmers from the Caucasus. The Vedas and Sanskrit then come down from the IVC. And yet strangely the Vedas do not depict a very complex society like the IVC, but a more simple agro-pastoralist one. And, the sacred language of the IVC people presumably, Sanskrit, was maintained in particular by a Brahmin priestly caste that is notable for having a very high fraction of steppe ancestry, that much arrived later.

A massive issue of this paper is that it makes a hash of a major phenomenon that we know between 3500 and 2500 BC, and that’s the spread of steppe-people in all directions, especially out of the Corded Ware complex. The CWC are notable for having a major admixture of Globular Amphora Culture (GAC) Neolithic ancestry, about 25-35% of their genetics, and then spreading into all directions. As noted by the authors and other observers, ancient DNA suggests that Anatolian, Armenian, and perhaps Greek and Illyrian (Albanian), are exceptions to this, deriving directly from Yamnaya or pre-Yamnaya (in the case of Hittites) Indo-European people (remember, CWC is a mix of Yamnaya and GAC). The genetics is very clear that a major wave of post-CWC people went into Asia, and south into the Indian subcontinent and Iran. The Y chromosomes imply this was male mediated, and post-CWC Y chromosomes are found in appreciable quantities as far south as Sri Lanka. But these data place this demographic migration far too late to have been the origin of Sanskrit, which is associated with Arya culture.

As Lazaridis points out on social media, the divergence of European language groups like Germanic, Celtic, Italic and Balto-Slavic also predates the CWC expansion westward. For example, Italic language split off in 3500 BC, 500 years earlier than the expansion of CWC into Eastern Europe, with a 95% lower-bound of 2200 BC, about when steppe ancestry shows up in the Italian peninsula according to ancient DNA. If the dates are true then it seems that the various Indo-European language groups were differentiated already very early on in the Yamnaya, and not later on through their expansion across Europe. In other words, this is a model of “ancient linguistic substructure.”

To be entirely candid, it’s very hard for me to reconcile the ancient DNA with this typology and time-depth in a parsimonious manner that holds together in my head. This doesn’t mean the other models don’t have holes, the “Southern Arc” theory is pretty complicated too, and everything would have been “easier” if the Hittites had steppe ancestry, and they do not seem to. But there are too many things that are hard for me to understand with this new model. For example, the vast numbers of steppe Iranian people seem to be mostly descended from the CWC societies that gave rise to Europeans, but their languages diverged extremely early from their western neighbors, almost 2,000 years before the diversification of the CWC as an archaeological, demographic and genetic unit.

Dan Davis prehistory YouTuber par excellence

I’ve been on the record of being skeptical of a lot of content being generated on YouTube, but I think the author Dan Davis does a really great job. The topics overlap with a lot of my interests. For example, he has videos on the Corded Ware, the Sintashta, and the koryos. Since he relies on a lot of the primary scholarship I do I can evaluate his representation and it’s all true, and all accurate. Of course, it’s video, so I recommend you read the primary sources at some point. But Davis’ 15-30 minute videos are a reasonable length and relatively well-produced, so an excellent introduction.

Davis is the author of two novels about the early Indo-Europeans, Godborn and Thunderer. You can also get the prequel novel free on his website.


For some pieces on my Substack I’ve been re-reading a lot of the stuff on the ancient genetics and archaeology of Eurasia as they relate to Indo-Europeans. This means I get a different view from usual…as it’s more synoptic. I’m not entirely clear on the dates or archaeology, but here is what I’ve concluded: the Indo-European expansions can be partitioned into “waves.” That is, they weren’t a simple “demic diffusion” where disease (against their rivals) and reproductive excess generated a continuous expansion across their range.

So here’s what I get

1 – An “early phase” where Yamna people push west (Kurgan) and become Corded Ware, and east (far) and become Afanasievo. Date this to right before 3,000 BC, but pretty much “completes” in Europe by 2900-2800 BC, as the broad zone of Central and Northeast Europe is dominated by these people (there are still debates on whether Afanasievo became the “Tocharians”; I think they did)

2 – ~2500 BC, 400-500 years after the initial push west, Indo-European populations push beyond their limits on the Rhine, and breakthrough past the mountains ringing the Southern European peninsulas. The dates are often vague in the south, but it looks to be around 2500 to 2000 BC. For example, the Neolithic farmer descended Remedello Culture in northern Italy ends about 2400 BC. The Bell Beaker Indo-Europeans seem to have arrived in Ireland and England at just about this time, perhaps a century after they came to dominate France.

Though there were obviously islands of exception (often quite literally as in Sardinia and Crete), Europe by 2000 BC was Indo-European.

3 – The third wave dates to after 2000 BC, and it is the “Asia reflux.” Populations used the forest-steppe zone as a stepping stone out to the east. Derived from the same synthesis between Yamna and European farmer as Corded Ware, these populations seem ancestral to the Indo-Iranians. Slavic-speaking people (or the ancestors of those people) occupied the western fringe of this expansion zone, and by the Iron Age had begun to move east, marginalizing Indo-Iranians across much of their core European territory.

It seems that Indo-Iranians had pushed into the margins of northeast Iran, Khorasan, by ~2000 BC. In the period between 2000-1500 BC they clearly began to occupy their historical core zones in Iran and India. Obviously, Indo-European Iranians are present in western Iran by 1000 BC in the historical record, though Indo-European Mitanni are present by 1540 BC at the latest in Syria and northern Iraq.

The Iranians also moved into the Tarim basin, so the cities of the west and southern edge were Iranian-speaking (the cities of the north and east were Tocharian).

What explains these pulses? I don’t know totally, but we know a few things:

– There are star phylogenies on the Y chromosomal associated with these migrations. R1b, R1a, and I1. I think the last is due to the assimilation of non-Indo-European men in Europe, but the first two are clearly primal. The Indo-Europeans were clearly very patrilineal.

– The last, Asian, migration clearly has something to do with chariots and horses. The coincidence in timing seems too much. But the earlier migrations were before chariots (I believe). But, the horse does seem to have come with Indo-Europeans, so there was a level of mobility involved.

– The “Bell Beaker” motif seems to have emerged among non-Indo-Europeans in Iberia, and spread to Indo-Europeans, who expanded outward. I think we’re seeing something related to religion.

Unfortunately for I suspect that the Indo-European advantage was “social technology”, not material technology. Social technology is hard to infer in a preliterate society.

Question for readers: Can you nail down the chronology better? Those who know archaeology?

The enormous demographic impact of the Indo-Europeans

When I was a kid I remember seeing a map of the distribution of Indo-European languages, and being perplexed by their spread and distribution, from the North Sea to the Bay of Bengal. Later, I learned and understood that language families can spread by diffusion and cultural assimilation. In The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World David Anthony outlines an elite emulation model of the Indo-Europeanization, whereby groups of warriors associated with the Kurgan cultures took over and reshaped a broad range of societies.

The samples Anthony provided were instrumental in recalibrating his own model. It turns out that steppe migrants were extremely genetically impactful. Rather than a small minority, many archaeological cultures seem to have been predominantly steppe in genetic origin (total number of ancestors). The best estimates seem to be that ancestry from the steppe is somewhat more than half the total in northern and eastern Europe, and somewhat less than half in southern and western Europe (i.e., northeast to southwest cline).

More recently, it also seems that a substantial, though a smaller, proportion of the ancestry in southern Asia also derives from the steppe peoples. Within India itself, the range seems to be from 25-30% among some groups, such as North Indian Brahmins and Jatts, to a more typical range between 5 and 15% (peasant castes in South India are closer to the former, peasant castes in the Gangetic plain are closer to the latter).

Using the proportions in various ethnic groups in the Indian subcontinent, as well as across European nations, I have come to conclude that around ~10% of the ancestry in the world derives from people who were members of the “Yamna Horizon” ~3000 BC.* I don’t know the archaeology well enough to be highly informed, but I’m willing to bet that closer to 1% of the world’s population lived in and around the Yamna Horizon, so over the last 5,000 years, you’ve seen a 10-fold increase in representation of this ancestral component. More concretely, I think that the vast majority of the increase occurred between 2500 BC (when expansions into Britain and Southern Europe seem to have occurred) and 1000 BC (when the core area of the Indian subcontinent was Aryanized).

* I did stuff like weighted caste groups in Uttar Pradesh, looked at the populations of India states, added Pakistan and Bangladesh, as well as assigning estimates to European countries. I did some back-of-the-envelope for North and South America (e.g., assume that 50% of the ancestry is Iberian, and assume that 25% of that 50% is steppe).

Migration at the roof of West Asia

Click to see the full figure

The figure to the left is from The genetic prehistory of the Greater Caucasus. If you are a regular reader of this weblog, or Eurogenes, you can figure out what’s going on, and keep track of the terminology. But in 2018 I think we’re getting to the end of the line in making sense of “admixture graphs” in relation to West Eurasian population structure. The models are just getting too complicated to keep everything straight, and the distinct-populations-subject-to-pulse-admixture seems to be an assumption that may not necessarily hold.

To get a sense of what I’m talking about, the above preprint focuses on populations in and around the Caucasus region. One of the major reasons that this is important is that the Caucasus was and is to some extent a continental hinge, connecting Eastern Europe and the Pontic steppe, to the Near East. The Arab Muslims pushed north of the Caucasus, and came into conflict with the Khazars, while Cimmerians and Scythians moved south from the Pontic steppe.

The elephant in the room is the relevance to the “Indo-European controversy.” Colin Renfrew long ago posited that the Indo-European languages derive from West Asian farmers who expanded into Europe as early as ~9,000 years ago. A rival theory is that Indo-Europeans spread out of the Pontic steppe ~4,000 years ago. In 2015 two major papers suggested that the steppe was a major source of Indo-European expansion. Case closed? This preprint suggests perhaps not.

But we’ll get to that later. What do the results here show? The prose is a little hard to tease apart, but the major issues seem to be that in antiquity, or at least the period they’re focusing on, much of the gene flow seems to have been south (Near East) to the north (through the Caucasus, and out to the north slope). To some extent, we already knew this: the Yamna people of the Pontic steppe have “southern” ancestry from the Near East that earlier East European/Pontic people do not. In this preprint, the authors show that groups such as the Maykop of the north slope of the Caucasus carry Y haplogroups such as G2, and not the R1 lineages commonly found in the steppe. David W. suggests that this confirms that Near Eastern gene flow into the steppe was female-mediated.  This is plausible, but I would caution that Y chromosomes alone can be deceptive, due to the power of particular patrilineages. We’ll probably rely on the X chromosome to make a final judgment.

The plot below shows many of the relationships as a function of location and time. The green component is modal among “Iranian farmers,” the orange among “Anatolian farmers,” and the blue among “Western hunter-gatherers.”

A major aspect of this preprint is that it has to work hard to differentiate two Anatolian farmer-like signals: the first, from Anatolian farmers proper, and the second from the descendants of European farmers, who themselves are a mix of Anatolian farmers with a minority ancestry among the hunter-gatherers. The answers would probably be totally unintelligible if not for archaeology. It’s clear that the steppe people had contact with both European and Near Eastern farmers and that later East European groups that succeeded the Yamna were subject to reflux from Central Europe, and received European farmer ancestry.

Another curious nugget in their results is that there was early detection of both Ancestral North Eurasian (ANE) ancestry and, some East Eurasian gene flow (related to Han Chinese). One of their individuals carries the East Eurasian variant of EDAR, which today is only found in Finns, though it was found in reasonable frequencies among the Motala hunter-gatherers of Scandinavia. Additionally, Fu et al. 2016 found that the ancestors of Mesolithic hunter-gatherers received some gene flow from Eastern Eurasians as well (also in the supplements of Lazaridis et al. 2016).

The authors admit that there is probably population structure among ANE and undiscovered groups of East Eurasians who were traversing the Inner Asian landscape. I think this is all suggestive of some long-distance contacts, though the intensity and magnitude increased a lot with high-density societies and the mobility of pastoralism.

Much of the genetic mixing in the Near East, and to some extent in the trans-Caucasian region, seems to date to the 4th millennium. This is technically prehistory, but it is also the Uruk period. This was a phase of Mesopotamian culture expansion between 4000 and 3100 BC which resulted in replicas of Uruk style settlements as far away as Syria and southeastern Anatolia. There is even evidence of Uruk-related migration to the North Caucasus.

The Uruk experienced abrupt and sudden collapse. Uruk settlements outside of the core zone of Mesopatamia disappear.

It’s the final paragraph that warrants discussion:

The insight that the Caucasus mountains served not only as a corridor for the spread of CHG/Neolithic Iranian ancestry but also for later gene-flow from the south also has a bearing on the postulated homelands of Proto-Indo-European (PIE) languages and documented gene-flows that could have carried a consecutive spread of both across West Eurasia…Perceiving the Caucasus as an occasional bridge rather than a strict border during the Eneolithic and Bronze Age opens up the possibility of a homeland of PIE south of the Caucasus, which itself provides a parsimonious explanation for an early branching off of Anatolian languages. Geographically this would also work for Armenian and Greek, for which genetic data also supports an eastern influence from Anatolia or the southern Caucasus. A potential offshoot of the Indo-Iranian branch to the east is possible, but the latest ancient DNA results from South Asia also lend weight to an LMBA spread via the steppe belt…The spread of some or all of the proto-Indo-European branches would have been possible via the North Caucasus and Pontic region and from there, along with pastoralist expansions, to the heart of Europe. This scenario finds support from the well attested and now widely documented ‘steppe ancestry’ in European populations, the postulate of increasingly patrilinear societies in the wake of these expansions (exemplified by R1a/R1b), as attested in the latest study on the Bell Beaker phenomenon….

But instead of tackling this let’s focus on the paper that came out of the Willerslev group, The first horse herders and the impact of early Bronze Age steppe expansions into Asia. This is a final manuscript in Science. That means it was probably written before The Genomic Formation of South and Central Asia. When it comes to South Asia, the results from the two publications are consanant. There is no conflict.*

More interesting are the results in West Asia, and the linguistic supplement. In the authors note that tablets now indicate an Indo-Aryan presence in Syria ~1750 BC. Second, Assyrian merchants record Indo-European Hittite, or Nesili (the people of Nesa), as early as ~2500 BC.

As suggested in earlier work Hittite remains don’t suggest steppe influence. David W. says:

The apparent lack of steppe ancestry in five Hittite-era, perhaps Indo-European-speaking, Anatolians was interpreted in Damagaard et al. 2018 as a major discovery with profound implications for the origin of the Anatolian branch of Indo-European languages.

But I disagree with this assessment, simply because none of these Hittite-era individuals are from royal Hittite, or Nes, burials. Hence, there’s a very good chance that they were Hattians, who were not of Indo-European origin, even if they spoke the Indo-European Hittite language because it was imposed on them.

The main aspect I’d bring up with this is that in other areas steppe ancestry has spread deeply and widely into the population, including non-Indo-European ones. It is certainly possible that the sample is not needed enough to pick up the genuinely Hittite elite, but I probably lean to the likelihood that the steppe signal won’t be found. It seems that the Anatolian languages were already diversified by ~2000 BC, and perhaps earlier. Linguists have long suggested that they are the outgroup to other Indo-European languages, though this could just be a function of their isolation among highly settled and socially complex populations.

Two alternative models present themselves for these results. The Anatolian Indo-European languages expanded through elite diffusion,  part of the same general migrations that emerged out of the Yamna culture ~3000 BC. The lack of a steppe signal may be due to sampling bias, as David W. suggested, or, more likely in my opinion, simple dilution of the signal. Second, the steppe migrations were one part of a broader palette of population movements and cultural diffusions, and the Anatolian Indo-Europeans are basal to the efflorescence of the steppe derived branches.

The evidence of the explosion of Indo-Aryans in the years after 2000 BC in West and South Asia, as well as the expansion of Iranians across vast swaths of Inner Asia during the same period, suggest to me that Indo-Iranians are most definitely part of the steppe pulse. The connection to the Sintashta charioteers presents itself, and, connections to the Uralic languages indicates incubation in the trans-Volga region.

In West Asia, the Indo-Aryans crashed themselves against the most advanced civilizations of their time. Like the Bulgars, and unlike the Hittites, Indo-Aryan Mitanni was totally absorbed by their non-Indo-European Hurrian substrate. Indo-Aryan linguistic influence was preserved in their names, their gods, and in particular words relating to chariots. And yet in 2017’s Continuity and Admixture in the Last Five Millennia of Levantine History from Ancient Canaanite and Present-Day Lebanese Genome Sequences, the authors observe:

We next tested a model of the present-day Lebanese as a mixture of Sidon_BA and any other ancient Eurasian population using qpAdm. We found that the Lebanese can be best modeled as Sidon_BA 93% ± 1.6% and a Steppe Bronze Age population 7% ± 1.6% (Figure 3C; Table S6). To estimate the time when the Steppe ancestry penetrated the Levant, we used, as above, LD-based inference and set the Lebanese as admixed test population with Natufians, Levant_N, Sidon_BA, Steppe_EMBA, and Steppe_MLBA as reference populations. We found support (p = 0.00017) for a mixture between Sidon_BA and Steppe_EMBA which has occurred around 2,950 ± 790 ya (Figure S13B).

This needs to be more explored. The admixture could have come from many sources. I am curious about the frequency of R1a1a-z93 among modern-day Syrians and Lebanese.

For me these arguments can only be resolved with a deeper understanding of linguistic evolution. The close relationship of Indo-Aryan and Iranian languages is obvious to any speaker of either of these languages (I can speak some Bengali). A divergence in the range of 4 to 5 thousand years before the present seems most likely to me. But the relationship of the other Indo-European languages is much less clear.

One of the arguments in Peter Bellwood’s First Farmers is that the Indo-European languages exhibit a “rake-like” topology with the exception of Indo-Iranian, which forms a clear clade. To him and others in his camp, this argues for deep divergences very early in time.

It is hard to deny that the steppe migrations between 4 and 5 thousand years ago had something to do with the distribution of modern Indo-European languages. But, it is harder to falsify the model that there were earlier Indo-European migrations, perhaps out of the Near East, that preceded these. Only a deeper understanding of linguistic evolution, and multidisciplinary analysis of regional substrates will generate the clarity we need.

* I’m going to skip the Botai angle in this post.

Near Prehistory in Northern Europe was an Indo-European world

The Picts were the topic of discussion on this week on In Our Time. They are a mysterious yet intriguing people because we don’t know much about them in their own words, but, they are one of the roots of modern Scottish identity. When I first encountered the Picts decades ago there was some debate as to whether they were a pre-Indo-European people or not. Today that seems to not be a hypothesis people entertain. Rather, the Picts were simply the least Romanized of the Brythonic Celtic people of Britain.

Today because of the genetic data I think we can be rather confident that by the time of the Roman Empire there were no non-Indo-Europeans left in Northern Europe. The Beaker people in Britain and Ireland seem to have overwhelmingly replaced the native population of farmers, whose ancestors had predominantly arrived from the eastern Mediterranean thousands of years ago (via the Atlantic littoral or Central Europe). Across Northern Europe, in general, the replacement of the previous populations was substantial, though not total.

In Southern Europe, the arrival of Indo-Europeans was more fitful, and persistence of Basque attests to the fact that non-Indo-European languages were spoken down to historical times (if Etruscan is considered native to the Italian peninsula, that’s another example, though this is hotly debated and I lean toward the exogenous model). The pre-Latin language of Sardinia was almost certainly not Indo-European, while Greek has a high proportion of non-Indo-European words in its lexicon.


How a Eurasian “band of brothers” shaped the world

When I was eight years old I saw a map which genuinely confused me. I had opened up deluxe dictionary at my elementary school and saw a map of the world’s language families, and noticed that there were a group of dialects which spanned the Bay of Bengal to the North Sea. In fact, according to this map the language I had first learned to speak, Bengali, was in the same language family as English.

This was hard to wrap my mind around, but there it was in front of me. Further research at the public library confirmed this fact. And, upon further reflection it was obvious to me there were similarities…I had been learning French at school, and English, Bengali, and French, all exhibited similarities in the first ten numbers. English and French I understood in terms of a natural relationship, but Bengali?

My personal and professional interests have never been in domains where I would explore the topic first hand, but the origins of Indo-European languages have always been a hobby. I read books such as The Horse, the Wheel, and Language and In Search of the Indo-Europeans when I could. When taking in excellent works such as Empires of the Silk Road the Indo-European thread was always something I kept in mind.

But the above works take a more old-fashioned Eurasian heartland “marauders from the steppe” viewpoint. Starting about 15 years ago I began to look into a different framework: Indo-Europeans as farmers. For me begins with the 2002 paper, Mapping the Origins and Expansion of the Indo-European Language Family, which finds that “the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago” (this is the last paper I can remember reading in paper format). The model is elaborated by Peter Bellwood in works such as First Farmers, though he applies it to most language families.

But its origins go back decades, with the archaeologist Colin Renfrew. Rather than dramatic explosions from the steppe, Renfrew and colleagues suggest that the demographic expansion enabled by agriculture as a mode of production allowed for groups like Indo-Europeans to rapidly swamp their neighbors and enter into a process known as a wave of advance. There wasn’t a organized movement. Rather, farming enables the growth of population to such an extent that it was almost an undirected thermodynamic law that the original farmers would radiate outward, away from zones at the Malthusian carrying capacity and out toward virgin land.

It was a parsimonious theory, and phylogenetic techniques seem to have supported it. But then came ancient DNA to overturn the apple-cart. I won’t reshash what you probably already know, but will point to the two most relevant papers, Massive migration from the steppe was a source for Indo-European languages in Europe and Population genomics of Bronze Age Eurasia. Basically there was massive population turnover during the early Bronze Age. The genetic data aligned well with predictions you’d make from the old “marauders from the steppe” model, not the demic diffusion of farmers who were subject to high endogenous population growth over time.

Of course the Anatolian model proponents have an answer. There is a thesis whereby the steppe pastoralists derive from Anatolians, and so the European population turnover was of one Indo-European group by another. This is possible, but to my knowledge this model was never foregrounded by Anatolianists before. Rather, it strikes me as a way to “save” their framework.

So far much of the battle has been between archaeologists, who tend to favor gradualism, and often even  cultural diffusion as opposed to migration, and historical linguists and arriviste geneticists, who tend toward a more classical migration-from-the-steppe perspective.

A new paper in Antiquity takes the sledgehammer to the Anatolian hypothesis with an archaeology first tack. Re-theorising mobility and the formation of culture and language among the Corded Ware Culture in Europe. They don’t pull punches:

…the Anatolian hypothesis must be considered largely falsified. Those Indo-European languages that later came to dominate in western Eurasia were those originating in the migrations from the Russian steppe during the third millennium BC.

Why would they say this? There is a major paper coming out:

These local processes of social integration between intruding Yamnaya/Corded Ware populations and remnant Neolithic populations can be applied to language dispersal. We should expect that the transformation from Proto-Indo-European to Pre-Proto Germanic would reveal the same kind of hybridisation between an earlier Neolithic language of the Funnel Beaker Culture, and the incoming Proto-Indo-European language. This is precisely what recent linguistic research has been able to demonstrate (Kroonen & Iversen in press). In their study on the formation of Proto-Germanic in Northern Europe, Kroonen and Iversen document a bundle of linguistic terms of non-Indo-European origin linked to agriculture that were adopted by Indo-European-speaking groups who were not fully fledged farmers.

They also contend that the Neolithic language was roughly the same throughout the zone of Indo-European expansion. From what those who would know about these sorts of things have told me this is plausible, because the Neolithic farmers spread so rapidly from a small founder culture, and exhibited broad Europe-wide similarities for a thousand years. Curiously, the chart shows that Germanic languages may have been influenced by a hunter-gatherer language, which the others were not. I suspect this may have to do with the relatively late persistence of hunter-gatherers in some maritime environments facing the Baltic and North Sea.

The paper, which is open access, needs to be read in full. Here are some important points:

  • Burial type seems to be a more robust form of indicator of dominant cultural identity
  • Corded Ware males practiced exogamy
  • Corded Ware males traveled long distances
  • Corded Ware culture was initially exclusively pastoralist
  • There is a great deal of circumstantial, and some genetic, evidence that Corded Ware communities were characterized by having women who were clearly from the Neolithic farming population
  • There was intergroup violence as a function of culture
  • The Corded Ware and Neolithic populations persisted near each other geographically, though the Neolithic groups seem to have retreated to uplands
  • The Corded War engaged in a wholesale pattern of landscape sculpting, burning down forests to produce pasture

Neolithic Y lineages, such as G2, are far rarer in Northern Europea today that R1a and R1b (in contrast, the hunter-gatherer I seems to have gone through an expansion just like R1a and R1b). We already have a model for what went on here, the Iberian settlement of the New World. Among mestizo populations there are huge skews of mtDNA and Y, with the former almost all Amerindian (with some African) and the latter almost all European (with some African).

The Corded War are the ancestors of the German peoples who we see emerge into the light of history during antiquity. What these data are telling is that the Germans are the product of a massive period of biological and cultural amalgamation and synthesis between indigenous groups and intrusive populations from the steppe. The archaeological data indicate that the intrusion was male mediated. The “battle axe” culture probably lived up to its name. And they weren’t likely exceptional….