Substack cometh, and lo it is good. (Pricing)

Computational linguistic phylogenetics and Indo-Europeans

A new paper, Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages, has made a splash by inferring a far older date of diversification of these languages than has been assumed by other linguists, archaeologists and geneticists. As you can see above, the splits start a bit earlier than 5000 BC in this model, 1,500-2,000 years before the “classic” Pontic-steppe hypothesis. There are divergences in the typology from what some have assumed, for example, the deep split of Indo-Iranian from other groups. Nemets in Proto-Indo-European Urheimat Debate has given some skeptical thoughts, while Iosif Lazaridis of the “Southern Arc” fame has also offered his two cents. Others on social media have pointed out what seem to be important errors in the paper:

I can’t speak to the linguistics. I will say that I was taught about Bayesian phylogenetic inference in graduate school, so I know some of the models and parameters they’re using, and I’ve even used BEAST 2 myself. It’s not my specialty at all, so I have weak intuition, but these are serious methods that allow for understanding the past and reconstruction of evolutionary relationships. But I will pass on Asya Pereltsvaig’s criticism in The Indo-European Controversy: Facts and Fallacies in Historical Linguistics that the reliance on lexicon as input data might be a major problem in these inferences; misleading data in, misleading result out. But I’ll comment a few issues that jumped out at me informed by ancient DNA.

First, the position of Tocharian is not surprising…it often comes out as diverging early from the other languages. Tocharian languages were found in the northern and northeast regions of the Tarim basin. Historically, the southern rim of the basin was dominated by Iranian languages. It seems the most likely candidate for the people that gave rise to the Tocharian languages is the Afanasievo culture. The Afanasievo we now know were basically an eastern branch of the Yamnaya that show up in the Altai 3300 BC. This is 5,300 years BP. In the paper, the Tocharian split from other Indo-Europeans 5,400 to 8,600 years BP over a 95% confidence interval. The only way this makes sense to me is if there was deep linguistic structure within the Yamnaya despite overall genetic homogeneity maintained through mate exchange. In the text the authors seem to imply that the Tocharians are an early eastward migration, perhaps from the south Caucasus region. This does not align very well with the ancient DNA. The Afanasievo early on are replica copies of Yamnaya. Were the Tocharians already there? Did the Afanasievo just adopt their language?

The second issue I have broadly is with the Indo-Iranians. The authors propose that the Indic and Iranian branches separated in 3,500 BC. While earlier work indicates that the Indo-Iranian languages descend from the Sintashta language and the cultures of the Andronovo horizon, these authors emphasize the role of populations from the south Caucasus traversing Iran south of the Caspian Sea.

Below is a map that gets to the crux of my confusion about this ancient date and longstanding indigeneity of Indo-Iranians on the Iranian plateau:

We have some histories of the Middle Eastern Bronze Age. We know that the area of southwest Iran was dominated by the non-Indo-European Elamites as early as 3000 BC, and these people persisted down into the Common Era. Modern Armenia was dominated by non-Indo-European speaking Urartians after 1000 BC. This language is related to Hurrian, documented 4,000 years ago. Before the Indo-European Hittites ruled Hatti, the Hattians ruled Hatti. And the Hattians were not Indo-European. Judging by the obscure Eteocretan language that persisted into antiquity the Minoans were almost certainly not Indo-European speaking. Around 1500 BC it is true the ruling elite of the Hurrians, the Mittani, seem to have had an Indo-Aryan connection, but they were also likely intrusive, and their emergence as mobile warriors suspiciously post-dates the development of the light war chariot and the domestic horse thousands of miles to the north several centuries earlier by the Sintashta. The Assyrian royal annals date the arrival of Persians to the 9th century BC, but the results in the paper imply that the Iranians were already present in the Zagros for thousands of years before this (the south Caucasus being the Indo-European ur-heimat ultimately, the Indo-Iranians moving south and east very early on from that region).

I’m focusing on the Middle East because there is a rich history of textual evidence starting in the third millennium BC. These results imply that Indo-European languages are in fact native to the northern Middle East, in the southern Caucasus. And yet assorted obscure languages like Gutian, Kassite and Kaska, are found where you might expect a stray Indo-European here and there.  To me this is curious and weird. Further to the west, these results seem to imply that Greek was brought with Caucasus ancestry, but Minoan was likely not Indo-European. There are all these non-Indo-European languages attested in the textual record…and only a few Indo-European ones (Hittite being the first).

One of the major points of this paper that contradicts some theories in historical linguistics is a rejection of the tentative connection between Balto-Slavic and Indo-Iranian. Genetically, the curious aspect of the two language families is that Y chromosomal haplogroup R1a is very frequent in both, but differentiated into two lineages that seem to have diverged 5,500-6,000 years ago. But there is more than just Y chromosomes here; over the past decade autosomal genome analyses show that many South Asians, in particular those in the northwest and upper caste populations are enriched for a minority ancestral component that resembles Eastern Europeans. We now know what happened due to ancient DNA: Genetic ancestry changes in Stone to Bronze Age transition in the East European plain. A branch of the Corded Ware Culture (CWC) migrated eastward, becoming the Fatyanovo Culture, then the Balanovo Culture, then the Abeshevo Culture, and finally the Sintashta Culture. The Sintashta seem to have given rise a group of societies known as Andronovo that are hypothesized to evolved into Iranians and Indo-Aryans.

The result here does away with all this. Rather than Indo-Aryan speech being brought by steppe pastoralists between 3,500 and 4,000 years ago, as genetics would imply, the Indo-Aryan speech was likely present during the Indus Valley Civilization. These results imply that Indo-Aryan arrived in India thousands of years before the intrusion of steppe pastoralists, and it was carried eastward by farmers from the Caucasus. The Vedas and Sanskrit then come down from the IVC. And yet strangely the Vedas do not depict a very complex society like the IVC, but a more simple agro-pastoralist one. And, the sacred language of the IVC people presumably, Sanskrit, was maintained in particular by a Brahmin priestly caste that is notable for having a very high fraction of steppe ancestry, that much arrived later.

A massive issue of this paper is that it makes a hash of a major phenomenon that we know between 3500 and 2500 BC, and that’s the spread of steppe-people in all directions, especially out of the Corded Ware complex. The CWC are notable for having a major admixture of Globular Amphora Culture (GAC) Neolithic ancestry, about 25-35% of their genetics, and then spreading into all directions. As noted by the authors and other observers, ancient DNA suggests that Anatolian, Armenian, and perhaps Greek and Illyrian (Albanian), are exceptions to this, deriving directly from Yamnaya or pre-Yamnaya (in the case of Hittites) Indo-European people (remember, CWC is a mix of Yamnaya and GAC). The genetics is very clear that a major wave of post-CWC people went into Asia, and south into the Indian subcontinent and Iran. The Y chromosomes imply this was male mediated, and post-CWC Y chromosomes are found in appreciable quantities as far south as Sri Lanka. But these data place this demographic migration far too late to have been the origin of Sanskrit, which is associated with Arya culture.

As Lazaridis points out on social media, the divergence of European language groups like Germanic, Celtic, Italic and Balto-Slavic also predates the CWC expansion westward. For example, Italic language split off in 3500 BC, 500 years earlier than the expansion of CWC into Eastern Europe, with a 95% lower-bound of 2200 BC, about when steppe ancestry shows up in the Italian peninsula according to ancient DNA. If the dates are true then it seems that the various Indo-European language groups were differentiated already very early on in the Yamnaya, and not later on through their expansion across Europe. In other words, this is a model of “ancient linguistic substructure.”

To be entirely candid, it’s very hard for me to reconcile the ancient DNA with this typology and time-depth in a parsimonious manner that holds together in my head. This doesn’t mean the other models don’t have holes, the “Southern Arc” theory is pretty complicated too, and everything would have been “easier” if the Hittites had steppe ancestry, and they do not seem to. But there are too many things that are hard for me to understand with this new model. For example, the vast numbers of steppe Iranian people seem to be mostly descended from the CWC societies that gave rise to Europeans, but their languages diverged extremely early from their western neighbors, almost 2,000 years before the diversification of the CWC as an archaeological, demographic and genetic unit.

18 thoughts on “Computational linguistic phylogenetics and Indo-Europeans

  1. Razib, presumably out of politeness, lays the foundation but doesn’t reach the final punchline, which is that the linguistic model is seriously flawed, and that the narrative the flows from trying to shoehorn the linguistic model’s conclusions into the hard evidence from archaeological evidence, historical accounts, and ancient DNA is likewise just plain wrong. We should be using the hard, precisely dated and placed evidence to calibrate the linguistic model instead of the other way around, because the parameters and assumptions of the linguistic model are profoundly less certain in date and in place.

    The single biggest driver of the problem with the linguistic model is the assumption that the Anatolian languages, because they are more diverged from the other Indo-European languages, are also the oldest. All other things being equal, that isn’t an unreasonable assumption, but all other things are not equal.

    The Neolithic societies of Europe and South Asia in which Indo-European languages replaced pre-existing Neolithic languages were all in a state of abject collapse when the Indo-European language speaking steppe people swept in, so the pre-existing substrate languages had much less of an impact on the Indo-European languages in those places, than in Anatolia. Further, in Europe, all of the substrate Neolithic first farmer languages were part of a single macro-linguistic family derived from the languages of the Western Anatolian source for the first farmers, possible with one major division between the LBK wave along the Danube and other inland river systems, and a Cardial Pottery wave skirting to Northern Mediterranean coast. Some of what we attribute to Proto-Indo-European or to a very basil split on the European side of the Indo-European languages may actually be shared substrate influences from similar languages in this European Neolithic language family (systemically understating the impact of language contact with these languages).

    Likewise, in the East, the Indo-Iranian languages probably shared a common Harappan language family substrate.

    In contrast, the Anatolian Indo-European languages saw their speakers, especially the Hittites, conquering a much more sophisticated Eneolithic/Early Bronze Age Hattic society whose linguistic predispositions were not so easily swept aside. Even after Hittite became the dominant secular language of the Hittite empire, the non-Indo-European Hattic language survived as a liturgical language akin to church Latin, post-Sumer Sumerian, and ancient Hebrew, for another thousand years, something that happened nowhere else in the Indo-European linguistic region. And, the languages of Anatolia and the Caucasian and Iranian highlands by the metal ages, were very different from the languages of the Western Anatolian Neolithic ancestors of Europe’s first farmers.

    The Anatolian languages are more diverged from other Indo-European languages not because they are older, but because there was a stronger substrate influence and the substrate that was the source of the influence was much different. You can read and hear the extent of the influence by comparing Hittite names and words and sentences to their Hattic counterparts (preserved well in large volumes of royal record keeping) and their Minoan counterparts (preserved in phonetic transcriptions in Egyptian texts and what can be guessed at from Linear B writing).

    Tocharian seems older than it really is for the opposite reason. Unlike every single other known Indo-European language, it had little or no substrate influence as it expanded into thinly populated regions en route to and in the Tarim Basin. It is probably the most conservative linguistically, kept pure from reduced contract on the frontier in much the same way that Icelandic on the frontier is the closest Germanic language to Old Norse, the same way that the Appalachian dialect of English is the closest the English dialect of Shakespeare as it was isolated on the frontier, and the same way that the Spanish dialect spoken by multigenerational natives of Southern Colorado and New Mexico are the only dialects of Spanish that retain some of its Spanish colonial era archaic words and grammatical constructions in living languages. The shared substrate influences of Western Anatolian Neolithic and Harappan language families on languages that were not Anatolian or Tocharian are absent in Tocharian and that is why it seems more diverged.

    New Zealand academic Russell Gray in this paper is repeating the sins of his fellow New Zealander Quentin Atkinson in his 2012 paper in Science. Later work by Atkinson recognized that increasing the amount of language evolution attributed to language contract and decreasing the amount of language evolution attributed to random mutation produced more reasonable estimates of the time depths of the various Indo-European languages. But these lessons were lost on the authors of the current paper.

    The authors of the current paper, instead, forge an utterly unconvincing “Southern Arc” narrative that has to be riddled with exceptions to principles and inferences about ancient DNA markers of Indo-European languages, about the lack of a reason for archaeologically and genetically homogeneous and geographically compact societies to have deep linguistic divides. They remove the climate and other motives of expansion too.

  2. They have a systematic error of branch scaling which elongates branches with excessive borrowing (which is especially typical for Indic languages) or have limited knowledge of synonym pairs representing meanings in their dataset (which is common for many ancient languages). Both problems stem from the same computational simplification. Namely, they treat each cognate responsible for the given meaning as an independent binary value (present or absent) while in reality, presence or absence of synonyms for a given meaning are negatively correlated.
    Basically in the languages where coevolving synonyms are well attested, a gain or a loss of a synonym will generally have a change value of 1 (1,1 -> 1,0 or vice versa). But in languages with external borrowing or with unknown synonym pairs, any such change would count as 2 (loss of the original cognate plus gain of a new one).
    This scaling problem would have inferred even older split dates have it not been artificially limited by setting the upper bound for the age at 10,000 years. In one of the sensitivity analyses they removed this upper bound and ended up with estimates as old as 11 kya.

    There is also an important linguistic consideration for the Northern route and against the South Caucasus urheimat, and it is borrowings from IE to neighboring languages. The oldest layer of IE-derived words in the Finno-Ugric languages is thought to be related to proto-Iranian and dated to ~Sintashta epoch in the Ural Mountains. Conversely, Gamkrelidze and Ivanov assembled a great collection of potentially IE-derived words in Kartvelian and Semitic languages but nothing there is convincingly older than Mitanni age.

  3. @Dx

    “Both problems stem from the same computational simplification. Namely, they treat each cognate responsible for the given meaning as an independent binary value (present or absent) while in reality, presence or absence of synonyms for a given meaning are negatively correlated.”

    They do address this with a multi-state model, which actually rather closely reproduces the chronology we would expect under the steppe hypothesis (though it places Tocharian in a weird position).

  4. They do address this with a multi-state model

    Yes, the multi-state is an interesting sensitivity analysis there, and it results in archaeologically sensible dates in Asia, but they reject it for underestimating many split dates in Europe. As it turns out, although their multistate model correctly addresses the concept of multiple words conveying the same meaning, it STILL can’t handle synonyms properly. The model’s limitation is that there can only be one of the multiple possible states (synonyms) present at any given time. If, during language development, synonyms oscillate (sometimes one is more prominent and at other times, another one), then their model considers it to be change after change after change. Synonym oscillation isn’t a super common phenomenon but it is common in some recent branches which help define the magnitude of change rates. As a result, their recent change rates are overestimated in a number of branches, and the split dates become unrealistically young.

    Needless to say, neither model (binary vs. “multistate but lone value”) is up to task with such a giant tree with deep systematic differences between levels of linguistic knowledge.

  5. The underestimating of split-dates in Europe is rather a red-herring though. The languages being “underestimated” should be expected to have had shared drift while in closed contact in a dialect continuum, and a method that correctly estimates the timing of migratory splits (such as the splits caused by Indo-European migrations traveling thousands of miles) should underestimate the split-dates of languages that originated as dialects in a dialect continuum. Dialects in close contact (either spatial, as with Germanic languages in Europe, or literary / cultural, as with Brazilian and Portuguese Portuguese) will have more similar vocabulary than languages that evolved entirely separately.

    As to synonym oscillation, that will apply equally to inferred ancestral languages and attested modern and non-modern languages. The fact that attested languages are coded with one lexeme/cognate-set per meaning is actually strong reason to use multistate coding. If your calibration is based on languages that are intentionally coded as one lexeme per meaning, your methodology needs to generate ancestral languages that are also inferred as one lexeme per meaning.

    This is amply demonstrated by the fact that the binary coding model doesn’t correctly infer ancestral languages as actually being ancestral (e.g. Old English and English, or even more egregious examples for other languages). The multistate coding model does a much better job with most of those.

  6. I wonder if the split dates in multi-state models are indeed underestimated. The languages in computational models are lists of words. The estimated split dates are the dates of earliest changes in these lists. The word changes cannot occur before the language splits, but they can occur later, sometimes much later. If you check the IE-CoR database, you will notice that the cognate sets of some languages (e.g. Romanian) are mostly composed of loanwords. Many changes setting this language apart from its siblings are due to linguistic contact in medieval and modern times. The question remains when the rest of the words changed, but I doubt that they were all that early, and in particular I wonder if any of these words changed early enough to reject the multi-stage models for underestimating dates.

  7. Re; while I have doubts about the depth of the tree (for multiple reasons including these mentioned here), for Balto-Slavic and Indo-Iranian, certainly with a shallower date and operating under a steppe hypothesis for LPIE, the tree structure of languages doesn’t seem to be irreconcilable with R1a tree structure in any important way.

    We know that the early CWC in Czech Republic (https://www.science.org/doi/10.1126/sciadv.abi6941) circa 3000-2800 BCE had more diverse (though Steppe biased) y-dna, and then we know that later cultures were dominated by single clades more frequently.

    While they share a more recent comment ancestor that the form found in earliest CWC (R1a-Z645 found most early in Czechia circa Early Bronze Age), I don’t see that there is anything necessary that implies that the predominant Balto-Slavic R1a (R1a1a1b1a Z282) and Indo-Iranian R1a (R1a1a1b2 Z93) must have shared this common ancestor dated to a period either hundreds to a thousand years after the forms identified in early CWC.

    So, even if we take these clades as indicators of language communities, there need not be any extensive further time depth or shared linguistic evolution, not enough to build up some extensive body of linguistic (lexical / phonological / morphological) features that clearly match BS and II.

    The nature of rapid/star expansions is that you get relatively flat trees with independent mutations / changes rather than a clear and heavily supported chain of bifurcations. Any communities living in Northeastern Europe which were linguistic ancestors to proto-Balto-Slavic and Indo-Iranian communities may have “gone their own way” after the CW expansion 3000-2800 BCE, with little further contact prior to a putative II expansion to the south (perhaps enough contact to explain these Satem, etc features, but very little compared to the degree to which they were internally connected).

    YMMV.

  8. Alternatively, if there was a common Balto-Slavic+Indo-Iranian stage of say, 500 years (about 18 male generations), then there may not have been any significant common linguistic innovations in lexicon, morphology or phonology that would allow us to distinguish this stage from their common ancestor with Celtic-Germanic-Italic. And particularly if BS then interacts with an sister of Celtic-Germanic-Italic later, it may then look like a descendant from a common clade with them.

    One of the criticisms of these trees is that these elements don’t evolve at fixed rates, and while this is probably moderated by extremely well-sampled recent languages, it may be more of a problem for estimating older splits.
    Shared linguistic innovations need not evolve at the same rates as y-dna mutations, and so a y-dna history (even where the same y-dna clade is shared), need not match 1:1.

  9. @DX, re; the comment on the effects of borrowing, this is why these papers using phylogenetic inference for language history need to be challenged with simulations that can demonstrate these effects. Otherwise these comments will simply sit as conjectures in comment sections.

  10. 1. There are easier to read graphics in the supplement than the one reproduced above. I looked at the supplement, not because I understood a word of (I didn’t), but because I couldn’t download the main article. Even my wife’s university ID couldn’t crack the Science paywall.

    https://www.science.org/doi/suppl/10.1126/science.abg0818/suppl_file/science.abg0818_sm.pdf

    Fig. S6.1 (previous page) Maximum Clade Credibility (MCC) tree for the tree distribution from Model M3. This MCC tree corresponds to the DensiTree in Fig. 2 in the main text …
    Sup. p. 57-58

    Fig. 7.10.2 (next page) MCC tree from the multistate analysis of the IE-CoR database
    Sup. p. 77-78

    Back Later

  11. “deep linguistic structure within the Yamnaya”

    Perhaps. Even without external shocks, or admixtures from pre-existing languages, languages evolve and differentiate over time and space. This is true even when there is good communication through an area. Take for instance the Western Mediterranean littoral arcing from Iberia over France, to Italy. There is excellent communication throughout the area. Under pre-modern conditions water born transport is always fastest and cheapest. But there was a dialectical continuum from one end to another. Catalan and Occitian are different languages.

    One thing to remember is the reverse telescope effect of dealing in historical time. We whip through millennia faster than we go through last month’s expense vouchers. But, a few hundred years can produce enormous changes in a language. Beowulf in the 8th century is is unintelligible to modern English speakers. Shakespeare a mere 8 centuries late is understandable — at least to the educated.

    “light war chariot and the domestic horse thousands of miles to the north several centuries earlier by the Sintashta.”

    I think that is a key point.

    “but Minoan was likely not Indo-European”

    Further consider the status of Linear A and Linear B.

    A key player in the deciphering of Linear B was an American classicist Alice Kober (1916-1940) who died before the work was completed. https://en.wikipedia.org/wiki/Alice_Kober

    Early Indo European languages are inflected languages, in which the grammatical relationships between words are signaled through inflectional morphemes (usually endings*). Earlier I-E languages (Latin, Sanskrit) are more heavily inflected than more recent evolutions such as English.
    https://en.wikipedia.org/wiki/Indo-European_languages

    * e.g. English forms gerunds by adding the suffix -ing to verbs. German forms gerunds by prefixing verbs with ge-. singing gesang.

    One of Kobers key findings about Linear B was that the script had the patterns of an infected language. In that it differed from Linear A.

    A terrific book about the deciphering of Linear B focused on Alice Kober is “The Riddle of the Labyrinth: The Quest to Crack an Ancient Code” by Margalit Fox, 2013
    https://www.amazon.com/Riddle-Labyrinth-Quest-Ancient-Margalit/dp/B00SQCEWKS/geneexpressio-20

  12. Finally, Their model doesn’t make much geometric sense to me. Language groups don’t go through each other. My mental model of the evolution of IE languages is a lava lamp where blobs split off the main group and wander off.

    If IE starts in Anatolia, the Greeks would have to be the first split. If Hittite were first to split from the main blob, there is no way for Greeks to be isolated on the Hittites western flank and split from the proto-Yamnaya later. I just can’t see them going through the Hittites who were established in Central Anatolia.

    It is possible for the proto-IE to begin in Anatolia or the Southern Caucasus and for the Yamnaya to go north along the west coast of the Black Sea. That would be the first split. But, then the Greeks would have to be at the western end of the Blob and travel along the west coast of the Black Sea and into Thrace and then down the Aegean into the Peloponnese where they emerge into Bronze Age history.

    I agree that the idea the Aryans went east through the Zagros just doesn’t work.

  13. I’ve seen the map to the study. The interesting thing is that there are no arrows pointing southwards towards Mesopotamia, Egypt, Levant. We know that historically these areas were controlled by IE peoples, so why wouldn’t this be the case in regards to the PIEs? Anyway if this new hybrid model is true, I wonder what PIE society looked like socio-economically. For instance would those who went north into the steppe be the poorer PIEs? What about those that went west to the Aegean or east to India? Probably a mix of wealthy and poor PIEs? Maybe the ones that went into India were poorer and saw the new land as one of opportunity, likewise in the case of those that went north into the steppe. I guess they would have been prehistoric equivalents of Spanish conquistadors and Anglo American settlers, commoners or minor nobles in “new lands”.

    In regards to those who went west into the aegean and central-western anatolia and maybe even south into the levant/mesopotamia/egypt(BIG MAYBE), these regions already had somewhat complex societies, proto cities, and relatively big populations. So then perhaps these populations were just to big(and lucrative) for the conquerors to eliminate completely or mostly. In Greece and Anatolia then, the PIEs assimilated natives too a large extent but not completely. This is probably why non IE languages were present up until the Bronze age in those regions(Minoan in the Aegean and Hattian in Central Anatolia). But if non IE linguistic presence in a region during the Bronze age is evidence of that non IE language being native to that region since the Neolithic, this would also mean that Hurrian was native to Eastern Anatolia/Armenia and by consequence this would mean that PIE(or Proto-Indo-Anatolian) is not native to that region. It’s a very complicated issue. Personally, I still think the steppe is the most likely place of origin for all IE peoples/languages. Southern Arc hypothesis has to explain how Hurrian got to eastern anatolia/armenia.

  14. I don’t feel I can add much more to what has already been well said in this post and comment section, but I find the whole thing a bit suspicious. Call me a skeptic on this one, but between the two papers released in the last year, one on the dna (lack of EHG in ancient Anatolia) and this one on linguistics, I find it a bit to convenient, sort of a crowd pleaser, if you will. Obviously it’s healthy, well and good to continue the research, question and challenge, but this just throws out and glosses over to much previous evidence that was on it’s way to establishing its self in more widespread mainstream arenas, or one would hope. Most people I talk to in my larger circle here in San Jose, Ca get pretty intrigued and excited about such things as CWC, comparative linguistics and religion between say Germanic and Hinduism.
    Thanks for the article Razib.

  15. “I’ve seen the map to the study. The interesting thing is that there are no arrows pointing southwards towards Mesopotamia, Egypt, Levant. We know that historically these areas were controlled by IE peoples”

    Only on the Northern fringe in the Hittite era, and then during the Roman Empire, and very briefly during the Crusades.

  16. Just wanted to say the first comment by ohwilleke is really informative and could be a blog post by itself.

  17. Related – https://www.science.org/doi/10.1126/sciadv.adf7704 – Gray and co-authors find a cross-language family dataset, doesn’t appear to be a relationship between wider language families and more 2L speakers and any grammatical marker simplification (specific grammatical morphemes content rather than indicating grammar by word order / using more content words etc, although this may not be quite how a linguist would explain it).

    Seems like possibly under many contact circumstances, language transfers pretty fine without grammatical simplification.

Comments are closed.