Substack cometh, and lo it is good. (Pricing)

The enormous demographic impact of the Indo-Europeans


When I was a kid I remember seeing a map of the distribution of Indo-European languages, and being perplexed by their spread and distribution, from the North Sea to the Bay of Bengal. Later, I learned and understood that language families can spread by diffusion and cultural assimilation. In The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World David Anthony outlines an elite emulation model of the Indo-Europeanization, whereby groups of warriors associated with the Kurgan cultures took over and reshaped a broad range of societies.

The samples Anthony provided were instrumental in recalibrating his own model. It turns out that steppe migrants were extremely genetically impactful. Rather than a small minority, many archaeological cultures seem to have been predominantly steppe in genetic origin (total number of ancestors). The best estimates seem to be that ancestry from the steppe is somewhat more than half the total in northern and eastern Europe, and somewhat less than half in southern and western Europe (i.e., northeast to southwest cline).

More recently, it also seems that a substantial, though a smaller, proportion of the ancestry in southern Asia also derives from the steppe peoples. Within India itself, the range seems to be from 25-30% among some groups, such as North Indian Brahmins and Jatts, to a more typical range between 5 and 15% (peasant castes in South India are closer to the former, peasant castes in the Gangetic plain are closer to the latter).

Using the proportions in various ethnic groups in the Indian subcontinent, as well as across European nations, I have come to conclude that around ~10% of the ancestry in the world derives from people who were members of the “Yamna Horizon” ~3000 BC.* I don’t know the archaeology well enough to be highly informed, but I’m willing to bet that closer to 1% of the world’s population lived in and around the Yamna Horizon, so over the last 5,000 years, you’ve seen a 10-fold increase in representation of this ancestral component. More concretely, I think that the vast majority of the increase occurred between 2500 BC (when expansions into Britain and Southern Europe seem to have occurred) and 1000 BC (when the core area of the Indian subcontinent was Aryanized).

* I did stuff like weighted caste groups in Uttar Pradesh, looked at the populations of India states, added Pakistan and Bangladesh, as well as assigning estimates to European countries. I did some back-of-the-envelope for North and South America (e.g., assume that 50% of the ancestry is Iberian, and assume that 25% of that 50% is steppe).

20 thoughts on “The enormous demographic impact of the Indo-Europeans

  1. Thus, most Indo Aryan folks from Sindh to Bengal have 15% of sky father’s blessing, and their Dravidian brethren 5%.

    “Jai Shri Sky”

  2. When I was a kid I remember seeing a map of the distribution of Indo-European languages

    I had a simialr experience. I have a aspy-fascination with maps and one of my prized teenage birthday presents was the “The New Cambridge Modern History Atlas”.

    One of the maps was a linguistic map of the Subcontinent. I still remember seeing a list of languages under the heading: “Indo-European”. Wait, I thought,”Indo” AND “European”??? Needless to say that started a life-long fascination with languages, which transitioned nicely into genetics…

    I am proud to say that I still have that atlas and that my teenage boys are as fascinated with it as I am… 🙂

  3. The situation is likely even more extreme than Razib is saying,
    because the people of the Yamna Horizon are extremely homogeneous
    genetically (although dispersed geographically across a huge area).
    It is likely that Yamna themselves originated culturally and genetically from a tiny group, perhaps no more than a few thousand adults.

  4. Do we have some approximate numbers of the percentage of Yamna ancestry in modern Iran?

  5. “peasant castes in South India are closer to the former”

    Do they speak indo-european languages?

  6. “I’m willing to bet that closer to 1% of the world’s population lived in and around the Yamna Horizon”

    Like Nick Patterson, I think that 1% is too high. One of the leading estimates puts the population of the world in 3000 BC at 14 million. https://www.census.gov/data/tables/time-series/demo/international-programs/historical-est-worldpop.html And, my sense of the situation is that 140,000 people is a considerable overestimate of the population of the Yamna Horizon at that time, probably by a factor of ten or so.

  7. Are there any studies on Yamnaya genetic influence on Balkan population?

    I think that I read somewhere that Sintashta DNA is partially from Balkan farmer. Can you tell me if that’s true?

    Thank you!

  8. I suspect that’s a little overweighted for South Asia?

    The range under David’s Global 25 data (using the Vahaduo calculator) for Sintashta contribution seems about 25 (Kalash) to 20 (Brahmin) to 4-5% in many lowest groups (the rest allowing contributions from Paniya with minimal Steppe_MLBA, Iran_Neolithic, Turan Eneolithic, and Indus_Periphery.

    Sintashta is about 75% Yamnaya/steppe_EBA, so probably more like 19%, 15%, 2-3% for respective comparisons.

    If you wanted to capture pre-Yamnaya, Steppe_EBA is about 10% EEF/WHG, so reduction by 10% to 17%, 13.5%, etc. would do that.

    Quick puzzle. Greater increase in demographic legacy in the last 5000 years in “steppe_EBA” or “ASI”? Assuming that Indus_Valley had as little ASI as Narasimhan’s paper thinks.

  9. @Matt
    The Narasimhan paper looked at the Periphery but it also had a PCA with the Rakhigarhi sample. It was the other September 2019 paper which claimed a low AASI in Rakhigarhi, but someone pointed out that it was far too up in the Iran to AASI cline for it to have as little AASI as what the September paper claimed. September paper claimed low 20s% AASI but on the cline it was located close to the 40%+ AASI Shahr BA3 samples.

  10. @DaThang, yeah, I noticed the same sort of apparent inconsistency, but thought not worth getting into that when framing my question. When the paper came out it looked from a composite of their ADMIXTURE and PCA more like 33% or something I think? Can’t quite remember. I don’t know if it’s possible to be so robust with the testing.

  11. That said, I think you’re right on the money that 10% (or about 700 million) is about the guesstimate number for the “virtual population” that is Steppe_EMBA today though (in the sense that there is a virtual population of 2% Neanderthals or about 200 million today). Even with a slightly lower Steppe_EMBA estimate in South Asia.

    Quick estimate using the Top 101 countries in population (Wikipedia) and estimates: https://i.imgur.com/IXMo1Zt.png

    (Higher counting estimates some of these countries might shift up to as much as 13-15%, but I don’t think would change much more than that).

    I would very roughly guesstimate EEF (outside Steppe_EMBA but counting EEF in Steppe_MLBA) about the same virtual population, maybe slightly greater than the steppe: https://i.imgur.com/3HsTO0Z.png

    (You could reduce a little if you wanted just European EEF, discounting that which expanded directly from Anatolia into the Near East and Central Asia, or if you removed European HG from the European EEF side. But wouldn’t change much as the demographic weight of EEF is in Europe and neo-Europes).

    (Comparing EEF to Steppe_EMBA, the greater size of South Asian population looks to be matched by probable greater contribution of EEF to present day Latin America, despite lower total Latin Am population).

    It shows how huge the relative size of Asia (inc. the predominantly Iran_N+AncientNorthEurasian+ASI basis of South Asia) and Africa is today that populations of EEF and Steppe_EMBA combined probably only still roughly break 20-27% of the world pop, probably at the maximal estimate, despite huge geographic expansions.

    In terms of timing, not sure if the main increase as a % of population for Steppe_EMBA isn’t relatively recent in last 2500 years (with demographic expansion in Northern and Eastern Europe, South Asia, European colonization), even if the main geographic expansions that set the antecedents probably happen in range of 2500-1000 BC like you say (e.g. Steppe_EMBA moved into lots of populations in this time, but only became a large % of the world population when those populations stopped being as much marginal pastoralists and began big agricultural expansions, then spread ancestry around with states and empires).

  12. @ohwilleke: One of the leading estimates puts the population of the world in 3000 BC at 14 million. And, my sense of the situation is that 140,000 people is a considerable overestimate of the population of the Yamna Horizon at that time, probably by a factor of ten or so.

    That’s probably, most likely, all correct or close to it.

    Though that said, I would say that historical population estimates tend to be kind of wimpy and I think would have wide “confidence intervals” – in an ideal world we would just be able to use techniques that model direct from the genome changes in effective population size over time (PSMC, momi, etc) either directly on high coverage ancient genomes, or from huge numbers of present day people and then regress onto their estimated Steppe_EMBA ancestry.

    We might find that late and sophisticated mesolithic pottery using foraging populations, and early herding populations, had higher populations than we thought, relative to neolithic farmers.

    I’m looking at this through a lens where last decade, people often thought that early farming yielded such huge populations relative to herders and foragers that demographic replacement from the steppe was impossible (and so the search for the spread of IE looked at elite recruitment ideas), and so a bias to Renfrew’s IE theory. That seems somewhat wrong, so I don’t think it’s totally wrong to be skeptical that historical population estimates that informed these are a bit open to question.

  13. hey, so why is global25 better than doing the model in qpadmin? i am looking at david’s website but am not clear. it seems optimized for david’s genotype…

  14. I’m not sure if it’s better tbh, it was just really the only quick method I have to hand and is generally fairly congruent with qpadm. (I would say that being I prefer being able to jointly test for more populations? I know academically using these PCA position inference methods is not thought well of due to elements which can effect PCA position, but it is very close.)

    That said, the admixture model estimates from Narasimhan’s supplement extended model are still about 28-30% Kalash and 25-23% Brahmin for Steppe_MLBA , which would goes to 22% and 18% Steppe_EMBA assuming 75:25 Steppe_EMBA:MN_Europe in Steppe_MLBA as seems to be the case. (Other examples picked out at random, Patel and Punjabi about 12.5% and 15% Steppe_MLBA, goes to about 10% to 11.25% Steppe_EMBA).

    It seems like David’s PCA method produces slightly lower estimates of Steppe_MLBA than Narasimhan’s paper but very close, and not really a big deal, however the main difference between your estimate and mine maybe (I could be wrong?) seems to be due to taking numbers in about Narasimhan’s range and adjusting for Steppe_MLBA vs Steppe_EMBA?

  15. i reran the qpadmin with narasimhan’s ref pops to replicate his results. but yeah, the difference seems minimal. i was just curious why everyone used global25 and now i get it. it’s accessible (yes, admixtools takes a fair amount of initial investment to get working…i have written custom scripts to do some things)

  16. But, armenian don’t have yamnaya ancestry – near zero.

    What’s wrong with them???

Comments are closed.