The last glacial maximum bottlenecks and human phylogeny

I’ve mentioned The genomic origins of the world’s first farmers a few times. It’s an intense model-based paper that revises some expectations and models of the origins of diverse human groups on the cusp of the Holocene:

The precise genetic origins of the first Neolithic farming populations in Europe and Southwest Asia, as well as the processes and the timing of their differentiation, remain largely unknown. Demogenomic modeling of high-quality ancient genomes reveals that the early farmers of Anatolia and Europe emerged from a multiphase mixing of a Southwest Asian population with a strongly bottlenecked western hunter-gatherer population after the last glacial maximum. Moreover, the ancestors of the first farmers of Europe and Anatolia went through a period of extreme genetic drift during their westward range expansion, contributing highly to their genetic distinctiveness. This modeling elucidates the demographic processes at the root of the Neolithic transition and leads to a spatial interpretation of the population history of Southwest Asia and Europe during the late Pleistocene and early Holocene.

A few things to note about this paper. First, no mention of Basal Eurasians. This research group doesn’t believe they’re necessary. As you may know, Basal Eurasians were hypothesized because Mesolithic Europeans seem genetically closer to eastern non-Africans than to incoming Early European Farmers (EEF) from Anatolia. One model that can explain this is that there was a population somewhere in N. Africa and W. Asia that split off first from other non-Africans, perhaps more than 60,000 years ago and that eventually merged back with West Eurasians at some point. Lazaridis et al. also believe this might explain why some W. Asia groups have less Neanderthal ancestry; the Basal Eurasians did not admix with them.

The problem, so far, is that nearly a decade after they were hypothesized we haven’t found a mostly Basal Eurasian sample. And, Basal ancestry is found in West Eurasia pretty early. Perhaps they’ll always remain a statistical construct?

Why doesn’t everyone think Basal Eurasians are necessary? If you read the above paper, the key issue is the distortionary impact that bottlenecks can have on the inferred branch lengths of a given phylogeny. They argue that a very strong bottleneck during the LGM 20,000 years ago inflated the divergence of European foragers from other populations and that subsequently, the populations bounced back very well so that their census sizes were likely large. And, they also argue that some of the distinctiveness of EEF from Anatolia is a function of their own bottleneck far more recently, around the beginning of the Holocene. Combined with these bottlenecks there are also various migrations between the branches in the typology, branches differentially impacted by these bottlenecks.

I don’t know how this aligns with earlier models, but I think it’s a serious contender. The key question I wonder is how this fits in with earlier ancient DNA and archaeology.

New David Reich talk

Eurogenes points me to a new talk by David Reich, that has a nice new long abstract online. I’ll just insert my comments within the blockquote…

We present an integrative genetic history of the Southern Arc, an area divided geographically between West Asia and Europe, but which we define as spanning the culturally entangled regions of Anatolia and its neighbors, in both Europe (Aegean and the Balkans), and in West Asia (Cyprus, Armenia, the Levant, Iraq and Iran). We employ a new analytical framework to analyze genome-wide data at the individual level from a total of 1,320 ancient individuals, 731 of which are newly reported and address major gaps in the archaeogenetic record. We report the first ancient DNA from the world’s earliest farming cultures of southeastern Anatolia and northern Mesopotamia, as well as the first Neolithic period data from Cyprus and Armenia, and discover that it was admixture of Natufian-related ancestry from the Levant—mediated by Mesopotamian and Levantine farmers, and marked by at least two expansions associated with dispersal of pre-pottery and pottery cultures—that generated a pan-West Asian Neolithic continuum [“it was” refers to Cyprus and Armenia? How Mesopatamian farmers related to the Zagros-Levant-Anatolian trichotomy?]. Our comprehensive sampling shows that Anatolia received hardly any genetic input from Europe or the Eurasian steppe from the Chalcolithic to the Iron Age; this contrasts with Southeastern Europe and Armenia that were impacted by major gene flow from Yamnaya steppe pastoralists [I believe Southeastern Europe had both patchy early Yamnaya and later Indo-Europeans? Armenia on the other hand seems unique].

In the Balkans, we reveal a patchwork of Bronze Age populations with diverse proportions of steppe ancestry in the aftermath of the ~3000 BCE Yamnaya migrations, paralleling the linguistic diversity of Paleo-Balkan speakers. We provide insights into the Mycenaean period of the Aegean by documenting variation in the proportion of steppe ancestry (including some individuals who lack it altogether), and finding no evidence for systematic differences in steppe ancestry among social strata, such as those of the elite buried at the Palace of Nestor in Pylos [Mycenanean Greece starts at 1750 BC, so probably at least 500 years at least from the major penetration of Indo-Europeans, so that’s 20 generations or so. That seems enough time for status-gene correlations to breakdown if there’s no endogamous caste-like structure].

A striking signal of steppe migration into the Southern Arc is evident in Armenia and northwest Iran where admixture with Yamnaya patrilineal descendants occurred, coinciding with their 3rd millennium BCE displacement from the steppe itself. This ancestry, pervasive across numerous sites of Armenia of ~2000-600 BCE, was diluted during the ensuing centuries to only a third of its peak value [Looking online, there’s a 2012 paper that indicates that modern Armenians have of the specifically Yamnaya R1b lineage. If this, true might explain why Armenian is so hard to place within a Indo-European tree, as Celtic, Germanic, Balto-Slavic and Indo-Iranian seem to come out of a broader Corded Ware cultural complex], making no further western inroads from there into any part of Anatolia, including the geographically adjacent Lake Van center of the Iron Age Kingdom of Urartu. The impermeability of Anatolia to exogenous migration contrasts with our finding that the Yamnaya had two distinct gene flows [David of Eurogenes does not like this, but this could mean Anatolian and CHG/Iranian pulses?], both from West Asia, suggesting that the Indo-Anatolian language family originated in the eastern wing of the Southern Arc and that the steppe served only as a secondary staging area of Indo-European language dispersal. The demographic significance of Anatolia on a Mediterranean-wide scale is further documented by our finding that following the Roman conquest, the Anatolian population remained stable and became the geographic source for much of the ancestry of Imperial Rome itself.

Eurasia, the Stone Age and revenge of the Danes!

In the last week, I put up a big two-part series of posts on Substack, The wolf at history’s door and Casting out the wolf in our midst, about the spread of Indo-European (men) 5,000 years ago. By coincidence, a massive preprint on ancient DNA just came out of the Willerslev coalition of researchers, Population Genomics of Stone Age Eurasia. It really is massive, and is hard to summarize, but here’s the abstract:

The transitions from foraging to farming and later to pastoralism in Stone Age Eurasia (c. 11-3 thousand years before present, BP) represent some of the most dramatic lifestyle changes in human evolution. We sequenced 317 genomes of primarily Mesolithic and Neolithic individuals from across Eurasia combined with radiocarbon dates, stable isotope data, and pollen records. Genome imputation and co-analysis with previously published shotgun sequencing data resulted in >1600 complete ancient genome sequences offering fine-grained resolution into the Stone Age populations. We observe that: 1) Hunter-gatherer groups were more genetically diverse than previously known, and deeply divergent between western and eastern Eurasia. 2) We identify hitherto genetically undescribed hunter-gatherers from the Middle Don region that contributed ancestry to the later Yamnaya steppe pastoralists; 3) The genetic impact of the Neolithic transition was highly distinct, east and west of a boundary zone extending from the Black Sea to the Baltic. Large-scale shifts in genetic ancestry occurred to the west of this “Great Divide”, including an almost complete replacement of hunter-gatherers in Denmark, while no substantial ancestry shifts took place during the same period to the east. This difference is also reflected in genetic relatedness within the populations, decreasing substantially in the west but not in the east where it remained high until c. 4,000 BP; 4) The second major genetic transformation around 5,000 BP happened at a much faster pace with Steppe-related ancestry reaching most parts of Europe within 1,000-years. Local Neolithic farmers admixed with incoming pastoralists in eastern, western, and southern Europe whereas Scandinavia experienced another near-complete population replacement. Similar dramatic turnover-patterns are evident in western Siberia; 5) Extensive regional differences in the ancestry components involved in these early events remain visible to this day, even within countries. Neolithic farmer ancestry is highest in southern and eastern England while Steppe-related ancestry is highest in the Celtic populations of Scotland, Wales, and Cornwall (this research has been conducted using the UK Biobank resource); 6) Shifts in diet, lifestyle and environment introduced new selection pressures involving at least 21 genomic regions. Most such variants were not universally selected across populations but were only advantageous in particular ancestral backgrounds. Contrary to previous claims, we find that selection on the FADS regions, associated with fatty acid metabolism, began before the Neolithisation of Europe. Similarly, the lactase persistence allele started increasing in frequency before the expansion of Steppe-related groups into Europe and has continued to increase up to the present. Along the genetic cline separating Mesolithic hunter-gatherers from Neolithic farmers, we find significant correlations with trait associations related to skin disorders, diet and lifestyle and mental health status, suggesting marked phenotypic differences between these groups with very different lifestyles. This work provides new insights into major transformations in recent human evolution, elucidating the complex interplay between selection and admixture that shaped patterns of genetic variation in modern populations.

There’s so much, I can’t really reduce. Here are some highlights

1 – New hunter-gatherer cluster with a focus in the eastern Ukraine/Russian border region. Between the Dnieper and Don. Because I can barely read the admixture grap in extended figure 4, I’m not totally clear where this group is positioned in the graph, though it has some Causus hunter-gatherer

2 – Neolithicization was pretty slow (demic) in most of Europe, except Scandinavia. We knew this. Steppe arrival was faster everywhere, but mixed with local Neolithic substrate…except in Scandinavia, where there was straight up replacement. But Scandinavians do have Neolithic ancestry…so where’s that from?

3 – The paper claims that the Corded Ware people mixed with Globular Amphora culture. I’m pretty sure if they looked closely all the South Asians will steppe ancestry will show this too, and not any other type of European Neolithic.

4 – Scandinavia seems to have had several replacements even after the arrival of the early Battle Axe people. This is clear in Y chromosome turnover, from R1a to R1b and finally to mostly I1, the dominant lineage now. They claim that later Viking and Norse ancestry is mostly from the last pulse during the Nordic Bronze Age.

5 – They claim to detect it’s clear that Neolithic ancestry in North/Central/Eastern Europe was from Southeast Europe, while that in Western Europe was from Southwest Europe. This is expected.

6 – They confirm that in terms of polygenic prediction Yamnaya people were taller. They claim that it looks like N vs. S European differences in height aren’t selection, but stratification (Yamnaya predicts tallness).

7 – They find that dark hair and skin in Europeans seems correlated with WHG ancestry. This seems to confirm that the WHG were indeed dark of hair and eye. They find that lighter skin/hair really seems to come with Anatolian farmers and Yamnaya. Not the hunter-gatherers. Though selection does start earlier. They assert this has something to do with UV/Vitamin D, but if that, why were the HG groups dark? (if blue-eyed in the case of WHG) I think the explanation is some interaction with the agro-pastoralist lifestyle.

They also confirm that pigmentation selection went on until 3,000 years ago. This is obvious, and to me, it explains easily the heterogeneity in some CWC and post-CWC populations. Some of the early Bell Beakers in Britain look totally modern in pigmentation, but other populations are darker than they should be.

8 – Lots of selection in diet and immune system. What you’d expect. Basically a lot of illnesses might be mixture of the various populations. For example, diabetes comes from WHG.

9 – Neolithic Anatolians seem associated with some psychiatric issues. Could this be due to early dense-living? No idea. Also, they find EDU was selected for (one locus). Might be pleiotropy though.

10 – They find the African R1b around Lake Chad in some Ukrainian samples. Seems to confirm that somehow it’s from Eastern Europe? Weird.

Anyway, read it and tell me what you think.

Population Pairwise Fst on 250,000 SNPs

People routinely ask me about a place to find pairwise Fst values. I have a dataset with 250,000 SNPs and 200 populations, and a script using plink that generates pairwise differences crosses populations. Here are two files with the results:

A file with the Fst values between populations in rows

A file with the Fst values between populations as a matrix

The humans of Wallacea


A new open-access paper, Genome of a middle Holocene hunter-gatherer from Wallacea:

Much remains unknown about the population history of early modern humans in southeast Asia, where the archaeological record is sparse and the tropical climate is inimical to the preservation of ancient human DNA. So far, only two low-coverage pre-Neolithic human genomes have been sequenced from this region. Both are from mainland Hòabìnhian hunter-gatherer sites: Pha Faen in Laos, dated to 7939–7751 calibrated years before present (yr cal BP; present taken as AD 1950), and Gua Cha in Malaysia (4.4–4.2 kyr cal BP). Here we report, to our knowledge, the first ancient human genome from Wallacea, the oceanic island zone between the Sunda Shelf (comprising mainland southeast Asia and the continental islands of western Indonesia) and Pleistocene Sahul (Australia–New Guinea). We extracted DNA from the petrous bone of a young female hunter-gatherer buried 7.3–7.2 kyr cal BP at the limestone cave of Leang Panninge in South Sulawesi, Indonesia. Genetic analyses show that this pre-Neolithic forager, who is associated with the ‘Toalean’ technocomplex shares most genetic drift and morphological similarities with present-day Papuan and Indigenous Australian groups, yet represents a previously unknown divergent human lineage that branched off around the time of the split between these populations approximately 37,000 years ago. We also describe Denisovan and deep Asian-related ancestries in the Leang Panninge genome, and infer their large-scale displacement from the region today.

The best model seems to be the one to the right: the new Wallacean hunter-gatherer has some ancestry deeply related to Australo-Melanesians, and, another proportion of its ancestry is deeply related to East Asians. In particular, the East Asian-related ancestry seems to be basal or deeply diverged from the paleo-Southern East Asian ancestry. There’s a lot in the structure of ancient East Asian populations that I think we’re pretty unclear about, and need more DNA to really understand what’s going on.

But, I do want to mention that in about 24 hours I’ll be posting a discussion I had with Max Larena about the Denisovan admixture in the Phillippines on my Substack. It’ll be ungated in a few weeks.

Max, and this paper, convince me that Peter Bellwood’s simple model of the spread of farming into Southeast Asia ~4,000 years ago is probably wrong on some level. Too bad, it was a nice simple story. Basically, Northeast Asian populations may have had a presence further south far earlier, and they may have been hunter-gatherers initially.

Complex history of archaic ancestry

On the Apportionment of Archaic Human Diversity:

The apportionment of human genetic diversity within and between populations has been measured to understand human relatedness and demographic history. Likewise, the distribution of archaic ancestry in modern populations can be leveraged to better understand the interaction between our species and its archaic relatives, and the impact of natural selection on archaic segments of the human genome. Resolving these interactions can be difficult, as archaic variants in modern populations have also been shaped by genetic drift, bottlenecks, and gene flow. Here, we investigate the apportionment of archaic variation in Eurasian populations. We find that archaic genome coverage at the individual- and population-level present unique patterns in modern human population: South Asians have an elevated count of population-unique archaic SNPs, and Europeans and East Asians have a higher degree of archaic SNP sharing, indicating that population demography and archaic admixture events had distinct effects in these populations. We confirm previous observations that East Asians have more Neanderthal ancestry than Europeans at an individual level, but surprisingly Europeans have more Neandertal ancestry at a population level. In comparing these results to our simulated models, we conclude that these patterns likely reflect a complex series of interactions between modern humans and archaic populations.

The method is pretty neat. Read this closely. Here are some takeaways:

– European Neanderthal ancestry is lower than East Asian, but more diverse

– South Asians clearly have different Denisovan ancestry than East Asians

– Population structure matters…South Asian rare allele frequency is due to admixture between divergence groups

Basically, Neanderthal and Denisovan admixture is more complex than our simple stylized models.

Natural selection caught in the act

Analysis of genomic DNA from medieval plague victims suggests long-term effect of Yersinia pestis on human immunity genes:

Pathogens and associated outbreaks of infectious disease exert selective pressure on human populations, and any changes in allele frequencies that result may be especially evident for genes involved in immunity. In this regard, the 1346-1353 Yersinia pestis-caused Black Death pandemic, with continued plague outbreaks spanning several hundred years, is one of the most devastating recorded in human history. To investigate the potential impact of Y. pestis on human immunity genes we extracted DNA from 36 plague victims buried in a mass grave in Ellwangen, Germany in the 16th century. We targeted 488 immune-related genes, including HLA, using a novel in-solution hybridization capture approach. In comparison with 50 modern native inhabitants of Ellwangen, we find differences in allele frequencies for variants of the innate immunity proteins Ficolin-2 and NLRP14 at sites involved in determining specificity. We also observed that HLA-DRB1*13 is more than twice as frequent in the modern population, whereas HLA-B alleles encoding an isoleucine at position 80 (I-80+), HLA C*06:02 and HLA-DPB1 alleles encoding histidine at position 9 are half as frequent in the modern population. Simulations show that natural selection has likely driven these allele frequency changes. Thus, our data suggests that allele frequencies of HLA genes involved in innate and adaptive immunity responsible for extracellular and intracellular responses to pathogenic bacteria, such as Y. pestis, could have been affected by the historical epidemics that occurred in Europe.

This isn’t surprising. But now that old DNA studies are getting cheap and mass-produced, I think people will be looking at changes in allele frequencies in the last 2,000 years a lot. More sophisticated methods for detecting natural selection either conclude or imply that sweeps are happening now, but this sort of study will confirm it (there’s evidence of natural selection in American Indians for obvious and unfortunate reasons).

Lewontin’s Paradox in the 21st century

Why do species get a thin slice of π? Revisiting Lewontin’s Paradox of Variation:

Under neutral theory, the level of polymorphism in an equilibrium population is expected to increase with population size. However, observed levels of diversity across metazoans vary only two orders of magnitude, while census population sizes (Nc) are expected to vary over several. This unexpectedly narrow range of diversity is a longstanding enigma in evolutionary genetics known as Lewontin’s Paradox of Variation (1974). Since Lewontin’s observation, it has been argued that selection constrains diversity across species, yet tests of this hypothesis seem to fall short of explaining the orders-of-magnitude reduction in diversity observed in nature. In this work, I revisit Lewontin’s Paradox and assess whether current models of linked selection are likely to constrain diversity to this extent. To quantify the discrepancy between pairwise diversity and census population sizes across species, I combine genetic data from 172 metazoan taxa with estimates of census sizes from geographic occurrence data and population densities estimated from body mass. Next, I fit the relationship between previously-published estimates of genomic diversity and these approximate census sizes to quantify Lewontin’s Paradox. While previous across-taxa population genetic studies have avoided accounting for phylogenetic non-independence, I use phylogenetic comparative methods to investigate the diversity census size relationship, estimate phylogenetic signal, and explore how diversity changes along the phylogeny. I consider whether the reduction in diversity predicted by models of recurrent hitchhiking and background selection could explain the observed pattern of diversity across species. Since the impact of linked selection is mediated by recombination map length, I also investigate how map lengths vary with census sizes. I find species with large census sizes have shorter map lengths, leading these species to experience greater reductions in diversity due to linked selection. Even after using high estimates of the strength of sweeps and background selection, I find linked selection likely cannot explain the shortfall between predicted and observed diversity levels across metazoan species. Furthermore, the predicted diversity under linked selection does not fit the observed diversity–census-size relationship, implying that processes other than background selection and recurrent hitchhiking must be limiting diversity.

Assessing the utility of models in ancient DNA admixture analyses

Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture:

qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and non-human) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm.

The Reich lab provides its software and data. It’s really not that hard to replicate and tweak some of the analyses they do in their papers (check the supplements for the detailed specifications of the parameters). I’ve done many times when I got curious about a detail they hadn’t explored.

The preprint above is a valuable addition to the intuitions one can develop through using the packages.

If marrying cousins is so bad why does everyone want to marry their cousins?

The above figure illustrates the geographic distribution of the prevalence of people marrying people closely related to them. Mostly this involves cousin marriage. Most people know the urban legends around the debilities that occur due to cousin marriage, but traditionally the focus has been on rare recessive diseases (e.g., albinism). Now, a massive new study has been published (more than 400 authors, with sample sizes for 1 million or more for some characteristics) looking at a variety of traits, Associations of autozygosity with a broad range of human phenotypes:

In many species, the offspring of related parents suffer reduced reproductive success, a phenomenon known as inbreeding depression. In humans, the importance of this effect has remained unclear, partly because reproduction between close relatives is both rare and frequently associated with confounding social factors. Here, using genomic inbreeding coefficients (FROH) for >1.4 million individuals, we show that FROH is significantly associated (p < 0.0005) with apparently deleterious changes in 32 out of 100 traits analysed. These changes are associated with runs of homozygosity (ROH), but not with common variant homozygosity, suggesting that genetic variants associated with inbreeding depression are predominantly rare. The effect on fertility is striking: FROH equivalent to the offspring of first cousins is associated with a 55% decrease [95% CI 44–66%] in the odds of having children. Finally, the effects of FROH are confirmed within full-sibling pairs, where the variation in FROH is independent of all environmental confounding.

The offspring of first cousins have on average 0.10 fewer children. On an individual level, this is not that great of an effect. But in an evolutionary population genetics sense this is a serious selection coefficient.

On the whole, the paper is impressive in its scope. There are even sibling analyses to confirm the impact of runs of homozygosity causing problems due to rare alleles (since this paper involved r.o.h, of course, Jim Wilson is involved!).

Rather, I want to ask: if inbreeding is so bad genetically and biologically, why is it so common? One of the consequences of the Protestant Reformation is that the Roman Catholic Church’s strict enforcement of consanguinity rules were dropped, and cousin marriage became much more common among elites (such as the Darwin-Wedgewood family). The material rationale for cousin marriage is actually rather straightforward, in that it keeps accumulated property and power within the extended lineage. Marriages between children of brothers may cement alliances, while matrilocality and marriages between cross-cousins in South India have been associated with lower domestic abuse rates (in contrast, in North India strongly enforced exogamy has been associated with the idea that women marry into an alien household).

I would suggest perhaps that though marriages between relatives are biologically disfavored, there are many cases where it is culturally beneficial. In societies where collective family units engage in inter-group competition, some level of consanguinity may benefit cohesion. Other societies where individualism is more operative may exhibit no such incentives.

Note: I don’t see great evidence of purging genetic load in populations with more inbreeding. The rare variants are probably replenished constantly through mutation?