But why is the lactase persistent allele not in HWE?

Dairying, diseases and the evolution of lactase persistence in Europe:

In European and many African, Middle Eastern and southern Asian populations, lactase persistence (LP) is the most strongly selected monogenic trait to have evolved over the past 10,000 years1. Although the selection of LP and the consumption of prehistoric milk must be linked, considerable uncertainty remains concerning their spatiotemporal configuration and specific interactions2,3. Here we provide detailed distributions of milk exploitation across Europe over the past 9,000 years using around 7,000 pottery fat residues from more than 550 archaeological sites. European milk use was widespread from the Neolithic period onwards but varied spatially and temporally in intensity. Notably, LP selection varying with levels of prehistoric milk exploitation is no better at explaining LP allele frequency trajectories than uniform selection since the Neolithic period. In the UK Biobank4,5 cohort of 500,000 contemporary Europeans, LP genotype was only weakly associated with milk consumption and did not show consistent associations with improved fitness or health indicators. This suggests that other reasons for the beneficial effects of LP should be considered for its rapid frequency increase. We propose that lactase non-persistent individuals consumed milk when it became available but, under conditions of famine and/or increased pathogen exposure, this was disadvantageous, driving LP selection in prehistoric Europe. Comparison of model likelihoods indicates that population fluctuations, settlement density and wild animal exploitation—proxies for these drivers—provide better explanations of LP selection than the extent of milk exploitation. These findings offer new perspectives on prehistoric milk exploitation and LP evolution.

Two issues

1) Doesn’t seem to explain why LP started becoming common in Britain before the continent

2) Why are the alleles not in HWE? There’s not really any assortative mating.

Back migration into Africa by Eurasians

Two preprints/papers.

Identifying and Interpreting Apparent Neanderthal Ancestry in African Individuals:

Admixture has played a prominent role in shaping patterns of human genomic variation, including gene flow with now-extinct hominins like Neanderthals and Denisovans. Here, we describe a novel probabilistic method called IBDmix to identify introgressed hominin sequences, which, unlike existing approaches, does not use a modern reference population. We applied IBDmix to 2,504 individuals from geographically diverse populations to identify and analyze Neanderthal sequences segregating in modern humans. Strikingly, we find that African individuals carry a stronger signal of Neanderthal ancestry than previously thought. We show that this can be explained by genuine Neanderthal ancestry due to migrations back to Africa, predominately from ancestral Europeans, and gene flow into Neanderthals from an early dispersing group of humans out of Africa. Our results refine our understanding of Neanderthal ancestry in African and non-African populations and demonstrate that remnants of Neanderthal genomes survive in every modern human population studied to date.

Basically, this paper concludes that Eurasian back-migration related to Europeans/West Asians seems to be around 30% of Sub-Saharan African ancestry. They carry about 30% of the Neanderthal ancestry of Eurasians.

Then, a preprint that uses a pretty sophisticated method, Ancient Admixture into Africa from the ancestors of non-Africans:

Genetic diversity across human populations has been shaped by demographic history, making it possible to infer past demographic events from extant genomes. However, demographic inference in the ancient past is difficult, particularly around the out-of-Africa event in the Late Middle Paleolithic, a period of profound importance to our species’ history. Here we present SMCSMC, a Bayesian method for inference of time-varying population sizes and directional migration rates under the coalescent-with-recombination model, to study ancient demographic events. We find evidence for substantial migration from the ancestors of present-day Eurasians into African groups between 40 and 70 thousand years ago, predating the divergence of Eastern and Western Eurasian lineages. This event accounts for previously unexplained genetic diversity in African populations and supports the existence of novel population substructure in the Late Middle Paleolithic. Our results indicate that our species’ demographic history around the out-of-Africa event is more complex than previously appreciated.

This paper estimates 35-40% back-migration from the ancestral proto-Eurasian population, with less (~20%) in African hunter-gatherers. This paper didn’t detect Neanderthal ancestry and argues that the back-migration predates the West vs East Eurasian split. It plausibly argues African effective population sizes are inflated by the admixture event.

The two results here clearly contradict the details.

The genetics of Southeast Asia gets more complex…

Ancient genomes from the last three millennia support multiple human dispersals into Wallacea:

Previous research indicates that the human genetic diversity found in Wallacea – islands in present-day Eastern Indonesia and Timor-Leste that were never part of the Sunda or Sahul continental shelves – has been shaped by complex interactions between migrating Austronesian farmers and indigenous hunter-gatherer communities. Here, we provide new insights into this region’s demographic history based on genome-wide data from 16 ancient individuals (2600-250 yrs BP) from islands of the North Moluccas, Sulawesi, and East Nusa Tenggara. While the ancestry of individuals from the northern islands fit earlier views of contact between groups related to the Austronesian expansion and the first colonization of Sahul, the ancestry of individuals from the southern islands revealed additional contributions from Mainland Southeast Asia, which seems to predate the Austronesian admixture in the region. Admixture time estimates for the oldest individuals of Wallacea are closer to archaeological estimates for the Austronesian arrival into the region than are admixture time estimates for present-day groups. The decreasing trend in admixture times exhibited by younger individuals supports a scenario of multiple or continuous admixture involving Papuan- and Asian-related groups. Our results clarify previously debated times of admixture and suggest that the Neolithic dispersals into Island Southeast Asia are associated with the spread of multiple genetic ancestries.

This paper is hard to parse. But here are my takeaways

– the samples in this study do not seem particularly closely related to the 7,000 years old sample from Sulawesi

– there was likely an earlier mainland migration into Wallacea of Austro-Asiatic speaking people

– gene flow seems to have been reoccurrent from the east, in western Melanesia, as well as from Austronesians to the northeast

Yemen and the Yemeni Jews


In my Substack post Under pressure: the paradox of the diamond I said this:

The implication of these DNA results is that Yemeni Jews are by and large descended from natives of this region of Arabia. They are converts, and their genetic uniqueness is a function of their isolation from demographic currents that swept across Arabia with the rise of Islam. The Yemenis of the highlands, isolated by geography, show the same genetic signature of isolation, as they descend solely from the original inhabitants of the region. This is the nth demonstration that culture and geography are both powerful factors driving genetic distinctiveness.

Some people took objection, or, inquired further, as to why I said this. From High-resolution inference of genetic relationships among Jewish populations:

Four Jewish populations included in the study—Ethiopian Jews, Indian Jews from Cochin, Indian Jews from Mumbai, and Yemenite Jews—are considered to be culturally distinct and not part of the Ashkenazi, Mizrahi, North African, or Sephardi groups; they are therefore not analyzed in sets…

…Figure 1b reveals a distinctive position for the Yemenite Jewish samples in relation to other Jewish populations…

…The resulting MDS plot (Fig. 1c) places the Yemenite Jews near Bedouin, Saudi Arabian, and Yemenite non-Jewish populations…

…Jewish populations have mixed membership in the two clusters, with the exception of the Yemenite Jews, who are placed primarily in the main cluster among Middle Eastern populations. For K = 3, the third cluster (dark blue) separates the Mozabite and Moroccan populations. Non-Jewish populations from the Levant generally have substantial membership in this cluster, as do North African and Yemenite Jews.

For K = 6, Yemenite Jews have relatively high membership in the new cluster, which also has substantial membership from Middle Eastern populations such as Bedouins and Saudi Arabians (pink)…

We further reduced the population set, exploring structure among Jewish populations, continuing to exclude Ethiopian and Indian Jews, and also excluding the relatively dissimilar Yemenite Jews (population set 4)…

You can look at the plots above. I also added some of my own after I added Vyas et al. Yemen samples (warning, only 7,000 SNP intersection!). Using my own Fst, PCA, and TreeMix, I think it’s possible that the modern Yemenis aren’t related to ancient Yemenis, but Yemeni Jews clearly cluster with modern Arabian populations.

What does the three-population test say? You can look here, but Yemeni Jews don’t show a significant deviation from a three-population phylogeny when they’re an outgroup with the populations I have. That means with my particular model they’re probably best thought of as an ancient Arabian population without much gene flow from external sources (they don’t have much African admixture, unlike other Yemenis).

If you want to see the alternative, please read Mitochondrial DNA reveals distinct evolutionary histories for Jewish populations in Yemen and Ethiopia. I’m not spending any more time on this.

So many assumptions about Africa


I have been staring and this figure and rereading Ancient West African foragers in the context of African population history. The Shum Laka sample from this paper, dating to four to eight thousand years ago, have drawn my attention, and I’m just looking at them a lot.

It seems ridiculous I’ve been using Nigerians as my “African reference” for decades. Most African populations, including Pygmies and Khoisan, have Eurasian admixture from the last 10,000 years. And what about deeper back-to-Africa ancestry? That seems likely and is hinted at in the above paper.

Modern human lineages have a deep history in Africa and the Near East. I think we’re going to have a transformation of our understanding of what happened in these regions in the near future.

Got milk long before genes for milk


The story of lactase persistence (“lactose tolerance”) evolving is one of the best gene-culture coevolution stories we had. Arguably it was the canonical example. The story was simple, multiple times humans took up dairy-culture, and multiple times humans changed so that they could digest lactose, milk sugar, into adulthood. This is about 30% of the caloric intake of raw milk (the rest being fat and protein). For some people their gut flora reacts negatively to the sugar bath if it’s not digested, leading to discomfort in addition to wasted calories.

In the 2000’s several mutations were discovered around LCT, the gene responsible for producing lactase, which breaks down lactose. One mutation was found across Europe and Central Asia. Another among the Arabs. And Another in East Africa. The “mutational target” was big. The mutation in the European and Central Asian variant breaks a regulatory element that represses the expression of LCT in adults. There are lots of ways to break something. Lactase persistence isn’t really a gain of function, it’s just never shutting off the function, which itself is a feature, not a bug.

The haplotype around LCT is long and indicative of a really strong sweep in Europeans. It was in some ways a positive control for tests of selection.

The problem is that there are now major problems with this narrative. In short, dairy-culture predates the increase in frequency for lactase persistence alleles by thousands of years. The ancient DNA transects in Europe are so good that it seems pretty clear that the frequency was way lower during the Iron Age, and didn’t reach “modern” levels until the historical period.

The same is now known to be true in Africa: Humans were drinking milk before they could digest it.

This doesn’t mean that these mutations have nothing to do with milk. But there needs to be a rethink of the selection story. Perhaps there was a genetic modifier that spread recently which isn’t a big mutational target, and that’s why the lactose digestion alleles rose in the last 3,000 years? I don’t know. No one really does.

What was the population of the Americas in 1492?

Several people have asked me about the new study on ancient DNA in the Caribbean, A genetic history of the pre-contact Caribbean. There is a lot to this paper, some of which is outside of my purview (e.g., I don’t know anything about the archaeology of this region so can’t interpret the genetic results well). One of the major things they did was establish patterns of relatedness. This seems like a major step forward in terms of future applicability to ancient DNA.

But the biggest thing that jumped out at me had to do with effective population size. Carl Zimmer’s write-up highlights this issue:

The genetic variations also allowed Dr. Reich and his colleague to estimate the size of the Caribbean society before European contact. Christopher Columbus’s brother Bartholomew sent letters back to Spain putting the figure in the millions. The DNA suggests that was an exaggeration: the genetic variations imply that the total population was as low as the tens of thousands.

This matters because it starts to change our sense of revisionism (now orthodox?) in books such as 1491: New Revelations of the Americas Before Columbus. To reconcile the small numbers of indigenous people by the 16th century in the Caribbean the hypothesis that there were mass die-offs due to disease, or, the Spanish were inordinately cruel (“The Black Legend”). These results suggest that the scale of the pandemic shock was less of an issue since the baseline number of native peoples is lower in the area.

What does this imply for the rest of the New World? I don’t know. But perhaps the huge census sizes argued for by some scholars won’t hold? It probably depends on the region. But with enough ancient DNA, the same sort of analyses could be replicated.

The Greeks in the mountains

The New Yorker has a long feature that explores the strange results from the paper last year, Ancient DNA from the skeletons of Roopkund Lake reveals Mediterranean migrants in India. Basically, they found a bunch of Indians who died 1,000 years ago, and, a bunch of Greeks who died a few centuries ago. They were buried naturally in a very isolated lake high in the Himalayas. There are all sorts of hypotheses regarding the Greeks, whose bones indicate a Mediterranean diet, and the closest match to individuals in Crete. My personal experience is that “mainland Greeks” tend to be a bit Northern European shifted, so these individuals may have been Anatolian or Aegean Greeks.

Stuart Fidel, who sometimes comments on this weblog, suggests these were Armenian traders. But David Reich correctly points out Armenians are very distinct genetically from Greeks (though the two are not entirely different obviously!). Another hypothesis is a bone mix-up, but the issue here is there are a lot of individuals who are of the same population and seem to have lived in the same region. How could bone mix-ups produce so many systematic errors?

Ultimately there’s no final answer in the piece, though hopefully, someone will present a reasonable conjecture.

Because the piece has Reich and his lab spotlighted, they allude to the controversy around him. This is ultimately going to be the legacy of the hit-piece from a few years back. He’s now a “controversial figure,” which is, to be frank not a bad thing in the eyes of some of the Reich lab’s scientific rivals. Most media treatments that aren’t purely about his research (i.e., Carl Zimmer’s column in The New York Times covering the Reich lab publications) will mention this now.

Here’s why he’s a mensch:

Still, some anthropologists, social scientists, and even geneticists are deeply uncomfortable with any research that explores the hereditary differences among populations. Reich is insistent that race is an artificial category rather than a biological one, but maintains that “substantial differences across populations” exist. He thinks that it’s not unreasonable to investigate those differences scientifically, although he doesn’t undertake such research himself. “Whether we like it or not, people are measuring average differences among groups,” he said. “We need to be able to talk about these differences clearly, whatever they may be. Denying the possibility of substantial differences is not for us to do, given the scientific reality we live in.”

This is, in 2020, is an old-fashioned view. There are now young American researchers who frankly express disquiet and discomfort at the idea of studying human population genetic variation, period.  Including people who themselves have studied topics such as polygenic adaptation in humans. This would be a very strange view for older researchers, but it’s not totally out of the norm today, so expect someone like Reich to be viewed as quite the dinosaur in a decade. It seems ridiculous to say, but I do wonder if we’re seeing the end of the “humans as a model organism” era. Lots of ppl are not happy with the new atmosphere, but lots of people just keep quiet and go along.

Whole genomes of ancient farmers and hunter-gatherers

A new preprint uses about a dozen ancient genomes to create a model of the origins of Europeans and European farmers more precisely. The big deal here is that they aren’t relying on the same old SNP-array, but using the whole genome. This allows for some more explicit model-building and testing. I do think explicit model creation is something that needs to be done. A lot of the work today is data-first, and there needs to be more “theory”.

The mixed genetic origin of the first farmers of Europe:

While the Neolithic expansion in Europe is well described archaeologically, the genetic origins of European first farmers and their affinities with local hunter-gatherers (HGs) remain unclear. To infer the demographic history of these populations, the genomes of 15 ancient individuals located between Western Anatolia and Southern Germany were sequenced to high quality, allowing us to perform population genomics analyses formerly restricted to modern genomes. We find that all European and Anatolian early farmers descend from the merging of a European and a Near Eastern group of HGs, possibly in the Near East, shortly after the Last Glacial Maximum (LGM). Western and Southeastern European HG are shown to split during the LGM, and share signals of a very strong LGM bottleneck that drastically reduced their genetic diversity. Early Neolithic Central Anatolians seem only indirectly related to ancestors of European farmers, who probably originated in the Near East and dispersed later on from the Aegean along the Danubian corridor following a stepwise demic process with only limited (2-6%) but additive input from local HGs. Our analyses provide a time frame and resolve the genetic origins of early European farmers. They highlight the impact of Late Pleistocene climatic fluctuations that caused the fragmentation, merging and reexpansion of human populations in SW Asia and Europe, and eventually led to the world’s first agricultural populations.

The supplements are worth reading too. It’s all there.

No mention of Basal Eurasians. The last author told me on Twitter that they weren’t needed, but Iosif Lazaridis (also on Twitter) disagrees, naturally.

The great southern displacement in East Asia

The new preprint, Genomic Insights into the Demographic History of Southern Chinese, is somewhat inaccurately titled. It’s really more about the progenitors of the various Southeast Asian language families, whose origins are in South China. Yes, mother southern Han Chinese absorbed local substrate, but that’s been known for a while.

The story here is successive incidents of ‘collapsing structure’ out of the Last Glacial Maximum. The various East Asian populations admixed after diversification 20-40,000 years ago, and there was a later stage of admixture driven by the expansion of the Han out of the north.

An admixture graph is the best way to get at the major features of their model:


The major finding is that the Austro-Asiatic, Hmong-Mien and Austronesian language families emerge from groups distributed west-east in the Yangzi basin, with the Krai-Dai being more of a synthesis. The Tibeto-Burmans were a later push that synthesized mostly with Austro-Asiatic populations. The details are less important than the reality that some sort of separation and then admixture explains a lot of the local differences. Additionally, their genetic results confirm what is obvious with the Kinh: genetically they are very different from Austro-Asiatic groups which they are often linguistically bracketed.

The most interesting finding is an Andaman-like “ghost population” that contributed to the Jomon, and less to other groups. You know where I’m going here: this is clearly the basal East Eurasian group called “Australo-Melanesian” that contributed genes to some Amazonian groups. This group is the one that contributed haplogroup D to Tibetans and Japanese.

With East Asian population structure I feel we have the broad features, but a lot of the details are rickety. We’ll see.