So many assumptions about Africa

I have been staring and this figure and rereading Ancient West African foragers in the context of African population history. The Shum Laka sample from this paper, dating to four to eight thousand years ago, have drawn my attention, and I’m just looking at them a lot.

It seems ridiculous I’ve been using Nigerians as my “African reference” for decades. Most African populations, including Pygmies and Khoisan, have Eurasian admixture from the last 10,000 years. And what about deeper back-to-Africa ancestry? That seems likely and is hinted at in the above paper.

Modern human lineages have a deep history in Africa and the Near East. I think we’re going to have a transformation of our understanding of what happened in these regions in the near future.


Got milk long before genes for milk

The story of lactase persistence (“lactose tolerance”) evolving is one of the best gene-culture coevolution stories we had. Arguably it was the canonical example. The story was simple, multiple times humans took up dairy-culture, and multiple times humans changed so that they could digest lactose, milk sugar, into adulthood. This is about 30% of the caloric intake of raw milk (the rest being fat and protein). For some people their gut flora reacts negatively to the sugar bath if it’s not digested, leading to discomfort in addition to wasted calories.

In the 2000’s several mutations were discovered around LCT, the gene responsible for producing lactase, which breaks down lactose. One mutation was found across Europe and Central Asia. Another among the Arabs. And Another in East Africa. The “mutational target” was big. The mutation in the European and Central Asian variant breaks a regulatory element that represses the expression of LCT in adults. There are lots of ways to break something. Lactase persistence isn’t really a gain of function, it’s just never shutting off the function, which itself is a feature, not a bug.

The haplotype around LCT is long and indicative of a really strong sweep in Europeans. It was in some ways a positive control for tests of selection.

The problem is that there are now major problems with this narrative. In short, dairy-culture predates the increase in frequency for lactase persistence alleles by thousands of years. The ancient DNA transects in Europe are so good that it seems pretty clear that the frequency was way lower during the Iron Age, and didn’t reach “modern” levels until the historical period.

The same is now known to be true in Africa: Humans were drinking milk before they could digest it.

This doesn’t mean that these mutations have nothing to do with milk. But there needs to be a rethink of the selection story. Perhaps there was a genetic modifier that spread recently which isn’t a big mutational target, and that’s why the lactose digestion alleles rose in the last 3,000 years? I don’t know. No one really does.


What was the population of the Americas in 1492?

Several people have asked me about the new study on ancient DNA in the Caribbean, A genetic history of the pre-contact Caribbean. There is a lot to this paper, some of which is outside of my purview (e.g., I don’t know anything about the archaeology of this region so can’t interpret the genetic results well). One of the major things they did was establish patterns of relatedness. This seems like a major step forward in terms of future applicability to ancient DNA.

But the biggest thing that jumped out at me had to do with effective population size. Carl Zimmer’s write-up highlights this issue:

The genetic variations also allowed Dr. Reich and his colleague to estimate the size of the Caribbean society before European contact. Christopher Columbus’s brother Bartholomew sent letters back to Spain putting the figure in the millions. The DNA suggests that was an exaggeration: the genetic variations imply that the total population was as low as the tens of thousands.

This matters because it starts to change our sense of revisionism (now orthodox?) in books such as 1491: New Revelations of the Americas Before Columbus. To reconcile the small numbers of indigenous people by the 16th century in the Caribbean the hypothesis that there were mass die-offs due to disease, or, the Spanish were inordinately cruel (“The Black Legend”). These results suggest that the scale of the pandemic shock was less of an issue since the baseline number of native peoples is lower in the area.

What does this imply for the rest of the New World? I don’t know. But perhaps the huge census sizes argued for by some scholars won’t hold? It probably depends on the region. But with enough ancient DNA, the same sort of analyses could be replicated.


The Greeks in the mountains

The New Yorker has a long feature that explores the strange results from the paper last year, Ancient DNA from the skeletons of Roopkund Lake reveals Mediterranean migrants in India. Basically, they found a bunch of Indians who died 1,000 years ago, and, a bunch of Greeks who died a few centuries ago. They were buried naturally in a very isolated lake high in the Himalayas. There are all sorts of hypotheses regarding the Greeks, whose bones indicate a Mediterranean diet, and the closest match to individuals in Crete. My personal experience is that “mainland Greeks” tend to be a bit Northern European shifted, so these individuals may have been Anatolian or Aegean Greeks.

Stuart Fidel, who sometimes comments on this weblog, suggests these were Armenian traders. But David Reich correctly points out Armenians are very distinct genetically from Greeks (though the two are not entirely different obviously!). Another hypothesis is a bone mix-up, but the issue here is there are a lot of individuals who are of the same population and seem to have lived in the same region. How could bone mix-ups produce so many systematic errors?

Ultimately there’s no final answer in the piece, though hopefully, someone will present a reasonable conjecture.

Because the piece has Reich and his lab spotlighted, they allude to the controversy around him. This is ultimately going to be the legacy of the hit-piece from a few years back. He’s now a “controversial figure,” which is, to be frank not a bad thing in the eyes of some of the Reich lab’s scientific rivals. Most media treatments that aren’t purely about his research (i.e., Carl Zimmer’s column in The New York Times covering the Reich lab publications) will mention this now.

Here’s why he’s a mensch:

Still, some anthropologists, social scientists, and even geneticists are deeply uncomfortable with any research that explores the hereditary differences among populations. Reich is insistent that race is an artificial category rather than a biological one, but maintains that “substantial differences across populations” exist. He thinks that it’s not unreasonable to investigate those differences scientifically, although he doesn’t undertake such research himself. “Whether we like it or not, people are measuring average differences among groups,” he said. “We need to be able to talk about these differences clearly, whatever they may be. Denying the possibility of substantial differences is not for us to do, given the scientific reality we live in.”

This is, in 2020, is an old-fashioned view. There are now young American researchers who frankly express disquiet and discomfort at the idea of studying human population genetic variation, period.  Including people who themselves have studied topics such as polygenic adaptation in humans. This would be a very strange view for older researchers, but it’s not totally out of the norm today, so expect someone like Reich to be viewed as quite the dinosaur in a decade. It seems ridiculous to say, but I do wonder if we’re seeing the end of the “humans as a model organism” era. Lots of ppl are not happy with the new atmosphere, but lots of people just keep quiet and go along.


Whole genomes of ancient farmers and hunter-gatherers

A new preprint uses about a dozen ancient genomes to create a model of the origins of Europeans and European farmers more precisely. The big deal here is that they aren’t relying on the same old SNP-array, but using the whole genome. This allows for some more explicit model-building and testing. I do think explicit model creation is something that needs to be done. A lot of the work today is data-first, and there needs to be more “theory”.

The mixed genetic origin of the first farmers of Europe:

While the Neolithic expansion in Europe is well described archaeologically, the genetic origins of European first farmers and their affinities with local hunter-gatherers (HGs) remain unclear. To infer the demographic history of these populations, the genomes of 15 ancient individuals located between Western Anatolia and Southern Germany were sequenced to high quality, allowing us to perform population genomics analyses formerly restricted to modern genomes. We find that all European and Anatolian early farmers descend from the merging of a European and a Near Eastern group of HGs, possibly in the Near East, shortly after the Last Glacial Maximum (LGM). Western and Southeastern European HG are shown to split during the LGM, and share signals of a very strong LGM bottleneck that drastically reduced their genetic diversity. Early Neolithic Central Anatolians seem only indirectly related to ancestors of European farmers, who probably originated in the Near East and dispersed later on from the Aegean along the Danubian corridor following a stepwise demic process with only limited (2-6%) but additive input from local HGs. Our analyses provide a time frame and resolve the genetic origins of early European farmers. They highlight the impact of Late Pleistocene climatic fluctuations that caused the fragmentation, merging and reexpansion of human populations in SW Asia and Europe, and eventually led to the world’s first agricultural populations.

The supplements are worth reading too. It’s all there.

No mention of Basal Eurasians. The last author told me on Twitter that they weren’t needed, but Iosif Lazaridis (also on Twitter) disagrees, naturally.


The great southern displacement in East Asia

The new preprint, Genomic Insights into the Demographic History of Southern Chinese, is somewhat inaccurately titled. It’s really more about the progenitors of the various Southeast Asian language families, whose origins are in South China. Yes, mother southern Han Chinese absorbed local substrate, but that’s been known for a while.

The story here is successive incidents of ‘collapsing structure’ out of the Last Glacial Maximum. The various East Asian populations admixed after diversification 20-40,000 years ago, and there was a later stage of admixture driven by the expansion of the Han out of the north.

An admixture graph is the best way to get at the major features of their model:

The major finding is that the Austro-Asiatic, Hmong-Mien and Austronesian language families emerge from groups distributed west-east in the Yangzi basin, with the Krai-Dai being more of a synthesis. The Tibeto-Burmans were a later push that synthesized mostly with Austro-Asiatic populations. The details are less important than the reality that some sort of separation and then admixture explains a lot of the local differences. Additionally, their genetic results confirm what is obvious with the Kinh: genetically they are very different from Austro-Asiatic groups which they are often linguistically bracketed.

The most interesting finding is an Andaman-like “ghost population” that contributed to the Jomon, and less to other groups. You know where I’m going here: this is clearly the basal East Eurasian group called “Australo-Melanesian” that contributed genes to some Amazonian groups. This group is the one that contributed haplogroup D to Tibetans and Japanese.

With East Asian population structure I feel we have the broad features, but a lot of the details are rickety. We’ll see.


The Genetic History of the Middle East: into Arabia

A new massive preprint on the Middle East is out. I’ve edited the first figure to give people a general sense of the broad results and populations sampled. First, you have to know that these are high-quality modern samples. 137 individuals at 30x whole genome coverage.  In other words, basically the best genomic data you can get on sequences. No need to futz around with subsets of the data. This is important and needful because the 1000 Genomes doesn’t have a Middle Eastern population. So when looking to assemble variants there was a deficit in this domain. Even the WGS of the HGDP was not totally sufficient, since the Middle Eastern populations were not Arabian.

The populations here are sampled from both the classical “Fertile Crescent” and various points within the Arabian peninsula. At the end of the preprint, they do some analysis on selection, which I won’t talk about. The most interesting thing is that they confirm that Arabian people have a unique lactase persistence allele that seems to have been selected very recently, just like in Europeans. A lot of the selection analysis seems to be either replicate what you would find elsewhere. Or, they do not have enough power to detect polygenic selection (though they did detect selection on EDU).

The big finding to me is that this work confirms that there is a north-south cline in the Near East defined by a deep population structure. The admixture graph to the right captures the main features using Lebanese and Emiratis as the two extreme populations, but as you can see in the admixture plot above the cline really runs from the Caucasus to southern Arabia. If you analyze these populations one thing you will see is that Fertile Crescent populations, such as Druze, often seem more like Armenians and Georgians, than South Arabians. Why is this? After all, South Arabians and Fertile Crescent populations speak Semitic languages.

I think the issues here are multiple. First, there is recent admixture that obscures some of the deeper relationships. This is clear insofar as most Arab Muslim populations have Sub-Saharan African admixture. This is historically attested, and physically visible. The variation and range are quite high, in part due to spatial heterogeneity of slavery (e.g., more African slaves in lowlands than highlands), and the recency of the admixture producing variation due to incomplete mixing (the dates are usually 1000 A.D. and later).

But this is not the only admixture. All of the Fertile Crescent populations, along with groups to the north, have much more steppe drift than those to the south in Arabia. The details of the fractions don’t matter, it’s not much, but it’s not trivial, and it’s always higher than among the Arabians. Additionally, this element is new to the region, in relative terms. You can see the contribution in modern Lebanese in comparison to the Bronze Age Sidon samples, which date to 1800 BC. The source could be continuous gene flow during the Roman and Byzantine period, or even later. Or, it could also be Indo-European migrations.

We know that Indo-Iranian peoples were present in Upper Mesopotamia. The Mitanni Kingdom, which had Indo-Aryan affinities, shows up after 1750 BC. The Hittites, the Nesa, show up to the north in Anatolia a bit earlier.  Interestingly, the Hittites speak an Indo-European language that is often considered basal (the outgroup) to most of the others. Armenian, who emerges later in eastern Anatolia, is also quite distinct, just as Greek to the west is. In contrast, there is a lot of suggestive evidence of either genealogical or geographical connectedness between the ancestors of Indo-Iranian and Slavic language families.

The presence of these two very distinct ancestral components, steppe, and Sub-Saharan African, on top of the ancient Near Eastern base, produce distinctions in the modern populations which obscure some of the deeper strands. In the late 2000s when researchers and bloggers began running admixture analyses on Ethiopians it was clear that this population was a mix between “West Eurasian” and African which wasn’t Bantu. The West Eurasian donor population was often Yemeni, in particular Yemeni Jews. Later on, using more sophisticated methods some models suggested greater affinity in Ethiopian genomes to Levantine populations than Yemenis. What was going on?

We now know. It is quite clear Ethiopian populations lack steppe ancestry. In the earlier Bronze Age, and definitely, the Neolithic, Levantines lacked steppe ancestry. In fact, the Neolithic Levantines usually lacked “Iranian” ancestry. The West Eurasian ancestry in Northeast Africans, on the whole, is enriched for a Levantine ancestry quite similar to Natufian. Modern-day South Arabians are the closest to this population mix, even if they are not descended from ancient Levantines. They lack steppe.

Modern-day South Arabians in fact descend in part from indigenous hunter-gatherers, who were a sister clade to the ancestors of Natufians. The admixture graph makes that clear for the Emiratis with the least African ancestry have half their ancestry from this group. In the book Arabs, the author discusses at length various Yemeni legends of a fusion between distinct peoples on the edge of history. This could be recollections of the merge of indigenous Neolithic Arabians and peoples who expanded from the north.

The analyses of these samples confirm and reiterate what has been found with ancient DNA: at some point late in the Neolithic and early in the Bronze Age a massive admixture event occurred in the Fertile Crescent which brought a considerable amount of “Iranian” ancestry into the region (these ancient people are not like modern Iranians; in particular, they lacked steppe ancestry which is copious in much of Iran, particularly the east). This ancestry pushed south and westward so that ~50% of the ancestry of Arabians seems to be Iranian. That being said, I have some qualms here:

We explored whether this ancestry penetrated both the Levant and Arabia at the same time, and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,900 and 5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-3,300) and Arabia (2,000-3,800). These times overlap with the dates for the Bronze Age origin and spread of Semitic languages in the Middle East and East Africa estimated from lexical data (Kitchen et al., 2009; Figure S8). This population potentially introduced the Y-chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ~5.6 [95% CI, 4.8-6.5] kya, agreeing with a potential Bronze Age expansion; however, we do find rarer earlier diverged lineages coalescing ~17 kya (Figure S9). The haplogroup common in Natufians, E1b1b, is also frequent in our dataset, with most lineages coalescing ~8.3 [7-9.7] kya, though we also find a rare deeply divergent Y-chromosome which coalesces 39 kya (Figure S9).

Some of these dates are hard to credit. For example, I obtain a midpoint estimate of Iranian admixture into Egypt around 1836 BC!

The fraction of Iranian ancestry is substantial. The admixture model in the supplements gives this for Egyptians: 45% Levant_N, 32% Iran_N, 8% EHG (Eastern European Hunter-Gatherer), and 15% Mota (African). The older date is 2700 BC. The oldest Egyptian writing dates to 2700 BC, but proto-hieroglyphs are 500 years older. The authors talk about Semitic languages, and ancient Egyptian is not Semitic. So it could be a minority population mixed into the Egyptians, but this is a massive event that we don’t have records of. In fact, the authors claim that it went into much of Northeast Africa at a relatively late date.

Additionally, the values for the Levant seem recent as well. That being said there was a pre-Sumerian civilization, the Uruk Civilization, which spread broadly from Mesopotamia between 4000 and 3000 BC. This is 6000 to 5000 years ago. The midpoint of this is 5500 years, while the midpoint of the admixture into the Syrians, who were on the edge of the Uruk Civilization is 3800 years ago. Basically, I think the evidence points to various statistical genomic artifacts reducing the age from when the admixture truly occurred (this has long been a problem in this field).

I honestly have no idea how to relate the expansion of Semitic languages to the expansion of Iranian languages. My friend Patrick Wyman believes that Anatolian farmers spoke Afro-Asiatic. These were very different people from the Iranians, who arrived from the east later. Additionally, history teaches us that Mesopotamia during the Bronze Age was very linguistically diverse. The Sumerians were not Semitic, and neither were their Elamite neighbors in Khuzistan. The Akkadians, who were more prevalent in the north of Mesopotamia, but were present from the beginning of Sumerian history, were Semitic.

There is still a mystery around the great admixture between Neolithic Near Easterners of the west and the east. I don’t think we’ve closed that chapter of the book.

That being said, there is a lot that is “solved” in this paper. For example, these authors seem to confirm that there is no evidence of “first wave” modern humans in Arabian populations earlier than the non-African radiation. Arabians, like other non-Africans, underwent a population expansion 50-70,000 years ago. Their separation from Mbuti Pygmies was gradual up until 120,000 years ago. Then there seems to have been a separation. What this is telling us, I believe, is that the ancestors of non-Africans were part of the African meta-population until 120,000 years ago. This is suspiciously close to the Eemian Interglacial, which dates to between 115,000 to 130,000 years ago. The Eemian was characterized by a “Green Sahara”, so it seems that this is when early modern humans ventured in substantial numbers out of the continent and to its peripheries. One issue that seems notable in the data is that proto-non-Africans seem to have been characterized by a period of isolation and small population size. Perhaps

But 50-70,000 years ago a massive expansion of one of these daughter populations occurred. These data confirm that Arabians seem to have the same Neanderthal admixture as everyone else, but, even accounting for Sub-Saharan African ancestry they also have somewhat less. In alignment with earlier research, they argue that this is due to admixture with “Basal Eurasian” populations which did not mix with Neanderthals ~55,000 years ago.  Or, more precisely, did not carry as much Neanderthal ancestry (it seems plausible that the Basal Eurasian populations are themselves a compound of conventional non-African at the base of the broader splits, and a deeper basal group which lacks Neanderthal ancestry).

Going back to the admixture graph, you notice that both western and eastern farmer populations are a compound of Basal Eurasian and various lineages that are broadly “West Eurasian.”  Natufians and Anatolian farmers are descended about half from groups related to European hunter-gatherers, while ancient Neolithic Iranians had ancestry related to these people, but even more to populations distantly related to Ancient North Eurasians (Paleo-Siberians). The events here are distant, but the sample proportion of Basal Eurasian ancestry indicates to me a rapidly expanding population at some point which mixed with a well-structured set of groups in the Near East.

The major takeaways

  • Near Easterners are part of the same broad diversification as all other non-Africans
  • The expansion of these non-Africans dates to 50-70,000 years ago
  • Archaeological evidence points to a very intense expansion in the period around ~50,000 years ago, and admixture with Neanderthals somewhat before then
  • At the beginning of the Holocene Near Easterners were deeply structured regionally, and had threaded together disparate ancestral components (Basal Eurasian, related to European hunter-gatherer and Paleo-Siberian)
  • Late in the Neolithic and early Bronze Age much of this structure collapsed, and there was a massive admixture of Iranian ancestry to the south and west (conversely, there is evidence in other work of admixture of western farmer ancestry to the east)
  • Finally, there is evidence for later incursions of steppe people into the northern Arabian fringe and Fertile Crescent
  • On top of this, there is historical admixture from Africans and in the north Turks and other groups

The genomic landscape of Brazil in 1950

A new whole-genome analysis out of Brazil has some interesting ancestry information. The preprint, Whole-genome sequencing of 1,171 elderly admixed individuals from the largest Latin American metropolis (São Paulo, Brazil):

As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases….

Admixed populations are useful for a lot of reasons. But let’s observe some things about his Brazilian population.

First, it’s old. The average age is 72, so these are people born in 1950. This is the genetic characteristics of Brazil in 1950 in many ways, not today. This is why you see so many individuals who self-identify as Asian who are nearly 100% Asian. These individuals are the children of Japanese immigrants. In 1950 the endogamy of the community was high. Today the youngest generation of Japanese Brazilians is 60% mixed.

Second, most of the ancestry of self-identified Brazilian whites in this sample is mostly white. Like the Japanese, a large number of these individuals are probably the children of European immigrants. I suspect this accounts for many of the 20% of the “white” sample that has no trace non-European ancestry. But observe that around another 20% has trace proportions (~1%) of non-European ancestry, mostly African.  My supposition, in this case, is that these are “old stock” white Brazilians. That this, one or both of their parents descend from Portuguese Brazilians who settled in overwhelmingly European areas and retain some non-European admixture due to long-term residence in Brazil. The remainder is white Brazilians who have substantial non-European ancestry, with a small minority whose proportions are quite high from a North American perspective.

A point of comparison is probably useful. About 95% of non-Hispanic whites in the United States seem to have almost no detectable non-European ancestry using this sort of model-based clustering. This illustrates the massive demographic difference between the USA and Latin American nations. The vast majority of white Latin Americans look quite Iberian, but the majority also have far more non-European ancestry than 95% of North American whites. This is partly a reflection of the smaller population sizes of native peoples in North America, and, the nature of hypodescent for people of any African ancestry in the United States, so that mixed individuals were integrated into African Americans.

Third, the people who are “mixed” and black in Brazil are more European than you might expect. All the estimates of European ancestry I’ve seen for self-identified black Brazilians (a somewhat protean category due to social changes over the past few generations) indicate higher European ancestry fraction than among African Americans (~20% median in the latter). Self-identified “mixed” Brazilians have more European ancestry than anything.

The native category is interesting because most of these people have only a minor component of that ancestry. Additionally, a huge number of white, mixed, and black Brazilians have native ancestry. This is not surprising from previous work. Ancestry deconvolution indicates this is an old admixture, and mtDNA lineages are more native than Y chromosomes. There was a sex asymmetry in the early settlement, and native women married into the settler population. Both black and white Brazilians (and mixed) have lots of native ancestry.

Finally, though there is some overlap between these groups (despite their average differences), I assume that the overlap is much greater in contemporary cohorts in terms of genomic ancestry. It will be interesting to see when we get temporal transects in Brazil to see how assortative mating does, or doesn’t, work.

Looking forward to more of this from Latin America. So many opportunities for admixture mapping!


Solute carrier family genes are important…but how?

Over the last ten years David Reich and other researchers have been constructing what is basically an atlas of human demographic history. Taking the genealogies written in our DNA, mapping them onto population bifurcations and admixtures, and synthesizing that back together with what we know from history and archaeology.

To a great extent, this is a project of human phylogenomics. Taking genome-wide data and constructing phylogenies out of it (or, perhaps more precisely, graphs, as this is on a intra-species time scale mostly and characterized by lots of gene flow across the “tips” of the tree). But there’s another thing you can do with modern human genomics and evolution: look at patterns of selection within the genome.

The Reich group has already started doing this. For example, they have adduced that CCR5 delta 32 mutation seems to have emerged out of the Yamnaya horizon.

Last fall, a paper came out in MBE, Ancestry-Specific Analyses Reveal Differential Demographic Histories and Opposite Selective Pressures in Modern South Asian Populations, which I gave a cursory read, but which I’ve looked at more closely. It takes a “natural experiment,” the emergence of Indian subcontinental populations from a massive admixture between lineages which diverged 40,000 years ago, and looks to see which genetic regions deviate from what you would expect based on overall genome.

The method is simple: imagine that “Ancestral North Indians” are fixed for an allele at a gene in one state and “Ancestral South Indians” are fixed in the other state. Indian populations are about 50:50 (with a range). If the frequency today in Indian populations is 95% for the allele that is from the “Ancestral North Indians”, one might be suspicious as to what’s going on. Or, vice versa.

In the paper, they used whole genomes to reconstruct the ancestral steppe/Iranian population without any residual “Ancient Ancestral South Indian” (AASI), the latter of which has no West Eurasian. They did the same for the AASI. These reconstructions are always dicey, but they made a good faith effort to check their work. On the whole, that section was impressive. The authors seem to be roughly aligned with the results in Narasimhan et al. 2019. The AASI seems to be homogeneous, with the exception of attempting to model them from donors which were Munda or Burusho, both groups with deep East Asian admixture (illustrating the problem with deconvolution). Second, they show that the AASI are not clustering with the Andamanese, which makes sense since these groups diverged closer to 40,000 years ago. Finally, the steppe/Iranian group looks most like Armenian middle-to-late Bronze Age people. A synthesis of steppe and some Iranian-like ancestry.

But this isn’t the most interesting part of the paper. It’s the selection. Here are the top, top, candidates:

Read More


Correlated response is a big story of selection

Adaptation is clearly one of the most important processes in understanding how evolution occurs. In a classical sense, it’s easy to understand. Parallel adaptations in body plans make dolphins and swordfish shaped the same. It’s physics.

But with the emergence of DNA, a lot of the focus on adaptation has been displaced to the signatures of natural selection on the molecular level. Phenotypes are controlled by variation in genotypes, and instead of description and hypothesizing, researchers can actually infer from the genetic patterns the history and arc of adaptation. 

At least that’s the theory.

The initial tests for signatures of natural selection focused on adaptation between species. For example, Tajima’s D. Usually this took the form of comparing variation across two lineages of Drosophila. In the 2000s with genome-wide data new methods predicated on looking at ‘haplotype structure’ (variation across sequences of genes) emerged. Instead of between species, these methods focused on the selection within species (e.g., why are some humans adapted to malaria?). These methods were good at picking up strong signals at a few genes where the selective sweeps were recent.

But as datasets and genomics got bigger and better researchers focused on more fundamental patterns and analyses, such as looking at ‘site frequency spectra.’ Ultimately the goal was to go beyond selection at a single locus (e.g., lactase persistence), and understand polygenic characteristics (e.g., height). Obviously, this is much harder because polygenic characters are distributed across many genetic loci, and issues of statistical power are always going to loom large (and there is the soft vs hard sweep issue too!).

A new preprint is an excellent introduction to this wild world, Disentangling selection on genetically correlated polygenic traits using whole-genome genealogies:

We present a full-likelihood method to estimate and quantify polygenic adaptation from contemporary DNA sequence data. The method combines population genetic DNA sequence data and GWAS summary statistics from up to thousands of nucleotide sites in a joint likelihood function to estimate the strength of transient directional selection acting on a polygenic trait. Through population genetic simulations of polygenic trait architectures and GWAS, we show that the method substantially improves power over current methods. We examine the robustness of the method under uncorrected GWAS stratification, uncertainty and ascertainment bias in the GWAS estimates of SNP effects, uncertainty in the identification of causal SNPs, allelic heterogeneity, negative selection, and low GWAS sample size. The method can quantify selection acting on correlated traits, fully controlling for pleiotropy even among traits with strong genetic correlation (|rg| = 80%; c.f. schizophrenia and bipolar disorder) while retaining high power to attribute selection to the causal trait. We apply the method to study 56 human polygenic traits for signs of recent adaptation. We find signals of directional selection on pigmentation (tanning, sunburn, hair, P=5.5e-15, 1.1e-11, 2.2e-6, respectively), life history traits (age at first birth, EduYears, P=2.5e-4, 2.6e-4, respectively), glycated hemoglobin (HbA1c, P=1.2e-3), bone mineral density (P=1.1e-3), and neuroticism (P=5.5e-3). We also conduct joint testing of 137 pairs of genetically correlated traits. We find evidence of widespread correlated response acting on these traits (2.6-fold enrichment over the null expectation, P=1.5e-7). We find that for several traits previously reported as adaptive, such as educational attainment and hair color, a significant proportion of the signal of selection on these traits can be attributed to correlated response, vs direct selection (P=2.9e-6, 1.7e-4, respectively). Lastly, our joint test uncovers antagonistic selection that has acted to increase type 2 diabetes (T2D) risk and decrease HbA1c (P=1.5e-5).

There’s a lot going on here. This is my favorite passage:

To address these issues, we recently developed a full-likelihood method, CLUES, to test for selection and estimate allele frequency trajectories. 21 The method works by stochastically integrating over both the latent ARG using Markov Chain Monte Carlo, and the latent allele frequency trajectory using a dynamic programming algorithm, and then using importance sampling to estimate the likelihood function of a focal SNP’s selection coefficient, correcting for biases in the ARG due to sampling under a neutral model.

Alrighty then! Someone’s a major-league nerd.

The preprint is fine, but ultimately this is something you get a “feel” for by working with models, data, and general analyses in the field. And I don’t have a strong feel since I don’t work with these sorts of data and questions myself. So what do I know? That being said, I like the preprint because it satisfies an intuition I’ve long had: correlated response is a big part of the story of polygenic selection.

Basically, you have to remember that complex traits are subject to variation at a host of genetic positions. And genetic variants rarely have singular effects. That is, one locus usually exhibits pleiotropy. The genetic effect shapes a lot of characteristics. Therefore, if there is a strong selection on a gene, more traits than simply the target of selection will be impacted. In animal breeding making huge, meaty, fast-growing lineages can render them infertile if selection is taken too far. That’s a bad correlated response.

After correcting for the genetic correlation the authors note that some traits, such as EDU and hair color, are not really selected directly at all. This is like the fact that we know EDAR is associated with hair thickness and is a strong target of selection. We have no idea what the trait of interest is. But it’s a pretty big deal. All these quantitative traits controlled by variation across the genome are being reshaped by adaptation on other traits. What are those traits? This preprint doesn’t answer that really.

Hopefully, we’ll make some headway in the 2020s because we’re definitely looking through the mirror darkly.