A week ago a very cool new preprint came out, Identifying loci under positive selection in complex population histories. It’s something that you can’t even imagine just ten years ago. The authors basically figure out ways to identify deviations of markers from expected allele frequency given a null neutral evolutionary model. The method is put first, which I really like, before getting to results or discussion. Additionally, they did a lot of simulation ahead of time. The sort of simulation that is really not possible before the sort of computational resources we have now.
Here’s the abstract:
Detailed modeling of a species’ history is of prime importance for understanding how natural selection operates over time. Most methods designed to detect positive selection along sequenced genomes, however, use simplified representations of past histories as null models of genetic drift. Here, we present the first method that can detect signatures of strong local adaptation across the genome using arbitrarily complex admixture graphs, which are typically used to describe the history of past divergence and admixture events among any number of populations. The method – called Graph-aware Retrieval of Selective Sweeps (GRoSS) – has good power to detect loci in the genome with strong evidence for past selective sweeps and can also identify which branch of the graph was most affected by the sweep. As evidence of its utility, we apply the method to bovine, codfish and human population genomic data containing multiple population panels related in complex ways. We find new candidate genes for important adaptive functions, including immunity and metabolism in under-studied human populations, as well as muscle mass, milk production and tameness in particular bovine breeds. We are also able to pinpoint the emergence of large regions of differentiation due to inversions in the history of Atlantic codfish.
A generation ago these sorts of debates would be a sequence of “you’re wrong!” vs. “no, you’re wrong!” Today the disputes involve a lot of data, and so have a reasonable chance of resolution.
The first preprint identifies the usual candidates in humans that you normally see, and expected targets in cattle and cod. Sure, that will given biologists more interested in mechanisms and pathways things to chew upon, but imagine once researchers have large numbers of genomes for thousands and thousands of species. Then they’ll be testing deviations from neutral allele frequencies across many trees, and getting a more general and abstract sense of the parameter that selection explores, conditional on particularities o evolutionary history.
This is why I’m excited about plans to sequence lots and lots of species.
We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.
In The New York Timeswrite-up there is an interesting detail, “This study served as proof-of-concept, he added. His team is moving forward on evaluating prenatal testing data from more than 3.5 million Chinese people.” So what he’s saying is that this study with >100,000 individuals is a “pilot study.” Let that sink in.
The above admixture graph is from a new preprint, Paleolithic DNA from the Caucasus reveals core of West Eurasian ancestry. To be honest, if you read the supplementary text there’s almost no point in reading the main preprint, as it is far more in depth when it comes to the methodology as well as spotlighting a variety of particular results. It’s hard to know where to begin with such a preprint so I want to highlight the “this is a simplified model” portion in the figure above. That’s actually the truth. Remember, no admixture graph is the Truth, it is an attempt by humans to capture concisely and informatively the major features of our species’ population history dynamics. The reality was never as clear and distinct as stylized graphical representations would have you think, and the researchers are aware of this.
In any case, if you want to really get at how they arrived at the conclusions they did, really read the supplementary section SI 2, “An admixture graph model of Upper Paleolithic West Eurasians.” The authors have so many potential combinations of ancestral populations that they can’t simply manually and intuitively posit admixtures. Rather, they have to explore a huge number of combinations (trees/graphs)…at which point they run into computational limits. This section explicitly lays out computationally efficient ways to automatically traverse the possibility space, and arrive at the best fitting set of models, within reason.
The title of the preprint says it all, but let me quote the abstract in full:
The earliest ancient DNA data of modern humans from Europe dates to ~40 thousand years ago, but that from the Caucasus and the Near East to only ~14 thousand years ago, from populations who lived long after the Last Glacial Maximum (LGM) ~26.5-19 thousand years ago. To address this imbalance and to better understand the relationship of Europeans and Near Easterners, we report genome-wide data from two ~26 thousand year old individuals from Dzudzuana Cave in Georgia in the Caucasus from around the beginning of the LGM. Surprisingly, the Dzudzuana population was more closely related to early agriculturalists from western Anatolia ~8 thousand years ago than to the hunter-gatherers of the Caucasus from the same region of western Georgia of ~13-10 thousand years ago. Most of the Dzudzuana population’s ancestry was deeply related to the post-glacial western European hunter-gatherers of the ‘Villabruna cluster’, but it also had ancestry from a lineage that had separated from the great majority of non-African populations before they separated from each other, proving that such ‘Basal Eurasians’ were present in West Eurasia twice as early as previously recorded. We document major population turnover in the Near East after the time of Dzudzuana, showing that the highly differentiated Holocene populations of the region were formed by ‘Ancient North Eurasian’ admixture into the Caucasus and Iran and North African admixture into the Natufians of the Levant. We finally show that the Dzudzuana population contributed the majority of the ancestry of post-Ice Age people in the Near East, North Africa, and even parts of Europe, thereby becoming the largest single contributor of ancestry of all present-day West Eurasians.
Longtime readers know that I hate the American racial term “Caucasians.” It’s pretentious when you could just say “white European,” because that’s what people really mean, judging by the fact that the real people from the Caucasus are marginally Caucasian in the eyes of many Americans. The genealogical origin of the term goes back to Johann Friedrich Blumenbach. And yet this paper takes these two samples, and finds that a lot of the ancestry of modern groups can be attributed to them! (also, a religion interpretation of the results is in the title of the post)
To be fair, they caution that these ancient Caucasian samples are representative of a particular thread of human heritage, not that the center of this thread was necessarily in the Caucasus. This does make me wonder about ascertainment bias in the Near East toward samples from mountainous areas which were colder. But, at the granularity they are attempting to understand human population history, it’s probably not that big of a deal. Ultimately, they conclude that this Paleo-Caucasian population contributes “~46-88% of the ancestry” of modern Europeans, Near Easterners, and North Africans. That’s kind of a big deal.
There are so many results in this preprint, so I think we need to back to the “beginning” of the non-African branch. The Paleo-Caucasian sample is of note in part because it is from before the Last Glacial Maximum, and, about halfway back to the massive diversification of most non-African populations around 55,000 years ago. Using the Paleo-Caucasian samples’ affinities this preprint reinterprets results from last spring on ancient DNA from Northwest Africa. In that paper, the authors conclude that Paleolithic North Africans were a mix between an unspecific Sub-Saharan population and Natufians. Here though the authors suggest that the Natufians and Yoruba both received gene flow from Paleolithic North Africans. And, these Paleolithic North Africans were themselves mixed between something similar to the Paleo-Caucasians (a mix between an ancient West Eurasian ancestry and “Basal Eurasian”), and a “Deep” ancestry which diverged from other non-Sub-Saharan Africans before the Basal Eurasians did.
The reason that the Paleo-Caucasian sample is so important is that it allowed the researchers to see that the early Holocene Near East, where Anatolian and Iranian farmers, as well as Natufians in the Levant, were ancestral to many later groups, was subject to many genetic changes from before the Last Glacial Maximum. The Natufians seem to be well modeled as having ancestry from the Paleolithic North Africans as one of the major ways they are distinctive from the Paleo-Caucasians. This presents us with a reasonable model for the west to east movement of haplogroup E, and, the Afro-Asiatic languages. The gene flow of Paleolithic North African also explains the non-trivial level of Neanderthal admixture which is found in the Yoruba population. This is mediated through the presumed back migration of Paleo-Caucasians from the Near East at some point in the Pleistocene, contributing some Neanderthal ancestry to the genetic background of Paleolithic North Africans.
Additionally, the distinction between western (Anatolian/Levant) and eastern (Iran) farmers during the early Holocene can now be understood as a product of later admixture into eastern proto-farmers of basic Paleo-Caucasian stock. The relative closeness of Anatolian farmers to the Paleo-Caucasian samples is indicative of the fact that there was an “Ancestral North Eurasian” (ANE) admixture cline into the Near East during the Pleistocene, which meant that some populations to the east became rather different from the pre-LGM samples. Probably after the Last Glacial Maximum proto-Siberian ancestry became prominent in the zone between the Caucasus and Iran (additionally, some of the models imply there was eastern Eurasian ancestry). This is in keeping with the fact that ANE ancestry does seem to have been found in places like Khorasan before the expansion south of steppe populations after 2,000 BC.
As noted in the abstract, Paleo-Caucasians had Basal Eurasian ancestry ~30,000 years ago. This increases the likelihood that Basal Eurasians weren’t recent migrants from deep inside Africa. Additionally, for various reasons, the authors are now positing a Deep ancestry which diverged even further into the past. Both Basal Eurasians and Deep populations seem to lack Neanderthal admixture. The authors also repeatedly suggest that Basal Eurasians were part of the Out of Africa bottleneck event. In Who We Are and How We Got Here David Reich presents the model that this bottleneck population had a low effective population size for a long time. This seems plausible because the genetic homogeneity that you see in non-Africans is pretty striking vis-a-vis Sub-Saharan Africans. On the other hand, this work confirms earlier results that imply that Basal Eurasians did not admix with Neanderthals, and also indicates that the divergence has to be greater than 60,000 years before the present from other non-Africans, who diversified more recently.
In contrast, the Deep ancestry group, which nevertheless forms a clade with the new Eurasian lineages (Basal and non-Basal), does not clearly seem to have undergone the bottleneck event according to this preprint. It’s more a matter of what they don’t say, rather than what they say in this case.
The big picture needs to be integrated I think with the new “modern humans emerged through a multi-regional process” within Africa. If you think of modern humans as emerging across an African range which shifted in the Near East based on oscillating climatic conditions, the ancestors of the “non-African” lineages can be thought of as one of the main deeply rooted lineages, probably in the northeast of the continent. During the Pleistocene, the Sahara was even more brutal than today during many periods, so it is not implausible that some of these marginal populations on the edge of Africa were subject to long periods of very small effective population sizes. Most of them presumably went extinct. But one population was probably far enough north and east that it had a little more margin to play with. This population was probably connected along the Mediterranean littoral at some point with the Deep component in North Africa, which had higher effective population sizes because the mountainous terrain of the Atlas region was always going to remain more clement through dry phases.
At some point one a group of the bottlenecked population mixed with some Neanderthals, and began to break out of containment in southwest Asia. If I had to bet money, I suspect there were already other related groups, probably somewhat admixed with local hominin lineages, further east. That is, I believe the archaeological results in Southeast Asia, and think that those in Australia are credible. But these groups were probably small in number, and totally absorbed by the later migration wave.
Also, the timing of the separation of Africans and “non-Africans” is such that I wouldn’t be surprised Qafzeh-Skull people were somehow ancestral to, or closely related to, the ancestors of non-Africans.
Finally, let’s remember that the authors were focusing on North Africa and Western Eurasia in this preprint. Things will get more complicated as East Asia and Africa come “online” in terms of these analyses. Of course, we are going to be helped by the reality that human genetic variation is not arbitrarily and randomly distributed, but reflects real constraints in our evolutionary history and the forces of geography as well as contingency. The non-African story is made simpler in part because of the great bottleneck, and especially the common descent of most peoples from the population that mixed with Neanderthals. The modeling of effective population size changes over time in Sub-Saharan groups does not lead us to believe that it will be so simple in that continent.
The figure above is from a new paper, Estimating mobility using sparse data: Application to human genetic variation, which uses genomic data from late Pleistocene to the Iron Age in western Eurasia, and then infers migration rate considering both spatial distribution and the variable of time (remember that samples apart in time should also be genetically different, just as those apart in space often are).
The empirical results are shown above, but they validated their method first by running some simulations. Interestingly they modeled the migration as a Gaussian random walk. Which is fine. But I wonder how true this is for a lot of the Eurasian migrations of the last 10,000 years. Perhaps the the distribution of distances from the place of birth would turn out be multi-modal, with a minority of individuals tending to make “long jumps”?
With that out of the way, it’s fascinating that migration peaks around the Neolithic transition, the Bronze Age, and then the Iron Age. If you read a book like 1177 BC, you know that there was a major regression in the 13th century BC across the Near East, and for several centuries the region was in a “Dark Age.” In The Human Web William H. McNeill argues that one of the reasons for the length and depth of this Dark Age is that the network of complex societies exhibited less density and so less redundancy to failure.
The authors conclude:
We find that mobility among European Holocene farmers was significantly higher than among European hunter–gatherers both pre- and postdating the Last Glacial Maximum. We also infer that this Holocene rise in mobility occurred in at least three distinct stages: the first centering on the well-known population expansion at the beginning of the Neolithic, and the second and third centering on the beginning of the Bronze Age and the late Iron Age, respectively. These findings suggest a strong link between technological change and human mobility in Holocene Western Eurasia and demonstrate the utility of this framework for exploring changes in mobility through space and time.
Earlier they say:
We find strong support for a rise in mobility during the Neolithic transition in western Eurasia, likely corresponding to a well-established demic expansion of farmers, originating in the Middle East and resulting in the spread of farming technologies throughout most of Western Eurasia
The “demic diffusion” model is an easy one because it relies on the mass-action of individuals and family-groups as they expand in space through high fertility rates. And yet one thing that I think it misses is the socio-political context of that demic diffusion. For prehistoric periods we don’t have writing, and so no socio-political context. This is why in War Before Civilization the author focused on ethnographies of historical societies which came into contact with literate cultures which recorded their organization and folkways. The short summation is that these societies were often very aggressive and well organized for war. Additionally, hunter-gatherers themselves were keen on expanding farmers, and it seems clear they too could mobilize for violence.
The upshot is we need to think of the rise and expansion of strong states and expansionist polities as the context for an increase in the rate of migration. The reality of low migration rates in Pleistocene Europe was pretty evident even before this formal analysis. The pairwise genetic difference due to drift, and therefore low migration rates, for some nearby populations in the Pleistocene and early Holocene indicates that small-scale societies tend to be quite insulated from each other. In contrast, the Iron Age has witnessed a great deal of admixture, as large states and polities, as well as meta-ethnic identities, have broken down genetic barriers.
A regression around 1000 BC correlates neatly with reduced migration, This was almost certainly due to the fact that without larger states much of West Eurasian society, such as in Greece, had disintegrated into smaller tribal units.
Future historians and geneticists will notice that in the period between 1500 and 2000 the distribution of the Y chromosome lineage R1b1a1a2 expanded far beyond Western Europe. They will also understand the political context for this expansion of the lineage…
It looks as if the vast majority (95% or more depending on the population) of the ancestry of non-African humans derives from a population expansion which began around ~60,000 years ago. Before this period some researchers argue there was a non-trivial period of isolation. The “long bottleneck” (David Reich alludes to this in Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past). For the vast majority of humans then the last 60,000 years is characterized by a branching process, some reticulation (e.g., South Asians merge West and East Eurasian lineages) between these branches from a common ancestor, as well as introgression from archaic lineages like Neanderthals and Denisovans.
Though I do accept that it seems that modern humans probably migrated out of Africa before 60,000 years ago, mostly due to the results from archaeology, I think the genetic evidence is strong that these groups contributed very little genetically to contemporary populations.
The situation within Africa is very different. Being conservative it seems likely that the Khoisan ancestral lineage diverged from some other Africans ~200,000 years ago. I say conservative because there are researchers who want to push the divergence much further back. Additionally, several different research groups are now converging in a result that West Africans are a mixture between eastern Sub-Saharan Africans (think the population ancestral to Mota in Ethiopia) and a lineage basal to all other humans. That means that the Khoisan are not the most basal, so even assuming the conservative 200,000 year divergence point for Khoisan, modern humans share a common ancestor earlier than 200,000 years ago.
The upshot here is that around 75 percent of the history of modern humans is within (greater)* Africa. The distinctive “Out of Africa” bottleneck and expansion defines most humans only in the last 25 percent of the history of our species. And, within Africa, the dynamics were very different. The biggest difference is that African populations are not defined by a large number of lineages emerging and diverging around the same period, because there wasn’t a massive and singular expansion within Africa analogous to what occurred outside of Africa (at least until the recent past, with the Bantu expansion). That’s why there’s deep structure within Africa today between groups as divergent as the Bantu, Mbuti, Hadza, and Khoisan.
The term “Basal Eurasian” kind of makes sense in the non-African context because of the singular importance of divergence between lineages in the first 10,000 years or so after the “Out of Africa” event. I’m not sure “Basal human” makes as much sense because there wasn’t a singular event within Africa that allowed for the emergence of modern humans. Rather, it was a process, and probably quite resembles something like multiregionalism.
* Some wiggle room here for the likelihood that modern humans were long present in the liminal Near East.
Recently I had a discussion with a friend that I suspect the “tropical pygmy” phenotype you see Central Africa and Southeast Asia is a pretty recent development. So this sort of assertion, “The Sentinelese tribe have remained on their North Sentinel Island, almost completely uncontacted for nearly 60,000 years…” is probably wrong. First, the Sentinelese probably arrived with other Andaman peoples during the Pleistocene from mainland Southeast Asia when the archipelago may have been connected to the mainland due to low sea levels.
Second, the small size of many tropical hunter-gatherer populations may simply be due to the difficulty of surviving in this environment. Though rainforests are lush, humans can’t access a lot of it, and small animals tend to require more energy to catch than is justified by how much meat they provide.
Different human populations facing similar environmental challenges have sometimes evolved convergent biological adaptations, for example hypoxia resistance at high altitudes and depigmented skin in northern latitudes on separate continents. The pygmy phenotype (small adult body size), a characteristic of hunter-gatherer populations inhabiting both African and Asian tropical rainforests, is often highlighted as another case of convergent adaptation in humans. However, the degree to which phenotypic convergence in this polygenic trait is due to convergent vs. population-specific genetic changes is unknown. To address this question, we analyzed high-coverage sequence data from the protein-coding portion of the genomes (exomes) of two pairs of populations, Batwa rainforest hunter-gatherers and neighboring Bakiga agriculturalists from Uganda, and Andamanese rainforest hunter-gatherers (Jarawa and Onge) and Brahmin agriculturalists from India. We observed signatures of convergent positive selection between the Batwa and Andamanese rainforest hunter-gatherers across the set of genes with annotated ‘growth factor binding’ functions (p<0.001). Unexpectedly, for the rainforest groups we also observed convergent and population-specific signatures of positive selection in pathways related to cardiac development (e.g. 'cardiac muscle tissue development'; p=0.003). We hypothesize that the growth hormone sub-responsiveness likely underlying the pygmy phenotype may have led to compensatory changes in cardiac pathways, in which this hormone also plays an essential role. Importantly, we did not observe similar patterns of positive selection on sets of genes associated with either growth or cardiac development in the agriculturalist populations, indicating that our results most likely reflect a history of convergent adaptation to the similar ecology of rainforest hunter-gatherers rather than a more common or general evolutionary pattern for human populations.
A minor note: there is some ethnographic data that the isolated Sentinelese are not as small as the other Andaman Islanders. Some of their small size may simply be due to exposure to diseases and the stress of settlers from the mainland.
If you are American you have probably heard about “Cheddar Man” in Bryan Sykes’ Seven Daughters of Eve. If you don’t know, Cheddar Man is a Mesolithic individual from prehistoric Britain, dating to 9,150 years before the present. Sykes’ DNA analysis concluded that he was mtDNA haplogroup U5, which is found in ~10% of modern Europeans, and which ancient DNA has found to be overwhelmingly dominant among European hunter-gatherers. But for years there has been controversy as to whether this result was contamination (after all, if it’s found in ~10% of modern Europeans it wouldn’t be surprising if the DNA was contaminated).
Today that is a moot point. On February 18th Channel 4 in the UK will premier a documentary that seems to indicate genomic analysis of Cheddar Man’s remains have been performed, and he turns out to be exactly what we would have expected. That is, he’s a “Western Hunter-Gatherer” (WHG) with affinities to the remains from Belgium, Spain, and Central Europe. These WHG populations were themselves relatively recent arrivals in Pleistocene Europe, with connections to some populations in the Near East, and with unexplored minor genetic admixture from an East Asian population. Their total contribution to the ancestry of modern Europeans varies, with lower fractions in the south of the continent, and the highest in the northeast.
Overall, the consensus seems to be that in Western Europe the genuine descent from indigenous hunter-gatherers passed down through admixture with Neolithic farmers, and then the Corded Ware and Bell Beaker groups, is around ~10%. This is the number that shows up in the press write-ups. But, there are some researchers who contend it is far less than 10%, and that that fraction is misattribution due to early admixture with relatives of these hunter-gatherers as steppe and farmer peoples were expanding.
Phylogenetics aside, one of the major headline aspects of the Cheddar Man is that reconstructions are now of a very dark-skinned and blue-eyed individual. Some of the more sensationalist press is declaring that the “first Britons were black!” As far as the depiction goes, this is literally true. The reconstruction is of a black-skinned individual in the sense we’d describe black-skinned.
But on one level it is entirely expected that this is what Cheddar Man would look like. The hunter-gatherers of Mesolithic Western Europe were genetically homogenous. They seem to derive from a small founder population. And, on the pigmentation loci which make modern Europeans very distinctive vis-a-vis other populations, SLC24A5, SLC45A2 and HERC2-OCA2, they were quite different from anything we’ve encountered before. First, these peoples seem to have had a frequency for the genetic variants strongly implicated in blue eyes in modern Europeans close to what you find in the Baltic region. The overwhelming majority carried the derived variant, perhaps even in regions such as Spain, which today are mostly brown-eyed because of the frequency of the ancestral variant. Second, these European hunter-gatherers tended to lack the genetic variants at SLC24A5 and SLC45A2 correlated with lighter skin, which today in European is found at frequencies of ~100% and 95% to 80% respectively.
The reason that one of the scientists being interviewed stated that there was a “76 percent probability that Cheddar Man had blue eyes” is that they used something like IrisPlex. They put in the genetic variants and popped out a probability. The problem is that the training set here is modern groups, which may have a very different genetic architecture than ancient populations. Recent work on Africans and East Asians indicate that the focus on European populations when it comes to pigmentation genetics has left huge lacunae in our understanding of common variants which affect variation in outcome.
East Asians, for example, lack both the derived variants of SLC24A5 and SLC45A2 common in Europeans but are often quite light-skinned. A deeper analysis of the pigmentation architecture of WHG might lead us to conclude that they were an olive or light brown-skinned people. This is my suspicion because modern Arctic peoples are neither pale white nor dark brown, but of various shades of olive.
As far as blue eyes go, it is reasonable that these individuals had that eye color because that trait seems somewhat less polygenic than skin color. There are darker complected people with light eyes, from the famous “Afghan girl” to the first black American Miss America, Vanessa Williams. The homozygote of the derived HERC-OCA2 variant seems relatively penetrant. From what I recall the literature indicates many people with blue eyes are not homozygotes on this locus for the derived haplotypes, but those who are homozygotes for the derived haplotypes invariably have blue eyes.
Addendum: It isn’t clear in the press pieces, but it looks like they got a high coverage genome sequence out of Cheddar Man. They refer to sequencing, and, they seem to have hit all the major pigmentation loci. This indicates reasonable coverage of the genome.
If you are the product of a first cousin marriage, you have lots of runs of homozygosity. That’s because some of you will have large sections of the genome where both of the homologous chromosomes come from the same individual and are identical. In populations with small populations, this occurs not through recent inbreeding, as much as the reduced genetic diversity cranking up the frequency of some haplotypes over and above others.
The review covers all the bases, from distributions of runs of homozygosity in modern populations to ancient ones, as well as their functional consequences.
To the left, the plot shows that some populations, such as the Makrani of Pakistan, have fewer numbers of runs of homozygosity, but long ones when they have them. The populations on this part of the diagram are part of the “inbreeding belt.” In contrast, there are other populations with lots of runs of homozygosity, but they’re shorter. These are usually part of the “bottleneck belt,” where bottlenecks and small long-term effective populations have produced greater levels of homozygosity even on the genotype scale.
Perhaps the most interesting point though is that runs of homozygosity strongly correlate with changes in the values of a complex trait. In general, inbreeding is not too good, because recessively expressing deleterious alleles get exposed, and runs of homozygosity are a proxy for that.* This is why more exogamy in the Middle East and India may be such a social good.
* There may be confounds here. More educated and smarter people may marry those more distant from them geographically due to mobility.
Modern molecular genetic datasets, primarily collected to study the biology of human health and disease, can be used to directly measure the action of natural selection and reveal important features of contemporary human evolution. Here we leverage the UK Biobank data to test for the presence of linear and nonlinear natural selection in a contemporary population of the United Kingdom. We obtain phenotypic and genetic evidence consistent with the action of linear/directional selection. Phenotypic evidence suggests that stabilizing selection, which acts to reduce variance in the population without necessarily modifying the population mean, is widespread and relatively weak in comparison with estimates from other species.
The stabilizing selection part is probably the most interesting part for me. But let’s hold up for a moment, and review some of the major findings. The authors focused on ~375,000 samples which matched their criteria (white British individuals old enough that they are well past their reproductive peak), and the genotyping platforms had 500,000 markers. The dependent variable they’re looking at is reproductive fitness. In this case specifically, “rRLS”, or relative reproductive lifetime success.
With these huge data sets and the large number of measured phenotypes they first used the classical Lande and Arnold method to detect selection gradients, which leveraged regression to measure directional and stabilizing dynamics. Basically, how does change in the phenotype impact reproductive fitness? So, it is notable that shorter women have higher reproductive fitness than taller women (shorter than the median). This seems like a robust result. We’ve seen it before on much smaller sample sizes.
The results using phenotypic correlations for direction (β) and stabilizing (γ) selection are shown below separated by sex. The abbreviations are the same as above.
There are many cases where directional selection seems to operate in females, but not in males. But they note that that is often due to near zero non-significant results in males, not because there were opposing directions in selection. Height was the exception, with regression coefficients in opposite directions. For stabilizing selection there was no antagonistic trait.
A major finding was that compared to other organisms stabilizing selection was very weak in humans. There’s just not that that much pressure against extreme phenotypes. This isn’t entirely surprising. First, you have the issue of the weirdness of a lot of studies in animal models, with inbred lines, or wild populations selected for their salience. Second, prior theory suggests that a trait with lots of heritable quantitative variation, like height, shouldn’t be subject to that much selection. If it had, the genetic variation which was the raw material of the trait’s distribution wouldn’t be there.
Using more complex regression methods that take into account confounds, they pruned the list of significant hits. But, it is important to note that even at ~375,000, this sample size might be underpowered to detect really subtle dynamics. Additionally, the beauty of this study is that it added modern genomic analysis to the mix. Detecting selection through phenotypic analysis goes back decades, but interrogating the genetic basis of complex traits and their evolutionary dynamics is new.
To a first approximation, the results were broadly consonant across the two methods. But, there are interesting details where they differ. There is selection on height in females, but not in males. This implies that though empirically you see taller males with higher rLSR, the genetic variance that is affecting height isn’t correlated with rLSR, so selection isn’t occurring in this sex.
~375,000 may seem like a lot, but from talking to people who work in polygenic selection there is still statistical power to be gained by going into the millions (perhaps tens of millions?). These sorts of results are very preliminary but show the power of synthesizing classical quantitative genetic models and ways of thinking with modern genomics. And, it does have me wondering about how these methods will align with the sort of stuff I wrote about last year which detects recent selection on time depths of a few thousand years. The SDS method, for example, seems to be detecting selection for increasing height the world over…which I wonder is some artifact, because there’s a robust pattern of shorter women having higher fertility in studies going back decades.
In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in FST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the FST statistic is, the more of the variation is partitioned between the groups.
Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the FST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting FST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, FST was the go-to statistic for making phylogenetic inferences on a within-species scale.
Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.
But FST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise FST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).
Ancient populations were very distinct in Europe from modern ones.
Many modern groups are clustered close together.
The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.
Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….