It looks as if the vast majority (95% or more depending on the population) of the ancestry of non-African humans derives from a population expansion which began around ~60,000 years ago. Before this period some researchers argue there was a non-trivial period of isolation. The “long bottleneck” (David Reich alludes to this in Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past). For the vast majority of humans then the last 60,000 years is characterized by a branching process, some reticulation (e.g., South Asians merge West and East Eurasian lineages) between these branches from a common ancestor, as well as introgression from archaic lineages like Neanderthals and Denisovans.
Though I do accept that it seems that modern humans probably migrated out of Africa before 60,000 years ago, mostly due to the results from archaeology, I think the genetic evidence is strong that these groups contributed very little genetically to contemporary populations.
The situation within Africa is very different. Being conservative it seems likely that the Khoisan ancestral lineage diverged from some other Africans ~200,000 years ago. I say conservative because there are researchers who want to push the divergence much further back. Additionally, several different research groups are now converging in a result that West Africans are a mixture between eastern Sub-Saharan Africans (think the population ancestral to Mota in Ethiopia) and a lineage basal to all other humans. That means that the Khoisan are not the most basal, so even assuming the conservative 200,000 year divergence point for Khoisan, modern humans share a common ancestor earlier than 200,000 years ago.
The upshot here is that around 75 percent of the history of modern humans is within (greater)* Africa. The distinctive “Out of Africa” bottleneck and expansion defines most humans only in the last 25 percent of the history of our species. And, within Africa, the dynamics were very different. The biggest difference is that African populations are not defined by a large number of lineages emerging and diverging around the same period, because there wasn’t a massive and singular expansion within Africa analogous to what occurred outside of Africa (at least until the recent past, with the Bantu expansion). That’s why there’s deep structure within Africa today between groups as divergent as the Bantu, Mbuti, Hadza, and Khoisan.
The term “Basal Eurasian” kind of makes sense in the non-African context because of the singular importance of divergence between lineages in the first 10,000 years or so after the “Out of Africa” event. I’m not sure “Basal human” makes as much sense because there wasn’t a singular event within Africa that allowed for the emergence of modern humans. Rather, it was a process, and probably quite resembles something like multiregionalism.
* Some wiggle room here for the likelihood that modern humans were long present in the liminal Near East.
Recently I had a discussion with a friend that I suspect the “tropical pygmy” phenotype you see Central Africa and Southeast Asia is a pretty recent development. So this sort of assertion, “The Sentinelese tribe have remained on their North Sentinel Island, almost completely uncontacted for nearly 60,000 years…” is probably wrong. First, the Sentinelese probably arrived with other Andaman peoples during the Pleistocene from mainland Southeast Asia when the archipelago may have been connected to the mainland due to low sea levels.
Second, the small size of many tropical hunter-gatherer populations may simply be due to the difficulty of surviving in this environment. Though rainforests are lush, humans can’t access a lot of it, and small animals tend to require more energy to catch than is justified by how much meat they provide.
Different human populations facing similar environmental challenges have sometimes evolved convergent biological adaptations, for example hypoxia resistance at high altitudes and depigmented skin in northern latitudes on separate continents. The pygmy phenotype (small adult body size), a characteristic of hunter-gatherer populations inhabiting both African and Asian tropical rainforests, is often highlighted as another case of convergent adaptation in humans. However, the degree to which phenotypic convergence in this polygenic trait is due to convergent vs. population-specific genetic changes is unknown. To address this question, we analyzed high-coverage sequence data from the protein-coding portion of the genomes (exomes) of two pairs of populations, Batwa rainforest hunter-gatherers and neighboring Bakiga agriculturalists from Uganda, and Andamanese rainforest hunter-gatherers (Jarawa and Onge) and Brahmin agriculturalists from India. We observed signatures of convergent positive selection between the Batwa and Andamanese rainforest hunter-gatherers across the set of genes with annotated ‘growth factor binding’ functions (p<0.001). Unexpectedly, for the rainforest groups we also observed convergent and population-specific signatures of positive selection in pathways related to cardiac development (e.g. 'cardiac muscle tissue development'; p=0.003). We hypothesize that the growth hormone sub-responsiveness likely underlying the pygmy phenotype may have led to compensatory changes in cardiac pathways, in which this hormone also plays an essential role. Importantly, we did not observe similar patterns of positive selection on sets of genes associated with either growth or cardiac development in the agriculturalist populations, indicating that our results most likely reflect a history of convergent adaptation to the similar ecology of rainforest hunter-gatherers rather than a more common or general evolutionary pattern for human populations.
A minor note: there is some ethnographic data that the isolated Sentinelese are not as small as the other Andaman Islanders. Some of their small size may simply be due to exposure to diseases and the stress of settlers from the mainland.
If you are American you have probably heard about “Cheddar Man” in Bryan Sykes’ Seven Daughters of Eve. If you don’t know, Cheddar Man is a Mesolithic individual from prehistoric Britain, dating to 9,150 years before the present. Sykes’ DNA analysis concluded that he was mtDNA haplogroup U5, which is found in ~10% of modern Europeans, and which ancient DNA has found to be overwhelmingly dominant among European hunter-gatherers. But for years there has been controversy as to whether this result was contamination (after all, if it’s found in ~10% of modern Europeans it wouldn’t be surprising if the DNA was contaminated).
Today that is a moot point. On February 18th Channel 4 in the UK will premier a documentary that seems to indicate genomic analysis of Cheddar Man’s remains have been performed, and he turns out to be exactly what we would have expected. That is, he’s a “Western Hunter-Gatherer” (WHG) with affinities to the remains from Belgium, Spain, and Central Europe. These WHG populations were themselves relatively recent arrivals in Pleistocene Europe, with connections to some populations in the Near East, and with unexplored minor genetic admixture from an East Asian population. Their total contribution to the ancestry of modern Europeans varies, with lower fractions in the south of the continent, and the highest in the northeast.
Overall, the consensus seems to be that in Western Europe the genuine descent from indigenous hunter-gatherers passed down through admixture with Neolithic farmers, and then the Corded Ware and Bell Beaker groups, is around ~10%. This is the number that shows up in the press write-ups. But, there are some researchers who contend it is far less than 10%, and that that fraction is misattribution due to early admixture with relatives of these hunter-gatherers as steppe and farmer peoples were expanding.
Phylogenetics aside, one of the major headline aspects of the Cheddar Man is that reconstructions are now of a very dark-skinned and blue-eyed individual. Some of the more sensationalist press is declaring that the “first Britons were black!” As far as the depiction goes, this is literally true. The reconstruction is of a black-skinned individual in the sense we’d describe black-skinned.
But on one level it is entirely expected that this is what Cheddar Man would look like. The hunter-gatherers of Mesolithic Western Europe were genetically homogenous. They seem to derive from a small founder population. And, on the pigmentation loci which make modern Europeans very distinctive vis-a-vis other populations, SLC24A5, SLC45A2 and HERC2-OCA2, they were quite different from anything we’ve encountered before. First, these peoples seem to have had a frequency for the genetic variants strongly implicated in blue eyes in modern Europeans close to what you find in the Baltic region. The overwhelming majority carried the derived variant, perhaps even in regions such as Spain, which today are mostly brown-eyed because of the frequency of the ancestral variant. Second, these European hunter-gatherers tended to lack the genetic variants at SLC24A5 and SLC45A2 correlated with lighter skin, which today in European is found at frequencies of ~100% and 95% to 80% respectively.
The reason that one of the scientists being interviewed stated that there was a “76 percent probability that Cheddar Man had blue eyes” is that they used something like IrisPlex. They put in the genetic variants and popped out a probability. The problem is that the training set here is modern groups, which may have a very different genetic architecture than ancient populations. Recent work on Africans and East Asians indicate that the focus on European populations when it comes to pigmentation genetics has left huge lacunae in our understanding of common variants which affect variation in outcome.
East Asians, for example, lack both the derived variants of SLC24A5 and SLC45A2 common in Europeans but are often quite light-skinned. A deeper analysis of the pigmentation architecture of WHG might lead us to conclude that they were an olive or light brown-skinned people. This is my suspicion because modern Arctic peoples are neither pale white nor dark brown, but of various shades of olive.
As far as blue eyes go, it is reasonable that these individuals had that eye color because that trait seems somewhat less polygenic than skin color. There are darker complected people with light eyes, from the famous “Afghan girl” to the first black American Miss America, Vanessa Williams. The homozygote of the derived HERC-OCA2 variant seems relatively penetrant. From what I recall the literature indicates many people with blue eyes are not homozygotes on this locus for the derived haplotypes, but those who are homozygotes for the derived haplotypes invariably have blue eyes.
Addendum: It isn’t clear in the press pieces, but it looks like they got a high coverage genome sequence out of Cheddar Man. They refer to sequencing, and, they seem to have hit all the major pigmentation loci. This indicates reasonable coverage of the genome.
If you are the product of a first cousin marriage, you have lots of runs of homozygosity. That’s because some of you will have large sections of the genome where both of the homologous chromosomes come from the same individual and are identical. In populations with small populations, this occurs not through recent inbreeding, as much as the reduced genetic diversity cranking up the frequency of some haplotypes over and above others.
The review covers all the bases, from distributions of runs of homozygosity in modern populations to ancient ones, as well as their functional consequences.
To the left, the plot shows that some populations, such as the Makrani of Pakistan, have fewer numbers of runs of homozygosity, but long ones when they have them. The populations on this part of the diagram are part of the “inbreeding belt.” In contrast, there are other populations with lots of runs of homozygosity, but they’re shorter. These are usually part of the “bottleneck belt,” where bottlenecks and small long-term effective populations have produced greater levels of homozygosity even on the genotype scale.
Perhaps the most interesting point though is that runs of homozygosity strongly correlate with changes in the values of a complex trait. In general, inbreeding is not too good, because recessively expressing deleterious alleles get exposed, and runs of homozygosity are a proxy for that.* This is why more exogamy in the Middle East and India may be such a social good.
* There may be confounds here. More educated and smarter people may marry those more distant from them geographically due to mobility.
Modern molecular genetic datasets, primarily collected to study the biology of human health and disease, can be used to directly measure the action of natural selection and reveal important features of contemporary human evolution. Here we leverage the UK Biobank data to test for the presence of linear and nonlinear natural selection in a contemporary population of the United Kingdom. We obtain phenotypic and genetic evidence consistent with the action of linear/directional selection. Phenotypic evidence suggests that stabilizing selection, which acts to reduce variance in the population without necessarily modifying the population mean, is widespread and relatively weak in comparison with estimates from other species.
The stabilizing selection part is probably the most interesting part for me. But let’s hold up for a moment, and review some of the major findings. The authors focused on ~375,000 samples which matched their criteria (white British individuals old enough that they are well past their reproductive peak), and the genotyping platforms had 500,000 markers. The dependent variable they’re looking at is reproductive fitness. In this case specifically, “rRLS”, or relative reproductive lifetime success.
With these huge data sets and the large number of measured phenotypes they first used the classical Lande and Arnold method to detect selection gradients, which leveraged regression to measure directional and stabilizing dynamics. Basically, how does change in the phenotype impact reproductive fitness? So, it is notable that shorter women have higher reproductive fitness than taller women (shorter than the median). This seems like a robust result. We’ve seen it before on much smaller sample sizes.
The results using phenotypic correlations for direction (β) and stabilizing (γ) selection are shown below separated by sex. The abbreviations are the same as above.
There are many cases where directional selection seems to operate in females, but not in males. But they note that that is often due to near zero non-significant results in males, not because there were opposing directions in selection. Height was the exception, with regression coefficients in opposite directions. For stabilizing selection there was no antagonistic trait.
A major finding was that compared to other organisms stabilizing selection was very weak in humans. There’s just not that that much pressure against extreme phenotypes. This isn’t entirely surprising. First, you have the issue of the weirdness of a lot of studies in animal models, with inbred lines, or wild populations selected for their salience. Second, prior theory suggests that a trait with lots of heritable quantitative variation, like height, shouldn’t be subject to that much selection. If it had, the genetic variation which was the raw material of the trait’s distribution wouldn’t be there.
Using more complex regression methods that take into account confounds, they pruned the list of significant hits. But, it is important to note that even at ~375,000, this sample size might be underpowered to detect really subtle dynamics. Additionally, the beauty of this study is that it added modern genomic analysis to the mix. Detecting selection through phenotypic analysis goes back decades, but interrogating the genetic basis of complex traits and their evolutionary dynamics is new.
To a first approximation, the results were broadly consonant across the two methods. But, there are interesting details where they differ. There is selection on height in females, but not in males. This implies that though empirically you see taller males with higher rLSR, the genetic variance that is affecting height isn’t correlated with rLSR, so selection isn’t occurring in this sex.
~375,000 may seem like a lot, but from talking to people who work in polygenic selection there is still statistical power to be gained by going into the millions (perhaps tens of millions?). These sorts of results are very preliminary but show the power of synthesizing classical quantitative genetic models and ways of thinking with modern genomics. And, it does have me wondering about how these methods will align with the sort of stuff I wrote about last year which detects recent selection on time depths of a few thousand years. The SDS method, for example, seems to be detecting selection for increasing height the world over…which I wonder is some artifact, because there’s a robust pattern of shorter women having higher fertility in studies going back decades.
In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in FST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the FST statistic is, the more of the variation is partitioned between the groups.
Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the FST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting FST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, FST was the go-to statistic for making phylogenetic inferences on a within-species scale.
Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.
But FST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise FST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).
Ancient populations were very distinct in Europe from modern ones.
Many modern groups are clustered close together.
The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.
Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….
When I was a kid “killer bees” were a major pop culture thing. There were movies about the bees, and we would get updates about their march northward in the news. They were a cautionary tale of our species’ hubris.
Today we have a little bit more perspective. These bees were actually just African honeybees, the ancestral population to European honeybees, which were introduced to the New World with Europeans centuries earlier than the African honeybees. African honeybees were not that different from European honeybees, but they were more aggressive and tended to outcompete European honeybee colonies. They are a major problem for the beekeeping industry, but not a major threat to human life.
Today the African and European populations in the United States seem to have stabilized in their ranges, with a hybrid zone between them. African bee’s migratory behavior makes them less competitive with European bees in colder climates.
Highly aggressive Africanized honeybees (AHB) invaded Puerto Rico (PR) in 1994, displacing gentle European honeybees (EHB) in many locations. Gentle AHB (gAHB), unknown anywhere else in the world, subsequently evolved on the island within a few generations. Here we sequence whole genomes from gAHB and EHB populations, as well as a North American AHB population, a likely source of the founder AHB on PR. We show that gAHB retains high levels of genetic diversity after evolution of gentle behaviour, despite selection on standing variation. We observe multiple genomic loci with significant signatures of selection. Rapid evolution during colonization of novel habitats can generate major changes to characteristics such as morphological or colouration traits, usually controlled by one or more major genetic loci. Here we describe a soft selective sweep, acting at multiple loci across the genome, that occurred during, and may have mediated, the rapid evolution of a behavioural trait.
Come for the bees, but stay for the soft selection! If you talk to anyone in evolutionary and population genomics you know that the future is in understanding patterns of soft selection and polygenic selection from standing variation. Though these are related phenomena which are associated with each other, all are all distinct.
Standing variation just refers to the diversity which is segregating in the population at any given time. At any given moment many loci exhibit polymorphism. This polymorphism can be a target of natural selection if it is correlated with heritable variation and differentials in fitness. Though soft selection can be quite wooly it’s inverse, hard selection, is clear: in genetic terms hard selection can be seen in allele frequency changes at a single variant in a locus, going from the point where it is a novel mutation to nearly fixed in the population. In Haldane’s original conception hard selection involved excess deaths, and imposed a limit on the rate of evolution as well as the amount variation you could expect within a given population. This model was convenient in the pre-genomic and early genomic era because empirical selection tests had to focus on large allele frequency changes around singular loci. Researchers didn’t have large numbers of whole-genome samples available (nor the computational ability to analyze them).
Today this is not a limitation. In the analysis above the authors had 30 individuals of the 3 populations sequenced at high quality (20x). They ended up with millions of genetic variants they could analyze.
The plot to the left shows that “gentle African honeybees” (gAHB) tend to be closer to the African honeybee populations (AHB) overall (though with some hybridization with European honeybees, EHB). This is not surprising.
But the key observation was that over 12 generations the African honeybees of Puerto Rico became progressively less aggressive, despite maintaining overall morphological similarities to the mainland Mexican African bees from which they likely derive. Though buried in the discussion, there is a rationale for why this morphological change may have occurred: the Puerto Rican bees are subject to a lot of negative selection against aggression because of the density of the island, as well as the reality that aside from humans there aren’t other many species where their aggressive tendencies are beneficial. Basically, if you are an aggressive colony, it’s harder to make a go in densely settled areas (the implication here then is that there are probably “gentle” African honeybee populations across Latin America, they just are never disaggregated from the broader meta-population).
It’s the genomics where the real evolutionary insight comes in: they found that there were multiple soft sweep events around genetic regions implicated in behavior. In their overall genome the gAHB of Puerto Rico resembled mainland AHB, but in this subset of genetic loci they resembled EHB. Many of these loci had also been known to be targets of selection when the original European bee population diverged from the ancestral African population. Basically this is a genomic illustration of convergent evolution.
Regular readers of this blog will recognize the ways they detected selection. They used a modified form of EHH, which is reasonable since the selection event was recent enough to have been associated with distinct haplotype blocks. Also, standard Fst analysis showed that these were outliers in relation to the broader genetic pattern of relatedness (these loci were more like EHB than AHB, while most loci were more like AHB than EHB).
So this a form of polygenic selection. Remember, natural selection only knows genes through the phenotype (with intra-genomic selection being an exception). A behavior like aggression is probably subject to the fourth law of behavior genetics. That is, variation won’t be defined around a single genetic locus. Rather, variation across the genome will be correlated with variation in the phenotype. As selection favors a particular value of the phenotype across the distribution the allele frequencies across many genetic loci will shift, but they will not necessarily fix. Polygenic selection operates on the dispersed standing genetic variation which explains much of the variation of the phenotype in question. Instead of total sweeps to fixation due to large fitness differences between a given allele and its alternative form, the selection impact is distributed and diffused across the genome.
Though most of the genetic variants seem to recapitulate the evolution of the less aggressive phenotype that occurred with the original migration north of African honeybees, some of the selection signatures were novel. This points to the reality that when you have soft selection on standing variation you may have similar phenotypes which evolve via different means. Additionally, the authors noted that these results were in contrast to controlled breeding experiments in mammals where selection for gentility (“domestication”) often targeted a few loci and exhibited strong pleiotropic effects (due to the genetic correlation). These results point to the limitations of inferences made from human-directed selection.
Soft selection is probably ubiquitous. Consider the evolution of skin color in humans. There are lots of variants and lots of variation, and most of the variation seems to be ancestral. Only at the locus SLC24A5 do you have a perfect illustration of a hard selective sweep, probably from a de novo mutation that emerged around the Last Glacial Maximum.
From a geneticists’ perspective evolution is basically conceived of as changes in allele frequencies over time. Much of this is due to natural selection. Now that the world of soft selection is opening up, I suspect that we’ll understand a lot more of what we see around us, at least in the generality.