The Human Genome Diversity Project at high-coverage!

After a few years of presentations and preprints, the new high-quality whole-genome analysis of the HGDP dataset is finally published in Science, Insights into human genetic variation and population history from 929 diverse genomes. The HGDP dates back 30 years, so this is the culmination of a long line of research. The authors in this paper looked at nearly 1,000 HGDP individuals at high coverage sequencing, meaning that they had extremely good confidence in their calls of the state of a base across all 3 billion pairs.

This is in contrast to the ~600,000 markers in the original HGDP analyses from the 2000s, which came from results of a “SNP-array.” A SNP-array of this form focuses on the variation by looking at polymorphic sites (sites which vary in the population). How did they originally determine what was polymorphic? Unfortunately, they had to rely on European populations, so the original analyses were using a quite skewed measuring stick. Whole-genome analyses bypass these problems because you get the totality of sequence information, and, the high-coverage means you can confidently call very rare variations in some of these individuals (they’re not false positives).

The HGDP was assembled by L. L. Cavalli-Sforza and curated from ethnographically interesting populations. Therefore, it is useful to compare it to the 1000 Genomes, which tends to focus on more conventional populations. The 1000 Genomes has 2,500 individuals, sequenced at somewhat lower coverage on average. While this project yielded 70 million polymorphisms, the 1000 Genomes Project had 85 million. Most of these are rare. The power to detect rare polymorphisms is useful in elucidating population structure because rare polymorphisms tend to be evolutionary new, and so reflect more recent differentiation.

For example, they compared Yoruba, Mbuti, and non-Africans. Looking at common polymorphisms the Yoruba are closer to non-Africans while looking at rare ones they are closer to Mbuti. Why? The rarer polymorphisms reflect recent differentiation, and there has been recent gene flow between Mbuti and Yoruba.

On the whole, they recapitulated earlier findings but using more sophisticated methods that leveraged their whole-genome data they added some wrinkles. For example, some populations diverged in a very sharp and distinct fashion, such as Han and Yakuts, or Druze and Sardinians. But for the populations that diverged between 150,000 and 50,000 years ago, mostly within Africa, the separations were more gradual and probably characterized by repeated gene flow between the descendent groups (e.g., Non-Africans, Yoruba, Mbuti, San, etc.).

This reiterates that there isn’t a one-size-fits-all narrative we can use to talk about the emergence of modern populations and the way those populations are patterned. There are debates about whether we are a “clinal” species or not. I don’t think that’s a good question, because as implied in this paper a great deal of the past diversity has been collapsed through recent admixture events. The authors also detect deep and complex structure and differentiation. They’re clearly just scratching the surface.

Finally, there is more reiteration of the nature of Neanderthal and Denisovan admixture. The Neanderthals who mixed into early humans were quite homogeneous, or, there were not many of them. The haplotypes are not too numerous, and, they don’t exhibit the patterns you’d expect from different admixtures and source populations. The diversity is too great to be a single individual, but it could have been a small number. The main caution I would suggest here is that Neanderthals seem to often be quite homogeneous on the local scale.

The Denisovans are a different story. They detect the difference between Oceanian and non-Oceanian Denisovan ancestry (the Oceanian source Denisovans were quite distinct from the Altai Denisovans). But they also detect a different Denisovan contribution to the genomes of the Cambodians. The indigenous people of the Phillippines also harbor different Denisovan ancestry (not in this paper). The “Denisovans” seem to have been a cluster of different lineages that persisted in parallel for a long time.

Where is there to go next with the HGDP. At some point, better technologies will allow for a more thorough exploration of structural variation. I’ve emphasized this is an analysis of the sequence because that’s what it is. There is more information in non-sequence variation that they’ll get to one day (there was some structural analysis in this paper, but I believe that we are currently technology limited).


The genomics of the Viking Age

A huge new preprint on Vikings (as well as the Bronze Age, Iron Age, and comparisons to moderns), Population genomics of the Viking world:

…we sequenced the genomes of 442 ancient humans from across Europe and Greenland ranging from the Bronze Age (c. 2400 BC) to the early modern period (c. 1600 CE), with particular emphasis on the Viking Age. We find that the period preceding the Viking Age was accompanied by foreign gene flow into Scandinavia from the south and east: spreading from Denmark and eastern Sweden to the rest of Scandinavia. Despite the close linguistic similarities of modern Scandinavian languages, we observe genetic structure within Scandinavia, suggesting that regional population differences were already present 1,000 years ago. We find evidence for a majority of Danish Viking presence in England, Swedish Viking presence in the Baltic, and Norwegian Viking presence in Ireland, Iceland, and Greenland. Additionally, we see substantial foreign European ancestry entering Scandinavia during the Viking Age. We also find that several of the members of the only archaeologically well-attested Viking expedition were close family members. By comparing Viking Scandinavian genomes with present-day Scandinavian genomes, we find that pigmentation-associated loci have undergone strong population differentiation during the last millennia. Finally, we are able to trace the allele frequency dynamics of positively selected loci with unprecedented detail, including the lactase persistence allele and various alleles associated with the immune response. We conclude that the Viking diaspora was characterized by substantial foreign engagement: distinct Viking populations influenced the genomic makeup of different regions of Europe, while Scandinavia also experienced increased contact with the rest of the continent.

A few notes:

– Though the broad patterns seem to have been established with the expansion between 3,000 and 2,500 BC from the Yamnaya steppe (at least in Northern Europe), some subtle details in genome-wide ancestry shifted in subsequent periods. This data set seems to show a decline in “Neolithic Farmer” and increase in hunter-gatherer and steppe ancestry after the Bronze Age, with some increase in the former by the Viking Age. This suggests that there is some sort of skew in sampling which misses populations enriched for hunter-gatherer ancestry (I suspect these groups live in the most marginal land and are the most mobile).

– There is structure by the Viking Age, which is not surprising. But the authors also report a few regions of southern Sweden where samples are enriched for Neolithic farmer ancestry down to the Viking age, suggesting that even ancient structure wasn’t well mixed (yet).

– Most of the selection for the phenotype which characterizes modern-day Northern European populations seem to have completed over the 2,000 years between the Bronze Age and the Viking Age.


The vines around the tree trunks

A lot of the understanding of scientific theories and models in the public domain is communicated by evocative metaphors and turns of phrase. For example, Charles Darwin famously wrote:

It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us….

When it comes to understanding the origin of our own species and the broader human lineage over the past two million years, I’ve started to come to a mental model of a weighted-graph with edges. Some of the edges traverse time and have strong weights. These are analogous to the normal phylogenetic tree model, representing phyletic gradualism and anagenesis along each branch before some bifurcation event. But, some of the edges move horizontally between others. These represent migration and/or gene flow between the primary lineages.

I’m not sure though that a graph theory derived mental model helps many people, so I’ll use another one: imagine large trunks defining the primary lineages, and vines tying them together representing gene flow events. The above figure is from a new preprint, Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. This is a methods-heavy preprint. It utilizes an “ancestral recombination graph” (so a model of the genealogy of genes in the genome) and MCMC generate Bayesian probabilities of particular events (e.g., introgression of a lineage that diverged x years ago at fraction y).

The abstract presents some specific findings:

…While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present an extended version of the ARGweaver algorithm, ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topology and branch lengths along the genome, but also indicate migrant lineages…We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples…We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. We also identify 1% of the Denisovan genome which was likely introgressed from an unsequenced hominin ancestor, and note that 15% of these regions have been passed on to modern humans through subsequent gene flow.

ARGweaver-D is gnarly. Not in a bad way. But you should never really trust computational wizard of this sort unless you’ve taken it for a test drive, or it’s been around decades and people have validated it. A “play with the parameters” phase is necessary for these packages to become more than magic.

That being said, for about half a decade people have been detecting evidence of a “super-archaic” lineage within Denisovans. This is just another confirmation with another method. The super-archaic hypothesis seems plausible as an explanation of the patterns in the data (there may be other explanations). Second, there’s a lot of circumstantial evidence for gene flow into Neanderthals from moderns. E.g., mtDNA replacement in Neanderthals. Though not in the abstract, the preprint mentions the likelihood of “super-archaic” introgression into Neanderthals as well. From a recent ancient DNA paper on Nuclear DNA from two early Neandertals reveals 80,000 years of genetic continuity in Europe:

We find that population split times between HST and other Neandertals of less than 150 ka ago make the occurrence of a mitochondrial time to the most recent common ancestor (TMRCA) of 270 ka ago unlikely (1.2% of all simulated loci have such a deep TMRCA; note S11). We note that this result is robust to uncertainties in the estimates of the Neandertal population size and of the mitochondrial TMRCA (note S11). The presence of this deeply divergent mtDNA in HST thus suggests a more complex scenario in which HST carries some ancestry from a genetically distant population.

It seems entirely likely that we’re going to see “shadows of forgotten ancestors” in our genomes. But wait, there’s more!

…ARGweaver-D only detected a small amount of Sup→Afr introgression, which was somewhat lower than our estimated false positive rate. One aspect to note here is that the power to identify introgression from an unsequenced population is highly dependent on the population size of the recipient population. The larger the population, the deeper the coalescences are within that population, making it more difficult to discern which long branches might be explained by super-archaic introgression…If we had used a smaller population size, ARGweaver-D would have produced more Sup→Afr predictions, but most of these would be false positives unless that smaller population size is closer to the truth. Overall, we caution that the problem of detecting super-archaic introgression into a large and structured population such as Africas is very difficult and that claims of such introgression need to be robust to the demographic model used in analysis. It may not be possible to address the question of ancient introgression into Africans without directly sequencing fossils from the introgressing population.

In northern Eurasia, in particular, one might imagine a scenario with large fluctuations in population size, and patchy landscapes. This would reduce gene flow between populations, and also foster drift to produce distinct lineages. Simple stylized models of gene flow at particular times across disparate lineages makes a great deal of sense in this context. But if Africa had larger populations of humans, with more interconnected networks with continuous, if variable, levels of gene flow then the stylized models will mislead in important features.

This preprint is likely reporting some true robust results that will hold up. But I think the bigger picture is that it will lead us toward moving beyond the extremely simple models in vogue a generation ago, to a more subtle understanding of complex emergence and collapse of human population structure over the last two million years.


It’s raining selective sweeps

A week ago a very cool new preprint came out, Identifying loci under positive selection in complex population histories. It’s something that you can’t even imagine just ten years ago. The authors basically figure out ways to identify deviations of markers from expected allele frequency given a null neutral evolutionary model. The method is put first, which I really like, before getting to results or discussion. Additionally, they did a lot of simulation ahead of time. The sort of simulation that is really not possible before the sort of computational resources we have now.

Here’s the abstract:

Detailed modeling of a species’ history is of prime importance for understanding how natural selection operates over time. Most methods designed to detect positive selection along sequenced genomes, however, use simplified representations of past histories as null models of genetic drift. Here, we present the first method that can detect signatures of strong local adaptation across the genome using arbitrarily complex admixture graphs, which are typically used to describe the history of past divergence and admixture events among any number of populations. The method – called Graph-aware Retrieval of Selective Sweeps (GRoSS) – has good power to detect loci in the genome with strong evidence for past selective sweeps and can also identify which branch of the graph was most affected by the sweep. As evidence of its utility, we apply the method to bovine, codfish and human population genomic data containing multiple population panels related in complex ways. We find new candidate genes for important adaptive functions, including immunity and metabolism in under-studied human populations, as well as muscle mass, milk production and tameness in particular bovine breeds. We are also able to pinpoint the emergence of large regions of differentiation due to inversions in the history of Atlantic codfish.

On a related note in regards to selection, On the well-founded enthusiasm for soft sweeps in humans: a reply to Harris, Sackman, and Jensen. The authors are responding to a recent preprint criticizing their earlier work. The reason that it’s fascinating to me is that these sorts of arguments today are really concrete and not so theoretical. There’s a lot of data for analytic techinques to chew through, and computation has really transformed the possibilities.

A generation ago these sorts of debates would be a sequence of “you’re wrong!” vs. “no, you’re wrong!” Today the disputes involve a lot of data, and so have a reasonable chance of resolution.

The first preprint identifies the usual candidates in humans that you normally see, and expected targets in cattle and cod. Sure, that will given biologists more interested in mechanisms and pathways things to chew upon, but imagine once researchers have large numbers of genomes for thousands and thousands of species. Then they’ll be testing deviations from neutral allele frequencies across many trees, and getting a more general and abstract sense of the parameter that selection explores, conditional on particularities o evolutionary history.

This is why I’m excited about plans to sequence lots and lots of species.


The population genetic structure of China (through noninvasive prenatal testing)

This week a big whole genome analysis of China was published in Cell, Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History. The abstract:

We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.

In The New York Times write-up there is an interesting detail, “This study served as proof-of-concept, he added. His team is moving forward on evaluating prenatal testing data from more than 3.5 million Chinese people.” So what he’s saying is that this study with >100,000 individuals is a “pilot study.” Let that sink in.

Read More


Tracing the paths of Noah’s sons

The above admixture graph is from a new preprint, Paleolithic DNA from the Caucasus reveals core of West Eurasian ancestry. To be honest, if you read the supplementary text there’s almost no point in reading the main preprint, as it is far more in depth when it comes to the methodology as well as spotlighting a variety of particular results. It’s hard to know where to begin with such a preprint so I want to highlight the “this is a simplified model” portion in the figure above. That’s actually the truth. Remember, no admixture graph is the Truth, it is an attempt by humans to capture concisely and informatively the major features of our species’ population history dynamics. The reality was never as clear and distinct as stylized graphical representations would have you think, and the researchers are aware of this.

In any case, if you want to really get at how they arrived at the conclusions they did, really read the supplementary section SI 2, “An admixture graph model of Upper Paleolithic West Eurasians.” The authors have so many potential combinations of ancestral populations that they can’t simply manually and intuitively posit admixtures. Rather, they have to explore a huge number of combinations (trees/graphs)…at which point they run into computational limits. This section explicitly lays out computationally efficient ways to automatically traverse the possibility space, and arrive at the best fitting set of models, within reason.

The title of the preprint says it all, but let me quote the abstract in full:

The earliest ancient DNA data of modern humans from Europe dates to ~40 thousand years ago, but that from the Caucasus and the Near East to only ~14 thousand years ago, from populations who lived long after the Last Glacial Maximum (LGM) ~26.5-19 thousand years ago. To address this imbalance and to better understand the relationship of Europeans and Near Easterners, we report genome-wide data from two ~26 thousand year old individuals from Dzudzuana Cave in Georgia in the Caucasus from around the beginning of the LGM. Surprisingly, the Dzudzuana population was more closely related to early agriculturalists from western Anatolia ~8 thousand years ago than to the hunter-gatherers of the Caucasus from the same region of western Georgia of ~13-10 thousand years ago. Most of the Dzudzuana population’s ancestry was deeply related to the post-glacial western European hunter-gatherers of the ‘Villabruna cluster’, but it also had ancestry from a lineage that had separated from the great majority of non-African populations before they separated from each other, proving that such ‘Basal Eurasians’ were present in West Eurasia twice as early as previously recorded. We document major population turnover in the Near East after the time of Dzudzuana, showing that the highly differentiated Holocene populations of the region were formed by ‘Ancient North Eurasian’ admixture into the Caucasus and Iran and North African admixture into the Natufians of the Levant. We finally show that the Dzudzuana population contributed the majority of the ancestry of post-Ice Age people in the Near East, North Africa, and even parts of Europe, thereby becoming the largest single contributor of ancestry of all present-day West Eurasians.

Ancestry from Dzudzuana

Longtime readers know that I hate the American racial term “Caucasians.” It’s pretentious when you could just say “white European,” because that’s what people really mean, judging by the fact that the real people from the Caucasus are marginally Caucasian in the eyes of many Americans. The genealogical origin of the term goes back to Johann Friedrich Blumenbach. And yet this paper takes these two samples, and finds that a lot of the ancestry of modern groups can be attributed to them! (also, a religion interpretation of the results is in the title of the post)

To be fair, they caution that these ancient Caucasian samples are representative of a particular thread of human heritage, not that the center of this thread was necessarily in the Caucasus. This does make me wonder about ascertainment bias in the Near East toward samples from mountainous areas which were colder. But, at the granularity they are attempting to understand human population history, it’s probably not that big of a deal. Ultimately, they conclude that this Paleo-Caucasian population contributes “~46-88% of the ancestry” of modern Europeans, Near Easterners, and North Africans. That’s kind of a big deal.

There are so many results in this preprint, so I think we need to back to the “beginning” of the non-African branch. The Paleo-Caucasian sample is of note in part because it is from before the Last Glacial Maximum, and, about halfway back to the massive diversification of most non-African populations around 55,000 years ago.  Using the Paleo-Caucasian samples’ affinities this preprint reinterprets results from last spring on ancient DNA from Northwest Africa. In that paper, the authors conclude that Paleolithic North Africans were a mix between an unspecific Sub-Saharan population and Natufians. Here though the authors suggest that the Natufians and Yoruba both received gene flow from Paleolithic North Africans. And, these Paleolithic North Africans were themselves mixed between something similar to the Paleo-Caucasians (a mix between an ancient West Eurasian ancestry and “Basal Eurasian”), and a “Deep” ancestry which diverged from other non-Sub-Saharan Africans before the Basal Eurasians did.

The reason that the Paleo-Caucasian sample is so important is that it allowed the researchers to see that the early Holocene Near East, where Anatolian and Iranian farmers, as well as Natufians in the Levant, were ancestral to many later groups, was subject to many genetic changes from before the Last Glacial Maximum. The Natufians seem to be well modeled as having ancestry from the Paleolithic North Africans as one of the major ways they are distinctive from the Paleo-Caucasians. This presents us with a reasonable model for the west to east movement of haplogroup E, and, the Afro-Asiatic languages. The gene flow of Paleolithic North African also explains the non-trivial level of Neanderthal admixture which is found in the Yoruba population. This is mediated through the presumed back migration of Paleo-Caucasians from the Near East at some point in the Pleistocene, contributing some Neanderthal ancestry to the genetic background of Paleolithic North Africans.

Additionally, the distinction between western (Anatolian/Levant) and eastern (Iran) farmers during the early Holocene can now be understood as a product of later admixture into eastern proto-farmers of basic Paleo-Caucasian stock. The relative closeness of Anatolian farmers to the Paleo-Caucasian samples is indicative of the fact that there was an “Ancestral North Eurasian” (ANE) admixture cline into the Near East during the Pleistocene, which meant that some populations to the east became rather different from the pre-LGM samples. Probably after the Last Glacial Maximum proto-Siberian ancestry became prominent in the zone between the Caucasus and Iran (additionally, some of the models imply there was eastern Eurasian ancestry). This is in keeping with the fact that ANE ancestry does seem to have been found in places like Khorasan before the expansion south of steppe populations after 2,000 BC.

As noted in the abstract, Paleo-Caucasians had Basal Eurasian ancestry ~30,000 years ago. This increases the likelihood that Basal Eurasians weren’t recent migrants from deep inside Africa. Additionally, for various reasons, the authors are now positing a Deep ancestry which diverged even further into the past. Both Basal Eurasians and Deep populations seem to lack Neanderthal admixture. The authors also repeatedly suggest that Basal Eurasians were part of the Out of Africa bottleneck event. In Who We Are and How We Got Here David Reich presents the model that this bottleneck population had a low effective population size for a long time. This seems plausible because the genetic homogeneity that you see in non-Africans is pretty striking vis-a-vis Sub-Saharan Africans. On the other hand, this work confirms earlier results that imply that Basal Eurasians did not admix with Neanderthals, and also indicates that the divergence has to be greater than 60,000 years before the present from other non-Africans, who diversified more recently.

In contrast, the Deep ancestry group, which nevertheless forms a clade with the new Eurasian lineages (Basal and non-Basal), does not clearly seem to have undergone the bottleneck event according to this preprint. It’s more a matter of what they don’t say, rather than what they say in this case.

The big picture needs to be integrated I think with the new “modern humans emerged through a multi-regional process” within Africa. If you think of modern humans as emerging across an African range which shifted in the Near East based on oscillating climatic conditions, the ancestors of the “non-African” lineages can be thought of as one of the main deeply rooted lineages, probably in the northeast of the continent. During the Pleistocene, the Sahara was even more brutal than today during many periods, so it is not implausible that some of these marginal populations on the edge of Africa were subject to long periods of very small effective population sizes. Most of them presumably went extinct. But one population was probably far enough north and east that it had a little more margin to play with. This population was probably connected along the Mediterranean littoral at some point with the Deep component in North Africa, which had higher effective population sizes because the mountainous terrain of the Atlas region was always going to remain more clement through dry phases.

At some point one a group of the bottlenecked population mixed with some Neanderthals, and began to break out of containment in southwest Asia. If I had to bet money, I suspect there were already other related groups, probably somewhat admixed with local hominin lineages, further east. That is, I believe the archaeological results in Southeast Asia, and think that those in Australia are credible. But these groups were probably small in number, and totally absorbed by the later migration wave.

Also, the timing of the separation of Africans and “non-Africans” is such that I wouldn’t be surprised Qafzeh-Skull people were somehow ancestral to, or closely related to, the ancestors of non-Africans.

Finally, let’s remember that the authors were focusing on North Africa and Western Eurasia in this preprint. Things will get more complicated as East Asia and Africa come “online” in terms of these analyses. Of course, we are going to be helped by the reality that human genetic variation is not arbitrarily and randomly distributed, but reflects real constraints in our evolutionary history and the forces of geography as well as contingency. The non-African story is made simpler in part because of the great bottleneck, and especially the common descent of most peoples from the population that mixed with Neanderthals. The modeling of effective population size changes over time in Sub-Saharan groups does not lead us to believe that it will be so simple in that continent.

Related papers: The genetic history of Ice Age Europe, Tales of Human Migration, Admixture, and Selection in Africa, and Genomic insights into the origin of farming in the ancient Near East.


The great human migrations (coming in waves)

The figure above is from a new paper, Estimating mobility using sparse data: Application to human genetic variation, which uses genomic data from late Pleistocene to the Iron Age in western Eurasia, and then infers migration rate considering both spatial distribution and the variable of time (remember that samples apart in time should also be genetically different, just as those apart in space often are).

The empirical results are shown above, but they validated their method first by running some simulations. Interestingly they modeled the migration as a Gaussian random walk. Which is fine. But I wonder how true this is for a lot of the Eurasian migrations of the last 10,000 years. Perhaps the the distribution of distances from the place of birth would turn out be multi-modal, with a minority of individuals tending to make “long jumps”?

With that out of the way, it’s fascinating that migration peaks around the Neolithic transition, the Bronze Age, and then the Iron Age. If you read a book like 1177 BC, you know that there was a major regression in the 13th century BC across the Near East, and for several centuries the region was in a “Dark Age.” In The Human Web William H. McNeill argues that one of the reasons for the length and depth of this Dark Age is that the network of complex societies exhibited less density and so less redundancy to failure.

The authors conclude:

We find that mobility among European Holocene farmers was significantly higher than among European hunter–gatherers both pre- and postdating the Last Glacial Maximum. We also infer that this Holocene rise in mobility occurred in at least three distinct stages: the first centering on the well-known population expansion at the beginning of the Neolithic, and the second and third centering on the beginning of the Bronze Age and the late Iron Age, respectively. These findings suggest a strong link between technological change and human mobility in Holocene Western Eurasia and demonstrate the utility of this framework for exploring changes in mobility through space and time.

Earlier they say:

We find strong support for a rise in mobility during the Neolithic transition in western Eurasia, likely corresponding to a well-established demic expansion of farmers, originating in the Middle East and resulting in the spread of farming technologies throughout most of Western Eurasia

One of the main findings of Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past is that oftentimes change is not gradual. Consider the transition to the Corded-Ware society in Northern Europe.

The “demic diffusion” model is an easy one because it relies on the mass-action of individuals and family-groups as they expand in space through high fertility rates. And yet one thing that I think it misses is the socio-political context of that demic diffusion. For prehistoric periods we don’t have writing, and so no socio-political context. This is why in War Before Civilization the author focused on ethnographies of historical societies which came into contact with literate cultures which recorded their organization and folkways. The short summation is that these societies were often very aggressive and well organized for war. Additionally, hunter-gatherers themselves were keen on expanding farmers, and it seems clear they too could mobilize for violence.

The upshot is we need to think of the rise and expansion of strong states and expansionist polities as the context for an increase in the rate of migration. The reality of low migration rates in Pleistocene Europe was pretty evident even before this formal analysis. The pairwise genetic difference due to drift, and therefore low migration rates, for some nearby populations in the Pleistocene and early Holocene indicates that small-scale societies tend to be quite insulated from each other. In contrast, the Iron Age has witnessed a great deal of admixture, as large states and polities, as well as meta-ethnic identities, have broken down genetic barriers.

A regression around 1000 BC correlates neatly with reduced migration, This was almost certainly due to the fact that without larger states much of West Eurasian society, such as in Greece, had disintegrated into smaller tribal units.

Future historians and geneticists will notice that in the period between 1500 and 2000 the distribution of the Y chromosome lineage R1b1a1a2 expanded far beyond Western Europe. They will also understand the political context for this expansion of the lineage…


Beyond “Out of Africa” within Africa

It looks as if the vast majority (95% or more depending on the population) of the ancestry of non-African humans derives from a population expansion which began around ~60,000 years ago. Before this period some researchers argue there was a non-trivial period of isolation. The “long bottleneck” (David Reich alludes to this in Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past). For the vast majority of humans then the last 60,000 years is characterized by a branching process, some reticulation (e.g., South Asians merge West and East Eurasian lineages) between these branches from a common ancestor, as well as introgression from archaic lineages like Neanderthals and Denisovans.

Though I do accept that it seems that modern humans probably migrated out of Africa before 60,000 years ago, mostly due to the results from archaeology, I think the genetic evidence is strong that these groups contributed very little genetically to contemporary populations.

The situation within Africa is very different. Being conservative it seems likely that the Khoisan ancestral lineage diverged from some other Africans ~200,000 years ago. I say conservative because there are researchers who want to push the divergence much further back. Additionally, several different research groups are now converging in a result that West Africans are a mixture between eastern Sub-Saharan Africans (think the population ancestral to Mota in Ethiopia) and a lineage basal to all other humans. That means that the Khoisan are not the most basal, so even assuming the conservative 200,000 year divergence point for Khoisan, modern humans share a common ancestor earlier than 200,000 years ago.

The upshot here is that around 75 percent of the history of modern humans is within (greater)* Africa. The distinctive “Out of Africa” bottleneck and expansion defines most humans only in the last 25 percent of the history of our species. And, within Africa, the dynamics were very different. The biggest difference is that African populations are not defined by a large number of lineages emerging and diverging around the same period, because there wasn’t a massive and singular expansion within Africa analogous to what occurred outside of Africa (at least until the recent past, with the Bantu expansion). That’s why there’s deep structure within Africa today between groups as divergent as the Bantu, Mbuti, Hadza, and Khoisan.

The term “Basal Eurasian” kind of makes sense in the non-African context because of the singular importance of divergence between lineages in the first 10,000 years or so after the “Out of Africa” event. I’m not sure “Basal human” makes as much sense because there wasn’t a singular event within Africa that allowed for the emergence of modern humans. Rather, it was a process, and probably quite resembles something like multiregionalism.

* Some wiggle room here for the likelihood that modern humans were long present in the liminal Near East.


Rainforest hunter-gatherers are not primitive or primal

Recently I had a discussion with a friend that I suspect the “tropical pygmy” phenotype you see Central Africa and Southeast Asia is a pretty recent development. So this sort of assertion, “The Sentinelese tribe have remained on their North Sentinel Island, almost completely uncontacted for nearly 60,000 years…” is probably wrong. First, the Sentinelese probably arrived with other Andaman peoples during the Pleistocene from mainland Southeast Asia when the archipelago may have been connected to the mainland due to low sea levels.

Second, the small size of many tropical hunter-gatherer populations may simply be due to the difficulty of surviving in this environment. Though rainforests are lush, humans can’t access a lot of it, and small animals tend to require more energy to catch than is justified by how much meat they provide.

Genomics is now on the case: Polygenic adaptation and convergent evolution across both growth and cardiac genetic pathways in African and Asian rainforest hunter-gatherers:

Different human populations facing similar environmental challenges have sometimes evolved convergent biological adaptations, for example hypoxia resistance at high altitudes and depigmented skin in northern latitudes on separate continents. The pygmy phenotype (small adult body size), a characteristic of hunter-gatherer populations inhabiting both African and Asian tropical rainforests, is often highlighted as another case of convergent adaptation in humans. However, the degree to which phenotypic convergence in this polygenic trait is due to convergent vs. population-specific genetic changes is unknown. To address this question, we analyzed high-coverage sequence data from the protein-coding portion of the genomes (exomes) of two pairs of populations, Batwa rainforest hunter-gatherers and neighboring Bakiga agriculturalists from Uganda, and Andamanese rainforest hunter-gatherers (Jarawa and Onge) and Brahmin agriculturalists from India. We observed signatures of convergent positive selection between the Batwa and Andamanese rainforest hunter-gatherers across the set of genes with annotated ‘growth factor binding’ functions (p<0.001). Unexpectedly, for the rainforest groups we also observed convergent and population-specific signatures of positive selection in pathways related to cardiac development (e.g. 'cardiac muscle tissue development'; p=0.003). We hypothesize that the growth hormone sub-responsiveness likely underlying the pygmy phenotype may have led to compensatory changes in cardiac pathways, in which this hormone also plays an essential role. Importantly, we did not observe similar patterns of positive selection on sets of genes associated with either growth or cardiac development in the agriculturalist populations, indicating that our results most likely reflect a history of convergent adaptation to the similar ecology of rainforest hunter-gatherers rather than a more common or general evolutionary pattern for human populations.

A minor note: there is some ethnographic data that the isolated Sentinelese are not as small as the other Andaman Islanders. Some of their small size may simply be due to exposure to diseases and the stress of settlers from the mainland.


The genome of “Cheddar Man” is about to be published

If you are American you have probably heard about “Cheddar Man” in Bryan Sykes’ Seven Daughters of Eve. If you don’t know, Cheddar Man is a Mesolithic individual from prehistoric Britain, dating to 9,150 years before the present. Sykes’ DNA analysis concluded that he was mtDNA haplogroup U5, which is found in ~10% of modern Europeans, and which ancient DNA has found to be overwhelmingly dominant among European hunter-gatherers. But for years there has been controversy as to whether this result was contamination (after all, if it’s found in ~10% of modern Europeans it wouldn’t be surprising if the DNA was contaminated).

Today that is a moot point. On February 18th Channel 4 in the UK will premier a documentary that seems to indicate genomic analysis of Cheddar Man’s remains have been performed, and he turns out to be exactly what we would have expected. That is, he’s a “Western Hunter-Gatherer” (WHG) with affinities to the remains from Belgium, Spain, and Central Europe. These WHG populations were themselves relatively recent arrivals in Pleistocene Europe, with connections to some populations in the Near East, and with unexplored minor genetic admixture from an East Asian population. Their total contribution to the ancestry of modern Europeans varies, with lower fractions in the south of the continent, and the highest in the northeast.

Overall, the consensus seems to be that in Western Europe the genuine descent from indigenous hunter-gatherers passed down through admixture with Neolithic farmers, and then the Corded Ware and Bell Beaker groups, is around ~10%. This is the number that shows up in the press write-ups. But, there are some researchers who contend it is far less than 10%, and that that fraction is misattribution due to early admixture with relatives of these hunter-gatherers as steppe and farmer peoples were expanding.

Phylogenetics aside, one of the major headline aspects of the Cheddar Man is that reconstructions are now of a very dark-skinned and blue-eyed individual. Some of the more sensationalist press is declaring that the “first Britons were black!” As far as the depiction goes, this is literally true. The reconstruction is of a black-skinned individual in the sense we’d describe black-skinned.

But on one level it is entirely expected that this is what Cheddar Man would look like. The hunter-gatherers of Mesolithic Western Europe were genetically homogenous. They seem to derive from a small founder population. And, on the pigmentation loci which make modern Europeans very distinctive vis-a-vis other populations, SLC24A5, SLC45A2 and HERC2-OCA2, they were quite different from anything we’ve encountered before. First, these peoples seem to have had a frequency for the genetic variants strongly implicated in blue eyes in modern Europeans close to what you find in the Baltic region. The overwhelming majority carried the derived variant, perhaps even in regions such as Spain, which today are mostly brown-eyed because of the frequency of the ancestral variant. Second, these European hunter-gatherers tended to lack the genetic variants at SLC24A5 and SLC45A2 correlated with lighter skin, which today in European is found at frequencies of ~100% and 95% to 80% respectively.

The reason that one of the scientists being interviewed stated that there was a “76 percent probability that Cheddar Man had blue eyes” is that they used something like IrisPlex. They put in the genetic variants and popped out a probability. The problem is that the training set here is modern groups, which may have a very different genetic architecture than ancient populations. Recent work on Africans and East Asians indicate that the focus on European populations when it comes to pigmentation genetics has left huge lacunae in our understanding of common variants which affect variation in outcome.

East Asians, for example, lack both the derived variants of SLC24A5 and SLC45A2 common in Europeans but are often quite light-skinned. A deeper analysis of the pigmentation architecture of WHG might lead us to conclude that they were an olive or light brown-skinned people. This is my suspicion because modern Arctic peoples are neither pale white nor dark brown, but of various shades of olive.

As far as blue eyes go, it is reasonable that these individuals had that eye color because that trait seems somewhat less polygenic than skin color. There are darker complected people with light eyes, from the famous “Afghan girl” to the first black American Miss America, Vanessa Williams. The homozygote of the derived HERC-OCA2 variant seems relatively penetrant. From what I recall the literature indicates many people with blue eyes are not homozygotes on this locus for the derived haplotypes, but those who are homozygotes for the derived haplotypes invariably have blue eyes.

Addendum: It isn’t clear in the press pieces, but it looks like they got a high coverage genome sequence out of Cheddar Man. They refer to sequencing, and, they seem to have hit all the major pigmentation loci. This indicates reasonable coverage of the genome.