Since people asking me about this, and I’m running the South Asian Genotype Project, I thought I would post two non-PCA visualizations of how various South Asian groups relate to each other (along with a few outgroups).
The radial plot above is a neighbor-joining tree visualized from pairwise Fst statistics (basically a proxy for genetic distance).
I also used Treemix to generate a plot. You see the similar patterns as the one above, though the two methods are different. Treemix tests a bunch of models and sees how the data fit those models. The visualization of Fst is just a way of representing the summary statistic.
I added 5 migration edges to the plot to the right. Not sure if they add anything, but you can see that some of the nodes move around because they are so mixed.
The key issue here is that they found 11,500 year old remains from Alaska, one of which they sequenced at 17x coverage, which is rather good (not medical grade good, but really sufficient for a lot of population genomic work). It’s clear that the lineage represented by these remains is “basal” to that of other Native American peoples, whom David Reich’s group labeled “First Americans.” Later, the First Americans diverged into different populations, with the two in modern focus being a northern cluster, including the Aboriginal peoples of Canada and parts of the United States, and a southern one including everyone else. This does not mean that the Beringians were isolated outliers. There may have been many other peoples related to the Beringians who diversified, who went extinct as well. The settlement of Alaska by other peoples suggests to me that extreme conditions in the Arctic made it likely that there would be population turnover there. Also, the fact that these samples were located close to the source of settlement in the New World by modern humans makes their distant relation to all other New World populations unsurprising.
The big thing that the press is highlighting is the confirmation of the Beringian Standstill model, where modern humans percolated into the area between Siberia and Alaska, Beringia, and did not move east for thousands of years. Basically, the conditions were inclement toward human habitation on both sides of Beringia, while a relict modern human group likely occupied a pocket of more moderate climes for thousands of years, with minimal gene flow from the west, and blocked from migration to the east. Genetically the Beringia Standstill made sense for a long time…the divergence between Amerindian lineages and those of eastern Eurasia seemed too old to be accounted for by recent migration a bit more than 10,000 years ago (the old “Clovis first” hypothesis).
How old? This paper suggests that the portion of Native American ancestry which indicates an affinity to East Asians stopped exhibiting gene flow from that source around ~25,000 years ago, after diverging around ~36,000 years ago. This points to the fact that after modern humans came to dominate eastern Eurasia they began to diversify rapidly after 40,000 years ago, but gene flow between different populations did not always allow them to drift apart…at least initially. The ancestors of Native Americans and East Asians may have been in extremely separate locations by ~25,000 years ago, whether it be on the fringes of eastern Siberia, or somewhere in southern China (there is no reason that the modern Chinese have to have had ancestors resident on the North China plain before the Last Glacial Maximum).*
One aspect here I want to emphasize is that our image of a world thickly populated with humans may mislead us in our intuition about how patchy occupation was ~25,000 years ago. Yes, humans may have left artifacts all over the world, but that doesn’t mean that there weren’t centuries or millennia of no occupation, or, that meta-population dynamics were such that extinctions were common. For decades in population genetics there has been talk of “clines vs. clusters,” but if human population densities were far lower, or occupation patchier, then clines may have become much more important recently with high density than in the past.
Finally, back to the Australo-Melanesian issue. Either there is a lot of population structure in ancient Beringia to be explored, with diverse quasi-Asiatic groups, or there was an Australo-Melanesian group already in South America.
* Ancient North Eurasian ancestry came into Beringians ~20,000 years ago. Two groups which merged during the middle of the Last Glacial Maximum.
Modern molecular genetic datasets, primarily collected to study the biology of human health and disease, can be used to directly measure the action of natural selection and reveal important features of contemporary human evolution. Here we leverage the UK Biobank data to test for the presence of linear and nonlinear natural selection in a contemporary population of the United Kingdom. We obtain phenotypic and genetic evidence consistent with the action of linear/directional selection. Phenotypic evidence suggests that stabilizing selection, which acts to reduce variance in the population without necessarily modifying the population mean, is widespread and relatively weak in comparison with estimates from other species.
The stabilizing selection part is probably the most interesting part for me. But let’s hold up for a moment, and review some of the major findings. The authors focused on ~375,000 samples which matched their criteria (white British individuals old enough that they are well past their reproductive peak), and the genotyping platforms had 500,000 markers. The dependent variable they’re looking at is reproductive fitness. In this case specifically, “rRLS”, or relative reproductive lifetime success.
With these huge data sets and the large number of measured phenotypes they first used the classical Lande and Arnold method to detect selection gradients, which leveraged regression to measure directional and stabilizing dynamics. Basically, how does change in the phenotype impact reproductive fitness? So, it is notable that shorter women have higher reproductive fitness than taller women (shorter than the median). This seems like a robust result. We’ve seen it before on much smaller sample sizes.
The results using phenotypic correlations for direction (β) and stabilizing (γ) selection are shown below separated by sex. The abbreviations are the same as above.
There are many cases where directional selection seems to operate in females, but not in males. But they note that that is often due to near zero non-significant results in males, not because there were opposing directions in selection. Height was the exception, with regression coefficients in opposite directions. For stabilizing selection there was no antagonistic trait.
A major finding was that compared to other organisms stabilizing selection was very weak in humans. There’s just not that that much pressure against extreme phenotypes. This isn’t entirely surprising. First, you have the issue of the weirdness of a lot of studies in animal models, with inbred lines, or wild populations selected for their salience. Second, prior theory suggests that a trait with lots of heritable quantitative variation, like height, shouldn’t be subject to that much selection. If it had, the genetic variation which was the raw material of the trait’s distribution wouldn’t be there.
Using more complex regression methods that take into account confounds, they pruned the list of significant hits. But, it is important to note that even at ~375,000, this sample size might be underpowered to detect really subtle dynamics. Additionally, the beauty of this study is that it added modern genomic analysis to the mix. Detecting selection through phenotypic analysis goes back decades, but interrogating the genetic basis of complex traits and their evolutionary dynamics is new.
To a first approximation, the results were broadly consonant across the two methods. But, there are interesting details where they differ. There is selection on height in females, but not in males. This implies that though empirically you see taller males with higher rLSR, the genetic variance that is affecting height isn’t correlated with rLSR, so selection isn’t occurring in this sex.
~375,000 may seem like a lot, but from talking to people who work in polygenic selection there is still statistical power to be gained by going into the millions (perhaps tens of millions?). These sorts of results are very preliminary but show the power of synthesizing classical quantitative genetic models and ways of thinking with modern genomics. And, it does have me wondering about how these methods will align with the sort of stuff I wrote about last year which detects recent selection on time depths of a few thousand years. The SDS method, for example, seems to be detecting selection for increasing height the world over…which I wonder is some artifact, because there’s a robust pattern of shorter women having higher fertility in studies going back decades.
The above map is from a new preprint on the patterns of genetic variation as a function of geography for humans, Genetic landscapes reveal how human genetic diversity aligns with geography. The authors assemble an incredibly large dataset to generate these figures. The orange zones are “troughs” of gene flow. Basically barriers to gene flow. It is no great surprise that so many of the barriers correlate with rivers, mountains, and deserts. But the aim of this sort of work seems to be to make precise and quantitative intuitions which are normally expressed verbally.
To me, it is curious how the borders of the Peoples’ Republic of China is evident on this map (an artifact of sampling?). Additionally, one can see Weber’s line in Indonesia. There are the usual important caveats of sampling, and caution about interpreting present variation and dynamics back to the past. But I believe that these sorts of models and visualizations are important nulls against which we can judge perturbations.
As I said, these methods can confirm rigorously what is already clear intuitively. For example:
Several large-scale corridors are inferred that represent long-range genetic similarity, for example: India is connected by two corridors to Europe (a southern one through Anatolia and Persia ‘SC’, and
a northern one through the Eurasian Steppe ‘NC’)
We still don’t have enough ancient DNA to be totally sure, but it’s hard to ignore the likelihood that “Ancestral North Indians” (AN) actually represent two different migrations.
India also illustrates contingency of these barriers. Before the ANI migration, driven by the rise in agricultural lifestyles, there would likely have been a major trough of gene flow on India’s western border. In fact a deeper one than the one on the eastern border. And if the high genetic structure statistics from ancient DNA are further confirmed then the rate of gene flow was possibly much lower between demes in the past. Perhaps that would simply re-standardize equally so that the map itself would not be changed, but I suspect that we’d see many more “troughs” during the Pleistocene and early Holocene.
Because there are so many geographically distributed samples for humans, and frankly some of the best methods developers work with human data (thank you NIH), it is no surprise that our species would be mapped first. But I think some of the biggest insights may be with understanding the dynamics of gene flow of non-human species, and perhaps the nature and origin of speciation as it relates to isolation (or lack thereof).
In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in FST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the FST statistic is, the more of the variation is partitioned between the groups.
Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the FST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting FST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, FST was the go-to statistic for making phylogenetic inferences on a within-species scale.
Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.
But FST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise FST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).
Ancient populations were very distinct in Europe from modern ones.
Many modern groups are clustered close together.
The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.
Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….
One reason I quite like Norman Davies’ book The Isles is that it is a history of Britain and Ireland which explicitly aims to not privilege the story of the English inordinately. As the most powerful and numerous people of the British Isles the English loom large, but in the period between Gildas and Bede things were very different. In the early 600s the Welsh king Cadwallon ap Cadfan conquered and held Northumbria for a period, northern England from the Irish Sea to the North Sea. But this was the last time that a Celtic monarch held land in eastern England, unless you count the Tudors.
In The Isles, written at the turn of the century, Davies promotes the view dominant among historians at that time that the transition from British Celtic to Anglo-Saxon occurred through diffusion of elite culture. He alludes to the fact that in the year 700 the law code of Wessex alludes explicitly to the fact the weregild paid for the death of a Saxon was many-fold greater than that paid for a Briton (of the same class status). This suggests that many Britons were still resident in the Anglo-Saxon kingdoms. The contrasting view, which was dominant in the early 20th century, was that the English replaced the Celts in toto. The Irish, Welsh, and to some extent the Scots, were viewed as racially distinct from the Germanic English.
2015’s The fine scale genetic structure of the British population answered many of these questions. It turns out the maximal positions were incorrect. The authors estimate that 10-40% of the ancestry in eastern and southern England (the red positions on the map) derive from Germanic peoples which we might term Saxon, Angles, and Jutes. Even if the fraction is as low as 10% that is not trivial. If we take a value closer to ~25%, unless there were massive reproductive advantages for elites, it could not have just been diffusion from the elite. Archaeologists also see wholesale changes in agricultural patterns in eastern England, indicative of a transfer of a whole folkway.
All that being said it is likely that the majority of the ancestry of the population of England proper descends from Britons. In fact, once the Anglo-Saxon cultural hegemony was established it seems that some elite Britons may also have changed their identity. It is always a curious fact that the names of the first kings in the genealogy of the House of Wessex are distinctively Celtic. Just as Romano-Gallic aristocrats began aping the styles and mores of the Frankish elite in the 6th century, so perhaps some British warlords became Saxons.
Using similar methods many of the same authors have now put out a preprint on Ireland, Insular Celtic population structure and genomic footprints of migration. Unlike the earlier work on Britain, they’ve acknowledged the ancient DNA results which have reshaped our understanding of population turnover in Ireland. That being said, they are focused on more recent events, as well as spatial structure in the modern era.
Though they don’t have access to as detailed a regional data set as in the earlier work on Britain, in this case, the authors managed to detect a lot of regional population structure within Ireland. Why? Though the Irish are relatively homogeneous, as all Northern Europeans are, looking at long tracts of the genome and the patterns therein can squeeze out more information.
The figure at the top of this post shows how well they can cluster individuals geographically: they’ve basically recapitulated the “map of the British Isles.” There aren’t too many surprises. Western Ireland seems to exhibit greater genetic differences as a function of distance. Probably because it’s less developed, and perhaps because it has been less impacted by outsiders. Ulster and southern Scotland are strongly connected genetically. There are two issues going on here. First, the famous migration of Protestants into this region of Ireland from Scotland and northern England that occurred after the conquest of the 16th century. And second, the earlier migration of Irish to Scotland, which resulted in the creation of the Dal Riata kingdom.
Additionally, the authors detect more admixture in several parts of Ireland from Norse than they had anticipated. The mixing of Scandinavians and Irish created a hybrid culture, the Norse-Gaels, which was highly influential around the Irish Sea. So it would not be exactly surprising if there was a greater Scandinavian contribution to Irish ancestry than had been anticipated.
Of greater interest to me is the impact of social-political institutions on the genetic structure or lack thereof. Both Britain and Ireland have homogenized modal clusters. In Britain, this is associated with the expanding cultural zone of Anglo-Saxon rule, and later became the core of England. In Ireland, it seems to be the Pale, where Anglo-Norman rule was dominant for many centuries. Rapid cultural change seems to induce a state of panmixia. Genetic distinctiveness in the British Isles seems to have persisted in populations which were geographically isolated, or politically insulated, from expansive, assimilative, and integrative cultures. The modal cluster in Ireland is far smaller than in England, which nicely correlates with the much more limited impact of the Anglo-Norman ascendency of the medieval period.
When I was a kid “killer bees” were a major pop culture thing. There were movies about the bees, and we would get updates about their march northward in the news. They were a cautionary tale of our species’ hubris.
Today we have a little bit more perspective. These bees were actually just African honeybees, the ancestral population to European honeybees, which were introduced to the New World with Europeans centuries earlier than the African honeybees. African honeybees were not that different from European honeybees, but they were more aggressive and tended to outcompete European honeybee colonies. They are a major problem for the beekeeping industry, but not a major threat to human life.
Today the African and European populations in the United States seem to have stabilized in their ranges, with a hybrid zone between them. African bee’s migratory behavior makes them less competitive with European bees in colder climates.
Highly aggressive Africanized honeybees (AHB) invaded Puerto Rico (PR) in 1994, displacing gentle European honeybees (EHB) in many locations. Gentle AHB (gAHB), unknown anywhere else in the world, subsequently evolved on the island within a few generations. Here we sequence whole genomes from gAHB and EHB populations, as well as a North American AHB population, a likely source of the founder AHB on PR. We show that gAHB retains high levels of genetic diversity after evolution of gentle behaviour, despite selection on standing variation. We observe multiple genomic loci with significant signatures of selection. Rapid evolution during colonization of novel habitats can generate major changes to characteristics such as morphological or colouration traits, usually controlled by one or more major genetic loci. Here we describe a soft selective sweep, acting at multiple loci across the genome, that occurred during, and may have mediated, the rapid evolution of a behavioural trait.
Come for the bees, but stay for the soft selection! If you talk to anyone in evolutionary and population genomics you know that the future is in understanding patterns of soft selection and polygenic selection from standing variation. Though these are related phenomena which are associated with each other, all are all distinct.
Standing variation just refers to the diversity which is segregating in the population at any given time. At any given moment many loci exhibit polymorphism. This polymorphism can be a target of natural selection if it is correlated with heritable variation and differentials in fitness. Though soft selection can be quite wooly it’s inverse, hard selection, is clear: in genetic terms hard selection can be seen in allele frequency changes at a single variant in a locus, going from the point where it is a novel mutation to nearly fixed in the population. In Haldane’s original conception hard selection involved excess deaths, and imposed a limit on the rate of evolution as well as the amount variation you could expect within a given population. This model was convenient in the pre-genomic and early genomic era because empirical selection tests had to focus on large allele frequency changes around singular loci. Researchers didn’t have large numbers of whole-genome samples available (nor the computational ability to analyze them).
Today this is not a limitation. In the analysis above the authors had 30 individuals of the 3 populations sequenced at high quality (20x). They ended up with millions of genetic variants they could analyze.
The plot to the left shows that “gentle African honeybees” (gAHB) tend to be closer to the African honeybee populations (AHB) overall (though with some hybridization with European honeybees, EHB). This is not surprising.
But the key observation was that over 12 generations the African honeybees of Puerto Rico became progressively less aggressive, despite maintaining overall morphological similarities to the mainland Mexican African bees from which they likely derive. Though buried in the discussion, there is a rationale for why this morphological change may have occurred: the Puerto Rican bees are subject to a lot of negative selection against aggression because of the density of the island, as well as the reality that aside from humans there aren’t other many species where their aggressive tendencies are beneficial. Basically, if you are an aggressive colony, it’s harder to make a go in densely settled areas (the implication here then is that there are probably “gentle” African honeybee populations across Latin America, they just are never disaggregated from the broader meta-population).
It’s the genomics where the real evolutionary insight comes in: they found that there were multiple soft sweep events around genetic regions implicated in behavior. In their overall genome the gAHB of Puerto Rico resembled mainland AHB, but in this subset of genetic loci they resembled EHB. Many of these loci had also been known to be targets of selection when the original European bee population diverged from the ancestral African population. Basically this is a genomic illustration of convergent evolution.
Regular readers of this blog will recognize the ways they detected selection. They used a modified form of EHH, which is reasonable since the selection event was recent enough to have been associated with distinct haplotype blocks. Also, standard Fst analysis showed that these were outliers in relation to the broader genetic pattern of relatedness (these loci were more like EHB than AHB, while most loci were more like AHB than EHB).
So this a form of polygenic selection. Remember, natural selection only knows genes through the phenotype (with intra-genomic selection being an exception). A behavior like aggression is probably subject to the fourth law of behavior genetics. That is, variation won’t be defined around a single genetic locus. Rather, variation across the genome will be correlated with variation in the phenotype. As selection favors a particular value of the phenotype across the distribution the allele frequencies across many genetic loci will shift, but they will not necessarily fix. Polygenic selection operates on the dispersed standing genetic variation which explains much of the variation of the phenotype in question. Instead of total sweeps to fixation due to large fitness differences between a given allele and its alternative form, the selection impact is distributed and diffused across the genome.
Though most of the genetic variants seem to recapitulate the evolution of the less aggressive phenotype that occurred with the original migration north of African honeybees, some of the selection signatures were novel. This points to the reality that when you have soft selection on standing variation you may have similar phenotypes which evolve via different means. Additionally, the authors noted that these results were in contrast to controlled breeding experiments in mammals where selection for gentility (“domestication”) often targeted a few loci and exhibited strong pleiotropic effects (due to the genetic correlation). These results point to the limitations of inferences made from human-directed selection.
Soft selection is probably ubiquitous. Consider the evolution of skin color in humans. There are lots of variants and lots of variation, and most of the variation seems to be ancestral. Only at the locus SLC24A5 do you have a perfect illustration of a hard selective sweep, probably from a de novo mutation that emerged around the Last Glacial Maximum.
From a geneticists’ perspective evolution is basically conceived of as changes in allele frequencies over time. Much of this is due to natural selection. Now that the world of soft selection is opening up, I suspect that we’ll understand a lot more of what we see around us, at least in the generality.
I would recommend this preprint to two large groups of my readers. There are those with strong computational skills who are curious about biology. It makes it clear why population genomics benefits from machine learning methods. Second, those who are interested or trained in genetics with less of a computational and pop gen background.
Yes, all models are wrong. But some give insight, and some are just not salvageable. In population genomics some of the model-building is obviously starting to yield really fragile results.
If you are working on phylogenetic questions on a coarse evolutionary scale (that is, “macroevolutionary,” though I know some evolutionary geneticists will shoot me the evil eye for using that word) generating a tree of relationships is quite informative and relatively straightforward, since it has a comprehensible mapping onto to what really occurred in nature. When your samples are different enough that the biological species concept works well and gene flow doesn’t occur between node, then a tree is a tree (one reason Y and mtDNA results are so easy to communicate to the general public in personal genomics).
Everything becomes more problematic when you are working on a finer phylogenetic scale (or in taxa where inter-species gene flow is common, as is often the case with plants). And I’m using problematic here in the way that denotes a genuine substantive analytic issue, as opposed to connoting something that one has moral or ethical objections to.
It is intuitively clear that there is often genetic population structure within species, but how to summarize and represent that variant is not a straightforward task.
In 2000 the paper Inference of Population Structure Using Multilocus Genotype Data in Genetics introduced the sort of model-based clustering most famously implemented with Structure. The paper illustrates limitations with the neighbor-joining tree methods which were in vogue at the time, and contrasts them with a method which defines a finite set of populations and assigns proportions of each putative group to various individuals.
The model-based methods were implemented in numerous packages over the 2000s, and today they’re pretty standard parts of the phylogenetic and population genetic toolkits. The reason for their popularity is obvious: they are quite often clear and unambiguous in their results. This may be one reason that they emerged to complement more visualization methods like PCA and MDS with fewer a priori assumptions.
But of course, crisp clarity is not always reality. Sometimes nature is fuzzy and messy. The model-based methods take inputs and will produce crisp results, even if those results are not biologically realistic. They can’t be utilized in a robotic manner without attention to the assumptions and limitations (see A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots).
A classic problem in population genetics is the characterization of discrete population structure in the presence of continuous patterns of genetic differentiation. Especially when sampling is discontinuous, the use of clustering or assignment methods may incorrectly ascribe differentiation due to continuous processes (e.g., geographic isolation by distance) to discrete processes, such as geographic, ecological, or reproductive barriers between populations. This reflects a shortcoming of current methods for inferring and visualizing population structure when applied to genetic data deriving from geographically distributed populations. Here, we present a statistical framework for the simultaneous inference of continuous and discrete patterns of population structure….
The whole preprint should be read for anyone interested in phylogenomic inference, as there is extensive discussion and attention to many problems and missteps that occur when researchers attempt to analyze variation and relationships across a species’ range. Basically, the sort of thing that might be mentioned in peer review feedback, but isn’t likely to be included in any final write-ups.
As noted in the abstract the major issue being addressed here is the problem that many clustering methods do not include within their model the reality that genetic variation within a species may be present due to continuous gene flow defined by isolation by distance dynamics. This goes back to the old “clines vs. clusters” debates. Many of the model-based methods assume pulse admixtures between population clusters which are random mating. This is not a terrible assumption when you consider perhaps what occurred in the New World when Europeans came in contact with the native populations and introduced Africans. But it is not so realistic when it comes to the North European plain, which seems to have become genetically differentiated only within the last ~5,000 years, and likely seen extensive gene flow.
The figure below shows the results from the conStruct method (left), and the more traditional fastStructure (right):
There are limitations to the spatial model they use (e.g., ring species), but that’s true of any model. The key is that it’s a good first step to account for continuous gene flow, and not shoehorning all variation into pulse admixtures.
Though in beta, the R package is already available on github (easy enough to download and install). I’ll probably have more comment when I test drive it myself….
* I am friendly with the authors of this paper, so I am also aware of their long-held concerns about the limitations and/or abuses of some phylogenetic methods. These concerns are broadly shared within the field.
What does “strong genetic structure” mean? Basically Fst is showing the proportion of genetic variation which is partitioned between groups. Intuitively it is easy to understand, in that if ~1% of the genetic variation is partitioned between groups in one case, and ~10% in another, then it is reasonable to suppose that the genetic distance between groups in the second case is larger than in the first case. On a continental scale Fst between populations is often on the order of ~0.10. That is the value for example when you pool the variation amongst Northern Europeans and Chinese, and assess how much of it can be apportioned in a manner which differentiates populations (so it’s about ~10% of the variation).
This is why ancient DNA results which reported that Mesolithic hunter-gatherers and Neolithic farmers in Central Europe who coexisted in rough proximity for thousands of years exhibited differences on the order of ~0.10 elicited surprise. These are values we are now expecting from continental-scale comparisons. Perhaps an appropriate analogy might be the coexistence of Pygmy groups and Bantu agriculturalists? Though there is some gene flow, the two populations exist in symbiosis and exhibit local ecological segregation.
In PNG continental scale Fst values are also seen among indigenous people. The differences between the peoples who live in the highlands and lowlands of PNG are equivalent to those between huge regions of Eurasia. This is not entirely surprising because there has been non-trivial gene flow into lowland populations from Austronesian groups, such as the Lapita culture. Many lowland groups even speak Austronesian languages today.
Using standard ADMIXTURE analysis the paper shows that many lowland groups have significant East Asian ancestry (red), while none of the highland groups do (some individuals with East Asian admixture seem to be due to very recent gene flow). But even within the highlands the genetic differences are striking. The Fst values between Finns and Southern European groups such as Spaniards are very high in a European context (due to Finnish Siberian ancestry as well as drift through a bottleneck), but most comparisons within the highland groups in PNG still exceeds this.
The paper also argues that genetic differences between Papuans and the natives of Australia pre-date the rising sea levels at the beginning of the Holocene, when Sahul divided between its various constituents. This is not entirely surprising considering that the ecology of the highlands during the Pleistocene would have been considerably different from Australia to the south, resulting in sharp differences in the hunter-gatherer lifestyles. Additionally, there does not seem to have been a genetic cline. Papuans are symmetrically related to all Australian groups they had samples from.
Using coalescence-based genomic methods they inferred that separation between highlands and some lowland groups occurred ~10-20,000 years ago. That is, after the Last Glacial Maximum. For the highlands, the differences seem to date to within the last 10,000 years. The Holocene. Additionally, they see population increases in the highlands, correlating with the shift to agriculture (cultivation of taro).
None of the above is entirely surprising, though I would take the date inferences with a grain of salt. The key is to observe that large genetic differences, as well as cultural differences, accrued in the highlands of PNG during the Holocene. In the paper they have a social and cultural explanation for what’s going on:
Fst values in PNG fall between those of hunter-gatherers and present-day populations of west Eurasia, suggesting that a transition to cultivation alone does not necessarily lead to genetic homogenization.
A key difference might be that PNG had no Bronze Age, which in west Eurasia was driven by an expansion of herders and led to massive population replacement, admixture, and cultural and linguistic change (7, 8), or Iron Age such as that linked to the expansion of Bantu-speaking
farmers in Africa (24). Such cultural events have resulted in rapid Y-chromosome lineage expansions due to increased male reproductive variance (25), but we consistently find no evidence for this in PNG (fig. S13). Thus, in PNG, wemay be seeing the genetic, linguistic, and cultural diversity that sedentary human societies can achieve in the absence of massive technology-driven expansions.
Peter Turchin in books like Ultrasociety has aruged that one of the theses in Steven Pinker’s The Better Angels of Our Nature is incorrect: that violence has not decreased monotonically, but peaked in less complex agricultural societies. PNG is clearly a case of this, as endemic warfare was a feature of highland societies when they encountered Europeans. Lawrence Keeley’s War Before Civilization: The Myth of the Peaceful Savage gives so much attention to highland PNG because it is a contemporary illustration of a Neolithic society which until recently had not developed state-level institutions.
What papers like these are showing is that cultural and anthropological dynamics strongly shape the nature of genetic variation among humans. Simple models which assume as a null hypothesis that gene flow occurs through diffusion processes across a landscape where only geographic obstacles are relevant simply do not capture enough of the dynamic. Human cultures strongly shape the nature of interactions, and therefore the genetic variation we see around us.