So merfolk are a real thing now: adaptation to diving

When Rasmus Nielsen presented preliminary work on diving adaptations a few years ago at ASHG I really didn’t know what to think. To be honest it seemed kind of crazy. Everyone was freaking out over it…and I guess I should have. But it just seemed so strange I couldn’t process it. High altitude adaptations, I understood. But underwater adaptations?

The paper is out now, and open access, Physiological and Genetic Adaptations to Diving in Sea Nomads. There are a lot of moving parts in it, so I really recommend Carl Zimmer’s piece, Bodies Remodeled for a Life at Sea:

On Thursday in the journal Cell, a team of researchers reported a new kind of adaptation — not to air or to food, but to the ocean. A group of sea-dwelling people in Southeast Asia have evolved into better divers.

When Dr. Ilardo compared scans from the two villages, she found a stark difference. The Bajau had spleens about 50 percent bigger on average than those of the Saluan.

Only some Bajau are full-time divers. Others, such as teachers and shopkeepers, have never dived. But they, too, had large spleens, Dr. Ilardo found. It was likely the Bajau are born that way, thanks to their genes.

A number of genetic variants have become unusually common in the Bajau, she found. The only plausible way for this to happen is natural selection: the Bajau with those variants had more descendants than those who lacked them.

As some of you might know “sea nomads” are common across much of Southeast Asia. The Bajau are just one major group. The anthropology here is not surprising…but the biology most definitely is. For various technical reasons, the authors didn’t have extremely fine-grained genome data (high coverage sequence data, or very high-density chips). So they didn’t do some haplotype-based tests (e.g., iHS), though that might not matter anyhow (see below why). But, looking at the genome-wide relatedness and comparing that to makers which deviated from that expectation, both of which they could do robustly, the authors narrowed in on candidates for targets of selection. From the paper: “Remarkably, the top hit of our selection scan (Table 1) is SNP rs7158863, located just upstream of BDKRB2, the only gene thus far suggested to be associated with the diving response in humans.

There are many cases where researchers find selection signals in an ORF of unknown function. In this case, the top hit happens to be exactly in light with the biological characteristic you’re already curious about. The alignment is so good it’s hard to believe.

But wait, there’s more! Spleen size variation is not due to variation on just one locus. It’s polygenic, albeit probably dominated by larger effect quantitative trait loci (QTLs) than something like height (so more like skin color). They compared the Bajau to a nearby population, the Saluan, as well as Han Chinese as an outgroup. On the whole the distribution of allele frequency differences should reflect the phylogeny (Han(Bajau, Saluan)). The key is to look for cases where the Bajau are the outgroup. From the paper:

While some of the selection signals uniquely present in the Bajau may be related to other environmental factors, such as the pathogens, several of the other top hits also fall in candidate genes associated with traits of possible importance for diving. Examples include FAM178B, which encodes a protein that forms a stable complex with carbonic anhydrase, the primary enzyme responsible for maintaining carbon dioxide/bicarbonate balance, thereby helping maintain the pH of the blood….

FAM1788 shows up again later:

We identified one region overlapping chr2:97627143, which falls in the gene FAM178B, that falls in the 99% quantile of the genome-wide distribution for the fD statistic (Martin et al., 2015). Of the populations considered, this region exclusively stands out in the Bajau, and the signal appears strongest when using Denisova as source. Notably, this region was also proposed as a candidate for Denisovan introgression in Oceanic populations by….

What they’re saying here is that the allele at this locus adapted to diving may have come originally from the Denisovans! Remember, we already know that one of the Tibetan high altitude adaptations come from the Denisovans. So this isn’t surprising, but it is pretty cool. But most of the other hits don’t seem to be introgressed. That is, they come from modern humans (or have been segregating in our species for a long, long, time).

Many of the alleles found at high frequencies in the Bajau are found in other populations, just as very low frequencies. This implies that selection is operating on standing variation. Another suggestion that this is so is that the widths of the regions of the genome impacted by selection seem rather narrow. In contrast, the Eurasian adaptation to lactose digestion is from a de novo mutation, something that wasn’t at high frequency at all in the ancestral human populations. The sweep is strong and powerful around that single mutation, and huge swaps of the genome around it “hitchhiked” along so that on a population-wide level the area around the mutational target was homogenized (basically, a lot of one single original mutant human is found around that causal variant for lactase persistence).

Anyone who has learned basic quantitative genetics knows that one way to change a mean trait value is just to change the allele frequencies at a lot of different loci…over time you’ll have a lot of low-frequency alleles present in an individual which would otherwise never have occurred. Eventually, you can have a median value which is outside of the range of the original distribution. The mechanism here in a dynamic sense seems totally comprehensible, though as Carl Zimmer notes, and the rather short-shrift given in the Cell paper suggest, they’re not sure in a proximate sense how the selection is working (i.e., obviously there is a fitness implication but how does it manifest? Do people die? Are they unable to support a family?).

One key issue is to consider the demographic history of these people. The authors tried to model it genetically:

We found a model compatible with the data that has a divergence time of ∼16 kya, with subsequent high migration from Bajau to Saluan and low migration from Saluan to Bajau (for details see STAR Methods). We note that the estimate of 16 kya may reflect the divergence of old admixture components shared in different proportions by the Saluan and the Bajau, similarly to, for example, European populations being closely related to each other but differing in the proportion of ancient admixture components….

The authors cite papers which outline the real story about what happened, so they know that the model is somewhat unrealistic. For example, Ancient genomes document multiple waves of migration in Southeast Asian prehistory:

Southeast Asia is home to rich human genetic and linguistic diversity, but the details of past population movements in the region are not well known. Here, we report genome-wide ancient DNA data from thirteen Southeast Asian individuals spanning from the Neolithic period through the Iron Age (4100-1700 years ago). Early agriculturalists from Man Bac in Vietnam possessed a mixture of East Asian (southern Chinese farmer) and deeply diverged eastern Eurasian (hunter-gatherer) ancestry characteristic of Austroasiatic speakers, with similar ancestry as far south as Indonesia providing evidence for an expansive initial spread of Austroasiatic languages. In a striking parallel with Europe, later sites from across the region show closer connections to present-day majority groups, reflecting a second major influx of migrants by the time of the Bronze Age.

The upshot is that the predominant genetic character of Southeast Asia dates to the Neolithic, and to a great extent even more recently. The deep divergence between two Austronesian groups may be an artifact of drift in one group (probably the Bajau), or different proportions of admixture from the primary ancestral components in maritime Southeast Asia: Austronesian, Austro-Asiatic, and indigenous hunter-gatherer. As per Lipson 2014 the Bajau are probably mostly Austronesian but may have Negrito ancestry from the Phillippines, as well as indigenous hunter-gatherer more closely related to Malaysian Negritos. There probably isn’t so much Austro-Asiatic in Sulawesi, but I’d bet the farmers have more of that.

Ultimately the question here is are the adaptations to diving old or new? Anthropologists and historians have all sorts of theories, as reported in the Carl Zimmer article and hinted at in the paper. My own bet is that they are both old and new. By this, I mean that some sort of maritime lifestyle was surely practiced by indigenous people between the end of the last Ice Age and the arrival of farmers. But if the variation was present in humans more generally, the Austronesians would probably also have the capacity for the diving adaptations. Mixing with hunter-gatherers and another bout of selection could have done the trick in concert. So the adaptations and lifestyle are old, but the Bajau people may date to the last 2,000 years, and selection within this population may be that recent.

A lot of the answer might be found in looking at the other sea nomad groups….

Natural selection in humans (OK, 375,000 British people)


The above figure is from Evidence of directional and stabilizing selection in contemporary humans. I’ll be entirely honest with you: I don’t read every UK Biobank paper, but I do read those where Peter Visscher is a co-author. It’s in PNAS, and a draft which is not open access. But it’s a pretty interesting read. Nothing too revolutionary, but confirms some intuitions one might have.

The abstract:

Modern molecular genetic datasets, primarily collected to study the biology of human health and disease, can be used to directly measure the action of natural selection and reveal important features of contemporary human evolution. Here we leverage the UK Biobank data to test for the presence of linear and nonlinear natural selection in a contemporary population of the United Kingdom. We obtain phenotypic and genetic evidence consistent with the action of linear/directional selection. Phenotypic evidence suggests that stabilizing selection, which acts to reduce variance in the population without necessarily modifying the population mean, is widespread and relatively weak in comparison with estimates from other species.

The stabilizing selection part is probably the most interesting part for me. But let’s hold up for a moment, and review some of the major findings. The authors focused on ~375,000 samples which matched their criteria (white British individuals old enough that they are well past their reproductive peak), and the genotyping platforms had 500,000 markers. The dependent variable they’re looking at is reproductive fitness. In this case specifically, “rRLS”, or relative reproductive lifetime success.

With these huge data sets and the large number of measured phenotypes they first used the classical Lande and Arnold method to detect selection gradients, which leveraged regression to measure directional and stabilizing dynamics. Basically, how does change in the phenotype impact reproductive fitness? So, it is notable that shorter women have higher reproductive fitness than taller women (shorter than the median). This seems like a robust result. We’ve seen it before on much smaller sample sizes.

The results using phenotypic correlations for direction (β) and stabilizing (γ) selection are shown below separated by sex. The abbreviations are the same as above.


There are many cases where directional selection seems to operate in females, but not in males. But they note that that is often due to near zero non-significant results in males, not because there were opposing directions in selection. Height was the exception, with regression coefficients in opposite directions. For stabilizing selection there was no antagonistic trait.

A major finding was that compared to other organisms stabilizing selection was very weak in humans. There’s just not that that much pressure against extreme phenotypes. This isn’t entirely surprising. First, you have the issue of the weirdness of a lot of studies in animal models, with inbred lines, or wild populations selected for their salience. Second, prior theory suggests that a trait with lots of heritable quantitative variation, like height, shouldn’t be subject to that much selection. If it had, the genetic variation which was the raw material of the trait’s distribution wouldn’t be there.

Using more complex regression methods that take into account confounds, they pruned the list of significant hits. But, it is important to note that even at ~375,000, this sample size might be underpowered to detect really subtle dynamics. Additionally, the beauty of this study is that it added modern genomic analysis to the mix. Detecting selection through phenotypic analysis goes back decades, but interrogating the genetic basis of complex traits and their evolutionary dynamics is new.

To a first approximation, the results were broadly consonant across the two methods. But, there are interesting details where they differ. There is selection on height in females, but not in males. This implies that though empirically you see taller males with higher rLSR, the genetic variance that is affecting height isn’t correlated with rLSR, so selection isn’t occurring in this sex.

~375,000 may seem like a lot, but from talking to people who work in polygenic selection there is still statistical power to be gained by going into the millions (perhaps tens of millions?). These sorts of results are very preliminary but show the power of synthesizing classical quantitative genetic models and ways of thinking with modern genomics. And, it does have me wondering about how these methods will align with the sort of stuff I wrote about last year which detects recent selection on time depths of a few thousand years. The SDS method, for example, seems to be detecting selection for increasing height the world over…which I wonder is some artifact, because there’s a robust pattern of shorter women having higher fertility in studies going back decades.

Selection for pigmentation in Khoisan?

In the recent paper, Reconstructing Prehistoric African Population Structure, there was a section natural selection. Since my post on the paper was already very long I didn’t address this dynamic.

But now I want to highlight this section:

The functional category that displays the most extreme allele frequency differentiation between present day San and ancient southern Africans is ‘‘response to radiation’’ (Z = 3.3 compared to the genome-wide average). To control for the possibility that genes in this category show an inflated allele frequency differentiation in general, we computed the same statistic for the Mbuti central African rainforest hunter-gatherer group but found no evidence for selection affecting the response to radiation category.

We speculate that the signal for selection in the response to radiation category in the San could be due to exposure to sunlight associated with the life of the Khomani and Juj’hoan North people in the Kalahari Basin, which has become a refuge for hunter-gatherer populations in the last millenia due to encroachment by pastoralist and agriculturalist groups.

I’m a bit puzzled here, because the implication seems to be that the San populations are darker than they were in the past. And yet earlier this summer I saw a talk which strongly suggested that there was a selection in modern Bushman populations for the derived variant of SLC24A5, presumably introduced through admixture from East African populations with Eurasian admixture.

In comparison to their neighbors the San are quite light-skinned, so it’s a reasonable supposition that they have been subject to natural selection recently. The Hadza, in contrast, seem to have the same complexion as their Bantu neighbors.

So what’s point of demographic models which leave you scratching your head

There’s a new paper on Tibetan adaptation to high altitudes, Evolutionary history of Tibetans inferred from whole-genome sequencing. The focus of the paper is on the fact that more genes than have previously been analyzed seem to be the targets of natural selection. And I buy most of their analyses (not sure about the estimate of Denisovan ancestry being 0.4%…these sorts of things can be tricky).

But they fancy it up with a ∂a∂i model of population history, as well as using MSMC to account for gene flow. I don’t understand why they didn’t use something simpler like TreeMix, which can also handle more complex models. I guess because they wanted to focus on only a few populations?

Years ago I asked the developer of MSMC, Stephan Schiffels, if assuming an admixed population is not admixed might cause weird inferences. Why yes, it would. For example, admixed populations might show higher effective population since they’re pooling the histories of two separate populations. As for ∂a∂i, the model above leaves me literally scratching my head.

…predicted that the initial divergence between Han and Tibetan was much earlier, at 54kya (bootstrap 95% C.I 44 kya to 58 kya). However, for the first 45ky, the two populations maintained substantial gene flow (6.8×10-4 and 9.0×10-4 per generation per chromosome). After 9.4 kya (bootstrap 95% C.I 8.6 kya to 11.2 kya), the gene flow rate dramatically dropped (1.3×10-11 and 4×10-7 per generation per chromosome), which is consistent with the estimate from MSMC.

Mystifying. The separation between Chinese and Tibetans is pretty much immediately after modern humans arrive in East Asia. Then there’s a lot of reciprocal gene flow…which ends during the Holocene.

We’re being told here that there are two populations which persisted in some form for ~45,000 years. Is this believable? That these two populations maintained some sort of continuity, and, remained in close proximity to engage in gene flow. And then ~10,000 years ago the ancestors of the Tibetans separated from the ancestors of the modern Han Chinese.

The latter scenario I can imagine. It’s this ~45,000 year dance I’m confused by. If there is substantial gene flow between the two groups why did they keep enough distinctive drift to be separate populations?

With what we know about ancient DNA from Europe if we posited such a model for that continent we’d be way off. There’s been too many population turnovers. Is East Asia different? I’m moderately skeptical of that. I think perhaps researchers should be very aware of the limitations of ∂a∂i when it comes to fine-grained population genomic analyses.

Note: This is a cool paper, and this small section is not entirely relevant. Which is why I’m confused about it since it seems the weakest part of the analysis in terms of originality, and the least believable.