Substack cometh, and lo it is good. (Pricing)

It’s raining selective sweeps

A week ago a very cool new preprint came out, Identifying loci under positive selection in complex population histories. It’s something that you can’t even imagine just ten years ago. The authors basically figure out ways to identify deviations of markers from expected allele frequency given a null neutral evolutionary model. The method is put first, which I really like, before getting to results or discussion. Additionally, they did a lot of simulation ahead of time. The sort of simulation that is really not possible before the sort of computational resources we have now.

Here’s the abstract:

Detailed modeling of a species’ history is of prime importance for understanding how natural selection operates over time. Most methods designed to detect positive selection along sequenced genomes, however, use simplified representations of past histories as null models of genetic drift. Here, we present the first method that can detect signatures of strong local adaptation across the genome using arbitrarily complex admixture graphs, which are typically used to describe the history of past divergence and admixture events among any number of populations. The method – called Graph-aware Retrieval of Selective Sweeps (GRoSS) – has good power to detect loci in the genome with strong evidence for past selective sweeps and can also identify which branch of the graph was most affected by the sweep. As evidence of its utility, we apply the method to bovine, codfish and human population genomic data containing multiple population panels related in complex ways. We find new candidate genes for important adaptive functions, including immunity and metabolism in under-studied human populations, as well as muscle mass, milk production and tameness in particular bovine breeds. We are also able to pinpoint the emergence of large regions of differentiation due to inversions in the history of Atlantic codfish.

On a related note in regards to selection, On the well-founded enthusiasm for soft sweeps in humans: a reply to Harris, Sackman, and Jensen. The authors are responding to a recent preprint criticizing their earlier work. The reason that it’s fascinating to me is that these sorts of arguments today are really concrete and not so theoretical. There’s a lot of data for analytic techinques to chew through, and computation has really transformed the possibilities.

A generation ago these sorts of debates would be a sequence of “you’re wrong!” vs. “no, you’re wrong!” Today the disputes involve a lot of data, and so have a reasonable chance of resolution.

The first preprint identifies the usual candidates in humans that you normally see, and expected targets in cattle and cod. Sure, that will given biologists more interested in mechanisms and pathways things to chew upon, but imagine once researchers have large numbers of genomes for thousands and thousands of species. Then they’ll be testing deviations from neutral allele frequencies across many trees, and getting a more general and abstract sense of the parameter that selection explores, conditional on particularities o evolutionary history.

This is why I’m excited about plans to sequence lots and lots of species.

3 thoughts on “It’s raining selective sweeps

  1. I guess I have a slight grumble about “Identifying loci…” with their application of the method to humans.

    In Fig 3B they use a model where Europeans are an admixture of a lineage splitting from Sardinian that is basal to (Oceanian+East Asian+Native American) and a lineage splitting from Native American – but that’s totally incorrect; it’s not what happened at all…

    It seems like it would make more sense to use the models that are simplified but closer to truth; with Basal Eurasian split and then ENA split from ANE+WHG, and Europeans as recombinations of Basal+WHG+ANE and Native Americans as ANE+East Asian.

    I’m guessing the only reason they wouldn’t is because they need *some* unadmixed reference for their method to work, hence have to model Sardinian as unadmixed, though that doesn’t explain why they model Native American as unadmixed. That’s seems like a pretty major drawback, since we know that all populations are admixed on ancient references for which there are tight limits on the maximum data quality we’ll ever have.

    Makes me a skeptic that the European signals and Native American signals are correct. Seems like the signals European-w / w-x / Native-American-x (e.g. European on SLC45A2 or TLR) are probably not real, since the population history is not real and there never was an x or w in the sense that their model suggests. Perhaps this is not really a concern but I can’t see how it would not be.

    I might guess similar things might be going on with the models for cod and cows, but I’m going to have to nod on as if they’re correct (ala Gell-Mann), as I don’t know much about the population history.

  2. I mean, specifically the model Fig3B suggests that there was selection where mainland Europeans, who are admixed between Sardinians and Native Americans, had an episode of selection of SLC45A2, following admixture of Sardinians and Native American lineages, while SLC45A2 attained high frequencies in Sardinians through purely neutral means. I would guess this is to cope with Sardinians and mainland Europeans having high derived frequencies on SLC45A2 and Native Americans having 100% ancestral.

    But we actually know from adna that’s not what happened and that both mainland Europeans and Sardinians were subject to recent, strong selection on SLC45A2.

  3. Gregory Cochran saw things like this coming, as one can see from The 10,000 Year Explosion. As he put it, “Selection is the reason that horseshoe crabs can outlast mountain ranges.”

Comments are closed.