A week ago a very cool new preprint came out, Identifying loci under positive selection in complex population histories. It’s something that you can’t even imagine just ten years ago. The authors basically figure out ways to identify deviations of markers from expected allele frequency given a null neutral evolutionary model. The method is put first, which I really like, before getting to results or discussion. Additionally, they did a lot of simulation ahead of time. The sort of simulation that is really not possible before the sort of computational resources we have now.
Here’s the abstract:
Detailed modeling of a species’ history is of prime importance for understanding how natural selection operates over time. Most methods designed to detect positive selection along sequenced genomes, however, use simplified representations of past histories as null models of genetic drift. Here, we present the first method that can detect signatures of strong local adaptation across the genome using arbitrarily complex admixture graphs, which are typically used to describe the history of past divergence and admixture events among any number of populations. The method – called Graph-aware Retrieval of Selective Sweeps (GRoSS) – has good power to detect loci in the genome with strong evidence for past selective sweeps and can also identify which branch of the graph was most affected by the sweep. As evidence of its utility, we apply the method to bovine, codfish and human population genomic data containing multiple population panels related in complex ways. We find new candidate genes for important adaptive functions, including immunity and metabolism in under-studied human populations, as well as muscle mass, milk production and tameness in particular bovine breeds. We are also able to pinpoint the emergence of large regions of differentiation due to inversions in the history of Atlantic codfish.
On a related note in regards to selection, On the well-founded enthusiasm for soft sweeps in humans: a reply to Harris, Sackman, and Jensen. The authors are responding to a recent preprint criticizing their earlier work. The reason that it’s fascinating to me is that these sorts of arguments today are really concrete and not so theoretical. There’s a lot of data for analytic techinques to chew through, and computation has really transformed the possibilities.
A generation ago these sorts of debates would be a sequence of “you’re wrong!” vs. “no, you’re wrong!” Today the disputes involve a lot of data, and so have a reasonable chance of resolution.
The first preprint identifies the usual candidates in humans that you normally see, and expected targets in cattle and cod. Sure, that will given biologists more interested in mechanisms and pathways things to chew upon, but imagine once researchers have large numbers of genomes for thousands and thousands of species. Then they’ll be testing deviations from neutral allele frequencies across many trees, and getting a more general and abstract sense of the parameter that selection explores, conditional on particularities o evolutionary history.
This is why I’m excited about plans to sequence lots and lots of species.