Thursday, May 11, 2006

How reliable are empirical genomic scans for selective sweeps?   posted by the @ 5/11/2006 01:06:00 AM

Kosuke M. Teshima, Graham Coop and Molly Przeworski from the Department of Human Genetics at University of Chicago, home to both the Pritchard and Lahn labs, present an analysis of the effectiveness of empirical scans for selective sweeps, such as those from the Moysiz and Pritchard labs.


The beneficial substitution of an allele shapes patterns of genetic variation at linked sites. Thus, in principle, adaptations can be mapped by looking for the signature of directional selection in polymorphism data. In practice, such efforts are hampered by the need for an accurate characterization of the demographic history of the species and of the effects of positive selection. In an attempt to circumvent these difficulties, researchers are increasingly taking a purely empirical approach, in which a large number of genomic regions are ordered by summaries of the polymorphism data, and loci with extreme values are considered to be likely targets of positive selection. We evaluated the reliability of the "empirical" approach, focusing on applications to human data and to maize. To do so, we considered a coalescent model of directional selection in a sensible demographic setting, allowing for selection on standing variation as well as on a new mutation. Our simulations suggest that while empirical approaches will identify several interesting candidates, they will also miss many—in some cases, most—loci of interest. The extent of the trade-off depends on the mode of positive selection and the demographic history of the population. Specifically, the false-positive rate is higher when directional selection involves a recessive rather than a co-dominant allele, when it acts on a previously neutral rather than a new allele, and when the population has experienced a population bottleneck rather than maintained a constant size. One implication of these results is that, insofar as attributes of the beneficial mutation (e.g., the dominance coefficient) affect the power to detect targets of selection, genomic scans will yield an unrepresentative subset of loci that contribute to adaptations.

They spell out the implications of their findings pretty clearly at the end of the paper:

The ability to detect recent directional selection from polymorphism data depends on the recombination environment of the selected site (Supplemental Fig. S5), the dominance coefficient of the favorable allele (Fig. 5), the selection coefficient of the favorable allele (Supplemental Fig. S4), and whether the allele was favored from introduction or not (Figs. 3 and 5). Thus, if a candidate region does not stand out in empirical comparisons, it may be that there is little power to detect the mode of selection acting on it. This possibility has important implications for the interpretation of genomic scans of polymorphism data. Ultimately, we would like to use the results of genome scans to make inferences about which phenotypes were recently selected, and how selective pressures differ between populations. But we know that phenotypes differ in their genetic architectures, and thus the power to detect selection on different phenotypes may vary considerably. This raises the possibility that biological processes (e.g., GO categories) picked up in genomic scans are not those on which there was the most selection but those on which selection tended to act on new, co-dominant mutations in regions of low recombination.