Thursday, April 19, 2007
We've folowed much of the story of the gene ASPM here, largely because of the evidence that it has been under selection both along the lineage leading to humans and up to the current day. Evidence for the former has been provided by a number of studies, while evidence for the latter largely from a paper out of Bruce Lahn's lab. The conclusion that the locus is currently under selection was challenged (poorly) in a technical comment in Science last year, and now another technical comment challenges the conclusion for selection again.
This challenge, unlike the last one, cannot be lightly dismissed. A group led by David Reich, prior to the publication of the Lahn paper, had sequenced select parts of the ASPM gene and begun their own analyses of the gene. As they reported last year at a conference, they find no evidence for selection at the locus.
How could these two groups come to such different conclusions? It's perhaps worthwhile to highlight the major difference in the two methodologies-- Lahn's group determined the statistical significance of their data using coalescent simulations (that is, they simulated data under a number of different models and see if their actual data is an "outlier" when compared to the simulations), while Reich et al. prefer to compare their data to an empirical distribution (that is, they see if the area in question is an "outlier" compared to other genomic regions). Both of these methods have their problems, something worthwhile to keep in mind the next time you see a small p-value on a statistic. In particular, if one is unable to simulate the precise demographic history of a population, simulation will give biased p-values. Tests based on the empirical distribution generally don't have this issue, but instead assume that only a small proportion of the genome is under selection and that selected loci will be outliers. These are not just questionable assumptions; they have indeed been questioned, and the conclusions are not heartening.
The figure on the right is from a paper published in Genome Research that asks, appropriately enough, "How reliable are empirical genomic scans for selective sweeps?". It shows a color coded landscape for the false negative rate (the percentage of selected loci that will be missed in an empirical genome scan) for a population that has experienced a bottleneck (like Europeans) for different statistics and different assumptions about significance thresholds and the precentage of the genome under selection. Red areas are where the false negative rate is near 100%, fading to green at about 50% and finally blue (of which there isn't much in this figure) at 0%. As the authors conclude, "Our simulations suggest that while empirical approaches will identify several interesting candidates, they will also miss many--in some cases, most--loci of interest".
So it's certainly possible that this is a case study in the lack of power of empirical distributions to detect selected loci. Of course, maybe not. This analysis certainly does not lend support to the contention that ASPM is under selection, but nor does it eliminate it. It's clear that better tests for selection are needed. Lahn et al. have apparently decided not to respond to Reich et al.; they're likely thinking that this debate will not be resolved until those better tests are devised.