Saturday, December 15, 2007
In a previous post, I made the case that the evidence in Hawks et al. (2007)[pdf] should not convince you that human adaptive evolution is accelerating. This is a follow-up (again fairly technical) to that post. Again, I'll reiterate that I find the theory large convincing. If that's all you want to hear, don't keep reading. Otherwise, below the fold I have some additional comments and respond to John Hawks's answers to my critiques.
1. It has been pointed out that the test for selection used in Hawks et al. appears to have been used on all the individuals in the HapMap. People familiar with the HapMap will know that the European and African samples are in 30 trios--ie. two parents and one child. This provides excellent accuracy for phasing the parents, however there are only 60 independent individuals per population. The genotypes of the children are simply reshufflings of the parents. Both Wang et al. and Hawks et al. refer to "90" individuals in both the European and African populations. If it is true (as it appears to be, though I'm sure I'll be corrected if it's not) that all 90 individuals were used in the analysis, this is potentially a major problem. Think about it this way-- the test for selection is based on linkage disequilibrium structure, which is the correlation between alleles at nearby loci. Now if you include related individuals, you introduce correlation simply due to the fact that 1/3 of the individuals are rearrangements of the other two-thirds. Allele frequencies, for similar reasons, are also obviously biased. I'm not sure exactly how this would affect the results, but it's a highly non-standard analysis, and the burden of proof is on the authors to test whether it's legitimate. I have my doubts, and find it quite plausible that many (most?) of the "selection events" detected in this type of analysis are not selection at all, but rather something having to do with the structured nature of the population.
2. Popgen Ramblings has a nice post explaining how one could provide support for the acceleration hypothesis through simulations. I agree. For example, let's look at Figure 3 from Hawks et al. This figure purports to show the expected age distribution of selected variants under the null hypothesis of a constant rate of adaptive evolution and under the alternative of an acceleration. Clearly, the "true" age distribution looks much more like the distribution expected under acceleration. But how realistic is that null distribution? That is, one could simulate, under certain demographic parameters, a fixed number of selected alleles arising 80000 years ago, 70000 years ago, etc, up to the present day, conditioning on the present allele frequency being in the frequency range of the LDD statistic. If one were to plot the fraction of those selective events that are detected as a function of the age, that would be something of an approximation to a real null distribution. And what would that distribution look like? Well, no one can know until it's actually done, but I'm a betting man, and I'd wager large sums of money that it would look a lot like the "alternative" hypothesis (the "demographic model") shown in this figure.
3. Some excerpts from John Hawks's response to my previous comments are in italics, followed by my thoughts:
we won't detect just any recent things -- in fact, we will not be able to detect recent things that are weakly selected. By contrast, we should detect older things that are weakly selected, but we will never detect older things that were strongly selected -- they're the ones that are fixed now.
The part I've bolded has not been demonstrated, and I find it unlikely to be true. Remember, LD decays with time, so there should be little signal around old selected variants. Again, simulations could address this.
In theory, strongly selected mutations ought to be vanishingly rare. In fact, they ought to be exponentially rarer than weakly selected mutations. That doesn't mean the theory has to be right, but it does mean we need some kind of explanation if we find that weakly selected things are rare, and strongly selected ones are common -- I mean, R. A. Fisher was wrong sometimes, but I'm not going out on a limb on this one.
Acceleration can explain this reversal
A more parsimonious explanation for this "reversal" is again statistical power. Statistical power absolutely, obviously varies with selection coefficient-- this test is going to detect things that have been strongly selected (if it detects selection at all; see above), and not things that have been weakly selected. So even if the age distribution of selected alleles isn't a statistical artefact, this "reversal" clearly is (though I suppose I could be proved wrong, again with simulations).
Strikingly, we found that increasing the SNP density in the new HapMap made very little difference to the number of selected variants estimated for the CEU sample -- we believe this is because we are finding basically everything there for the method to find. This leaves significant limits -- for instance, the limited frequency window we used. But we don't think we are missing lots of selection in high-recombination regions.
The reason that SNP density made little difference in the CEU population is that there is extensive LD in that population, and the phase I data were sufficient to characterize that LD. The test takes LD as a parameter, so if you already had a good estimation of this parameter, increased information doesn't help. The inference that this thus means the test isn't missing selection in high-recombination regions simply does not follow--that is a property of the test statistic that has not been demonstrated. One could simulate fully ascertained data in regions of varying recombination rate and test this. To my knowledge, this has not been done.
Recent genetic drift including founder effects would affect all genomic regions equally, but the candidate selected genes occur predominantly in genic regions, and preferentially include genes in functional classes that are plausible targets for recent adaptive changes. Selection is the only explanation consistent with all these features.
It is well-known that different functional classes of genes (and different parts of the genome) vary systematically in recombination rate, LD structure, gene length, and many other metrics. A change in power along one of these axes could equally lead to this observation, without the need to invoke natural selection. This alone is not evidence that the test is detecting anything (though it's true it provides some evidence).
In other words, our tests of acceleration do not depend very finely on the ascertainment of these alleles
Like I said, I find the theory solid. However, the test for acceleration does depend on the ascertainment of these alleles to a certain extent. Neither I nor John Hawks has any idea if 5%, 50% or 0% of these "selective events" are real. This is a problem.