More notes on acceleration

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

In a previous post, I made the case that the evidence in Hawks et al. (2007)[pdf] should not convince you that human adaptive evolution is accelerating. This is a follow-up (again fairly technical) to that post. Again, I’ll reiterate that I find the theory large convincing. If that’s all you want to hear, don’t keep reading. Otherwise, below the fold I have some additional comments and respond to John Hawks’s answers to my critiques.

1. It has been pointed out that the test for selection used in Hawks et al. appears to have been used on all the individuals in the HapMap. People familiar with the HapMap will know that the European and African samples are in 30 trios–ie. two parents and one child. This provides excellent accuracy for phasing the parents, however there are only 60 independent individuals per population. The genotypes of the children are simply reshufflings of the parents. Both Wang et al. and Hawks et al. refer to “90″ individuals in both the European and African populations. If it is true (as it appears to be, though I’m sure I’ll be corrected if it’s not) that all 90 individuals were used in the analysis, this is potentially a major problem. Think about it this way– the test for selection is based on linkage disequilibrium structure, which is the correlation between alleles at nearby loci. Now if you include related individuals, you introduce correlation simply due to the fact that 1/3 of the individuals are rearrangements of the other two-thirds. Allele frequencies, for similar reasons, are also obviously biased. I’m not sure exactly how this would affect the results, but it’s a highly non-standard analysis, and the burden of proof is on the authors to test whether it’s legitimate. I have my doubts, and find it quite plausible that many (most?) of the “selection events” detected in this type of analysis are not selection at all, but rather something having to do with the structured nature of the population.

2. Popgen Ramblings has a nice post explaining how one could provide support for the acceleration hypothesis through simulations. I agree. For example, let’s look at Figure 3 from Hawks et al. This figure purports to show the expected age distribution of selected variants under the null hypothesis of a constant rate of adaptive evolution and under the alternative of an acceleration. Clearly, the “true” age distribution looks much more like the distribution expected under acceleration. But how realistic is that null distribution? That is, one could simulate, under certain demographic parameters, a fixed number of selected alleles arising 80000 years ago, 70000 years ago, etc, up to the present day, conditioning on the present allele frequency being in the frequency range of the LDD statistic. If one were to plot the fraction of those selective events that are detected as a function of the age, that would be something of an approximation to a real null distribution. And what would that distribution look like? Well, no one can know until it’s actually done, but I’m a betting man, and I’d wager large sums of money that it would look a lot like the “alternative” hypothesis (the “demographic model”) shown in this figure.

3. Some excerpts from John Hawks’s response to my previous comments are in italics, followed by my thoughts:

we won’t detect just any recent things — in fact, we will not be able to detect recent things that are weakly selected. By contrast, we should detect older things that are weakly selected, but we will never detect older things that were strongly selected — they’re the ones that are fixed now.

The part I’ve bolded has not been demonstrated, and I find it unlikely to be true. Remember, LD decays with time, so there should be little signal around old selected variants. Again, simulations could address this.

In theory, strongly selected mutations ought to be vanishingly rare. In fact, they ought to be exponentially rarer than weakly selected mutations. That doesn’t mean the theory has to be right, but it does mean we need some kind of explanation if we find that weakly selected things are rare, and strongly selected ones are common — I mean, R. A. Fisher was wrong sometimes, but I’m not going out on a limb on this one.

Acceleration can explain this reversal

A more parsimonious explanation for this “reversal” is again statistical power. Statistical power absolutely, obviously varies with selection coefficient– this test is going to detect things that have been strongly selected (if it detects selection at all; see above), and not things that have been weakly selected. So even if the age distribution of selected alleles isn’t a statistical artefact, this “reversal” clearly is (though I suppose I could be proved wrong, again with simulations).

Strikingly, we found that increasing the SNP density in the new HapMap made very little difference to the number of selected variants estimated for the CEU sample — we believe this is because we are finding basically everything there for the method to find. This leaves significant limits — for instance, the limited frequency window we used. But we don’t think we are missing lots of selection in high-recombination regions.

The reason that SNP density made little difference in the CEU population is that there is extensive LD in that population, and the phase I data were sufficient to characterize that LD. The test takes LD as a parameter, so if you already had a good estimation of this parameter, increased information doesn’t help. The inference that this thus means the test isn’t missing selection in high-recombination regions simply does not follow–that is a property of the test statistic that has not been demonstrated. One could simulate fully ascertained data in regions of varying recombination rate and test this. To my knowledge, this has not been done.

Recent genetic drift including founder effects would affect all genomic regions equally, but the candidate selected genes occur predominantly in genic regions, and preferentially include genes in functional classes that are plausible targets for recent adaptive changes. Selection is the only explanation consistent with all these features.

It is well-known that different functional classes of genes (and different parts of the genome) vary systematically in recombination rate, LD structure, gene length, and many other metrics. A change in power along one of these axes could equally lead to this observation, without the need to invoke natural selection. This alone is not evidence that the test is detecting anything (though it’s true it provides some evidence).

In other words, our tests of acceleration do not depend very finely on the ascertainment of these alleles

Like I said, I find the theory solid. However, the test for acceleration does depend on the ascertainment of these alleles to a certain extent. Neither I nor John Hawks has any idea if 5%, 50% or 0% of these “selective events” are real. This is a problem.

Labels:

29 Comments

  1. Recent genetic drift including founder effects would affect all genomic regions equally, but the candidate selected genes occur predominantly in genic regions, and preferentially include genes in functional classes that are plausible targets for recent adaptive changes. Selection is the only explanation consistent with all these features. 
     
    I’m not an expert on the HapMap data, but aren’t they enriched for SNPs in protein coding genes? Would this bias affect their results and conclusions?

  2. Like I said, I find the theory solid. However, the test for acceleration does depend on the ascertainment of these alleles to a certain extent. Neither I nor John Hawks has any idea if 5%, 50% or 0% of these “selective events” are real. This is a problem. 
     
    Now wait a minute. I can take honest questions and I’ve answered them quickly. 
     
    But this is just nonsense and you should know it. If you are going to make a claim like this, you need to provide some plausible neutral mechanism that would explain how an allele in any recombining region of the genome gets to a frequency of 30 percent while strongly linked to a 200 kilobase haplotype.  
     
    Because you will need such a mechanism to explain the recent peaks in those data.  
     
    I can believe that an occasional neutral haplotype makes it up past the threshold — at or above 22 percent in the sample, with strong linkage over many kilobases. But none of these will be in those recent peaks. If 95 percent of the data were like this, then they would all look like they were 40,000 or 60,000 years old. 
     
    I’m not going to argue about whether we have no false positives, or whether we might have 5 percent false positives. If you had some mechanism that you thought would make 20 percent false positives I would listen to you, but you’d have to work pretty hard to convince me.  
     
    I will say that we were concerned about false positives in the low-recombination regions near the centromeres. So we threw them out.  
     
    In contrast, I know many mechanisms that will result in false negatives — real selection we have missed. We aren’t finding selection for regions where more than one allele has been selected. We’re not finding it outside our ascertainment range. We’re likely to miss some at the oldest extreme of the time range.  
     
    These false negatives have one thing in common: if we could count them, they would make the evidence for acceleration stronger.

  3. Hawks: we won’t detect just any recent things — in fact, we will not be able to detect recent things that are weakly selected. By contrast, we should detect older things that are weakly selected, but we will never detect older things that were strongly selected — they’re the ones that are fixed now. 
     
    p-ter: The part I’ve bolded has not been demonstrated, and I find it unlikely to be true. Remember, LD decays with time, so there should be little signal around old selected variants. Again, simulations could address this.  
     
    The bolded part is obviously true. We can’t detect old, strongly selected events because they are fixed. That only leaves old, weakly selected events. In theory, these should be much more frequent than strongly selected alleles, unless there was an acceleration.  
     
    p-ter’s argument here is that we have missed such a large fraction of the old, weakly selected events that it has skewed the overall distribution.  
     
    1. This is directly contrary to his other argument, that we are counting so many false positives that it has skewed the distribution. The false positives would be old-looking. The contradiction doesn’t make the false negative argument wrong, but they can’t both bias the distribution.  
     
    2. If we’re missing lots of old events due to LD decay, then we should be missing more of them in the population with less LD. But we are finding more old events in Africans, by a factor of 2 or more. That’s consistent with demography, it’s inconsistent with ascertainment bias.  
     
    It also tends to show why this doesn’t follow:  
     
    The reason that SNP density made little difference in the CEU population is that there is extensive LD in that population, and the phase I data were sufficient to characterize that LD. The test takes LD as a parameter, so if you already had a good estimation of this parameter, increased information doesn’t help. The inference that this thus means the test isn’t missing selection in high-recombination regions simply does not follow–that is a property of the test statistic that has not been demonstrated. 
     
    We only know that the phase 1 data were sufficient because of our results using the phase 2 data. This certainly did not have to be true genome-wide, and it was a plausible hypothesis that the phase 1 results would miss selection in regions with faster LD decay. The current results test that hypothesis by applying a higher SNP density.  
     
    There’s nothing impossible about us missing events in high recombination regions. But I’m still unclear about why *missing* selection is going to *weaken* our results.

  4. It is well-known that different functional classes of genes (and different parts of the genome) vary systematically in recombination rate, LD structure, gene length, and many other metrics. A change in power along one of these axes could equally lead to this observation, without the need to invoke natural selection. This alone is not evidence that the test is detecting anything (though it’s true it provides some evidence).  
     
    I’ve bolded this part because it reflects a very pervasive bias, which I suspect p-ter doesn’t hold, but that most human geneticists do. What exactly is the purpose of the word “invoke” if not to reinforce the assumption that selection is unusual and strange?  
     
    I reject that formulation as sloppy wording. Selection happens. We have tested the hypothesis that it happened in humans at a constant rate. Disproof of that hypothesis is not “invoking” selection. Selection unquestionably has happened, it has unquestionably been very common, and our hypothesis concerns its rate over time.  
     
    Aside from that, you present a very interesting hypothesis for the concentration of recent selection in functional classes related to the adaptations for the agricultural revolution, without actually needing agriculture to have been invented.  
     
    It seems to me that this hypothesis is readily tested by looking for enrichment in functional categories in a non-agriculture-using population. Let’s take maize, for example:  
     
    Candidate selected genes with putative function in plant growth are clustered near quantitative trait loci that contribute to phenotypic differences between maize and teosinte. If we assume that our sample of genes is representative, 1200 genes throughout the maize genome have been affected by artificial selection. 
     
    (10.1126/science.1107891) 
     
    Looks a lot like people, except that the enriched genes are the ones related to the ecological change. Just like in humans.

  5. some plausible neutral mechanism that would explain how an allele in any recombining region of the genome gets to a frequency of 30 percent while strongly linked to a 200 kilobase haplotype. 
     
    what if it’s really a rare haplotype (which could plausibly be on a long background) and you’ve biased your allele frequencies and haplotype lengths upwards by including the children from the hapmap? or what if it’s a region of low recombination that passes through a population bottleneck? if you don’t think it can happen, show me the simulations.  
     
    p-ter’s argument here is that we have missed such a large fraction of the old, weakly selected events that it has skewed the overall distribution. 
     
    1. This is directly contrary to his other argument, that we are counting so many false positives that it has skewed the distribution. The false positives would be old-looking. The contradiction doesn’t make the false negative argument wrong, but they can’t both bias the distribution.
     
     
    assume, for the sake of argument, a constant low rate. now assume, again for the sake of argument, you have both a large false positive rate and no power to detect old sweeps. your false positives will all be recent, and you’ll have the exact age distribution you see.  
     
    If you had some mechanism that you thought would make 20 percent false positives I would listen to you, but you’d have to work pretty hard to convince me. 
     
    run the simulations. you might be surprised at the range of things neutral evolution can do with reasonable demographic models. the burden of proof is on you, not on anyone else.  
     
    The bolded part is obviously true. We can’t detect old, strongly selected events because they are fixed. That only leaves old, weakly selected events. 
     
    that’s my point. I doubt you can detect old, weakly selected events, because they’re weakly selected. nowhere have you provided evidence to support this.

  6. There’s nothing impossible about us missing events in high recombination regions. But I’m still unclear about why *missing* selection is going to *weaken* our results. 
     
    fair enough. If you assume N alleles are headed to fixation in humans (which assumes as well that selection remains constant on them in the future), then I do think you can argue for some N this is inconsistent with a constant rate of adaptation. I don’t believe your N, though, and I don’t think you should either.

  7. What exactly is the purpose of the word “invoke” if not to reinforce the assumption that selection is unusual and strange? 
     
    you have a set of genes that are enriched in some functional categories. you claim the enrichment is because they’re selected. it is also possible that the enrichment is because you’re pulling out classes of genes that have low recombination rates, for example.  
     
    claiming that some GO categories are related to the agricultural revolution is analogous to reading tea leaves. for most of these variants, the actual function is unknown. in the maize paper, there’s clustering near quantitative trait loci. QTLs, which are known to affect a certain trait, are very different than GO categories. if you saw clustering near association or linkage signals for important traits, I would absolutely believe you.

  8. Neither I nor John Hawks has any idea if 5%, 50% or 0% of these “selective events” are real. 
     
    ok, this was something of an exaggeration. the data probably isn’t 100% false positives :) 
     
    but you should want a ballpark estimate of what that false positive rate is–it’s important in your downstream analyses–and you don’t have it. I do think this is a problem, and I don’t understand why you guys didn’t try really hard to get a defensible figure.  
     
    unless you’re thinking along the lines I mentioned above–if there are N things going to fixation, there’s some N that is inconsistent with a constant rate, and you don’t really care about the precise number. But you have to understand, you actually make some pretty bold claims about N–if it’s not that important, why not just write that?

  9. but you should want a ballpark estimate of what that false positive rate is–it’s important in your downstream analyses–and you don’t have it. I do think this is a problem, and I don’t understand why you guys didn’t try really hard to get a defensible figure.  
     
    Of course we have an estimate of the false positive rate: Wang et al. (2006) did simulations to establish the threshold. You have expressed dissatisfaction with those simulations, but that doesn’t mean they don’t exist! 
     
    I understand that you don’t believe them. I don’t particularly expect you to change your mind; I’m sure you have good reasons I don’t know about. But you can’t say we don’t have a “defensible” figure from simulations!

  10. what if it’s really a rare haplotype (which could plausibly be on a long background) and you’ve biased your allele frequencies and haplotype lengths upwards by including the children from the hapmap? 
     
    The LDD test on these samples is only considering alleles over 22 percent. None of these are rare.  
     
    You are completely correct that if we were considering really rare haplotypes we would see long haplotypes once in a while. We are not.  
     
    Also, you understand that the LDD test is looking for differences between haplotypes in homozygotes. Including the kids doesn’t bias this analysis, unless they are clones.  
     
    You are quite correct that if we were using phased data this would be a bias. We are not.

  11. One other thing: we have a control on the total number of old strong events, because we can look for these in other ways. These fixed events should have strong depressions in heterozygosity around them. If they were common, they would be reducing heterozygosity genome-wide. Also, we can look for them by Fst between populations — fixed in one place, but not in others. We don’t find many that way, either.  
     
    There just aren’t that many old events.

  12. thanks for the plug p-ter. I’ve enjoyed reading your posts. I’ve posted again on the topic. I agree with your comments, especially that the burden of evidence lies with Hawks et al. The literature has plenty of models of human demography, which would provide a suitable starting place for simulations (though all of them have there critics, and rightly so).  
     
    John, you argue from the starting position that there is a lot of selection. This is a perfectly valid world view (and one that I’m tempted to share), but you can not start from that position if you want to convince people that there is a lot of selection (or that there is more than you would expect given historical population sizes).

  13. Of course we have an estimate of the false positive rate: Wang et al. (2006) did simulations to establish the threshold. 
     
    so what’s the false positive rate?  
     
    I know you’ve criticized people for using ridiculous demographic models to argue against selection, and I agree with you. But wang et al. didn’t simulate under *any* model at all–they did entirely model-free permutations of the data, which should largely remove both the effects of selection and demography from the data. why not use a legit model and simulate with that as the null? it’s not bulletproof, but if you vary the parameters within what is reasonable, it’s pretty solid.  
     
    this is especially true if you simulated with different recombination rates or allele ages. Your false positive rate varies with these things (you removed centromeric regions just for this reason), but you can correct for this if you have some estimate of how it varies.  
     
    Also, you understand that the LDD test is looking for differences between haplotypes in homozygotes. Including the kids doesn’t bias this analysis, unless they are clones. 
     
    so if you don’t include the kids, you get the same results? I’m genuinely curious–if parent A and parent B are homozygous for a SNP, and you then include the kid, you add another homozygote to the analysis, but that addition isn’t independent (it’s obviously one haplotype from the mother and one from the father, so in some sense each of the transmitted haplotypes is in the data twice). this makes me uncomfortable–am I completely off-base?

  14. The danger of including the children in the test is not only bias (in the statistical sense), but an increased variance of the test statistic. Which can inflate the false positive rate. Clearly including data from the parents again in the child (which has been done whether or not you use haplotypes) could well inflate the test (as you have less independent data than you think you do). Clearly children are not clones, but they do share one haplotype with each parent. 
     
    Now hopefully in the simulations of Wang et al, they formed trios from their simulated individuals before performing their test. Did they? If they did not then the cutoff is not even a suitable cutoff from their simulations, as the potential inflation of false positives (due to the including the children) is not accounted for.

  15. so if you don’t include the kids, you get the same results? I’m genuinely curious–if parent A and parent B are homozygous for a SNP, and you then include the kid, you add another homozygote to the analysis, but that addition isn’t independent (it’s obviously one haplotype from the mother and one from the father, so in some sense each of the transmitted haplotypes is in the data twice). this makes me uncomfortable–am I completely off-base? 
     
    The test has less power because there is a smaller sample. Some of the haplotypes are in twice, but each contrast is in only once. The test uses the contrasts. The contrasts are independent if the parents are random mating.  
     
    G is correct, this increases the variance of the statistic. The simultations were randomizations; if the parents were random mating this is appropriate. If they weren’t random mating (e.g., ten pairs from the same village), then the HapMap has more problems than ours!

  16. I have to apologize for being short — Greg says I’m playing the bad cop today. I didn’t realize you guys hadn’t read the earlier paper. Now it is all clear to me.  
     
    From Wang et al 2006: 
     
    An ALnLH of >0.71 (>2.6 SD, Fig. 3) was never observed in this extreme simulation model of population bottlenecks/admixture. Therefore, bottlenecks/admixture of a less extreme (and more likely) size for human populations will also never produce an ALnLH >2.6 SD from the average.  
     
    The estimate: zero false positives, even with extreme admixture and bottleneck models. It’s a conservative threshold. Like I said, I can imagine there are some number hiding in there, but false negatives are the more likely concern, for which we have stacked the deck conservatively.

  17. One nice thing about this issue and the discussion of it here, from a layman’s point of view, is that you know that if you work hard enough you can follow the argument and make some independent judgment. In contrast to one of Lubos Motl’s posts, for example, where ten lifetimes wouldn’t suffice. That’s one of the nice things about the frontiers of biological, or at least, genetic science.

  18. The contrasts are independent if the parents are random mating.  
     
    Sorry for butting in here, since this interesting discussion is way over my head, but that doesn’t sound right. The parents aren’t “random mating” because they are mating with other parents in the sample – any analysis of the parents shouldn’t include the children, and any analysis of the children shouldn’t include the parents. 
     
    In fact, the parents are probably mating with people from the same or nearby villages, so it isn’t random mating at all.

  19. Another possibility is to include one parent and one child. This doesn’t solve the problem that actual mating isn’t really random, but it does solve the problem that, at least theoretically, the parent can mate with anyone. 
     
    I hope I’m making myself clear – the problem is that you’re missing all the cases where matings introduce genes that didn’t happen to make it into you’re original sample of the parents, as would occur in a genuine random sample.

  20. The test has less power because there is a smaller sample 
     
    so the results are different if you only include the independent individuals? how different?

  21. Thanks John. What I meant with my question about the children was actually: are pseudo children formed by combining haplotypes from the simulated parents, before the analysis is run on the simulated data. If not then this ignores the structure present in the data due to the replication of haplotypes in the children. This could deflate the statistic in the simulations.

  22. It is reassuring that extreme bottlenecks do not give you false positives. But that does not mean that intermediate strength bottlenecks (or bottlenecks simulated under standard population genetic models) will not. There is no automatic guarantee of a monotonic relationship between the average (or its tails) of any population genetic summary statistic and the severity of a demographic event.  
     
    The LDD test of Wang et al, relies on contrasts of two allelic haplotype backgrounds. Overly severe bottlenecks simulations might reduce haplotype diversity to the point beyond which high LDD can be generated. 
     
    The simulations shown in Figure 6 of the supplementary material of wang et al do not look much like the real data. Now obviously this is in part because the real data has selected sweeps in it, and so there are potentially more extreme values. However, worrying the rest of the simulated distribution (below the line, around zero) looks unlike the real data. This potentially suggests that the simulations are potentially not appropriate. 
     
    Now obviously you are convinced by the simulations and so should think that there is a lot selection. However, I am skeptical of their relevance (though I?m sure that the LDD method detects some interesting regions) and so I am not convince that Hawks et al have shown that there is a lot/too much recent selection.

  23. The estimate: zero false positives, even with extreme admixture and bottleneck models. 
     
    this is a very, very strong claim. certainly you realize this? every data analysis in genomics has false positives–you’re testing 3.2 million hypotheses; sometimes you reject the null when you shouldn’t.  
     
    now, I think I’ve pointed to a couple issues in the simulations that might have misled you: 1. do you agree that randomization of SNPs along chromosomes could possibly remove the effects of both selection and demography from the data? 2. do you agree with G’s point that it probably would have been best to simulate data in trios for the CEU and YRI populations, since this is the data you have (even if you don’t think it would have an effect)? do you see any validity to these comments? 
     
    It would probably take a few days to do simulations, using standard population genetics tools, to address these issues. why not just do them?

  24. P-ter, 
     
    You’re throwing up a number of methodological criticisms that (a) are at the limits of credulity (b) tend to cancel each other out, and (c) don’t fundamentally alter the conclusions of the Hawks et al. study.  
     
    You state that the false positive rate could be as high as 20% and that these false positives would be concentrated in more recent times. Fine. But wouldn’t the same be true for the false negatives? And even if the false positive rate were as high as 20%, are you arguing that this 20% ‘tail’ could be wagging the 80% ‘dog’? In other words, if there were little or no acceleration among the 80% of true positives, you would have to assume a very high rate of acceleration among the minority of false positives. You would also have to assume a highly skewed bipolar distribution in the rate of genetic change over the human genome.

  25. meh. read wade’s article in the times about this paper, and note the reservations expressed by some population geneticists. if hawks et al. are wondering why there’s skepticism, now they know.  
     
    if I come across as being unfairly harsh (I’ve tried to avoid this, but I’m human), I apologize, but I do see a lot of holes. maybe some of them cancel each other out, maybe not. maybe some of them alter the conclusions of the paper, maybe not. but there are things the authors could have done to make their case a lot stronger.

  26. p-ter, have they at least shown, in your view, that the proposition that there’s been practically no human evolution these last n millenia is false?

  27. have they at least shown, in your view, that the proposition that there’s been practically no human evolution these last n millenia is false? 
     
    that had been demonstrated well before this paper.

a