DARC and HIV: a false positive due to population structure?

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

The recent report that the Duffy null allele is associated with increased risk of HIV infection recieved a lot of press (see Razib’s comments on it here), mostly positive. In Nick Wade’s New York Times article on the paper, however, some smart people publicly express some doubts. It’s a tribute to Wade that he actually tries to summarize those doubts in the limited space allotted to him:

Dr. Goldstein said that in parts of the United States, African-Americans have a higher infection rate than European-Americans, and that patients with a higher proportion of African genes may be more vulnerable to H.I.V. for reasons unconnected to the SNP. Nonetheless, the SNP would show up in a greater proportion of infected people simply because of their African heritage. If so, the gene’s apparent association with H.I.V. infection could be just coincidental, not causal.

In somewhat more technical terms, the issue referred to here is the potential for false positives in an association study due to population structure[1]. The issues involved in accounting for structure in an admixture mapping study are somewhat more subtle than in a classic case-control study, but are generally similar. In particular, it’s important to take individual levels of admixture into account[2]; this is generally done by including an estimate of individual admixture as a covariate in any regression model.

The authors are aware of this potential confounder, and develop a measure of admixture based on 11 SNPs to include as a covariate in their regression. However, this measure is kind of weak, which I imagine in the sticking point for the skeptics in the Times article. If you have access to the supplemental information, take a look at it–several of these 11 SNPs are in the same gene, which means they’re not independent, and several don’t even have big frequency differences between African and European samples (if you’re trying to judge via SNPs whether someone is more African or European, those SNPs better have a big frequency difference between Africa and Europe). This is probably not a precise measure of ancestry. In fact, the Duffy null allele they claim as associated is a better predictor of ancestry than any of these SNPs.

So it’s quite possible that the authors have simply shown a correlation between level of African ancestry and susceptibility to HIV (which could be due to any number of sociological, demographic, or genetic factors), rather than an association between Duffy null and susceptibility to HIV. Here’s a relatively simple test of this possibility: genotype rs1426654 (the nonsynonymous SNP in SLC24A5) in their sample and perform exactly the same test as performed with Duffy. The motivation for this is that this SNP shares the property of Duffy null of being highly informative about ancestry, while being in a gene that presumably plays no role in HIV infection. If you get an association there, it seriously calls the Duffy result into question; if not, you feel a bit more comfortable.

[1] For the classic extreme example of how population structure leads to false positive associations, consider a case-control association study on, say, diabetes, where the cases are all from Nigeria and the controls from France. Clearly, the cases are all going to have a high frequency of the Duffy null allele, and the controls are all going to have a low frequency (as Duffy null is essentially fixed in Africa and absent elsewhere), and one might naively conclude that Duffy null causes diabetes. But of course, the Duffy blood group has absolutely nothing to do diabetes (I don’t think!), and the researchers have simply been confused by not matching their cases and controls. Obviously, this example is extreme, but more subtle population structure can also confound an association study (and methods for correcting for it are an active area of research; see here, for example)

[2] It’s well-known that African-Americans are an admixed population, with about 15-20% European ancestry on average. But there’s great variability in this–a single sample of self-defined “African-Americans” can contain individuals with essentially no European ancestry and individuals who look genetically to be completely European. And on a larger scale, within the United States there’s heterogeneity in admixture proportions as well (see Parra et al.). How could this create false positives? Essentially, if risk for a disease is correlated with ancestry for any reason, there’s the potential for getting false positives. In this particular example, if HIV rates are higher in metropolitan areas where there’s been more admixture, or if there are other genetic factors that make Europeans more resistant to HIV, etc., any “African allele” (like Duffy null) will show up as associated with HIV despite playing absolutely no role in the disease.



  1. This issue came up as Marc and I were researching the forthcoming popgen of IQ post. Apparently what you (and the ppl in Wade’s article) raise has been found to generally not be much of an issue in other cases. See (de Bakker et al. 2006; Service et al. 2007). The title of Service et al says it well: “Tag SNPs chosen from HapMap perform well in several population isolates” 
    Also, if I’m reading the summaries correctly, this gene is expressed in the blood, not the brain, so no one can make the argument that the causal chain involves behavior.

  2. well, no, this is a different issue. you’re talking about how well SNPs ascertained in one population can tag variation in others (what people refer to as the “portability” of tagSNPs). In this case, the SNP responsible for Duffy null is known, there’s no issue with tagging.  
    The point here is that population structure in an association study can lead to false positives. for some of the issues in case-control studies, see here, and for the particular case of admixture mapping, see here.

  3. Ben g — no doubt that *sometimes*, cryptic substructure is a boogieman invoked to explain why an association study failed.  
    However, in this case, I mean…the Duffy null?? That’s the canonical population marker. Now, it might in fact be that it was selected for some other purpose (though I haven’t seen arguments re: its functionality, though I’m not up to date on the area) and the population faces increased vulnerability because of selection.  
    But in this case we also have an alternate (though not mutually exclusive) explanation which *is* a function of ancestry. As sexual behavior *does* vary between groups (and things like gestation times do vary in this fashion), perhaps Duffy’s significance is only due to it being strongly correlated with ancestry…and people with that ancestry have a package of predisposing alleles, including some with behavioral effects. 
    Jury’s out on this one, and genotyping of several more ancestrally informative markers spread across the genome is definitely the way to go.

  4. ok, let’s see if im wrong (again) in summarizing it this time: what you (and wade’s cited researchers) raise is a population stratification issue.? 
    if so, i’ve read that that can be properly controlled for using two methods: 1) the method you seem to be describing (correct me if i’m wrong)– “genotyping of unlinked markers among population-based samples (Pritchard & Rosenberg 1999).”, and 2) “a more sophisticated approach, using family-based samples. (Abecasis et al. 2000)” 
    *Pritchard, J.K. & Rosenberg, N.A. (1999) Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 65 220-228. 
    *Abecasis, G.R., Cardon, L.R. & Cookson, W.O. (2000) A general test of association for quantitative traits in nuclear families. Am J Hum Genet 66 279-292.

  5. fuck elsevier… i can’t get access to the original article, but the abstract mentions evidence that HIV binds to DARC. if that’s true, it’s fairly good evidence of causal involvement. also, i can’t tell whether they did any controls for admixture based on just the abstract — maybe they did. its a perfectly obvious problem — like height association with LCT in euros. 
    ben g — those are the two standard methods

  6. ben g–yes, that’s right, it’s a stratification issue.  
    the abstract mentions evidence that HIV binds to DARC. if that’s true, it’s fairly good evidence of causal involvement 
    but the direction of the association is counterintuitive–HIV binds Duffy, but Duffy null (ie. a complete lack of the receptor) is associated with increased HIV risk. one might expect the opposite (like in malaria– p. vivax binds duffy, and duffy null is protective). this might be more consistent with the association being due to stratification.  
    they do a little control for admixture, but it’s not very convincing (the ~9 independent SNPs they use for estimating admixture aren’t even good AIMs).

  7. p-ter — solid criticism re: AIMs. i’d want to see how good the binding data is and how they explain the null susceptibility. i’m assuming duffy isn’t a receptor for HIV. but yeah, i’d say this association is as likely stratification as not. i guess the editors of cell host and microbe were in a hurry to get new, sexy content out the door.

  8. yeah, a lot of this paper rests on those pretty flimsy AIMs (looking back at the supplement, 4 of them have a frequency difference over 0.5 between CEU and FYI –I assume they mean YRI?– and none have a frequency difference over 0.7). it’s easy to find 10 or 20 good AIMs (I imagine there are panels already published) that could give you much more confidence.  
    I’m not sure how much genotyping 10 SNPs in 800 individuals is, but it seems like they did a lot of work; it would be a shame if it was all chasing a false positive.

  9. fuck elsevier 

  10. Nice post, this seems like quite a serious problem.

  11. The Duffy gene is the single best one for guessing African ancestry, right? So, it would likely also correlate well with eating at chicken ‘n’ waffles restaurants, drinking grape soda pop, and voting Democratic.

  12. p-ter wrote: 
    but the direction of the association is counterintuitive–HIV binds Duffy, but Duffy null (ie. a complete lack of the receptor) is associated with increased HIV risk. one might expect the opposite (like in malaria– p. vivax binds duffy, and duffy null is protective). 
    Nothing counterintuitive here.  
    1. Malaria: a lack of receptor on target cells prevents parasite from entering target cells and propagating ==> decreased succeptibility to the disease.  
    2. HIV: a lack of receptor on non-target cells means there is no competition for productive binding of the virus to the target cells ==> increased succeptibility to the disease.  
    The slow disease progression might have something to do with the intricacies of immune response down the road.

  13. with eating at chicken ‘n’ waffles restaurants  
    Deuce Bigalow: T.J., I’m so glad you are here. 
    T.J. Hicks: How did you find me? 
    Deuce Bigalow: Well, this seemed like the only chicken and waffles place in all of Holland. 
    T.J. Hicks: Ohhh, so the black guy has to go to a chicken and waffles place, that’s Racist! 
    Deuce Bigalow: But you’re here. 
    T.J. Hicks: Yeah, but figuring it out was racist. 
    Deuce Bigalow: [noticing all the black people] This is a nice place.

  14. I agree, P-ter. This study, as it is, tells us almost nothing we didn’t already know.  
    A similarly designed study would show that the gene for blondness is a risk factor for having the ability to digest milk into adulthood.