When David Reich’s op-ed came out some discussion ensued about his focus on prostate cancer risk in African Americans. This is the research which put Reich on my personal radar (if you care, start with this 2006 paper, Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men). I had a back-and-forth with Debbie Kennett about whether this was a robust result. To be honest I hadn’t followed the research closely because 1) my own risk of dying of prostate cancer is probably pretty low knowing what people in my extended pedigree tend to die from 2) I’m not terribly interested in disease genetics unless they have a strong evolutionary genomic implication.
Doing some cursory literature searches suggested that Reich was right to include that example in the book and the op-ed because there had been follow-up work that verified the initial result. I had told myself that perhaps I’d follow up on this at a later point. After reading Laura Hercher’s rather patronizing take on David’s op-ed I decided that now is as good a time as any.
Looking around I found a very recent paper which hits the spot. Genetic hitchhiking and population bottlenecks contribute to prostate cancer disparities in men of African descent (it’s in Cancer Research). It came out in February 2018, so it will be up on the literature, and, there is an evolutionary angle here (I am friendly with the first author and respect his work overall).
The paper is open access so I recommend you read it. But here’s the high level:
- They had access to Sarah Tishkoff’s huge data set of African populations, as well as 1000 Genomes, to produce a combined panel with 1 million markers and 64 populations (38 African).
- Then, they focused on the hits in the literature for prostate cancer SNPs, which they called CaP susceptibility loci. 68 SNPs with high confidence (they looked for p-values of 10-5 or less).
So they have the data set with populations and allele frequencies, and a subset of markers that they want to interrogate (no imputation here, they had all the SNPs). They developed a statistic, Genetic Disparity Contribution (GDC), to evaluate the impact of SNP differences across populations in terms of CaP risk (that is, prostate cancer risk).
First, they need to look at a SNP in a particular population:
i = SNP, j = individual, and k = population. The SNP here is the “risk allele” (remember, they come in two forms). 2, is reflecting the frequency of the risk allele. ORi is basically the odds ratio of a given SNP of developing prostate cancer.
Now, the GDC:
A = African and N = non-African. You are just using the frequencies within the populations of interest for the given SNP. You can compare different populations presumably.
Finally, the individual Genetic Risk Score (GRS):
The score for an individual j in population k is the sum of ̅ across all 68 markers. If the individual has no “risk alleles” (those that increase odds of developing prostate cancer), then their GRS = 0.
As I stated above I don’t know much about prostate cancer. Honestly, I should take more of an interest, since it seems to run on my sons’ maternal side, so they are at risk (I know I am at risk, but people in my family tend to die of heart issues rather than cancer). The heritability for this cancer is 0.42-0.58. This is not trivial. The authors state that “CaP has the highest familial risks of any major cancer.” I certainly did not know that.
Combining their population-wide data set and the knowledge of risks from GWAS on CaP risk SNPs, they generated the plot to the left which shows you each population’s mean GRS. They confirm earlier work which suggests that African populations are at more risk than non-African populations and that West African populations are at more risk than East African populations. The authors observe that some African populations do have low risks even on the global scale. But on the whole the rank here is:
West African > East African > South Asian > European > East Asian.
They used ADMIXTURE to confirm the obvious correlations; the more West African ancestry in an individual the higher the GRS. The highest non-African population are Puerto Ricans, who have substantial West African admixture.
But one thing to remember here is that some of these African populations are quite distinct. For example, though West African populations have the highest risks, the Hadza and the Baka have high risks as well, and these hunter-gatherers are very diverged from other Africans. In fact, we know from ancient DNA that modern African populations are fusions of extremely distinct groups whose divergence may go well north of 200,000 years ago.
The pattern of risk seemed a bit strange to me outside of Africa. On the genome-wide scale, South Asians are between Europeans and East Asians, with a slight bias if any toward Europeans. This is because half the ancestry of South Asians is closely related to that that contributed to Europeans, and half is distantly related to the ancestry of East Asians. This can easily explain why their archaic admixture fractions are between these two groups. And yet the average GRS makes it clear dthat they seem higher than these two populations.
Lachance et al. do the standard genetic calculations of risk, and perform some exploratory analysis of the population structure in their data (since they curated this from well-known sources this wasn’t necessary for outlier removal as much as the regression that they ran of GRS on ancestry fractions). But they didn’t delve deeply into demographic history that I allude to above. Rather, what they did focus on were signals of selection in regions of the genome that these the risk markers were embedded in.
They seem to come to two general conclusions:
- Selection through the side-effect of hitch-hiking does seem to drive some of the African vs. non-African divergences.
- Much of the difference can probably be due to specifics of drift in non-African populations in the “out of Africa” event, and there isn’t evidence of polygenic selection across the 68 loci in the aggregate.
The latter seems unsurprising because prostate cancer hits late in life. As a trait, it is not what you are going to be selecting against in a pre-modern world (anyway, grandmothers, not grandfathers, seem to increase descendant fitness the most in ethnographic work). Additionally, the authors say that “risk allele frequencies tend to be higher in Africa when risk alleles are ancestral, and risk allele frequencies tend to be higher in non-African populations when risk alleles are derived.” Ancestral/derived here relates to new mutations (the latter). We know that the “out of Africa” bottleneck resulted in the extinction of some ancestral variation, presumably including ancestral risk alleles.
The former, in regards to linked selection, is also not surprising. As non-Africans spread across the world they developed new local adaptations, and some allele frequencies shifted from the African ancestors. But not all. And that I think explains why South Asians have a higher risk than Europeans and East Asians. The authors observe several protective (lower risk) alleles rose in frequency due to being in a region where there was selection for lighter pigmentation. Pigmentation is one trait which is highly heritable where some non-Africans (South Asians, Oceanians) are often more like African populations than other Eurasian groups. If high-risk CaP alleles were somehow associated with ancestral pigmentation alleles, then it makes sense that South Asians have a higher risk, since they are more ancestral on these loci than other Eurasians.
Finally, there is the question of how applicable these GWAS are to diverse populations. These markers were discovered in mostly European panels, so there is the standard ascertainment bias. Though the authors do say that “The International Agency for Research on Cancer GLOBOCAN program estimates that CaP has the highest incidence of any tumor site in African-American, Caribbean, and African men.” That is, African men, just like men of the Diaspora, are at higher risk. And remember, the association with African ancestry emerged in African American men, with those with elevated African ancestry in a particular region of the genome being at higher risk. It wasn’t a naive observation of higher rates of CaP in African Americans.
Because the OR can vary between populations, the authors ran their analysis by equalizing the OR and also by using the literature value of OR at a marker population by population. They found the broad disparity held. Subsampling the markers also maintained the rank order in broad geographic terms. Finally, the authors observe that because of the bias in the discovery of European risk variants, there are probably African risk variants that are not in their marker set which result in an underestimate of the GRS.
What is the upshot of all of this? The less important one is that David Reich used the example of prostate cancer to open his discussion about population structure because it’s probably a robust result (and also, in the book he makes clear a lot of sociologists and anthropologists did not appreciate the correlation between disease and ancestry that seemed due to biology). The balance of the evidence points to the likelihood that men with African ancestry, in particular, but not exclusively, of West African ancestry, have somewhat higher risks all things equal of developing prostate cancer. As the authors note the risks overlap quite a between populations. A substantial number of men of European ancestry have a higher GRS for CaP than those of African ancestry. There are two classes of alleles driving this risk. One class has high-frequency differences between populations, and another class has a large impact on odds ratios (so small differences still matter).
The figure to the right shows that there is a strong correlation between predicted genetic risk score and the real death rate from prostate cancer. I’m a little confused though here about the relationship between the training set and the population one is predicting on. Presumably, the GWAS come from these populations based on medical research, which is the same body of literature collecting the death rates. But the interesting thing here is that East Asians, Europeans & Latin Americans, and Diaspora Africans, are all distinct clusters in both mortality and GRS.
Since the heritability is not high, but only moderate, and even this correlation is imperfect, one can still argue that the disparity is attributed to environment. But to be honest the South Asian prediction along with the relationship to pigmentation regions indicates to me that the GRS is capturing something real in population differences due to a combination of demographic history and natural selection.
Moving on from CaP, these academic debates about whether disparities are driven by genes, environment or both (or an interaction), miss the bigger picture that due to the contingencies of history different populations probably have different risks in late-in-life diseases. The South Asian risk for cardiac and metabolic illnesses is so extreme that I think most people won’t deny that that is a real thing (in particular since there is variation within South Asia for this judging by British medical data).