There is long-standing tension regarding whether and how to use race or geographic ancestry in biomedical research. We examined multiple self-reported measures of race and ancestry from a cohort of over 100,000 U.S. residents alongside genetic data. We found that these measures are often non-overlapping, and that no single self-reported measure alone provides a better fit to genetic ancestry than a combination including both race and geographic ancestry. We also found that patterns of reporting for race and ancestry appear to be influenced by participation in direct-to-consumer genetic ancestry testing. Our results demonstrate that there is a place for the language of both race and geographic ancestry as we seek to empower individuals to fully describe their family history in research and medicine.
Two salient points. A combination of official racial categories and personal ancestry description seem to be the best predictor of genomic ancestry at the HLA loci. Second, HLA loci variation is highly sensitive to genetic background. It’s optimal to have more information, not less. This needs reiteration since I’m told some geneticists have been dismissing the need for self-reports of ethnicity.
In some ways, we haven’t moved that far beyond Risch 2003.
A minor gripe is that the preprint keeps mentioning supplements in the text body, but it’s not available for download (that I can see). Makes it harder to evaluate…
Also, the authors report that those who utilized direct-to-consumer personal genomics changed some opinions. For example, a smaller proportion of white individuals reported indigenous American ancestry, while a larger proportion of Hispanics reported indigenous American ancestry. Some of the authors work at Ancestry so not entirely surprising that it would present this stuff in a positive light, but would anyone complain about this?
When David Reich’s op-ed came out some discussion ensued about his focus on prostate cancer risk in African Americans. This is the research which put Reich on my personal radar (if you care, start with this 2006 paper, Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men). I had a back-and-forth with Debbie Kennett about whether this was a robust result. To be honest I hadn’t followed the research closely because 1) my own risk of dying of prostate cancer is probably pretty low knowing what people in my extended pedigree tend to die from 2) I’m not terribly interested in disease genetics unless they have a strong evolutionary genomic implication.
Doing some cursory literature searches suggested that Reich was right to include that example in the book and the op-ed because there had been follow-up work that verified the initial result. I had told myself that perhaps I’d follow up on this at a later point. After reading Laura Hercher’s rather patronizing take on David’s op-ed I decided that now is as good a time as any.
The paper is open access so I recommend you read it. But here’s the high level:
They had access to Sarah Tishkoff’s huge data set of African populations, as well as 1000 Genomes, to produce a combined panel with 1 million markers and 64 populations (38 African).
Then, they focused on the hits in the literature for prostate cancer SNPs, which they called CaP susceptibility loci. 68 SNPs with high confidence (they looked for p-values of 10-5 or less).
So they have the data set with populations and allele frequencies, and a subset of markers that they want to interrogate (no imputation here, they had all the SNPs). They developed a statistic, Genetic Disparity Contribution (GDC), to evaluate the impact of SNP differences across populations in terms of CaP risk (that is, prostate cancer risk).
First, they need to look at a SNP in a particular population:
i = SNP, j = individual, and k = population. The SNP here is the “risk allele” (remember, they come in two forms). 2, is reflecting the frequency of the risk allele. ORi is basically the odds ratio of a given SNP of developing prostate cancer.
Now, the GDC:
A = African and N = non-African. You are just using the frequencies within the populations of interest for the given SNP. You can compare different populations presumably.
Finally, the individual Genetic Risk Score (GRS):
The score for an individual j in population k is the sum of ̅ across all 68 markers. If the individual has no “risk alleles” (those that increase odds of developing prostate cancer), then their GRS = 0.
As I stated above I don’t know much about prostate cancer. Honestly, I should take more of an interest, since it seems to run on my sons’ maternal side, so they are at risk (I know I am at risk, but people in my family tend to die of heart issues rather than cancer). The heritability for this cancer is 0.42-0.58. This is not trivial. The authors state that “CaP has the highest familial risks of any major cancer.” I certainly did not know that.
Combining their population-wide data set and the knowledge of risks from GWAS on CaP risk SNPs, they generated the plot to the left which shows you each population’s mean GRS. They confirm earlier work which suggests that African populations are at more risk than non-African populations and that West African populations are at more risk than East African populations. The authors observe that some African populations do have low risks even on the global scale. But on the whole the rank here is:
West African > East African > South Asian > European > East Asian.
They used ADMIXTURE to confirm the obvious correlations; the more West African ancestry in an individual the higher the GRS. The highest non-African population are Puerto Ricans, who have substantial West African admixture.
But one thing to remember here is that some of these African populations are quite distinct. For example, though West African populations have the highest risks, the Hadza and the Baka have high risks as well, and these hunter-gatherers are very diverged from other Africans. In fact, we know from ancient DNA that modern African populations are fusions of extremely distinct groups whose divergence may go well north of 200,000 years ago.
The pattern of risk seemed a bit strange to me outside of Africa. On the genome-wide scale, South Asians are between Europeans and East Asians, with a slight bias if any toward Europeans. This is because half the ancestry of South Asians is closely related to that that contributed to Europeans, and half is distantly related to the ancestry of East Asians. This can easily explain why their archaic admixture fractions are between these two groups. And yet the average GRS makes it clear dthat they seem higher than these two populations.
Lachance et al. do the standard genetic calculations of risk, and perform some exploratory analysis of the population structure in their data (since they curated this from well-known sources this wasn’t necessary for outlier removal as much as the regression that they ran of GRS on ancestry fractions). But they didn’t delve deeply into demographic history that I allude to above. Rather, what they did focus on were signals of selection in regions of the genome that these the risk markers were embedded in.
They seem to come to two general conclusions:
Selection through the side-effect of hitch-hiking does seem to drive some of the African vs. non-African divergences.
Much of the difference can probably be due to specifics of drift in non-African populations in the “out of Africa” event, and there isn’t evidence of polygenic selection across the 68 loci in the aggregate.
The latter seems unsurprising because prostate cancer hits late in life. As a trait, it is not what you are going to be selecting against in a pre-modern world (anyway, grandmothers, not grandfathers, seem to increase descendant fitness the most in ethnographic work). Additionally, the authors say that “risk allele frequencies tend to be higher in Africa when risk alleles are ancestral, and risk allele frequencies tend to be higher in non-African populations when risk alleles are derived.” Ancestral/derived here relates to new mutations (the latter). We know that the “out of Africa” bottleneck resulted in the extinction of some ancestral variation, presumably including ancestral risk alleles.
The former, in regards to linked selection, is also not surprising. As non-Africans spread across the world they developed new local adaptations, and some allele frequencies shifted from the African ancestors. But not all. And that I think explains why South Asians have a higher risk than Europeans and East Asians. The authors observe several protective (lower risk) alleles rose in frequencydue to being in a region where there was selection for lighter pigmentation. Pigmentation is one trait which is highly heritable where some non-Africans (South Asians, Oceanians) are often more like African populations than other Eurasian groups. If high-risk CaP alleles were somehow associated with ancestral pigmentation alleles, then it makes sense that South Asians have a higher risk, since they are more ancestral on these loci than other Eurasians.
Finally, there is the question of how applicable these GWAS are to diverse populations. These markers were discovered in mostly European panels, so there is the standard ascertainment bias. Though the authors do say that “The International Agency for Research on Cancer GLOBOCAN program estimates that CaP has the highest incidence of any tumor site in African-American, Caribbean, and African men.” That is, African men, just like men of the Diaspora, are at higher risk. And remember, the association with African ancestry emerged in African American men, with those with elevated African ancestry in a particular region of the genome being at higher risk. It wasn’t a naive observation of higher rates of CaP in African Americans.
Because the OR can vary between populations, the authors ran their analysis by equalizing the OR and also by using the literature value of OR at a marker population by population. They found the broad disparity held. Subsampling the markers also maintained the rank order in broad geographic terms. Finally, the authors observe that because of the bias in the discovery of European risk variants, there are probably African risk variants that are not in their marker set which result in an underestimate of the GRS.
What is the upshot of all of this? The less important one is that David Reich used the example of prostate cancer to open his discussion about population structure because it’s probably a robust result (and also, in the book he makes clear a lot of sociologists and anthropologists did not appreciate the correlation between disease and ancestry that seemed due to biology). The balance of the evidence points to the likelihood that men with African ancestry, in particular, but not exclusively, of West African ancestry, have somewhat higher risks all things equal of developing prostate cancer. As the authors note the risks overlap quite a between populations. A substantial number of men of European ancestry have a higher GRS for CaP than those of African ancestry. There are two classes of alleles driving this risk. One class has high-frequency differences between populations, and another class has a large impact on odds ratios (so small differences still matter).
The figure to the right shows that there is a strong correlation between predicted genetic risk score and the real death rate from prostate cancer. I’m a little confused though here about the relationship between the training set and the population one is predicting on. Presumably, the GWAS come from these populations based on medical research, which is the same body of literature collecting the death rates. But the interesting thing here is that East Asians, Europeans & Latin Americans, and Diaspora Africans, are all distinct clusters in both mortality and GRS.
Since the heritability is not high, but only moderate, and even this correlation is imperfect, one can still argue that the disparity is attributed to environment. But to be honest the South Asian prediction along with the relationship to pigmentation regions indicates to me that the GRS is capturing something real in population differences due to a combination of demographic history and natural selection.
Moving on from CaP, these academic debates about whether disparities are driven by genes, environment or both (or an interaction), miss the bigger picture that due to the contingencies of history different populations probably have different risks in late-in-life diseases. The South Asian risk for cardiac and metabolic illnesses is so extreme that I think most people won’t deny that that is a real thing (in particular since there is variation within South Asia for this judging by British medical data).
David Hume stated that “reason is, and ought only to be the slave of the passions.” I don’t know about the ought part, that’s up for debate. But the is part seems empirically true. The reasons people give for this or that is often just a post hoc rationalization. To give a different twist to this contention, others have argued that reason exists to win arguments, not converge upon truth. Or more precisely in my opinion to give the patina of erudition or abstraction to sentiments which are fundamentally derived from emotion or manners enforced through group norms (ergo, the common practice of ‘educated’ people citing scholars whose work we can’t evaluate to buttress our own preconceptions; we all do it).
One of the reasons I recommend In Gods We Trust, and cognitive anthropology more generally, to atheists and religious skeptics is that it gives a better empirical window into the mental processes that are really at work, as opposed to those which people say are at work (or, more unfortunately, those they think are at work). In In Gods We Trust the author reports on research conducted where religious believers are given a set of factual assertions purportedly from scholarship (e.g., the Dead Sea Scrolls). These assertions on the face of it flatly contradict their religious beliefs in some deep fundamental way. But when confronted with facts which seem to logically refute the coherency of their beliefs, they often still accept the validity of the scholarship before them. When asked about the impact on their beliefs? Respondents generally asserted that the new facts strengthened their beliefs.
This is one reason that cognitive anthropologists term religious ‘reasoning’ quasi-propositional. It takes the general form of analysis from axioms, but ultimately the rationality is besides the point, it is simply a quiver in the arrow of a broader and deeper cognitive phenomenon.
To give a personal example which illustrates this. Many many years ago I knew a Jewish girl of Modern Orthodox girl background passingly. She once asserted to me that the event of the Holocaust strengthened her belief in her God. I didn’t follow through on this discussion, as it was too disturbing to me. But it brought home to me that in some way the “reasoning” of many religious people leaves me totally befuddled (and no doubt vice versa).
As it happens, while in the course of writing this post, I found out that Hugo Mercier and Dan Sperber, the authors of the above argument in relation to reason and argumentation, published a book last month, The Enigma of Reason. I encourage readers to get it. I just bought a Kindle copy. Dan Sperber, who I interviewed 12 years ago, is a very deep thinker on the level of Daniel Kahneman. He’s French, and his prose can be somewhat difficult, so I wonder if that’s one reason he’s not nearly as well known).
Ultimately the point of this post actually goes back to genomics and history. Anne Gibbons has an excellent piece in Science, There’s no such thing as a ‘pure’ European—or anyone else. In it she draws on the most recent research in human population genomics to refute antiquated ideas about the purity of any given population. If you have read this blog for the past few years you already know most human populations are complex admixtures; that is, it isn’t a human family tree, but a human family graph.
Gibbons’ piece attacks directly some standard racialist talking points which have been refuted on a factual basis by genetic science:
When the first busloads of migrants from Syria and Iraq rolled into Germany 2 years ago, some small towns were overwhelmed. The village of Sumte, population 102, had to take in 750 asylum seekers. Most villagers swung into action, in keeping with Germany’s strong Willkommenskultur, or “welcome culture.” But one self-described neo-Nazi on the district council told The New York Times that by allowing the influx, the German people faced “the destruction of our genetic heritage” and risked becoming “a gray mishmash.”
In fact, the German people have no unique genetic heritage to protect. They—and all other Europeans—are already a mishmash, the children of repeated ancient migrations, according to scientists who study ancient human origins. New studies show that almost all indigenous Europeans descend from at least three major migrations in the past 15,000 years, including two from the Middle East. Those migrants swept across Europe, mingled with previous immigrants, and then remixed to create the peoples of today.
First, let’s set aside the political question of welcoming on the order of one million refugees to Germany. I will not post comments discussing that.
As a point of fact the truth genetically in relation to Germans is even more complex than what Gibbons’ asserts. When I worked with FamilyTree DNA I had access to their database and presented at their year conference some interesting results from people whose four grandparents were from Germany. In short, Germans tended to fall into three main clusters, one that was strongly skewed toward people from some parts of France, another which was shifted toward Scandinavians, and a third which was very similar to Slavs.
The historical and cultural reasons for this are easy to guess at or make conjectures. The takeaway here is that unlike Finns, or Irish, and to a great extent Scandinavians and Britons, Germany exhibits a lot of population substructure within it because of assimilation or migration in the last ~1,000 years. This is why genetically saying someone is “German” is very difficult when compared to saying someone is Polish or Swedish. By dint of their cultural expansiveness Germans are everyone and no one set next to other Northern Europeans* (with the exception perhaps of the French…I’m sure Germans will appreciate this comparison!).
The conceit of these sort of pieces is that racists will confront refutations which will shatter their racist axioms. But since most of the people who are writing these pieces and read Science are not racists, they won’t have a good intuition on the cognitive processes at work for genuine racists.
This causes problems. As a comparison, many atheists seem to think that refutation of the Athanasian creed will blow Christians away and make them forsake their God (or showing them contradictions in the Bible, admit that you’ve gone through that phase!). Though the Church Father Tertullian’s assertion that he “believed because it is absurd” is more subtle than I often make it out to be, on the face of it it does reflect how outsiders view a normative social group like Christianity.
The emphasis here is on normative. Social or religious movements and sentiments are often about norms, which emerge at the intersection of history, intuition, instinct, and facts. I place facts last in the list, because I think it is a defensible stance to take that facts are the least important variable!
The field of cultural evolution has shown that group cohesion and communal norms have been major drivers of human evolution. Likely there has been gene-cultural coevolution so that group conformity has been selected for as a way to make social units operate more smoothly. Social cognition is a thing; people believe what they believe because other people in their social groups believe something, not because they’ve reasoned to it themselves. Originally reasoning is hard. Letting others derive for you, and plugging and chugging is easy. As Muhammad stated, the Ummah will not agree upon error! The smarter people are, the better they are are reasoning…but the better they are at motivated reasoning, ignorance, and rationalization.
When faced with disconfirming evidence some people can dig in and deny the plain facts. Creationists are a straightforward case of this. Then there are evaders. From what I have seen on the political Left in the United States at least over the last 15 years (when I’ve been engaging actively with people on the internet) there has been a consistent pattern of obfuscation and dodging the likely reality of sex differences in many quarters. When pinned down on the fundamentals few deny the principle or the possibility, but they almost always impose an extremely high level of skepticism that is not found in other domains, where their epistemology is far less stringent.
But then there is a third case, where facts that seem to refute on first blush to you only strengthen the beliefs of someone with whom you already disagree. I am generally of the view that the rise of naturalistic science has probably undermined the case for classical supernaturalist theism, which emerged in the pre-modern era. Reasonable people can disagree, as I have smart religious friends who are also scientists. Some of these people, like Francis Collins, will even assert that modern findings which boggle the mind and shock our intuitions confirm and strengthen their belief in pre-modern religious systems!
My point is not to take a strong stance on science and religion. Rather, it is to say that when you present evidence and declare “I refute you thus!”, they may simply respond “Aha! You have proven my point!”
…Implicit in this research is the assumption that the geographic locations of people today are informative about the geographic locations of their ancestors in the distant past. However, it is now clear that long-range migration, admixture and population replacement have been the rule rather than the exception in human history. In light of this, we argue that it is time to critically re-evaluate current views of the peopling of the globe and the importance of natural selection in determining the geographic distribution of phenotypes. We specifically highlight the transformative potential of ancient DNA. By accessing the genetic make-up of populations living at archaeologically-known times and places, ancient DNA makes it possible to directly track migrations and responses to natural selection.
From this, are we to conclude that white nationalism would decline from marginal to non-existent in the past three years? A review of the empirical data does not seem to support that proposition. Therefore, a naive model that white nationalism is predicated on facts about racial purity may be wrong.
The responses that I have seen (often in the form of comments I don’t publish on this weblog) are denial/rejection, confusion, reinterpretation and vindication (along with standard issue racial insults directed toward me, their colored cognitive inferior). As with the religious case I have a difficult time “putting myself” in the shoes of a racialist of any sort, so I don’t totally understand how they’re getting from A to B, but in their own minds they are.
Let’s reaffirm what’s going on here: white racial consciousness in the United States has exploded on the public scene over the past three years, just as scientists have come to the very strong conclusion that the “white European race” as we understand it is an artifact of the last ~5,000 years or so.**
We need to go back to Hume, and the anthropological understanding of what reason is. Reason is a tool to confirm what you already hold to be true and good. If reason falsifies in some way what you hold to be true and good, that does not mean for most people that reason is where they will stand. Likely there will be some subtle reinterpretation, but magically reason will support their presuppositions. Ask the descendants of the followers of William Miller about falsification.
The fact is that very few people in the world know about David Reich and his research. I know this personally because I’m a voluble evangelist, and many geneticists, even human geneticists, are not aware of the revolution in historical population genetics that ancient DNA has wrought. I do not know any Nazis personally, I suspect that perhaps their knowledge of human phylogenomics is not at the same level as a typical geneticist.
Of course this sort of logic about logic cuts both ways. Before 2010 I actually assumed, as did most human geneticists who took an interest in these topics, that human populations had long been resident in their region of current occupation for tens of thousands of years. When I read Reconstructing Indian Population History by David Reich I was shocked out of my prior model, because the inferences were so ingenious and plausible, and, the updated story of how South Asians came to be actually made a lot of anomalies make a lot more sense. When Lazaridis et al. posted Ancient human genomes suggest three ancestral populations for present-day Europeans on biorxiv in the December of 2013 I was far more surprised, because I had always assumed that the thesis that most European ancestry dated to the Pleistocene in any given region was a robust one. Both the phylogeography from mtDNA and Y pointed to a Pleistocene origin.
But the data were compelling. It’s one thing to make inferences on present day genetic distribution, it’s another to actually genotype ancient individuals (remember, I can reanalyze the data myself, and have done so numerous times). Lazaridis et al. and Priya Moorjani’s Genetic Evidence for Recent Population Mixture in India totally changed my personal life. All of a sudden my wife and I were far closer emotionally and spiritually because we understood that the TMRCA of many segments in our autosomal genome was about 5-fold closer than I had assumed!!!***
Actually, the last sentence is a total fiction. The history which changed how I understood my wife and I to be related on a historical population genetic sense had zero impact on our relationship. That’s because we’re not racists, and race doesn’t really impact our relationship too much (the fact that my parents are Muslim, well, that’s a different issue….). Sorry Everyday Feminism. This is not an uncommon view, though perhaps not as common as we’d assumed of late (actually, as someone who has looked at the fascinating interracial dating research, I pretty much understood that what people say is quite different than what they do; anti-racism is the conformist thing to do, so people will play that tune for a while longer).
Just because the state of the world is one particular way, it does not naturally follow that it should be that way, or that it always will be that way. Most ethical religions saw in slavery an aspect of injustice; rational arguments aside, on some level extension of empathy and sympathy makes its injustice self-evident. But they accepted that it was an aspect of the world that was naturally baked into the structure of reality. The de jure abolition of slavery today does not mean it has truly gone away, but its practice has certainly been curtailed, and much of the cruelty diminished. Theories of human nature or necessities of economic production at the end of the day gave way changing mores and values. Facts about the world became less persuasive when we decided to let them no longer dictate tolerance of slavery.
All that I say above in relation to how humans use reason does not leave scientists or journalists untouched. All humans have their own goals, and even though they see through the glass darkly, they see in the visions beyond what they want to see. The cultural and theoretical structure of modern science is such that some of these impulses are dampened and human intuitions are channeled in a manner so that theories and models of the world seem to correspond to reality. But I believe this is deeply unnatural, and also deeply fragile. When moving outside of their domain of specialty scientists can be quite blind and irrational. Even when one steps away at a mild remove in terms of domain knowledge this becomes clear, such as when Linus Pauling promoted Vitamin C. And motivated reasoning can creep into the actions of even the greatest of scientists, such as when R. A. Fisher rejected the causal connection between tobacco and cancer.****
I will end on a frank and depressing note: I believe that the era of public reason and fealty to empirical standards in at least official capacities is fading. Social cognition, tribal logic, is on the rise. But we have to remember that in the historical perspective social cognition and tribal logic ruled the day. They are the norm. This is age when he abide by public reason is the peculiarity in the sea of polemic. Ultimately it may be the fool who fixates on being right or wrong, as opposed to being on the winning team. I hope I’m wrong on this.
Addendum: I have written a form of this post many times.
* The current chancellor of Germany has a Polish paternal grandfather.
** If Middle Easterners are included as white we can extended the time horizon much further back, but that seems to defeat the purpose of white nationalism in the United States….
*** I had assumed that the western affinity in South Asians had diverged from Europeans during the Last Glacial Maximum. In turns out some of it may be as recent as ~4,500 years ago or so.
**** This may have been unconsciously as opposed to malicious, as Fisher was keen on tobacco personally.