Ambiguity in self-classification of ancestry and its problem with disease risk

Genetic ancestry analysis on >93,000 individuals undergoing expanded carrier screening reveals limitations of ethnicity-based medical guidelines:

Self-reported ethnicity was an imperfect indicator of genetic ancestry, with 9% of individuals having >50% genetic ancestry from a lineage inconsistent with self-reported ethnicity. Limitations of self-reported ethnicity led to missed carriers in at-risk populations: for 10 ECS conditions, patients with intermediate genetic ancestry backgrounds—who did not self-report the associated ethnicity—had significantly elevated carrier risk. Finally, for 7 of the 16 conditions included in current screening guidelines, most carriers were not from the population the guideline aimed to serve.

It’s long been known that a certain percentage of people for various reasons give an ethnic background that doesn’t match their total genome. This is a problem because of the frequency of carriers varies by population. I first began to be curious about this issue 15 years ago. Today there is no excuse in my opinion for not genotyping if you are a hospital.

The wholesale costs of SNP-arrays are $25, and there are pretty simple turn-key ancestry inference algorithms. This doesn’t need to be an issue. This is 2020.


10 thoughts on “Ambiguity in self-classification of ancestry and its problem with disease risk

  1. The gist of the article is quite different IMVHO. It’s not advocating testing for “biological ancestry” for making more precise medical decisions, but rather for giving all the patients exactly the same panel of tests, regardless of their self-reported ancestry. And with regard to carrier testing for recessive conditions, it sort of makes sense, isn’t it? Adding a few dozen more loci to the screen isn’t gonna cost a whole lot, and little harm is done to the patients by reporting carrier status for more diseases.

    The push “not to assume that someone’s risks are lower just because of their self-classified ancestry” goes much broader. A very recent NEJM piece ( ) insists that we should treat patients as high-risk for heart, kidney etc. conditions if the main factor behind their low risk estimate is their ancestry. Their Table 1 lists more conditions with ancestry-dependent reduced-risk estimates questioned (but not discussed) by the authors, such as breast cancer. It is well known that preventive interventions for high cancer risk patients carry especially significant risks of comorbidities, but the current momentum is clear: it’s better to overtreat than to deny care based on the ancestry factors.

    Of course, unlike the Nature paper you cite, the NEJM piece is heavy on the denial that there is anything biological about ancestry at all. We may cringe about such a blanket denial, but it seems that most regular people (who aren’t interested in ancestry DNA testing in the first place) shrug the DNA results off as some kind of non-scientific nonsense. Take a look, for example, at this study, published earlier this year. You should ignore the inflammatory title and the shamefully biased questionnaire design … but one basic conclusion stands. Folks who don’t want their DNA ancestry tested, tend to disbelieve and disparage their test results.

    In short, testing the patients’ DNA ancestry instead of trusting their self-id just seems to be a recipe for great patient dissatisfaction (to put it mildly). Of course many precise clinical decisions (especially DNA-based clinical decisions such as those using notoriously ancestry-dependent polygenic risk scores) are impossible without adequately measuring ancestral origins of the genome, but how to reconcile it all is a $64,000 question.

    Apropos the denialism, I can’t help noticing that this just days-old manifesto
    parades Lewontin 1972 on its banner. (As well as a 2002 paper which scored tiny pieces of the genome for “diversity”, studding their African panel with pygmies and Khoe-San tribes to achieve the desired result). I remember that there was a cool discussion of these issues here, after the “Fallacy” paper came out, but most of the old links are dead. I wonder if anyone tried reapplying these old methodologies to the G1K sets (especially Lewontin’s, since he was so biased towards high-frequency variation identified in the Europeans). I understand that the methodologies themselves were kind of lame and kind of meaningless, but since the old results occupy such an outsize spot in the denialist consience …. then perhaps re-running the analyses with the better data might help changing some minds?

  2. I agree this is the sort of information you want to know, but can you realistically control who else knows it? What are your thoughts on the privacy implications?

  3. Genotyped ancestry determinations could also be assigned precisely defined identifier labels that do not overlap directly with census definitions or other definitions of race, ethnicity or ancestry used in other contexts, in order to avoid the minefield on inconsistent use of plain English terminology whose acceptability shifts over time due to changing linguistic and political norms, in this highly sensitive area.

  4. Details:

    “Concordance between SRE and GAmajority was highest in those who self-reported as Northern European (96.9%), Hispanic (96.4%), South Asian (96.3%), or Southeast Asian (95.8%).

    Concordance was lowest (<90%) in those self-reporting as Middle Eastern (59.2%), Ashkenazi Jewish (80.2%), or Southern European (84.0%).

    We also observed wide distributions of the genetic ancestry proportions within SRE groups both above and below the 50% threshold, suggesting that genetic ancestry level can vary among individuals in the same SRE and that substantial genetic ancestry may still be present even if GAmajority and SRE do not match. Nearly one-third of patients (31%) had a SRE of Mixed or Other Caucasian (either by directly selecting this option on the requisition form or by selecting more than one ethnicity), with most having GAmajority of European."

  5. btw I left a comment with 3 links and it seems to have disappeared in a moderation purgatory? Can you check? Is 3 links too much?

  6. Wouldn’t it be better to test for the actual allel rather than ancestry alone, if testing at all? I mean I’m all fine with ancestry testing, but quite obviously to test for what is the real cause, namely a specific genetic variant, is more solid than estimating ancestry. Will be the future, if there is a positive techno-civilisational development, for sure. The only problem is data protection and abuse.

  7. “Wouldn’t it be better to test for the actual allel rather than ancestry alone, if testing at all?”


    The notion is that it can make sense to test, e.g., for sickle cell risk in people with African ancestry within the last 600 years, but not for everyone else, because your false positive to false negative ratio is a function of allele frequency in the population as a whole.

    But, cheap genetic testing could easily be done once with everybody as a kid to determine alleles that are then known and common enough to scan for, and for ancestry, but your mass produced testing chip might not have some allele discovered to have functional importance thirty years after your childhood DNA test is done. Cheap testing is cheap, in part, because they don’t test at loci where 99.9% of all humans in all known populations are known to be the same (which is a huge proportion of the genome).

    Indeed, even if you have privacy concerns, you could do a one time test on a kid, assign an ancestry mix or cluster, a point out the identified risks, and throw the source data out, and get the same benefits.

    Your genetically determined ancestry cluster will be stable and could be used to build studies of allele frequency on more solid ground than SRE (self-reported ethnicity), and your DNA test based ancestry could be used to determine if someone should be tested for example, for a newly discovered functional allele not including in old testing that is found at high frequency in South Asian Brahmins that is rare in everyone else.

    This is a somewhat different construct than Razib proposes in the OP – focusing in childhood establishment of genetic ancestry clusters and on use of genetic ancestry clusters in lieu of SRE in doing allele frequency studies and phenotype risk studies.

    But, it addresses the concern you identify (if your going to test, why not just test for what you’re looking for) in a more sleek way. Also, if you do regressions of allele frequency to ancestry with mixed ancestry individuals, you can mathematically compute individualized risk v. reward and false positive to false negative odds for people who have mixed ancestry with nothing more complex than a cell phone app added on to data on allele frequencies in idealized type single ancestry individuals who don’t exist and have inbreeding coefficient issues in some of the purest cases.

    This is an especially big deal in the case, for example, of African Americans and Native Americans where actual ancestry percentages vary greatly within the population, varying allele frequencies expected and need for preventative testing in those populations by individual. A test that might make sense for one person might not make sense for their cousin.

  8. Genetic testing is pretty rare at my hospital. I generally only see it done for sickle-cell patients, and a few others.

  9. Its rare anywhere and in Europe its in some countries even difficult to get anything else but chromosomal defects tested prenatally. We are far from testing for genetic predispositions later in life.
    Actually even meaningful and recommended medical tests being rarely done if they are not absolutely necessary with a clear indication. On the other hand many routine tests being done over and over again, whether they make sense of the patient in question or not. Its all about what the routine is and the insurance company is paying without asking.


Leave a Reply

Your email address will not be published. Required fields are marked *