Ambiguity in self-classification of ancestry and its problem with disease risk

Genetic ancestry analysis on >93,000 individuals undergoing expanded carrier screening reveals limitations of ethnicity-based medical guidelines:

Self-reported ethnicity was an imperfect indicator of genetic ancestry, with 9% of individuals having >50% genetic ancestry from a lineage inconsistent with self-reported ethnicity. Limitations of self-reported ethnicity led to missed carriers in at-risk populations: for 10 ECS conditions, patients with intermediate genetic ancestry backgrounds—who did not self-report the associated ethnicity—had significantly elevated carrier risk. Finally, for 7 of the 16 conditions included in current screening guidelines, most carriers were not from the population the guideline aimed to serve.

It’s long been known that a certain percentage of people for various reasons give an ethnic background that doesn’t match their total genome. This is a problem because of the frequency of carriers varies by population. I first began to be curious about this issue 15 years ago. Today there is no excuse in my opinion for not genotyping if you are a hospital.

The wholesale costs of SNP-arrays are $25, and there are pretty simple turn-key ancestry inference algorithms. This doesn’t need to be an issue. This is 2020.


16 thoughts on “Ambiguity in self-classification of ancestry and its problem with disease risk

  1. The gist of the article is quite different IMVHO. It’s not advocating testing for “biological ancestry” for making more precise medical decisions, but rather for giving all the patients exactly the same panel of tests, regardless of their self-reported ancestry. And with regard to carrier testing for recessive conditions, it sort of makes sense, isn’t it? Adding a few dozen more loci to the screen isn’t gonna cost a whole lot, and little harm is done to the patients by reporting carrier status for more diseases.

    The push “not to assume that someone’s risks are lower just because of their self-classified ancestry” goes much broader. A very recent NEJM piece ( ) insists that we should treat patients as high-risk for heart, kidney etc. conditions if the main factor behind their low risk estimate is their ancestry. Their Table 1 lists more conditions with ancestry-dependent reduced-risk estimates questioned (but not discussed) by the authors, such as breast cancer. It is well known that preventive interventions for high cancer risk patients carry especially significant risks of comorbidities, but the current momentum is clear: it’s better to overtreat than to deny care based on the ancestry factors.

    Of course, unlike the Nature paper you cite, the NEJM piece is heavy on the denial that there is anything biological about ancestry at all. We may cringe about such a blanket denial, but it seems that most regular people (who aren’t interested in ancestry DNA testing in the first place) shrug the DNA results off as some kind of non-scientific nonsense. Take a look, for example, at this study, published earlier this year. You should ignore the inflammatory title and the shamefully biased questionnaire design … but one basic conclusion stands. Folks who don’t want their DNA ancestry tested, tend to disbelieve and disparage their test results.

    In short, testing the patients’ DNA ancestry instead of trusting their self-id just seems to be a recipe for great patient dissatisfaction (to put it mildly). Of course many precise clinical decisions (especially DNA-based clinical decisions such as those using notoriously ancestry-dependent polygenic risk scores) are impossible without adequately measuring ancestral origins of the genome, but how to reconcile it all is a $64,000 question.

    Apropos the denialism, I can’t help noticing that this just days-old manifesto
    parades Lewontin 1972 on its banner. (As well as a 2002 paper which scored tiny pieces of the genome for “diversity”, studding their African panel with pygmies and Khoe-San tribes to achieve the desired result). I remember that there was a cool discussion of these issues here, after the “Fallacy” paper came out, but most of the old links are dead. I wonder if anyone tried reapplying these old methodologies to the G1K sets (especially Lewontin’s, since he was so biased towards high-frequency variation identified in the Europeans). I understand that the methodologies themselves were kind of lame and kind of meaningless, but since the old results occupy such an outsize spot in the denialist consience …. then perhaps re-running the analyses with the better data might help changing some minds?

  2. I agree this is the sort of information you want to know, but can you realistically control who else knows it? What are your thoughts on the privacy implications?

  3. Genotyped ancestry determinations could also be assigned precisely defined identifier labels that do not overlap directly with census definitions or other definitions of race, ethnicity or ancestry used in other contexts, in order to avoid the minefield on inconsistent use of plain English terminology whose acceptability shifts over time due to changing linguistic and political norms, in this highly sensitive area.

  4. Details:

    “Concordance between SRE and GAmajority was highest in those who self-reported as Northern European (96.9%), Hispanic (96.4%), South Asian (96.3%), or Southeast Asian (95.8%).

    Concordance was lowest (<90%) in those self-reporting as Middle Eastern (59.2%), Ashkenazi Jewish (80.2%), or Southern European (84.0%).

    We also observed wide distributions of the genetic ancestry proportions within SRE groups both above and below the 50% threshold, suggesting that genetic ancestry level can vary among individuals in the same SRE and that substantial genetic ancestry may still be present even if GAmajority and SRE do not match. Nearly one-third of patients (31%) had a SRE of Mixed or Other Caucasian (either by directly selecting this option on the requisition form or by selecting more than one ethnicity), with most having GAmajority of European."

  5. btw I left a comment with 3 links and it seems to have disappeared in a moderation purgatory? Can you check? Is 3 links too much?

  6. Wouldn’t it be better to test for the actual allel rather than ancestry alone, if testing at all? I mean I’m all fine with ancestry testing, but quite obviously to test for what is the real cause, namely a specific genetic variant, is more solid than estimating ancestry. Will be the future, if there is a positive techno-civilisational development, for sure. The only problem is data protection and abuse.

  7. “Wouldn’t it be better to test for the actual allel rather than ancestry alone, if testing at all?”


    The notion is that it can make sense to test, e.g., for sickle cell risk in people with African ancestry within the last 600 years, but not for everyone else, because your false positive to false negative ratio is a function of allele frequency in the population as a whole.

    But, cheap genetic testing could easily be done once with everybody as a kid to determine alleles that are then known and common enough to scan for, and for ancestry, but your mass produced testing chip might not have some allele discovered to have functional importance thirty years after your childhood DNA test is done. Cheap testing is cheap, in part, because they don’t test at loci where 99.9% of all humans in all known populations are known to be the same (which is a huge proportion of the genome).

    Indeed, even if you have privacy concerns, you could do a one time test on a kid, assign an ancestry mix or cluster, a point out the identified risks, and throw the source data out, and get the same benefits.

    Your genetically determined ancestry cluster will be stable and could be used to build studies of allele frequency on more solid ground than SRE (self-reported ethnicity), and your DNA test based ancestry could be used to determine if someone should be tested for example, for a newly discovered functional allele not including in old testing that is found at high frequency in South Asian Brahmins that is rare in everyone else.

    This is a somewhat different construct than Razib proposes in the OP – focusing in childhood establishment of genetic ancestry clusters and on use of genetic ancestry clusters in lieu of SRE in doing allele frequency studies and phenotype risk studies.

    But, it addresses the concern you identify (if your going to test, why not just test for what you’re looking for) in a more sleek way. Also, if you do regressions of allele frequency to ancestry with mixed ancestry individuals, you can mathematically compute individualized risk v. reward and false positive to false negative odds for people who have mixed ancestry with nothing more complex than a cell phone app added on to data on allele frequencies in idealized type single ancestry individuals who don’t exist and have inbreeding coefficient issues in some of the purest cases.

    This is an especially big deal in the case, for example, of African Americans and Native Americans where actual ancestry percentages vary greatly within the population, varying allele frequencies expected and need for preventative testing in those populations by individual. A test that might make sense for one person might not make sense for their cousin.

  8. Genetic testing is pretty rare at my hospital. I generally only see it done for sickle-cell patients, and a few others.

  9. Its rare anywhere and in Europe its in some countries even difficult to get anything else but chromosomal defects tested prenatally. We are far from testing for genetic predispositions later in life.
    Actually even meaningful and recommended medical tests being rarely done if they are not absolutely necessary with a clear indication. On the other hand many routine tests being done over and over again, whether they make sense of the patient in question or not. Its all about what the routine is and the insurance company is paying without asking.

  10. Not sure how many readers are medical. As a practitioner dealing with increased disease incidence and severity in different racial groups I don’t really pay that much attention to these type of studies. They lack clinical usefulness.
    Years ago I was taught a complete family history going back 4 generations gives you all the risk you need to know without making a racial assumption. A genetic test is an incomplete correlation.
    You would be better off knowing what your parents, grandparents and great grandparents died from, as well as all their siblings, and particularly any infant deaths.
    Great examples of this are bleeding disorders through Polynesian families etc. Bad examples are trying to apply small group prevalence data to larger populations such as breast cancer BRACA risk. Yes if your Eastern European Jewish it contributes risk, but a little more uncertain amongst Anglo immigrants to Australia.

  11. complete family history going back 4 generations gives you all the risk

    I am sorry, Doctor, have you read the article, or have you ever heard about recessive conditions or carrier status?

    Being a carrier of a recessive mutation doesn’t give you, or your ancestors, any risk. The risk is to you future children if the other parent happens to carry a similar recessive mutation.

    The paper lists nearly a 100 of such recessive conditions, most of which are, unfortunately, only tested in the ethnic groups in which the carrier frequencies are higher.

    As a practitioner, you probably never counseled couples before they conceive a child, and it’s fine. It is someone else’s medical specialty. The rhetorical question is, should you dish out medical advice in the specialties you don’t seem to know much about, and don’t even want to read?

  12. Dx “I am sorry, Doctor, have you read the article, or have you ever heard about recessive conditions or carrier status?”

    Surprisingly I did read the article and do happen to know about recessive carrier status. In addition male infertility, with its attendant inherited issues is part of my speciality. Your comment gives me pause as to whether you read the paper, or my comment in full. If so you would have seen the following in the middle of the introduction.

    “For instance, 39.6% of individuals cannot identify the ancestry of all four grandparents, limiting the ability of SRE to reflect genetic ancestry”

    My comment was that knowing a detailed family history of 4 generations with sibling and infant causes of mortality reveals the vast majority of recessive conditions. Taking that history is time consuming. Most of the time it’s skipped over. This paper argues don’t trust the patient, they are poor self reporters, our special gene test can help you out. They don’t do a great job letting the reader know which of these are clinically relevant or prevalent diseases.

    The first recessive example I mentioned are bleeding disorders amongst Polynesian families. Recessive factor VII/VIII deficiencies with variable penetrance and epigenetic expression. Knowing associated SNP frequencies doesn’t predict which family members bleed. You can have a positive genetic test with minimal clinical phenotype, in the same family.

    The second example I mentioned was discussed in the article. BRCA mutations in individuals who have Ashkenazi Jewish descent. Yes that group is at risk for ovarian/breast cancer (and others). But clinically if they stay in surveillance there is no impact on life expectancy. The question of how applicable that is in say the Anglo population living in Australia with BRCA becomes legitimate. There doesn’t seem to be the same prevalence of breast cancer in that population who have BRCA compared to AJ. It’s increased, but not to the same magnitude (like nowhere near) Why? Beats me. But clearly there has to be some LOF/GOF variant as yet identified that’s protective. Wouldn’t want to go around scaring people into making poorly informed decisions on the basis of a positive genetic test. Unlike, for example, Angelina Jolie, who underwent bilateral mastectomies and ovary removal on some questionable advice.

    Lastly, check the authorship. The senior author is a Myriad Labs employee. The first 3 also worked for Myriad. I’ve used Myriad’s services on and off when indicated. Keep in mind Myriad patented the BRCA1/2 genes in 1995, before losing a famous Supreme Court case in 2013. The business model is monopoly of genetic testing. Would take this paper with a teaspoon of salt.

  13. Ah, I must apologize, you did read the paper but seeing it through the prizm of ignorance and conspiracy theories about genetics must have clouded your understanding.

    Factor VIII deficiency isn’t autosomal recessive, it’s X linked. Carrier mothers have affected sons regardless of the father’s carrier status. This is why it can be diagnosed on the basis of family history.

    This isn’t the case with the autosomal recessive traits. If the mutation frequency isn’t extremely high, then about the only way one might see its footprint in family history is through consanguinities in the family, which are too rare in most cultures to help.

    Even autosomal dominant conditions like BRCA disorder are often hidden in family history, because males and even younger females aren’t affected severely enough to make the family histories strong enough to qualify for an automatic high risk group inclusion.

    Variations in penetrance which you mention are another reason why family history assessment may miss the warning signs. By the way, it’s usually argued that the Ashkenazi Jewish founder mutations have similar or possibly slightly lower penetrance than average, although it’s hard to remove biases of ascertainment in today’s clinical experience which uses different selection thresholds for different ethnicities. But it isn’t higher by any means. And variations in penetrance between sky high and just high should never be construed as a license to ignore the gene!

  14. Right Dx,

    Not sure how much medicine or genetic experience you have with real world patients and families.

    Your summary of inherited bleeding disorders is a little incomplete. Classical hemophilia is typically taught as being X linked Factor VIII/IX with females rarely affected. I’ll point you to the 4th paragraph in Wiki for hemophilia “some females with a nonfunctional gene .. may be mildly symptomatic”. Clinically it’s more than some, and it can be severe. The Polynesian disorder I was referring to was Von Wilibran’s. VWD comes in both AD and AR patterns. VWD is more common than hemophilia, it’s autosomal and can be recessive. On top of this it’s estimated 20% of all bleeding disorders are de novo so testing a parent makes little difference, unless you want to get into circulating fetal DNA testing.

    BRCA certainly affects men. About 5 yrs ago it was listed by the American Urology Association as the only identified inherited risk for prostate cancer, and all men with high grade prostate cancer should be checked for BRCA. There is guidance to consider testing the sons. Funny thing. Many of these studies were funded by ……Myriad Labs!

    If you check their list in this paper the vast majority of diseases are metabolic. Things like Maple Syrup Urine etc. You might think consanguineous unions were uncommon …. unless you’ve worked in a major pediatric facility where the 4th question on the intake form after name and birthdate, parents names was – is this child a product of a consanguineous union? (I’m not kidding, and it’s quite amazing what answers come back when you ask)

    Family histories really matter. Using genetic panel testing ignores non functional genes etc

    Everybody needs to do a better job of knowing who they are. And I don’t mean psychologically. I mean genetically. Know where your great grandparents were born. Know what they died of, know what diseases your grandparents and hopefully their siblings had, know what your parents and their siblings died of.

  15. I don’t want to be a bore, but believe me, there are important differences between autosomal dominant vs recessive disease, and of the most important practical differences is that it is nearly impossible to foresee a recessive condition in a newborn child by the family history of the disease in the ancestors.

    Why so? Because the child needs to inherit two copies of the defective gene to become affected. The parents have one copy each (ever heard about a carrier status?) and so were some of their parents and so on. One. NOT TWO. The chances that some of the more distant ancestors married other carriers and produced affected children are so slim … ugh, I don’t know if I need to waste more time trying to explain the obvious.

    The Polynesian disorder I was referring to was Von Wilibran’s. VWD comes in both AD and AR patterns. VWD is more common than hemophilia, it’s autosomal and can be recessive.

    Von Willebrand. Which of the many forms of the disease is there in the Polynesians? Give me a reference please. A very rare Type 3 VWD is recessive but it accounts for hardly more than 5% of the Von Willebrand’s cases (typically 2-3 cases per million population). There are about 2000 VWD diagnoses in Australia, implying that there may be a 100 Type 3’s at most. I bet you got wrong. If there was a Polynesian family with a VWD bleeding issue going back generations, them it’s got to be a dominant, and more common type 1.

Comments are closed.