The Facts About Elizabeth Warren’s DNA test

With Warren dropping out of the race for the Democratic nomination a lot of people on podcasts I listen to are making fun of her DNA test. Unfortunately, there are some falsehoods being promoted. It’s kind of scary for me because this is a field I know well, and it’s disturbing to watch falsehoods becoming accepted truths because people repeat them over and over again.

– First, the DNA test was not done through 23andMe, etc., or any standard commercial service. Rather, it was done by the Bustamante group at Stanford. This group has a lot of experience with the genetics of indigenous peoples of the Americas, so that is presumably why they were approached.

– Second, Warren surely has more than the expected amount of ancestry (for a white American) derived from people who were resident in the New World prior to 1492. The Bustamante group used relatively stringent criteria that are not comparable in an apples-to-apples manner with the inferences of 23andMe.

I am not here addressing the issue of whether she is or isn’t a Cherokee, or descended from Cherokees. The tests can’t answer those questions for both scientific and socio-political reasons. I’m also not addressing whether she used her identification with that tribe in furthering her career.

My only point in putting this post up is that it gets really disturbing to see “pundits” repeating “facts” you know are totally wrong without any malice because the information ecosystem is such that false facts rapidly transmute into conventional wisdom. Basically, when you see this happening you start to disbelieve everything…

Here is an old post, Elizabeth Warren Carries Native American DNA – She’s Running!.

Note: I have a piece about personal genomics that should be in the print edition of National Review in early April. Update, it’s up.

A genetic history of the human race is not controversial science nor is it fraught

Recently I was talking to a journalist about genetic genealogy, and we both agreed that soon Christine Kenneally’s The Invisible History of the Human Race: How DNA and History Shape Our Identities and Our Futures will need an update. Though published in 2015, much of the research in The Invisible History of the Human Race dates to much earlier.  In the last few years, personal genomics has gone from a sector of millions to tens of millions. In years after 2020 it will go to hundreds of millions.

And yet I’m not sure the educated public is ready to understand what a genomic future is going to look like.

This is why I think that the Elizabeth Warren DNA story is important to get right. The reality is that this isn’t really about Elizabeth Warren’s ancestry, it’s a story at the intersection of high politics and culture wars, and genetics is getting caught in the undertow. Recently I heard Ben Shapiro comment that Warren likely had “maybe 1/1024th Native American.” Actually, I think it’s very likely she has 0.5 to 1% Native American ancestry (read this Elizabeth Warren DNA post for why I say that) Not to be trite, but facts don’t care about Ben Shapiro’s feelings. I know he’s not a fan of Warren, but he shouldn’t be laundering misrepresentations.

Even in the comments of this website motivated reasoning cropped up when the original Warren story became a national sensation. Many on the Right side of the spectrum laughed at the results and interpreted them in the least generous terms. The falsehoods and misunderstandings promoted by the media, often inadvertently because most journalists don’t have the skills to navigate the science, were injected into the conservative memesphere.  Shapiro has admitted, to his own chagrin, his lack of science background, and I suspect if I explained it to him he wouldn’t use “maybe 1/1024th Native American” line. He doesn’t need to. If you are a conservative there are many reasons to be critical of Elizabeth Warren.

But I can’t blame Shapiro too much. He was reacting to this story in The New York Times, Elizabeth Warren Stands by DNA Test. But Around Her, Worries Abound. In this piece, the attacks on Warren are coming from the Left and Native American activists. There is a real story here. The Boston Globe has published an editorial warning her not to run. The air has changed around her.

From the piece in The Times:

Warren’s presidential ambitions, she has yet to allay criticism from grass-roots progressive groups, liberal political operatives and other potential 2020 allies who complain that she put too much emphasis on the controversial field of racial science — and, in doing so, played into Mr. Trump’s hands.

Ms. Warren’s allies also say she unintentionally made a bigger mistake in treading too far into the fraught area of racial science — a field that has, at times, been used to justify the subjugation of racial minorities and Native Americans.

There is “racial science” like there is “evolution science” or “Creation science.” The term is not used by any scientist that I know of, but comes up by critics and polemicists. The New York Times, whether consciously or not, is going to convince a lot of scientifically illiterate people who don’t read their science pages that there is a field of “racial science” (using the term “race science” liberally is a thing on the Left…reminds me of social conservatives who used to call everyone who was not an evangelical Protestant a “non-Christian”)

Here’s what went on in the Warren case is:

  1. Not scientifically controversial
  2. But scientifically new

Here is a review, A comprehensive survey of models for dissecting local ancestry deconvolution in human genome which looks at “20 methods or tools to deconvolve local ancestry.” There may be disagreement on the best method for various reasons, but there is no disagreement that local ancestry deconvolution is possible. It is not controversial. In fact, it is rather important in areas such as admixture mapping for diseases.

The science isn’t that hard to explain at a high level. The figure to the left is from a new paper that recently came on the genetics of the New World (using ancient DNA). What you see is that some human populations are isolated from other human populations. For example, the last common substantial ancestry of Native American populations before 1492 and Northern Europeans dates to the period between 20,000 to 40,000 years ago.

Tens of thousands of years of genetic separation result in genetic distinctiveness. This is a standard old population genetic model. When populations come back together and mix, that daughter population is clearly going to be genetically a mix between the two parent populations. But the human genome is a sequence of three billion distinct base pairs, and the mixing exhibits discrete patterns within the genome.

Humans are diploid, which means we have two copies of each gene. These genes are aligned along homologous chromosomes. One homolog you inherit from the father and one homolog you inherit from the mother. These two homologs are the basis for Mendel’s Law of Segregation.

When sex cells, sperm and eggs, are formed they carry only one of the homologs. They are haploid, with single gene copies. If they weren’t, you’d end up tetraploid instead of diploid. You get one gene copy from the mother and one gene copy from the father.

But, before the formation of sex cells, during meoisis, the homologs undergo recombination. In humans that means that there is swapping between stretches of homologous chromosomes. The average human has between 20 and 40 recombination events across the genome. A concrete way to think about it is that the individual who is producing sperm or egg is taking the chromosomes they inherited from their parents, and mixing them together, so the final set of chromosomes are a synthetic combination of the chromosomes of grandparents.

Purple segments half-identical to paternal grandfather

To make this concrete, to the left is a partial depiction of one of my children’s chromosomes, and the relatedness to my father. The purple regions are genomic stretches where the child is half-identical to the paternal grandfather. The light gray sections have no genetic descent from my father. The reason is that one of the homologs is from the maternal side. The other homolog is from me, and could be from either my father or my mother. Where the purple alternates with light gray, you see clearly where recombination events happened, as maternal and paternal homologs broke and paired together to produce sperm with novel chromosomes (e.g., my contributed chromosome 11 is 90% my father, 10% my mother…while chromosome 19 is more balanced.

But that’s not the only way to look at recombinations. To the right is an ancestry painting for 23andMe from a friend of mine who is ~25% East Asian and ~75% Northern European. On their chromosome you see two homologs. The blue segments are Northern European. The dark brown segments are East Asian. Notice the alternation between European and East Asian on one of the homologs: this chromosome is almost certainly from the parent who is 50% East Asian and 50% European. There was a recombination event where an “East Asian” homolog, inherited from the parent of East Asian origin, recombined with the “European” homolog, inherited from the parent of European origin.

The resultant chromosome is something new in a physical sequence, with alternating segments of East Asian and European ancestry. Just as the whole genome has an imprint of the genetic history of a population, so sequences of the genome also exhibit distinctiveness due to their origins. Because each generation introduces recombination events, the lengths of these distinct ancestry blocks can tell you how many generations in the past the admixture may have happened.

That’s the theory. The new aspect is that genomic technology has allowed science to assess patterns of local ancestry to a much greater extent than was possible even 15 years ago. With hundreds of thousands of genomic positions, variants, scientists are now able to map regions of the genome to an incredible level of granularity, deploying theoretical understanding of Mendelian and population genetics that dates back to the 20th century.

To look at Elizabeth Warren’s genome, and discover that a small segment of a particular length derives from a Native American population, is not a “controversial field of racial science.” This sort of analysis is now becoming de rigueur in much of medical genetics in larger part because population history has a major impact on disease risk susceptibility. To be fair, doing a local ancestry deconvolution on populations which are much, much, closer genetically due to recent shared history is difficult. But Warren’s is not one of those cases!

Honestly, I don’t know what the outcome of The New York Times calling this “racial science” is going to be, seeing as how it seems likely in the next few years >100 million Americans will have likely done ancestry tests. Many scientists, fairly, do criticize of the interpretations of these tests, and how the public perceives them. But the underlying models and methods are workaday.

It is the interpretation, and how they interact with social and political values, is fraught. The link in the phrase “controversial field of racial scienceactually goes to an article where social and political commentators and activists react to Warren’s decision to take the DNA test. There is no discussion of the science at all. It’s controversial because of what they believe the implications are, not because the science is faulty or unsound.

For example, many (though not all) Native Americans object to the idea of using genetic science to shed light on the status of particular individuals as Native American or not. The decision to take this DNA test, in an environment where many already privately grumbled about Warren’s claims, was obviously clearly a political and public relations misstep. But that does not speak to whether the science itself is sound or unsound.

Conservatives will be highly skeptical of Warren because of her policy positions. And, if the above article is correct, it seems that some of the Left is now against her on the grounds of her impolitic foray into Native American identity. That is all fine, and not much of a concern of mine. But when non-science journalists get their hands on a science story, they tend to mess it up, and that is a problem in the long-term. The sands of politics and society are protean, and always shifting. Science is something more solid, and we should not try to muddy the waters.

Elizabeth Warren carries Native American DNA – she’s running!

Since I’ve talked about this issue before, Warren releases results of DNA test:

There were five parts of Warren’s DNA that signaled she had a Native American ancestor, according to the report. The largest piece of Native American DNA was found on her 10th chromosome, according to the report. Each human has 23 pairs of chromosomes.

“It really stood out,” said Bustamante in an interview. “We found five segments, and that long segment was pretty significant. It tells us about one ancestor, and we can’t rule out more ancestors.”

He added: “We are confident it is not an error.”

The proportion of ancestry is not large. But it is clearly there. They compared to the Utah white and British European 1000 Genomes populations, which is a good standard for Old Stock Anglo-Americans. She’s clearly an outlier, with about an order of magnitude more “Native American” ancestry. So it’s unlikely to be some artifact.

There is some talk in the article about lack of reference populations. But remember, the key is to identify Native American ancestry, so all of this should coalesce back 10-15,000 years ago. Compared to the divergence from Northern Europeans, this is going to jump out against the genetic background.

So does Elizabeth Warren have Native American ancestry? 99% sure that that is a yes. Is she going to run? Well, I wouldn’t say 99%, but that seems likely too….

(I doubt she’ll do it, but it would be neat if she released her raw results)

Update: Here’s the technical report.

Update II: Some quick responses to comments. I’m going to address the genetic aspects. I’ll leave the cultural and political angles to others.

  • The analyst, who I know personally as well as by reputation, did exactly what I’d have expected he do with this data. So nothing atypical in terms of method/analytic pipeline. You can download and use the tools yourself!
  • The number of markers used in the analysis, 660,000, is a good number. Sufficient most definitely for the local ancestry analysis done here (and probably on some level necessary to gain a high level of confidence).
  • Some people wonder about the sample size of the reference population. Is the number sufficient? Yes, for the purposes of this analysis. For the scope of the questions asked. You aren’t looking for recent relatives, you are looking for a good representation of the genealogical networks from a given geography/ethnicity. The Utah whites are an industry standard sample set that is well known. The British data set in the 1000 Genomes is also pretty well known. Both seem representative of people of Northwest European heritage, a set of populations which are genetically very similar to each other.
  • People are asking about the robustness of this result. One thing you have to remember when comparing reference sets against an individual is that the genetic distance of the reference sets is important. Applying local ancestry to an individual of Dutch ancestry with training sets of Germans and English heritage is going to produce results, but the training sets themselves are going to overlap in some ways. Now, if you take someone with Dutch ancestry and do local ancestry for English vs. Javanese ancestry, then you’re going to get really clear results in comparison.
  • Some serious individuals are questioning the representativeness of the European panel and the Native American panel. As well as the lack of Siberian groups, who are closely related. But we know that Warren’s family background is such that a shift toward a Northeast Asian group is likely to be Native American. Not Chukchi. Further analysis could confirm, but the most likely hypothesis is that this is a woman of Northwest European ancestry with some Native American ancestry. Other models could fit these results. But those are not likely models in the first place (also, the PCA on Native American groups makes it likely that she is not Siberian, and she is not shifted to the northern groups).
  • A huge issue is that people are worried about the representativeness of the Native American groups. First, if you are looking for someone with indigenous North American ancestry, Mexican groups are sufficient. If anything this will reduce your power to detect, not produce false positives. Second, look at the plot, Warren’s haplotype is positioned between Canadian and Mexican natives: 
  • People are interpreting this local ancestry method, which assigns segments of the genome to particular populations with a probability, to the point estimates provided in most consumer genomics results. From what I can see, they assigned 0.4% of Warren’s genome as Native American. But 8% was not assigned. This is almost certainly mostly European, but some of it may be Native as well. Basically, the method here was less about assigning a specific proportion, and more about testing whether it was likely she had detectable indigenous American ancestry (she did), and, the range of periods in which that ancestry could have admixed into the Northern European genetic background. This is not comparable to the estimates you are getting from personal genomics tests.
  • One way you can try to assess whether these are artifactual is to compare an individual to populations of known ancestry and see the distribution of empirical results. Warren’s results are very atypical in comparison to Northern European reference sets. If this is a “false positive” due to the training sets, then you would expect the same type of problem to crop up when test out sample individuals.
  • Some are asking whether Warren is just a typical white American. You would need to do apples-to-apples comparisons. But my intuition is that she’s not. Most Old Stock white Americans probably have a genealogical relationship to Native Americans, but they may not have any segments of DNA because it is too far back. Warren is part of the minority of white Americans who have detectable Native American ancestry.

Basically, I think it is very likely that Warren has Native American ancestry. Follow-up analysis would probably just increase our confidence.