Substack cometh, and lo it is good. (Pricing)

When journalists get out of their depth on genetic genealogy

For some reason The New York Times tasked Gina Kolata to cover genetic genealogy and its societal ramifications, With a Simple DNA Test, Family Histories Are Rewritten. The problem here is that to my knowledge Kolata doesn’t cover this as part of her beat, and so isn’t well equipped to write an accurate and in depth piece on the topic in relation to the science.

This is a general problem in journalism. I notice it most often when it comes to genetics (a topic I know a lot about for professional reasons) and the Middle East and Islam (topics I know a lot about because I’m interested in them). It’s unfortunate, but it has also made me a lot more skeptical of journalists whose track record I’m unfamiliar with.* To give a contrasting example, Christine Kenneally is a journalist without a background in genetics who nevertheless is immersed in genetic genealogy, so that she could have written this sort of piece without objection from the likes of me (she did write a book on the topic, The Invisible History of the Human Race: How DNA and History Shape Our Identities and Our Futures, which I had a small role in fact-checking).

What are the problems with the Kolata piece? I think the biggest issue is that she didn’t go in to test any particular proposition, and leaned on the wrong person for the science. She quotes Joe Pickrell, who knows this stuff like the back of his hand. But more space is given to Jonathan Marks, an anthropologist who is quite opinionated and voluble, and so probably a “good source” for any journalist.

Marks seems well respected in anthropology from what I can tell, but he’s also the person who put up a picture of L. L. Cavalli-Sforza juxtaposed with a photo of Josef Mengele in the late 1990s during a presentation at Stanford. Perhaps this is why anthropologists respect him, I don’t know, but I do not like him because of his nasty tactics (I wouldn’t be surprised if Marks had power he would make sure people like me were put in political prison camps, his rhetoric is often so unhinged).

Marks’ quotes wouldn’t be much of an issue if Kolata could figure out when he’s making sense, and when he’s just bullshitting. But she can’t. For example:

…“tells me I’m 95 percent Ashkenazi Jewish and 5 percent Korean, is that really different from 100 percent Ashkenazi Jewish and zero percent Korean?”

The precise numbers offered by some testing services raise eyebrows among genetics researchers. “It’s all privatized science, and the algorithms are not generally available for peer review,” Dr. Marks said.

The part about precise numbers is an issue, though a lot less of an issue with high density SNP-chips (the real issue is sensitivity to reference population and other such parameters). But if a modern test says you are 95 percent Ashkenazi Jewish and 5 percent Korean it really is different from 100% Ashkenazi. Someone who comes up as 5% Korean against an Ashkenazi Jewish background is most definitely of some East Asian heritage. In the early 2000s with ancestrally informative markers and microsatellite based tests you’d get somewhat weird results like this, but with the methods used by the major DTC companies (and in academia) today these sorts of proportions are just not reported as false positives. Marks may not know because this isn’t his area, but Pickrell would have. Kolata probably did not think to double-check with him, but that’s because she isn’t able to smell out tendentious assertions. She has no feel for the science, and is flying blind.

Second, Marks notes that the science is privatized, and it isn’t totally open. But it’s just false that the algorithms are not generally available for peer review. All the details of the pipeline are not downloadable on GitHub, but the core ancestry estimation methods are well known. Eric Durand, who wrote the originally 23andMe ancestry composition methodology presented on it at ASHG 2013. I know because I was there during his session.

You can find a white paper for 23andMe’s method and Ancestry‘s. Not everything is as transparent as open science would dictate (though there are scientific papers and publications which also mask or hide elements which make reproducibility difficult), but most geneticists with domain experience can figure out what’s going on and it if it is legitimate. It is. The people who work at the major DTC companies often come out of academia, and are known to academic scientists. This isn’t blackbox voodoo science like “soccer genomics.”

Then Marks says this really weird thing:

“That’s why their ads always specify that this is for recreational purposes only: lawyer-speak for, ‘These results have no scientific standing.’”

Actually, it’s lawyer-speak for “do not sue us, as we aren’t providing you actionable information.” Perhaps I’m ignorant, but lawyers don’t get to define “scientific standing”.

The problem, which is real, is that the public is sometimes not entirely clear on what the science is saying. This is a problem of communication from the companies to the public. I’ve even been in scientific sessions where geneticists who don’t work in population genomics have weak intuition on what the results mean!

Earlier Kolata states:

Scientists simply do not have good data on the genetic characteristics of particular countries in, say, East Africa or East Asia. Even in more developed regions, distinguishing between Polish and, for instance, Russian heritage is inexact at best.

This is not totally true. We have good data now on China and Japan. Korea also has some data. Using haplotype-based methods you can do a lot of interesting things, including distinguish someone who is Polish from Russian. But these methods are computationally expensive and require lots of information on the reference samples (Living DNA does this for British people). The point is that the science is there. Reading this sort of article is just going to confuse people.

On the other hand a lot of Kolata’s piece is more human interest. The standard stuff about finding long lost relatives, or discovering your father isn’t your father. These are fine and not objectionable factually, though they’ve been done extensively before and elsewhere. I actually enjoyed the material in the second half of the piece, which had only a tenuous connection to scientific detail. I just wish these sorts of articles represented the science correctly.

Addendum: Just so you know, three journalists who regularly cover topics I can make strong judgments on, and are always pretty accurate: Carl Zimmer, Antonio Regalado, and Ewen Callaway.

* I don’t follow Kolata very closely, but to be frank I’ve heard from scientist friends long ago that she parachutes into topics, and gets a lot of things wrong. Though I can only speak on this particular piece.

17 thoughts on “When journalists get out of their depth on genetic genealogy

  1. The NYTimes formerly employed Nicholas Wade as its beat writer on genetics and evolution.Wade is 75. His swan song was “A Troublesome Inheritance: Genes, Race and Human History” in which he confessed to crimethink.

    Wade “argued that human evolution has been “recent, copious, and regional” and that genes may have influenced a variety of behaviours that underpin differing forms of human society. The book was criticised in the New York Times Book Review of Sunday 13 July; David Dobbs wrote that it was “a deeply flawed, deceptive, and dangerous book” with “pernicious conceits”. However, Edward O. Wilson of Harvard University said of the book in a book-cover comment: “Nicholas Wade combines the virtues of truth without fear and the celebration of genetic diversity as a strength of humanity, thereby creating a forum appropriate to the twenty-first century.”

    https://en.wikipedia.org/wiki/Nicholas_Wade

    I assume that the social constructionists have taken over at the NYTimes, and that they would not dream of hiring someone knowledgeable about genetics.

    Wait a sec, didn’t they deep six your columns, Razib?

  2. Would you recommend Living DNA over, say, 23andMe? My maternal grandmother’s line came to the US before the Revolutionary War, so there is some British in there (family lore says they came from the Lake District, but who really knows).

  3. Would you recommend Living DNA over, say, 23andMe? My maternal grandmother’s line came to the US before the Revolutionary War, so there is some British in there (family lore says they came from the Lake District, but who really knows).

    from what i’ve heard recombination in american british ppl after many generations results in weird outcomes. so mixed reviews on that…

  4. Apologies for asking this basic question, but are there resources that you or anyone would particularly recommend for non-scientists who want to dig a bit deeper into their personal genome as reported by e.g. 23andme?

    I just got back my results from them (I sent away for it after seeing the post here about the $50 Prime Day special, so thanks) and was surprised that seemingly Google doesn’t return all that many credible-looking hits if you search for a particular term in your ancestry.

    For example, I wanted to learn more about one particular part of the Y chromosome DNA, which 23andme reported as “E-L29,” and the most detailed Google hit appeared to be this Reddit post quoted below by someone who is almost as confused as I am. Is it really the case that the same thing can have three different names depending on who is writing about it? Thanks in advance.

    https://www.reddit.com/r/slatestarcodex/comments/6n8qnp/friday_fun_thread_for_jul_14th_2017_how_the_west/

    >>> My patrilineal haplogroup, however, was really weird: E-L29. I’ll break up some insights into a few chunks. The first confusing thing I had to sort out was the name “E-L29” was hard to sort out, especially since 23&me gave me the history of a different haplogroup called “E-M123,” and I didn’t know how they connected. Further, when I researched online, there was a THIRD name that came up: E1b1b1c1a. Oh, and sometimes “E-L29” was also called E-M84? As it turns out, there is some lack of consistency in naming conventions, so one group calls a haplogroup one thing and another something else.

  5. To the person who posted this, did you reach out to the reporter and offer your expertise in the field after reading her article, or did you just post the blog piece pointing out where they got it wrong? I’m asking because I think that we should be building bridges with the media, not burning them. Yes, it is important to set the record straight. But this post isn’t a bridge builder – this is a clear burn. You have a great deal to offer writers on this topic, but this isn’t the way to sell your expertise and accuracy. Those of us who desire and expect accuracy need to do some outreach so that reporters – whether they are experts in the field or not – have a ready pool of authorities who know the science, that they know are approachable, accessible and available. This post doesn’t do that. It’s important that we get it right with them now, so we can help them get it right in the future.

  6. I’m asking because I think that we should be building bridges with the media, not burning them.

    kolata talked to someone who knows everything there is to know, joe pickrell. but she barely quoted him, instead going with someone who doesn’t know anything.

    as for “reaching out”, i am pretty well known to the media on this topic. they call/contact me regularly, and i provide fact-checking/review of copy at no cost if they ask. (i am friends with carl zimmer at THE NEW YORK TIMES who writes on this sort of topic now and then) e.g., https://medium.com/matter/23-and-you-66e87553d22c and i’ve written on the topic myself:

    http://www.slate.com/articles/health_and_science/human_genome/2013/10/analyze_your_child_s_dna_which_grandparents_are_most_genetically_related.html

    kolata probably didn’t know to ask me and she shouldn’t be assigned to this story.

    also, i have had plenty of experience with the media where they have already come to some conclusion, and are just looking for quotes to fit their story. if i don’t provide what they want, they just don’t put the quotes in. i don’t think kolata is that instance, but it’s common enough in my personal experience that i have zero hesitation burning the media.

    my experience is playing nice doesn’t work.

  7. As for me when I consider how many hands a report goes through before it’s finally published, I quake at what an opinion can do to truth. The burden of truth is on the scientists AND the news network regardless of how “hard it is to get it right.”

  8. Speaking of East Asian DNA in “full” Ashkenazic Jews, which Jonathan Marks failed to understand or explain the sources of (not Korean) or even to present in its correct percentage (5% is too high), I got into “Nicholas Wade mode” when I wrote (for free) my popularized (but not dumbed-down) article “The Chinese Lady Who Joined the Ashkenazic People” in Jewish Times Asia’s March 2015 issue @ https://issuu.com/jewishtimesasia/docs/mar2015/19

    I didn’t notice other journalists writing about the main scientific paper I discussed in my article. Wade had retired from the New York Times 3 years earlier.

    This month, I had the opportunity to look at the LivingDNA estimates for a “full” Ashkenazi. This person scores 1.4% in their North China category. LivingDNA doesn’t have a separate Ashkenazic reference population so it breaks Ashkenazim down into lots of specific regional affinities mostly in Europe and the Middle East.

  9. ashkenazim have something besides euro+middle east (some have claimed SS-african too). but the ‘normal’ thing to do is just define them as a reference population, which masks this component. ergo, mark’s misunderstanding.

  10. The New York Times seems to make a habit of telling us that nearly every person in the USA with some sub-Saharan African ancestry identifies as “black.” Apparently we are supposed to ignore the obvious African ancestry in the Hispanic and Arab populations. Hell, why isn’t Sonia Sotomayor called the first “black” woman on the U.S. Supreme Court? Does she lack any sub-Saharan African DNA? Come on! American communities have often practiced a “Don’t Ask, Don’t Tell” philosophy in dealing with suspicious ancestry in white families or individuals.

  11. Razib,

    Thanks for this article because I hadn’t so far been able to find papers on the methods by 23andme and Ancestry. (My wife and I just used Ancestry for DNA testing in the last month, she is interested in her ancestry, I am interested in genetics).

    I will read these two papers.

    I’m still working through the excellent “Principles of Population Genetics” by Hartl & Clark (recommended by you, thanks for that one as well). Fascinating how important genetic drift is. I thought it was something boring to be ignored because selection was way more important..

    In the meantime if you have a “multi-paragraph technical bite” (as opposed to a soundbite) on the meaning of what ancestry calls “Ethnicity Estimate – Thousands of years ago” vs what they call “Genetic Communities – Hundreds of years ago” I will be very interested.

  12. “The Chinese Lady Who Joined the Ashkenazic People” from Kevin Brook of the much-discredited Khazar theory fame, no less. And this Dr. Marks seems to be a supporter too, explaining that 5% East Asian admixture, per Ancestry, in an Ashkenazi genome is a regular deal. In fact whenever these major companies “discover” an Asian component in Ashkenazi genomes, it’s hardly more than 1% and it may be “broadly Asian” but doesn’t ever map to China / Korea / Japan / Central Asia. Perhaps it’s a Siberian affinity of sorts mediated by the Slavs, or perhaps just an error of measurement. But quite revealingly, Dr. Marks does not mention Middle Eastern ancestry of the Ashkenazi Jews, and the paper doesn’t mention MyHeritage, the only major lab with strong Middle Eastern and Mediterranean datasets. It looks like he, too, is a Khazar revisionist…

  13. @AD Powell

    Supplementary Figure 9 of Bryc et al. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4289685/) suggests that someone with 30% African ancestry is more likely than not to self-identify as ‘African-American’, while the reverse is not true (someone with 30% European ancestry is not likely to self-identify as ‘white’).

    I suspect that numbers like this (the ancestry percentages at which someone’s self-identify ‘flips’) and the sociological labels that they switch between (ie. ‘white’, ‘African-American’,etc) vary substantially across cultures and over time. You’re right that it’s probably also true that different ancestries matter (i.e. that someone in the modern USA with 30% African ancestry and 70% Native American ancestry probably sees themselves differently than someone with 30% African ancestry and 70% European ancestry).

    But I think it’s fair to say that in the US the proportion of African ancestry at which someone self-identifies as ‘black’ is much lower than the proportion of European ancestry at which someone self-identifies as ‘white’.

  14. The appropriate selection of reference groups is important but it isn’t “the whole thing” about accuracy. Phasing imprecision, and penalizing short ethnic segments, are very important as well, especially when an algorithm is deciding between two genetically related ethnic groups, and/or if an ancestry component is many generations old (and thus its segments are short)

    Phasing is already based on making implicit guesses about ethnicity, as the putative haplotypes which are more common in the dataset are given preference (therefore, “common haplotypes of common ethnicities” win the easiest). Then the resulting patchwork of ethnic segments is smoothed out (and sometimes re-phased) through a penalty imposed on extremely short ethnic blocks and/or a bonus on longer ethnic segments. So if a segment of ancestry A is sandwiched between segments of ancestry B, and is reasonably likely to be B rater than A itself, then it’s painted as B rather than A.
    Extremely divergent short segments (like African in European background) will survive smoothing unscathed, but in a patchwork of Mediterranean segments, genetic similarities are such that shorter segments may be completely gobbled up by nearby longer segments of a different ethnicity, with somewhat irreproducible results.

    The best example is probably MyHeritage algorithm, which assembled superb ethnic reference sets (all 4 grandparents of their subjects being of the same ethnicity). MyHeritage can tell Sardinians from Peninsula Italians, Ashkenazi from Sephardi, or Greek from Balkan Slavs. But shorter segment assignment likelihoods fluctuate in this sea of statistically similar possibilities, and they can no longer tell if one’s great grandfather was a Sardinian or an Italian. In fact their composition estimates are subject to as much as 10% inaccuracies (with a similar ethnicity substituting the “correct” one).

    When reference populations have been admixed from genetically similar sources, or continued to admix after the original pulse, relatively recently, or exhibit geographic or social class admixture clines, then the problem of accuracy becomes even more peculiar.

  15. Dap, we have known for several years now that the East Asian DNA traces in Ashkenazic people are real (and vary by the individual – some do have more than 1% while others have 0%) but that they do not come from Khazars or any other Turkic or Mongolian people, and my views from the last century have changed as new evidence has come in. I am not a partisan believer in any historical theory.

    You clearly did not read my Chinese DNA article carefully which explains the limited geographic distribution of that haplogroup M33c (and it is not found among Slavs), nor my message above where I referred to a company (LivingDNA) that did report a specific Chinese affinity rather than a broad Asian affinity, nor did you read the 2nd edition of my book which does not support a major Khazar component in Ashkenazic ancestry and mentioned plenty of evidence for Middle Eastern DNA. Also, the 3rd edition will explain how we now know that there is no Khazar component at all, neither uniparental nor autosomal.

    You mentioned MyHeritageDNA. A known cousin of mine whose recent ancestors were Ashkenazic Jews scores 0.8% in MyHeritageDNA’s “Chinese and Vietnamese” category.

  16. re: phasing. ftdna uses genotype not haplotype data. last i checked it seems ancestry doesn’t used the phasing for ethnicity; just relative matching.

    also, phasing is REALLY good obv when u have family members. though in 23andMe the use of local ancestry means generic admixed white americans get non specific results due to lots of recombination.

Comments are closed.