Is American genetic diversity enough?


In the nearly 20 years since the draft of the human genome was complete,* we’ve moved on to bigger and better things. In particular, researchers are looking to diversify their panels of human genetic diversity, because of differences between groups matter. You can’t just substitute them for each other genetically.

There have been efforts to diversify the population panels recently, but that prompts the question whether American population coverage is sufficient. My first thought is that the genetic diversity in the USA is probably getting us 90% of the way there. Consider Spencer’s comment about Queens, it’s the most ethnically diverse large conurbation in the country.

There are some gaps though. In Who We Are David Reich points out the distinctiveness of Indian population genetics. The subcontinent has lots of large census populations which have drifted upward deleterious alleles due to long-term endogamy. And, many of these populations don’t have a strong representation in the Diaspora.

In contrast, much of the rest of the world is panmictic enough that an American panel can pick up most of the variation. American Chinese are skewed toward Guandong and Fujian, but a substantial number of people from other parts of China have arrived in the last generation. Regional structure is not so strong that you’ll miss out on too much, aside from very rare variants which are more extended pedigree scale rather than population scale.

There are small populations such as Hadza, Khoikhoi, and Pygmies in Africa which are probably going to be missed by American population panels, but the total census size of these groups is pretty low (for comparison, there are 1 million Pulayar Dalits in the state of Kerala alone). Much of the rest of Africa is West African variation well represented in African Americans, and Bantu and Nilotic variation probably captured my immigrant communities.

I’d propose supplementing American genetic diversity with sampling Cape Coloureds in South Africa.

* No discussions about how the genome isn’t totally complete. I know that.

Ancestry does not always match up with appearance

A few years ago I watched a bunch of Megan Bowen’s YouTubes about living in Korea as an expat. In one episode she had explained that the reason she had a black American accent (she’s from Georgia I think) is that she is a black American. Just a very light-skinned one.

In other videos, you can see that her skin is a little darker without typical Korean makeup, though she is still very light-skinned. And her natural hair is quite curly. But it would not be implausible to assume that she is one of the 10% or so of African Americans who are more than 50% white.

I didn’t think much about this until today. As part of my job, I watch ancestry-related YouTube videos to get a sense of how people interpret their results, and Megan Bowen showed up!

So I watched her video. There are some photos of her parents, and both look darker in complexion and more typically African American in their appearance. She also admitted that she was so light at birth that her father took a paternity test, and she was his.

The results for her ancestry came back…and she’s 65% Sub-Saharan African! This is curious because arguably Megan Bowen looks more “white” than the actress Megalyn Echikunwoke, who is 50% European (American) and 50% Nigerian (or half-Shona half-English Thandie Newton, the list could go on).

We have the genome-wide data. Megan is 65% Sub-Saharan African. And ~32% European.

Ultimately this is a pretty clear issue of the fact that only a subset of genes are responsible for the features which we deem ancestrally informative in a naive manner. Skin color, hair form, and facial features.

To the right is a plot from a paper which looked for variants affecting skin color in a Cape Vedre sample. They used ~900,000 SNPs to assess ancestry, so you know that that’s right. They also used a melanin index generated with a spectrophotometer. You see that 44% of the variation in skin color can be predicted by ancestry in this admixed population.

There’s a clear correlation between ancestry and complexion, but because the number of loci affecting the variation of complexion in humans is relatively small for a polygenic trait, the relationship can get decoupled rather easily (a few large effect genetic loci explain a lot of the rest of the variation).

If you looked at pigmentation loci in Megan Bowen and did local ancestry analysis, you’d see a strong enrichment for European segments. Far greater than the genome-wide 32%. It happens. It’s probability, not magic.