When we sampled 200 people in Queens for the Genographic ‘Human Family Tree’ Project in 2008 we even found a Khoisan lineage – part of what was effectively a microcosm of global genetic diversity in a single urban US population.
— Spencer Wells (@spwells) May 1, 2018
In the nearly 20 years since the draft of the human genome was complete,* we’ve moved on to bigger and better things. In particular, researchers are looking to diversify their panels of human genetic diversity, because of differences between groups matter. You can’t just substitute them for each other genetically.
There have been efforts to diversify the population panels recently, but that prompts the question whether American population coverage is sufficient. My first thought is that the genetic diversity in the USA is probably getting us 90% of the way there. Consider Spencer’s comment about Queens, it’s the most ethnically diverse large conurbation in the country.
There are some gaps though. In Who We Are David Reich points out the distinctiveness of Indian population genetics. The subcontinent has lots of large census populations which have drifted upward deleterious alleles due to long-term endogamy. And, many of these populations don’t have a strong representation in the Diaspora.
In contrast, much of the rest of the world is panmictic enough that an American panel can pick up most of the variation. American Chinese are skewed toward Guandong and Fujian, but a substantial number of people from other parts of China have arrived in the last generation. Regional structure is not so strong that you’ll miss out on too much, aside from very rare variants which are more extended pedigree scale rather than population scale.
There are small populations such as Hadza, Khoikhoi, and Pygmies in Africa which are probably going to be missed by American population panels, but the total census size of these groups is pretty low (for comparison, there are 1 million Pulayar Dalits in the state of Kerala alone). Much of the rest of Africa is West African variation well represented in African Americans, and Bantu and Nilotic variation probably captured my immigrant communities.
I’d propose supplementing American genetic diversity with sampling Cape Coloureds in South Africa.
* No discussions about how the genome isn’t totally complete. I know that.