After a few years of presentations and preprints, the new high-quality whole-genome analysis of the HGDP dataset is finally published in Science, Insights into human genetic variation and population history from 929 diverse genomes. The HGDP dates back 30 years, so this is the culmination of a long line of research. The authors in this paper looked at nearly 1,000 HGDP individuals at high coverage sequencing, meaning that they had extremely good confidence in their calls of the state of a base across all 3 billion pairs.
This is in contrast to the ~600,000 markers in the original HGDP analyses from the 2000s, which came from results of a “SNP-array.” A SNP-array of this form focuses on the variation by looking at polymorphic sites (sites which vary in the population). How did they originally determine what was polymorphic? Unfortunately, they had to rely on European populations, so the original analyses were using a quite skewed measuring stick. Whole-genome analyses bypass these problems because you get the totality of sequence information, and, the high-coverage means you can confidently call very rare variations in some of these individuals (they’re not false positives).
The HGDP was assembled by L. L. Cavalli-Sforza and curated from ethnographically interesting populations. Therefore, it is useful to compare it to the 1000 Genomes, which tends to focus on more conventional populations. The 1000 Genomes has 2,500 individuals, sequenced at somewhat lower coverage on average. While this project yielded 70 million polymorphisms, the 1000 Genomes Project had 85 million. Most of these are rare. The power to detect rare polymorphisms is useful in elucidating population structure because rare polymorphisms tend to be evolutionary new, and so reflect more recent differentiation.
For example, they compared Yoruba, Mbuti, and non-Africans. Looking at common polymorphisms the Yoruba are closer to non-Africans while looking at rare ones they are closer to Mbuti. Why? The rarer polymorphisms reflect recent differentiation, and there has been recent gene flow between Mbuti and Yoruba.
On the whole, they recapitulated earlier findings but using more sophisticated methods that leveraged their whole-genome data they added some wrinkles. For example, some populations diverged in a very sharp and distinct fashion, such as Han and Yakuts, or Druze and Sardinians. But for the populations that diverged between 150,000 and 50,000 years ago, mostly within Africa, the separations were more gradual and probably characterized by repeated gene flow between the descendent groups (e.g., Non-Africans, Yoruba, Mbuti, San, etc.).
This reiterates that there isn’t a one-size-fits-all narrative we can use to talk about the emergence of modern populations and the way those populations are patterned. There are debates about whether we are a “clinal” species or not. I don’t think that’s a good question, because as implied in this paper a great deal of the past diversity has been collapsed through recent admixture events. The authors also detect deep and complex structure and differentiation. They’re clearly just scratching the surface.
Finally, there is more reiteration of the nature of Neanderthal and Denisovan admixture. The Neanderthals who mixed into early humans were quite homogeneous, or, there were not many of them. The haplotypes are not too numerous, and, they don’t exhibit the patterns you’d expect from different admixtures and source populations. The diversity is too great to be a single individual, but it could have been a small number. The main caution I would suggest here is that Neanderthals seem to often be quite homogeneous on the local scale.
The Denisovans are a different story. They detect the difference between Oceanian and non-Oceanian Denisovan ancestry (the Oceanian source Denisovans were quite distinct from the Altai Denisovans). But they also detect a different Denisovan contribution to the genomes of the Cambodians. The indigenous people of the Phillippines also harbor different Denisovan ancestry (not in this paper). The “Denisovans” seem to have been a cluster of different lineages that persisted in parallel for a long time.
Where is there to go next with the HGDP. At some point, better technologies will allow for a more thorough exploration of structural variation. I’ve emphasized this is an analysis of the sequence because that’s what it is. There is more information in non-sequence variation that they’ll get to one day (there was some structural analysis in this paper, but I believe that we are currently technology limited).