In Praise of the Human Genetic Diversity Project

Credit: Luca Giarelli — **Credit:** Luca Giarelli, L. L. Cavalli-Sforza 2010

One of the things I (and probably almost anyone) do when reading a paper on population genetics which disaggregates the sample set into discrete elements is look at the number of individuals within each group. In a genetic variation sense there need not be any deep technicalities about power analysis here (though those surely are there). If you have a sample size of ~30 Han Chinese I know enough about the variation present in Han Chinese to be less worried about this N than a sample size of ~30 Xhosa, or to make it even more explicit, ~30 Brazilians. Not only is sample size important, but so is provenance. Brazilians sampled from Rio Grande do Sul are going to be different from those sampled from Bahia. The same worry applies to Han Chinese (e.g., Guangdong vs. Hunan), but to a far lesser extent in terms of magnitude.

This came to mind when reading A Genetic Atlas of Human Admixture History, a paper by Hellenthal et al. which showcases the power of modern statistical genetic inference in outlining the dynamics of historical demography. It’s a masterful work, and I’ll try and grapple with the results in a later post, time permitting. But poring over the real paper, the supplements, I came upon this table:

table

What I want to emphasize here are the rather small samples sizes for the English and Germans. Since genetic distance in Northwest Europe is low the small N may not be a big deal (i.e., you can swap in French or Norwegian). And the reality is that of course there’s plenty of genotypic data on English and German individuals. But much of this is locked up in biomedical studies where the data can’t be released for more widespread usage (I assume that the PopRes data set had insufficient overlap of marker sets?). In contrast you have decent sample sizes for obscure Pakistani groups like the Kalash and Burusho. Why? Because of the Human Genome Diversity Project, spearheaded by L. L. Cavalli-Sforza, author of The History and Geography of Human Genes. The HGDP data set is an awesome resource, and because of its anthropological focus it preserves the genetic variation of specific isolated groups. The Kalash of Pakistan for example look like they’re going to be forcibly converted to Islam and genetically assimilated within the generation. Late in the last decade the HGDP was released even to the public, so “citizen scientists” can perform their own analyses. Until recently I’d say The History and Geography of Human Genes was L. L. Cavalli-Sforza’s greatest (of many!) achievements. But now I’m starting to think that the HGDP may be greater, its easy availability is so taken for granted that we don’t even think about it.

But it did not come without cost to the principals involved. As recounted in A Genetic and Cultural Odyssey: The Life and Work of L. Luca Cavalli-Sforza, during the 1990s the usual suspects assailed Cavalli-Sforza and his colleagues, making invidious accusations of bad faith and worse. Spencer Wells told me that at one point Jonathan Marks came to Stanford, where Cavalli-Sforza was based, and gave a presentation where he juxtposed an image of Cavalli-Sforza next to the notorious Nazi doctor Josef Mengele. Those days are done. The cult of outrage has moved on to other useful scapegoats to persecute, Cavalli-Sforza is in retirement in Venice, and the HGDP data set is out in the world, furthering our understanding of the present and past variation of the human race. The controversies of the 1990s was only useful for the standard cultural Marxist types who were attempting to gain some measure of fame, and a sinecure at a university. Of course they failed in their project to obscure and manipulate our understanding of reality, because it is what it is, and it always will be. But human lives, and careers, were surely tarnished and effected due to the unfounded and opportunistic behavior of the propagandists and sophists who have burrowed themselves into American academia with the tenacity of a tick buried into your flesh.

As someone who says what they think and has the bias toward speaking plainly I am aware that I am open to broadsides from the armies of obscurantism (OK, frankly, I’ve been subject to many attacks over the years despite my relative obscurity; the armies of the darkness see all deviationists). They lack shame and restraint, as is the norm among true believers. They would burn books if they could, that I know. So why do I persist? Because over the long run the arc of history runs toward truth, and if not now, then in the future. Reality is, and I want to see it, understand it, and grasp with hands and comprehend it in my bones. One day in the future I’ll be proud to tell my daughter and soon-to-be-born-son that I wasn’t craven. Today, in the present, I open my mind, and take in more data and results in genetics in the past 10 years than was published over the previous 100 years! If such behavior closes off some avenues of career advancement, so be it. My eyes are open. Man does not live by sinecure alone.

Cavalli-Sforza and his colleagues who persisted in the face of a concerted campaign of academic mau-mauing have given the future greatness and possibility. Today there are many populations which were outside of the HGDP’s purview that remain under-analyzed in broad pooled surveys because the data are closed. Groups like the English and Germans, which surely researchers in the 1990s assumed would have been sampled thoroughly. The priority was placed upon getting data on obscure ethnic groups which might nevertheless maintain distillations of human genetic diversity in a relatively purer form (due to less admixture). In 2014 though the situation is such that central repositories which make data available are the exception rather than the norm. We are still extracting so many dividends from Cavalli-Sforza’s foresight and persistence, and that is what being an academic and doing scholarship is always about. Making a difference, but in a way which sheds light, rather than obscures for the sake of the orthodoxies of the age. Such greatness is difficult to comprehend, but let’s take a moment to reflect.

Addendum: If you are interested, The Human Genome Diversity Project: An Ethnography of Scientific Practice, was a fair-minded treatment from what I recall, but I read this book nearly 10 years ago….

In Praise of the Human Genetic Diversity Project

Related Posts:

Related