Substack cometh, and lo it is good. (Pricing)

The HGDP made less racist!

Back in the 1990s there was a lot of controversy around the Human Genome Diversity Project. In fact there were whole books devoted to the sociology of the project. Though on some of the details critics of the project may have had a point, their overall aim of stalling scientific inquiry in this area failed in totality. A few years ago a team out of the University of Chicago even produced a web browser so you can explore the data yourself. To my knowledge this hasn’t resulted in massive genocidal action against indigenous peoples; the human race doesn’t seem to need any scientific backing for that, alas.

But, if I was a Lefty the-man-is-racist type I think I might assert that the chips which were used to generate the 600,000 markers for the HGDP public data set are racist! I’m not one of those types, so what I really am concerned about is ascertainment bias. From what I have heard many of the SNP chips floating around today are looking for variants found in Europeans most often. That’s because so many study populations in medical genetics are of European descent. This is not a total deal breaker, a lot of European variation is useful in understanding world wide patterns of variation. But ultimately it’s not optimal.


Today we take a major step in changing this. Nick Patterson sent me a nice heads up on a project out of David Reich’s lab. Using the full genomes of disparate human populations, as well as other primates, and archaic humans, the group has collaborated with Affymetrix to produce a panel which is much more finely tuned toward the concerns of those interested in the demographic and adaptive history of human populations.

You can find the files here, at ftp://ftp.cephb.fr/hgdp_supp10/. In particular see the technical document. When I get some time I’ll be playing with this, rest assured.

Finally, Nick adds an important caution:

We hope that this array, and the HGDP data we have produced will be a major resource for population genetic studies. The data are undoubtedly complicated, (13 different ascertainment schemes (!)) and users should read the technical documentation, and especially the short readme file. In particular note that the ancient DNA alleles are not high quality (especially the Neandertal) and there are numerous potential traps in analysis

Posted in Uncategorized

Comments are closed.