Saturday, June 03, 2006

Race: a useful concept? -- revisited   posted by JP @ 6/03/2006 10:30:00 AM
Share/Bookmark

The use of information in race and ethnicity in biomedical studies is of course controversial, but this controversy is often minimized with an appeal to the utility of the information. I recently made this case here. But how useful of a concept is race, really?

I assume most geneticists would agree that, given enough genetic markers from a person and the worldwide distribution of those makers, one could give a good guess as to that person's ancestry. And from that, one could predict the label that person would apply to themselves in any given culture. A paper from Neil Risch's group showed essentially the United States version of this, finding that clusters of individuals created blindly using genetic information corresponded extraordinarily well to how people self-identified by race.

However, the utility of the concept of race does not come from how well it can be predicted using genetic information, but the inverse. That is, the utility of the concept of race comes from how well genetic information can be predicted from it. And that is much more complicated a question to address.

A quick bit of context, so people know how this problem is set up in my head: imagine you're a researcher and you've got this idea that a couple mutations play a role in a disease you study. So you want to do a case-control study-- genotype the mutations in some people with the disease, some people without, and see if there's a difference between the two groups. Of course, if you're going to type a large number of people, you know you need to control for possible confounding due to population structure (don't know what I'm talking about? Read part I of my old post). There are a number of ways to due this, but they all involve genotyping a bunch of extra markers, and you don't want to pony up the cash for that-- you want to type your two candidate mutations and be done with it. You've heard that controlling for race reduces error due to population stratification, so you limit your study to African-Americans. You do it, analyze the data, and sure enough there's a significant association. What's the interpretation?

Clearly, saying that this association is real depends on race acting as a proxy for the rest of the genome you don't genotype. There's strong evidence that this is not a generally good assumption:

A couple studies have essentially repeated the Tang et al. study I linked to before with fewer markers, and their results are conflicting. One found that, with only 15 markers, the genetic clusters corresponded well to self-indentified racial clusters. But another, using the same marker set found significant population structure within each cluster. That is, genetic information--at least at some markers--is not consistently predicted by self-identified race. The interesting thing about comparing these two studies is that they take place in different cities (San Francisco and Detroit), so the conflicting results could be due to differences in the populations involved. This would seem to square with other reports from the Risch group that the level of admixture in African-American populations varies geographically. This is not particularly surprising-- deCODE Genetics made news a couple years back when they documented the level of population structure present in Iceland, considered a fairly genetically homogenous place.

The authors of one of the papers above write that "when the true underlying genetic structure and the self-defined racial/ethnic groups were roughly in agreement with each other, the self-defined race/ethnicity information was useful in the control of population structure". Well, yes. But that's the whole point-- it's impossible to know if the two are in agreement until you test it. So it seems there's no getting around paying the extra money to type more markers to test for population structure in a study-- controlling for race will get rid of the obvious structure, but not less-obvious, but still important, differences within the population being studied.