Substack cometh, and lo it is good. (Pricing)

When sociology meets statistical genetics

In Dr. Daniel MacArthur’s post on Roots into the Future Blaine Bettinger left an interesting comment:

It will be interesting to see how 23andMe deals with the pool of people that respond to the 10,000 free kits. Doesn’t seem like they can pre-screen applicants, since African American heritage is sometimes more sociological than genetic (based on previous genetic studies, anyway). In other words, who’s to say who is an African American and who isn’t?

And how will they deal with the unscrupulous people who apply with the full knowledge that they have no recent African ancestry? Certainly they won’t be screen those people out, even with surveys or other methods.

My concerns probably won’t apply to the genetic association studies, since they can look for test-takers that have, for example, a certain % of African American ancestry, or can look for African American ancestry in the region of the genome where the association is believed to reside (after it’s predicted to exist).

However, my concerns will certainly apply to any conclusions they might make about African American genetic ancestry. For example, a conclusion such as “XX% of African Americans have less than XX% of African American DNA,” or “XX% of African Americans have European Y-DNA signatures.” These calculations will unfortunately be biased by the “unscrupulous”, even if they ask for surveys or other methods to deter bias. The best they might be able to do is “XX% of African Americans with 5% or more of African American DNA have European Y-DNA,” and conclusions that take the “unscrupulous” bias into account.


Naive nerd that I am I hadn’t even considered the possibility of fraud! In any case, after running the African Ancestry Project I have to be honest and admit that it’s weird but I have started to “profile” genotypes automatically. I guess that’s a fraught term to use with black Americans, but the honest truth is that I don’t pay much attention to the ancestry that people report to me in the emails. I just assign them IDs, do the format conversions, and run the algorithms. I then push the results online and let people interpret it how they want to interpret it. But with all that said the genetic profile of African Americans is pretty straightforward. My sample of ~130 individuals has around 100 African Americans, and they’re distinctive in having a mix of European and African ancestry. The individuals who are from Africa stick out like sore thumbs, and I immediately know that IDX has to be African. After 200+ years African Americans almost always have some European ancestry, even if it’s at a low fractional quantum.

One of the aspects of Blaine’s comments is the idea that those who attempt to sneak into this project might distort the distribution of ancestral components reported within the African American population. I don’t think this is an issue. This is one group which has been studied some, and the consensus is rather clear that it’s about ~20% white. Let’s look at some of the papers which report results that give us a sense of what’s going on.

First, Admixture Mapping of 15,280 African Americans Identifies Obesity Susceptibility Loci on Chromosomes 5 and X. The title says it all. ~15,000 African Americans in their total pool. Here’s the table with the statistics by population set (the ranges are standard deviations):

Let’s go graphical. Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans has a PCA which shows the two largest dimensions of variation in their combined data set, which includes East Asians, Europeans, and Yoruba from Nigeria, in addition to African Americans. After removing 11 individuals (outliers and related individuals), they found that of the remaining 89 the ancestral percentages were 21 percent European and 79 percent African. The range in European ancestry in these individuals was 1-62 percent with a standard deviation of 14 percent.

Finally, let’s look at a bar plot from the Genome-wide patterns of population structure and admixture in West Africans and African Americans. Their sample of African Americans was 365. The median proportion of European ancestry was 18.5%, with the 25th–75th percentiles being 11.6–27.7%. To the left you can observe the range in African Americans in terms of admixture. A very few people are overwhelmingly European, but most individuals are much closer to 20% European.

My point in reviewing all this is straightforward: even without screening Roots into the Future will be able to ascertain the likelihood of fraud and deception. The distribution of ancestry among African Americans as a whole is pretty well characterized. There’s some inter-regional variation, but if the project observed a secondary mode with ~100% non-African ancestry I think they can assume that these should be discarded from the project.

Posted in Uncategorized

Comments are closed.