The current bias in genealogical databases

As a follow up to my post below on the thick coverage of European information in genealogical and genomic databases, here are the “Ancestry Finder” matches from 23andMe for my daughter using the default settings:

If I increase sensitivity India does come up, at 0.1%, second to last in a very long list of European nations. I’m pointing this peculiarity out because my daughter is 50 percent South Asian, but this element of her ancestry doesn’t find many matches because there aren’t many people out there in the database to match. In contrast, because she is 1/8th Norwegian (her great-great grandparents were immigrants from the Olso area; thanks Ancestry.com!) this “block” jumps out, and aligns up with many people in their database.

This isn’t just an exceptional case. Here’s the result for a friend who is 50 percent East Asian (Chinese) and 50 percent American white:

The old warning rears its ugly head: the tool is just a tool, and must be used with and understanding of what it can and can’t do. If you decrease sensitivity many South Asians actually match people from European nations before they do people from India. Why? Part of it is probably that many South Asian groups are highly endogamous, which dampens intra-South Asian segment sharing. And the other part is that the sample size of Europeans is so large that random matches with this population are just as, or more, likely than genuine matches with the smaller number of South Asians.

The current bias in genealogical databases

Related Posts:

Related