Substack cometh, and lo it is good. (Pricing)

Admixture analysis isn’t wrong, it misleads

Screenshot 2016-09-18 20.57.52

The above results are from Ancestry. You can see here 4% Melanesian. This is common in South Asians. And it’s not an error in the method. Rather, it is a natural outcome of the methods uses to generate admixture profiles.

Basically what’s going on is this:

1) You have data. In this case, the data are your own genotypes, as well as that of a set of individuals which represent world genetic variation, and are categorized into discrete populations.

2) You have a model or set of models. These models have different parameters.

3) You look at the data you have, and pick the parameters which best explain the data given the model.

If you have 100,000 or more markers that’s more than enough genotype data for individuals. The models themselves are quite stylized (e.g., HWE random mating sets of populations), but close enough to reality to give good results in many cases. For example, Ashkenazi Jews are often assigned to be ~100% Ashkenazi Jewish through these methods.

Then again, Ashkenazi Jews are a good test case. This is a population which went through a bottleneck about 500 to 1,000 years ago, and has been reasonably endogamous most of this time. Additionally, it’s not extremely structured due to inbreeding in different clan lineages. Though cousin marriage and uncle-niece marriage has been practiced by Ashkenazi Jews, the runs of homozygosity you see in Jewish genomes is not such that indicates a highly inbred population, as is common in the Middle East or South Asia. Rather, there are lots of medium length segments identical by descent across individuals.

Ashkenazi Jewish population is rather simple, and it is actually a rather clear and distinct population cluster. It stands to reason that when you create an Ashkenazi Jewish reference panel in your training data set it’s a pretty good match to the individuals you are testing.

The problems occur when you are to generate clusters and ancestry assignments for populations which are not so clear and distinct. Why do South Asians routinely come out as part Melanesian or Polynesian? This post was prompted by a Facebook thread where a South Asian customer of Ancestry was interested to see she had Polynesian ancestry. The reality is she almost certainly does not have Polynesian ancestry.

What’s going on is that the reference panel for South Asians used by many of the DTC genomics companies is not diverse enough to capture South Asian genetic diversity. There is an element of South Asian ancestry, “Ancestral South Indian” or ASI, which has deep shared ancestry with populations across Southern Eurasia and out toward Oceania. The admixture analysis method is searching through the reference panels for combinations of genotypes which can explain individual genetic variation. Since the South Asian training set is insufficient to explain all the South Asian variation the algorithms are filling in the balance of the variation with the closest available proxies to the “ghost clusters.”

The method is constrained and conditioned on two things:

1) The data being put in, which is often insufficient.

2) The set of populations that it is forced to work with to generate the combinations in individuals (the parameter values in the model to explain the data) are often insufficient or artificial.

What I mean by the last is that many of the genetic clusters are not taxonomically equivalent. “South Asian” ancestry is much more diverse and diffuse than “Melanesian” ancestry. This why Melanesian ancestry can explain South Asian ancestry, but generally not the reverse.

Posted in Uncategorized

Comments are closed.