Mind the drift lest your inference go off path


The bar plot above shows the Kalash people in yellow as very distinctive group among a panoply of Eurasian populations. The figure is from a Rosenberg lab paper. There’s nothing aberrant about this result, you can generate this plot pretty easily by using any motley set of markers. The Kalash are distinctive. But it is important to keep the distinction in perspective. They’re not a relic population, remnants of an ancient race lost to time and memory. Rather, they happen to be a highly diverged northwest South Asian group. Their divergence is due to a small isolated breeding population which has been highly endogamous.

What this means is that the Kalash have a low long term effective population and have been more strongly impacted by drift in their allele frequency spectra. Small populations are subject to great allele frequency volatility generation to generation, and tend to lose a lot of their genetic diversity, and also fix many alleles. One consequence of this is genetic inbreeding and a higher recessive disease load. These populations with a lot of drift will have less efficacy of selection in removing deleterious alleles, and if a recessive expressing variant is fixed, then that’s that.

But another major consequence of strong drift on a population so that everyone is quasi-related for all practical purposes is that when you attempt some sort of clustering they naturally fall out as a very natural grouping. They’re low hanging fruit. When you plot populations on on a PCA you normally remove closely related individuals, because they will naturally form a tight cluster, and overwhelm the between population variation you’re looking for, hogging up all the highest dimensions making them distinct from non-relatives. Inbred groups like the Kalash do the same thing, if less boldly so. If you can keep this in mind it will allow for proper inferences about the natural history of a population. If you can’t, then you will be confused.

This is preface to a nice paper in PLOS GENETICS, Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference, which reports that an earlier publication, Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool, did not control for the effect of drift due to endogamy and so came to the wrong conclusion.* I won’t repeat the methods they used, as the paper is open access. But, they account for drift much better, and show that the divergence of a presumably genetically distinct caste had much more to do with increased drift due to endogamy than it did with the separation of the two lineages at some time in the distance past. Remember, drift builds up over any two pair of lineages which separate. But if the population size in one of the daughter lineages is very low, then drift will shift it away from the ancestral frequency spectra much faster, producing an artificially “long branch.”

The Kalash and the Ari are extreme cases of this. But they illustrate the general principle that we should be cautious about making inferences when we don’t control for the vicissitudes of demographic history, which may skew the power of our methods to see in a fair and balanced manner.

* There’s an overlap of authors across the two publications, showing that scientists do and can overturn their own conclusions if new data or analysis can persuade them.

Posted in Uncategorized

Comments are closed.