Substack cometh, and lo it is good. (Pricing)

Living in the age of structure

Screenshot 2016-09-05 04.14.05

Jonathan Novembre and Benjamin Peter have posted a preprint of a review, Recent advances in the study of fine-scale population structure in humans, which readers will find useful. In particular, the citations are a gold-mine for anyone attempting to navigate this literature.

The figure above from their preprint illustrates the number of markers needed to differentiate populations in Europe. Recall that genetic variation within Europe, especially Northern Europe, is rather low. It’s pretty clear that if you sample 100 SNPs from the human genome you can’t differentiate much. At 1,000 SNPs structure begins to appear, and this is starting to be well resolved by 10,000 SNPs. By 100,000 SNPs you are pretty much going to hit diminishing returns for regional diversity on Europe level scales. The pattern differs by method. PCA for example does much better with 10,000 SNPs in Europe than the model-based clustering (e.g., ADMIXTURE) in my experience, but the two are comparable as you near 100,000 SNPs. Beyond 100,000 SNPs there is not that much increase in resolution for genome-wide methods that rely on genotypes at this level of genetic diversity.

51qciM4cBhL._AC_UL320_SR242,320_Of course, if you want really fine-scale differences, between villages for examples, more markers, and perhaps whole-genome sequencing that can pick up rare variants, are useful. In other words, there are cases one can imagine where more data than is normally available on SNP-chips ps useful. But these are definite boundary conditions. Once you get to the point of distinguishing branches of extended families you really can’t collapse the genealogies any further.

Another instance where more marker density, or the power of high coverage whole-genome sequencing, might be useful is for local ancestry deconvolution. If you’re assigning ancestry to windows of the genome then your marker density is going to be a limiting factor, as you might be slicing the 100,000 SNPs into 1,000 subunits.

Finally, there’s the issue of the models being tested. Novembre and Peter allude to the fact that many of these models posit stylized discrete pulse admixtures. As it turns out in some cases ancient DNA seems to have confirmed that something like this went on. That is, long periods of local stability and panmixia, followed by genetic turnover and admixture. But they note that there isn’t a good simulation framework where demographic scenarios are allowed to generate in silico data for testing new models. In other words, biologists are currently having to rely on “natural experiments.”

Posted in Uncategorized

Comments are closed.