Lumpy genetic variation

Over at Violent Metaphors Jennifer Raff has another review of A Troublesome Inheritance, this time focusing Nicholas Wades’ interpretation of population genomics. I don’t want to be cliche, but if there’s one thing to like about Wade’s book it’s that lots of people are talking about human population genetics. Then again, if there’s one thing to not like about Wade’s book it’s that lots of people are talking about things that they don’t grasp in much detail, and confusing the issues. Jennifer obviously knows quite a bit about the population genomics, but I’ll be frank in suspecting that some of her fans who are praising her to the heavens find her conclusions congenial, and can’t really follow the technical details she’s fleshing out. They trust her, and that’s an acceptable position. More concretely she implies that model-based clustering (e.g., Structure, Admixture, and Frappe) will naturally produce a set of individuals composed as a combination of K ancestral populations. There’s nothing privileged about a particular K. But there are ways to more formally establish which K is the best “fit” to the data. Rather than talking I’ve set Admixture to run some HapMap data and will check the cross-validation results to get a sense of which K values are most reasonable.

I could say much more, but I’m getting bored with this interminable debate. I’ll focus on one aspect of Jennifer’s exposition: that human genetic variation is clinal. This is a defensible position, and may even be a majority position among population geneticists. But I no longer believe one can take this at face value as the null model that can be used to dismiss ideas such as discrete racial categories. The clinal expectation is predicated on an isolation by distance dynamic. As a rough stylized model you can see the schematic above, which shows three lineages separated by geography and exchanging genes continuously over time. This certainly applies across broad swaths of the world, but there are in fact sharp discontinuities in regions just where traditional racial boundaries are often asserted. For example genetic variation across distance increases sharply at the Himalaya mountain range. Obviously there is admixture across East and South Asia (10-15% of my own genome is East Asian, and I’m South Asian), but the people of South Asia on the whole exhibit greater affinities to Europeans than they do to East Asians. This does not mean isolation by distance is useless; in fact it shows the importance of geography and how it can force isolation by distance to modulate.

clusterBut I have a bigger qualm with the clinal model: it leads people to assume that the extant human populations descend from a diversification of lineages out of Africa,which have been somewhat isolated since the initial settlements. The genetic distance is then simply a function of time since divergence, as well as the magnitude of gene flow (which is inversely proportional to geographic distance). But this model is probably wrong. Going back to South Asians, putting them on a genetic-geographic map and attempting to adduce deep demographic history is total folly, because evidence is building that they are a compound synthetic population, whose origins in time are relatively recent. The position of South Asians on a PCA plot conveniently between West and East Asians is not due to an ancient divergence stabilized by equidistant geographic position. It is due to the fact that South Asians are admixed between (a) West Eurasian population(s), and an ancient indigenous group which has somewhat closer affinities to East Asians. ~10,000 years ago the correspondence between genes and geography would have been very different.

A great deal of Jennifer’s post about Wade’s book is rooted in interpretations of Noah Rosenberg’s work. I’m sure Rosenberg appreciates being dragged into this argument, but I couldn’t help but notice one section of his 2005 paper, Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure:

For population pairs from the same cluster, as geographic distance increases, genetic distance increases in a linear manner, consistent with a clinal population structure. However, for pairs from different clusters, genetic distance is generally larger than that between intracluster pairs that have the same geographic distance. For example, genetic distances for population pairs with one population in Eurasia and the other in East Asia are greater than those for pairs at equivalent geographic distance within Eurasia or within East Asia. Loosely speaking, it is these small discontinuous jumps in genetic distance—across oceans, the Himalayas, and the Sahara—that provide the basis for the ability of STRUCTURE to identify clusters that correspond to geographic regions.

Two exceptions to the pattern include the Hazara and Uygur populations, from Pakistan and western China, respectively, whose genetic distances scale continuously with geographic distance both for populations in Eurasia and for those in East Asia. These populations were evenly split across the clusters corresponding to Eurasia and East Asia, and thus, unlike most other populations, they do not reflect a discontinuous jump in genetic distance with geographic distance….

As can be inferred from the title he is attempting to resolve the clusters/clines debate, and explaining why geographic clusters emerge from model-based clustering (and PCA as well). But his two examples of ethnic groups which are positioned exactly where their geography would predict are also instances of very recent admixture events between West and East Eurasian populations. They don’t reflect the deep time history of the human race, but are products of recent folk wanderings. In the case of the Uygurs: the likely migration of West Eurasians into the Tarim Basin three to four thousand years ago, and then a subsequent influx of East Eurasians within the last two thousand years. The Hazara a compound of Mongol refugees from medieval Persia and the local Dari speaking substrate of the Afghan highlands.

And the Uygurs and Hazara may actually be far more typical in terms of their genetic ethnogenesis than we previously had thought. It seems likely that Native Americans, South Asians, Southeast Asians, and Europeans are all the products of just this sort of fusion between very distinct branches of humans. Detecting many of these admixture events is not trivial for various technical reasons, and it seems to me that these new facts should update our prior expectations of other groups. East Asians may not seem admixed only because of a lack of reference populations.

An alternative to isolation by distance and the clinal model would be that in the Pleistocene human populations were sparse and highly differentiated because of low gene flow due to low population densities. There were occasional admixture events, as occurred with the ancestors of the Native Americans. But it was during the Holocene that a revolution occurred, as a few populations picked up cultural characteristics which allowed them to blossom demographically. These explosions swamped out much of the previous variation, and also admixed across one another. What we see today is a palimpsest accumulated from these discrete pulse events.

I’d bet reality is probably somewhere between these two models, with variations from region to region. But they need to be kept in mind.

Posted in Uncategorized

Comments are closed.