Substack cometh, and lo it is good. (Pricing)

The islands of genetic uniqueness in the swell

I recall years ago reading Spencer Wells discuss how important it was to sample “indigenous people”* before they were swallowed up by the cresting panmixia. Of course panmixia has to be conditioned on the fact that the vast majority of Han Chinese are stilling reproducing with other Han Chinese, and so forth. But it seems plausible to argue that the great agricultural Diasporas are only today swallowing up the residual of marginalized groups outside of the farming frontier. These populations which expanded from agricultural hearths over the Holocene may only be a shadow of the genetic variation which was once extant after the last Ice Age, as the thinly populated landscape was fractionated into endogamous networks as a matter of necessity rather than preference.

First, let’s recall that over the long term “effective population size” is defined by the harmonic mean. Concretely, a population of 1 billion can be far more genetically homogeneous than a population of 1,000, if, those 1 billion only recently expanded from far smaller populations. Imagine a toy example of two populations, A & B. They both begin in generation 1 with a population size of 1,000. In generation 3 both experience a population drop, A to 750, and B to 85. Now, assume that A bounces back to 1000 and maintains that population for the next 20+ generations. In contrast, B begins to double in population size each generation. Here’s a log-transformed chart illustrating the different population sizes:


In generation 25 population B is at a census size of 350 million. What’s the long term effective population of these two groups?

– Population A = 987

– Population B = 979

As you see a population bottleneck can have long term consequences in terms of effective population size. Think about it in terms of evolutionary genetics. Any given population begin with a certain amount of standing genetic variation, but if they crash in size then a lot of that variation is lost through the sampling process of random genetic drift. A small population can be a good representation of the variation of a larger population, but as a practical matter usually there is some information lost in the sampling, with more information lost the smaller the N. If the population rebounds then migration and mutation can eventually replenish the lost variation, but that can take a great deal of time. Even after ~10,000 years as a minimum the populations of the New World seem to exhibit evidence of a population bottleneck.

Coming back to the real world, these are the sorts of dynamics which make me interested in events such as the Bantu expansion. If the model outlined in First Farmers is correct then the past 10,000 years have witnessed a massive reordering and diminution of genetic variation around the world, as small core populations of agriculturalists radiated and replaced hunter-gathers. This makes Spencer Wells’ argument more persuasive, insofar as the remaining twigs of non-agriculturalists may be reservoirs for the shadow of variation past.

As an amateur prehistorian then my interest in populations such as the Mbuti, Bushmen, and Andaman Islanders has been piqued. Yesterday I ran a set of populations in ADMIXTURE with ~170,000 markers. At K = 11, unsupervised, they partitioned relatively cleanly (and the cross-validation error crept back up at K = 12).

As you can see most of the populations are dominated by their own unique element here (the Druze and Mandenka are different shades of green). Here are the pairwise genetic distance values:

SanMbutiHanW AfricanDruzeHadzaN EuropeanMasaiPapuanMayaSandawe
San0.000.170.350.170.300.250.320.220.440.410.16
Mbuti0.170.000.310.120.270.220.290.180.400.370.12
Han0.350.310.000.240.160.340.170.250.230.150.21
W African0.170.120.240.000.200.180.220.120.330.300.08
Druze0.300.270.160.200.000.290.090.190.250.190.16
Hadza0.250.220.340.180.290.000.310.210.440.410.16
N European0.320.290.170.220.090.310.000.210.260.200.18
Masai0.220.180.250.120.190.210.210.000.330.300.12
Papuan0.440.400.230.330.250.440.260.330.000.300.29
Maya0.410.370.150.300.190.410.200.300.300.000.26
Sandawe0.160.120.210.080.160.160.180.120.290.260.00

My main interest right now is the Sandawe. Who are they? What are their relationships? The Hadza seem a genetic isolate of some sort. The Khoisan and Pygmy seem to be part of a broad hunter-gatherer substrate which was overlain by the Bantu expansion. The Sandawe presumably speak a language related to that of the Khoisan, and most of the stuff in the academic literature is linguistic. But they also are genetically distant from the Hadza, and there is some dispute as to their linguistic affinities. I am currently reading The ecological basis for subsistence change among the Sandawe of Tanzania (free on Google Books, so I pulled it to my Kindle). For now, plots and charts using the genetic distance values above. The two dimensional charts are representations of the genetic distances….

* The quotes because the term “indigenous” seems to be politically loaded and fraught. There’s a fair amount of evidence that many indigenous people replaced other indigenous people, relatively recently in time. The exceptions may be amongst groups who were first settlers on islands, like the Maori, relatively late in history.

Posted in Uncategorized

Comments are closed.