The HGDP in the post-ascertainment era

In the 1990s there was a huge debate around the “Human Genome Diversity Project” (HGDP). By the HGDP I don’t mean what you probably know as the HGDP panel, but a more ambitious attempt to genotype tens of thousands of individuals across the world. In the end activists “won”, and the grand plans came to naught. If you want to read about it, The Human Genome Diversity Project: An Ethnography of Scientific Practice has a scholarly viewpoint, though you can also just ask someone who was involved with the human population genetics community in the 1990s (this not a large set of scholars).

Ultimately the HGDP became the samples from L. L. Cavalli-Sforza’s dataset which you read about in The History and Geography of Human Genes. This is what drives the HGDP Browser. It’s also the data set at the heart of papers like Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. Here is the abstract:

Human genetic diversity is shaped by both demographic and biological factors and has fundamental implications for understanding the genetic basis of diseases. We studied 938 unrelated individuals from 51 populations of the Human Genome Diversity Panel at 650,000 common single-nucleotide polymorphism loci. Individual ancestry and population substructure were detectable with very high resolution. The relationship between haplotype heterozygosity and geography was consistent with the hypothesis of a serial founder effect with a single origin in sub-Saharan Africa. In addition, we observed a pattern of ancestral allele frequency distributions that reflects variation in population dynamics among geographic regions. This data set allows the most comprehensive characterization to date of human genetic variation.

These SNPs though were ascertained on European populations. That is, the genetic variation tended to be genetic variation found in Europe. This is a problem, and one reason that the Human Origins Array was developed. The ascertainment problem was really obvious when researchers were looking at Khoisan genomes, and noticed how much variation they had that wasn’t being captured on SNP-arrays.

Today, we’ve finally moving beyond the era where ascertainment is so much of an issue. At the SMBE meeting earlier this month Anders Bergstrom presented results from the HGDP using whole-genome analysis. When you look at the whole genome, you obviate the problem with selecting a biased subset of the variation. You can look at all the variation, or vary the variation you want to look at.

Bergstrom & company will have a paper on the whole-genome analysis of the HGDP in the near future. I assume it will be somewhat like the 1000 Genomes paper, but I bet you the SNP count will be higher, because they have Khoisan in their samples (along with Mbuti, etc.). Anders shared with me some of the preliminary data that the Sanger Institute has generated.

Below the fold I plotted a PCA of the HGDP data. First, the classic SNP-chip data. Second, SNPs pulled out of the WGS which are very high quality calls (though they may still have wrong calls), but have a minor allele frequency of at least 1% (~1.5 million). You immediately notice the Eurasian compression along PC 1. Finally, using ~15 million SNPs that had no missingness in the data, you see you PC 2 being defined by San Bushmen vs. non-San-Bushmen, while Mbuti Pygmies along with Biaka clearly are the furthest along PC 1 excepting the San. There are 6 San Bushmen in the data. If there are SNPs which are very distinct to this group, and not polymorphic in other populations, then my 1% cut-off would actually remove that variation.

It’s an interesting world we live in, thanks to research groups like the Sanger Institute, Estonian Biocentre, and the 1000 Genomes Project, as well as tools such as PLINK. Analysis that took decades in the 20th century can now be whipped out in a matter of hours. Better analyses in fact.

Read More

Complex evolution of pigmentation in modern humans

Last fall Crawford  et al.Loci associated with skin pigmentation identified in African populations, was published in Science and made a huge splash. As I’ve been saying recently, and most people agree, much of the remaining “low hanging fruit” in human evolutionary genomics, and to some extent, human medical genetics, is going to be in Africa on Africans. From an evolutionary perspective, that’s probably because from a gene-centric viewpoint most of our recent evolutionary history was within Africa. As a friend once told me, “most of the last 200,000 years is about the collapse of ancient population structure.” This goes too far, but at least it gets at something we’ve not been too conscious of.

Top left clockwise: Luo Kenya, Khoisan, South Asian, Arrernte Australia

Crawford  et al. was important because it was a deep dive into a topic which has been understudied, the variation of pigmentation genetics within Africa (also see Martin et al.). The fact that there is variation in pigmentation within Africa should not be surprising, though some people are surprised that there is variation in pigmentation within Sub-Saharan Africa. But anyone who has seen photos of San Bushmen, knows they are very distinct from South Sudanese, who are very distinct from West Africans. As documented by both Crawford  et al. and Martin et al. some of this variation is likely novel.

By this, I mean there has been backflow of the derived Eurasian variant of a mutation on SLC24A5. Arguably the first major human pigmentation locus of the “post-genomic era”, its discovery was enabled by its huge effect in explaining variation among Eurasian populations and their differences from African groups. In Crawford  et al. the author observes within Africans nearly ~30% of the trait variance was due to four loci, with ~13% due to SLC24A5. In earlier work comparing just people of European and African descent, SLC24A5 variance explains closer to 30% of the pigmentation difference. It seems that pigmentation effects genetically exhibit an exponential distribution. A small number of loci have a large effect, and a numerous number of loci have small effects.

Distribution of rs1426654 at SLC24A5

The results from Crawford  et al. and Martin et al., a naive inspection of the modern distribution of the derived rs1426654 allele, and ancient DNA, seem to indicate a mutation associated with lighter skin emerged after 40,000 years ago. After the expansion of non-African humans, and, the divergence between eastern and non-eastern branches of non-Africans. A common haplotype around this mutation suggests that it wasn’t part of the ancestral “standing variation” of the human lineage. Ancient samples from Scandinavia, the Caucasus, and modern samples from Eurasia and from Africa, all exhibit the same pattern, suggesting recent common descent.

And though a mutation on rs1426654 is associated with lighter skin, it does not produce white skin. I have the homozygote derived genotype on rs1426654, as does my whole nearby pedigree. All of us have brown skin, to varying degrees. And interestingly, the locus around rs1426654 seems to be under strong selection in both South Asia and Africa, including East Africa. This makes me somewhat skeptical that there is a simple story to tell on this locus in relation to skin pigmentation being the driver here.

Let me quote from  Crawford  et al.:

Most alleles associated with light and dark pigmentation in our dataset are estimated to have originated prior to the origin of modern humans ~300 ky ago (26). In contrast to the lack of variation at MC1R, which is under purifying selection in Africa (61), our results indicate that both light and dark alleles at MFSD12, DDB1, OCA2, and HERC2 have been segregating in the hominin lineage for hundreds of thousands of years (Fig. 4). Further, the ancestral allele is associated with light pigmentation in approximately half of the predicted causal SNPs…These observations are consistent with the hypothesis that darker pigmentation is a derived trait that originated in the genus Homo within the past ~2 million years after human ancestors lost most of their protective body hair, though these ancestral hominins may have been moderately, rather than darkly, pigmented (63, 64). Moreover, it appears that both light and dark pigmentation has continued to evolve over hominid history….

For over ten years it has been clear that very light skin in eastern and western Eurasia are due to different mutational events. Crawford  et al. give us results that indicate this pattern of evolutionary complexity is primal and ancient.

But there is often a tacit understanding that the selection process is the same over time and space. Something to do with protection from UV light and also synthesization of vitamin D at higher latitudes. So this paper that just came out definitely piqued my interest, Darwinian Positive Selection on the Pleiotropic Effects of KITLG Explain Skin Pigmentation and Winter Temperature Adaptation in Eurasians. The authors looked at a lot of variants in KITLG with a focus on East Asians. They confirmed that there were at least two selection events, one just around the “Out of Africa” period, and possibly another one later, during a period when West and East Eurasians were genetically distinct.

This section is very intriguing: “Besides pigmentation, KITLG is also involved in mitochondrial function and energy expenditure in brown adipose tissue under cold condition (Nishio et al. 2012; Huang et al. 2014). We demonstrated that winter temperature showed a much stronger correlation than UV for rs4073022.” Earlier the authors review work which suggests that large melanocytes are much more susceptible to damage due to cold than than smaller ones. Dark-skinned individuals tend to have large melanocytes (and more of them!). The KITLG locus does a lot of things; some of you may know its relationship to testicular cancer.

What  Crawford  et al. tells us that there seems to have been recurrent and sometimes balancing selection around loci implicated in pigmentation for hundreds of thousands of years. What ancient DNA is telling us is that the genetic architectures we take for granted as typical across much of Eurasia are relatively novel. But, I think people are perhaps taking the implications of modern genetic architecture too far in predicting the variation of characteristics in the past. Even the best genomic predictors seem to account for only around half the variance in pigmentation. “Ancestry” accounts for the rest, which basically means there are many other loci which are not accounted for. It is not unreasonable to suppose that ancient northern Eurasian populations may have been light-skinned due to genetic variants which we are not aware of.

Of course, there are people at high latitudes who retain darker complexions. From what we know the Aboriginal people of Tasmania were isolated for about 10,000 years at the same latitude as Beijing and Barcelona, and yet their skin color remained dark brown. In contrast, Martin et al. report that Khoisan people who lived 10 degrees further north, in a much sunnier climate, were selected at loci that strongly correlate with lighter skin.

I think it is safe to say that in the near future we will close in on much of the reamining genetic factor accounting for variation in pigmentation in modern populations. It is polygenic, but almost certainly far less polygenic and more tractable than height or intelligence. But the story of why humans have varied so much over time, and why loci implicated in pigmentation are so often targets of selection in some many contexts, remains to be told.

India vs. China, genetically diverse vs. homogeneous

About 36% of the world’s population are citizens of the Peoples’ Republic of China and the Republic of India. Including the other nations of South Asia (Pakistan, Bangladesh, etc.), 43% of the population lives in China and/or South Asia.

But, as David Reich mentions in Who We Are and How We Got Here China is dominated by one ethnicity, the Han, while India is a constellation of ethnicities. And this is reflected in the genetics. The relatively diversity of India stands in contrast to the homogeneity of China.

At the current time, the best research on population genetic variation within China is probably the preprint A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese. The author used low-coverage sequencing of over 10,000 women to get a huge sample size of variation all across China. The PCA analysis recapitulated earlier work. Genetic relatedness among the Han of China is geographically structured. The largest component of variance is north-south, but a smaller component is also east-west. The north-south element explains more than 4.5 times the variance as the east-west.

Read More

What Neanderthals tells us about modern humans

In Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past David Reich spends a fair amount of time on Neanderthal admixture into modern human lineages. Reich details exactly the process of how his team arrived to analyze the data that Svante Paabo’s group had produced, and how they replicated some peculiar patterns. In short, eventually, they concluded that modern humans outside of Africa have Neanderthal ancestry, because the Neanderthal genome that Paabo’s group had recovered happened to be subtly, but distinctively, closer to all non-Africans than to Africans. At the time, the group reported that Neanderthal ancestry was relatively evenly spread across non-African populations, which lead them to suggest that it was likely a singular admixture event early on during the expansion phase of modern humans.

Nearly a decade things have changed. There is a consistent pattern of West Eurasians having less Neanderthal ancestry than East Eurasians. That is, Europeans have lower Neanderthal ancestry fractions than Chinese (South Asians are in between, in direct proportion to their West Eurasian ancestral quantum). There have been a variety of arguments and explanations for why this might be, which fall into two classes:

  1. Neanderthal ancestry was purged more efficiently from West Eurasians due to larger effective population sizes (selection is stronger in large populations).
  2. There may have been multiple admixture events into modern humans, or, gene-flow into West Eurasians diluting their Neanderthal ancestry.

But what if all these arguments are mostly wrong? That’s what a new preprint seems to suggest: The limits of long-term selection against Neandertal introgression:

Several studies have suggested that introgressed Neandertal DNA was subjected to negative selection in modern humans due to deleterious alleles that had accumulated in the Neandertals after they split from the modern human lineage. A striking observation in support of this is an apparent monotonic decline in Neandertal ancestry observed in modern humans in Europe over the past 45 thousand years. Here we show that this apparent decline is an artifact caused by gene flow between West Eurasians and Africans, which is not taken into account by statistics previously used to estimate Neandertal ancestry. When applying a more robust statistic that takes advantage of two high-coverage Neandertal genomes, we find no evidence for a change in Neandertal ancestry in Western Europe over the past 45 thousand years. We use whole-genome simulations of selection and introgression to investigate a wide range of model parameters, and find that negative selection is not expected to cause a significant long- term decline in genome-wide Neandertal ancestry. Nevertheless, these models recapitulate previously observed signals of selection against Neandertal alleles, in particular a depletion of Neandertal ancestry in conserved genomic regions that are likely to be of functional importance. Thus, we find that negative selection against Neandertal ancestry has not played as strong a role in recent human evolution as had previously been assumed.

The basic argument in the preprint is that the model assumed for the ancestry of West Eurasians and Africans was wrong. Wrong assumptions can lead to wrong inferences. Using two Neanderthal genomes which are from different populations, one of whom directly contributed to the Neanderthal ancestry in modern humans, a new statistic which was insensitive to model assumptions about modern human phylogeny was computed.

The older statistic held that West Eurasians and Africans were distinct clades which had not had gene flow in ~50,000 years. Using simulations the authors argue that the best fit to the statistics that they do see, the earlier flawed one, and the current more robust one, is a situation where a population of West Eurasian origin mixed with Africans starting about ~20,000 years ago.

This explains why there was a consistent decline in Neanderthal ancestry: the earlier statistic’s model assumption got worse and worse over time, and so began to underestimate Neanderthal ancestry more and more. There was continuous gene flow into Africa over the past 20,000 years.

Not everything that came before is wrong. It could still be that there are multiple admixtures. And, the authors do agree that some selection for Neanderthal alleles has occurred. It’s just that it’s not the primary reason for the decline of Neanderthal ancestry in West Eurasians.

As for the other explanation, that Neanderthal-less Basal Eurasian ancestry diluted the European hunter-gatherer fractions, the authors seem very skeptical of that. One point the authors make is that though an early European farmer was estimated to have ~40% Basal Eurasian, its Neanderthal estimate is still quite high. Iosif Lazaridis points out that this is an old estimate, and the Reich group now puts it closer to ~25%. Additionally, another recent preprint put the fraction closer to ~10%. With such low values, it is possible that Basal Eurasians may have had low Neanderthal fractions, but that that was a marginal effect on the aggregate West Eurasian ancestry quantum from Neanderthals.

I think the bigger thing to consider is that our understanding of the relationships of modern humans is roughly right, but there are lots of nuanced details we’re missing or misunderstanding. Ancient DNA from South Africa, for example, shows that modern Bushmen all seem to have exotic ancestry compared to samples from 2,000 years ago. But what about samples from 20,000 years ago?

We have the best temporal transect from Ice Age Europe, and in this region, there are many population turnovers and admixtures. It seems implausible that Europe is entirely exceptional. The West Eurasian gene flow event dated to ~20,000 years ago is curiously coincidental with the beginning of the recession of the Last Glacial Maximum. To get a better understanding of the relationships of Pleistocene people looking at paleoclimate data is probably useful. The ancient DNA will come online at some point…and unless you think ahead, we’re going to be surprised.

Human genomics will uncover a lot of treasure in Southeast Asia

On this week’s podcast on “Isolated Populations” I mentioned offhand to Spencer that I believe it is a bit ridiculous to bracket a host of Southeast Asian populations as “Negritos,” as if they were an amorphous and homogeneous substratum over which the diversity of modern South and Southeast Asian agriculturalists were overlain.There was almost certainly a great deal of population structure which accrued over the Pleistocene. Another issue, which I didn’t mention, is that Southeast Asia is also very geographically expansive. Modern Indonesia alone spans the length of North America.

Of course, you could say the same for Europe, from the Urals to the Atlantic. And yet we know that European hunter-gatherers were relatively homogeneous (albeit, with some structure!) at the beginning of the Holocene. I think the difference though is that Europe was a landscape into which hunter-gatherers expanded during the Last Glacial Maximum, while Southeast Asia, like Africa, has long been a refuge for human populations even during the coldest and driest periods of the Pleistocene.

There are three major classes of “Negrito” peoples in South and Southeast Asia.  To the west, are the indigenous peoples of the Andaman Islands. These tribes probably arrived from what is today Myanmar during the Pleistocene, when sea levels were lower. In peninsular Malaysia you have groups such as the Semang. Though physically very different from their neighbors, these people speak the Aslian form of Austro-Asiatic languages. They are not linguistic isolates like the Andaman tribes.

This speaks to the reality that unlike the Andaman Islanders the Negritos of mainland Southeast Asia have long been interacting with local populations. The languages they speak reflect interactions with Austro-Asiatic rice farmers. Curiously though, the dominant people amongst whom they live no longer speak Austro-Asiatic languages. Rather, they speak Austronesian or Tai dialects. These two groups are later arrivals on the Southeast Asian scene, and both seem to have assimilated Austro-Asiatic groups culturally and genetically, except in Cambodia and Vietnam (and to a lesser extent in pockets of Thailand and Myanmar).

If you are curious about the relationship between the various modern Southeast Asian groups, then two ancient DNA papers, Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia and Ancient genomes document multiple waves of migration in Southeast Asian prehistory, should do the trick. Some of the migrations are historically or semi-historically attested. In particular, the intrusion of the Tai, the long occupation of what became Vietnam by the Chinese, and the settlement of Han officials amongst the local people, and the migrations of the ancestors of the Hmong into Laos.

Others processes are vaguer and poorly understood. It has long been clear that the Austronesian probably assimilated Austro-Asiatic rice farmers in much of maritime Southeast Asia. And yet unlike mainland Southeast Asia to my knowledge, there are no Austro-Asiatic populations in Indonesia. Additionally, it has been brought to my attention that the ~ 3,000-year-old sample from Myanmar has no clear Austro-Asiatic signature, despite the common sense suggestion that Austro-Asiatic languages must have entered India via that region (it has affinities to modern Tibeto-Burman individuals). And, importantly the Austro-Asiatic populations themselves seem to have been deeply mixed between a dominant element strongly related to the Han Chinese, and a minority component which was basal Southeast Asian, for lack of a better term. This means that the Munda populations within India have several distinct components of ancient South and Southeast Asian substratum.

Aeta family

But speaking of this substratum, probably the best paper recently focusing on these groups is from last year, Discerning the Origins of the Negritos, First Sundaland People: Deep Divergence and Archaic Admixture. In many ways, it just reinforced the results of Reich et al. 2011. All the Negrito groups are only distantly related to each other. The Negritos of the Andaman Islanders and those of peninsular Malaysia seem to be somewhat closer to each other than either is to those of the Philippines. And, the groups in the Phillippines seem to be somewhat closer to the peoples of Melanesia. To some extent, this is just geographically expected, but there are also interesting details.

The Negritos of the Philippines, in particular, those from the northern island of Luzon, have some of the highest fractions of Denisovan ancestry of any human populations outside of Melanesia. No one is clear whether the admixture is from the same event as the one that leads to the high fractions in Melanesians, or whether there were separate mixing events (not implausible). The western Negrito groups have far lower fractions of Denisovan.

Another surprising result is that the Negritos of the southern Philippines seem very distinct from those of the northern Philippines. This may be an artifact of particular admixture history, but I wouldn’t be surprised if these islands preserved a lot of diversity which has been homogenized elsewhere.

Like many people, I believe that human evolutionary genomics will have a lot to say about Africa in the next 10 years. But, outside of Africa Southeast Asia may be one of the most fertile regions in terms of exposing deep history. This was an area that was always amenable to habitation by modern-like Africans. It seems very likely now that the predominant modern human ancestry found in the Negrito substratum, and shared with all other non-Africans, is actually not the signal of the oldest modern humans to be present in Southeast Asia. Second, there seem to be many archaic human species which made their homes in Southeast Asia.

Humans arrived in Southeast Asia a long time ago. Our speciosity and census sizes were high. With more ancient DNA and better deep whole genome sequence analysis, we’ll uncover some surprising things. I guarantee.

Height differences across Europe could be less affected by selection than we had thought

Like an Old Testament prophet of yore Graham Coop has been prophesying that cryptic population stratification may be a major confounder in analyses for as long as I’ve known him with any degree of familiarity. So it’s no surprise he’s an author on one of two preprints which have rocked the genomics world:

Reduced signal for polygenic adaptation of height in UK Biobank:

There is considerable variation in average height across European populations, with individuals in the northwest being taller, on average, than those in the southeast. During the past six years, a series of papers reported that polygenic scores for height also show a north to south gradient, and that this cline results from natural selection. These polygenic analyses relied on external estimates of SNP effects on height, taken from the GIANT consortium and from smaller replication studies. Here, we describe a new analysis based on SNP effect estimates from a large independent data set, the UK Biobank (UKB). We find that the signals of selection using UKB effect-size estimates for height are strongly attenuated, though not entirely absent. Because multiple prior lines of evidence provided independent support for directional selection on height, there is no single simple explanation for all the discrepancies. Nonetheless, our current view is that previous analyses were likely confounded by population stratification and so the conclusion of strong polygenic adaptation in Europe now lacks clear support. Moreover, these discrepancies highlight (1) that current methods for correcting for population structure in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of polygenic differences between populations should be treated with caution until these issues are better understood.

And…Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies:

Genetic predictions of height differ significantly among human populations and these differences are too large to be explained by random genetic drift. This observation has been interpreted as evidence of polygenic adaptation, natural selection acting on many positions in the genome simultaneously. Selected differences across populations were detected using single nucleotide polymorphisms (SNPs) that were genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of sub-significant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection for diverse traits, the introduction of methods to do this, and claims of polygenic adaptation for multiple traits. All of the claims of polygenic adaptation for height to date have been based on SNP ascertainment or effect size measurement in the GIANT Consortium meta-analysis of studies in people of European ancestry. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. While we replicate most previous findings when restricting to genome-wide significant SNPs, when we extend the analyses to large fractions of SNPs in the genome, the differences across groups attenuate and some change ordering. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure, a more severe problem in GIANT and possibly other meta-analyses than in the more homogeneous UK Biobank. Therefore, claims of polygenic adaptation for height and other traits, particularly those that rely on SNPs below genome-wide significance, should be viewed with caution.

I haven’t read both preprints through and through, but my first thought (along with others), is the same as Casey Brown:

Note that no one has responded to his question.

Finally, recall that population structure within Europe is relatively weak and the distances between the groups low. It reminds you of how difficult polygenic traits are to analyze due to the small and subtle effects, and how they might be overwhelemed even by subtle population structure. And recall, even the British population has some of that… (albeit, an order of magnitude or so less than what you can find across Europe).

The lost 50,000 years of non-African humanity

The figure above is from Efficiently inferring the demographic history of many populations with allele count data. This preprint came out a few months ago, but I was prompted to revisit it after reading Spectrum of Neandertal introgression across modern-day humans indicates multiple episodes of human-Neandertal interbreeding.

The latter paper indicates that there were multiple waves to Neanderthal admixture into both Europeans and East Asians. The motivation to do the analysis is that East Asians are about ~12 percent more Neanderthal than Europeans. The authors don’t reject the idea that there was ‘dilution’ of Neanderthal through selection and especially admixture with a “Basal Eurasian” group which didn’t have Neanderthal ancestry. I don’t want to get into the details of the results except for one thing: the preprint confirms a consistent finding over the past eight years that the Neanderthal contribution to the modern human genome is from a single population.

Perhaps it was a small population. Or perhaps it was a large population that had gone through a bottleneck and was genetically not very differentiated. But unlike Denisovans it seems that it was a particular Neanderthal lineage that interacted with modern humans.

Moving back to the “Basal Eurasians,” notice some details of the schematic above. The divergence of Basal Eurasians from other non-Africans was ~80,000 years ago, across an interval of 70 to 100 thousand years ago. The admixture of Basal Eurasians into the proto-LBK population occurred ~30,000 years ago, across an interval of 11 to 41 thousand years ago. Ancient DNA from North Africa indicates that Basal Eurasians were already well admixed well before 11 thousand years ago.

The other dates make sense. 50,000 years for Europeans-Han Chinese, 96,000 years for Mbuti-Eurasians, and 696,000 years for Neanderthal-modern humans.

Ancient modern humans were highly structured. We know this from within Africa. But it seems clear that modern humans who had crossed over the other side of the Sahara also exhibited the same tendency. Basal Eurasians did not mix with Neanderthal populations. I suspect that that might be due to the fact that they were in Northeast Africa. At some point in the Pleistocene a mixing event occurred. This may have been precipitated by drier conditions and human retreat into only a few habitable areas, and the original Basal Eurasian populations may have mixed into other Near Eastern groups, which were part of the broader Neanderthal-mixed populations.

The great bottleneck after the post-Eemian separation

I’ve been thinking about effective population size. Basically it’s the inferred breeding population you estimate in the present, or in many cases the past, based on the genetic variation you see within the population. Another way to say it is that it’s the population size that can explain the genetic drift that you see in the data.

To give a concrete example, the population of the New England states of America was ~1,000,000 during the 1790 Census. The vast majority of this was due to natural increase from a settler population of about ~50,000 in 1650 (total fertility rate of women in New England was seven children in the years between 1650 and 1700). Of these, ~23,000 were Puritans or the offspring of Puritans who migrated around between 1630 and 1643 (due to religious differences with the English government of the period). One might think that a population of ~1,000,000 would be genetically diverse, but the ~50,000 in 1650 matter a lot more than the ~1,000,000 in 1790. The rate of mutation accumulation is pretty slow, so a population bottleneck or subsample has a huge long-term effect.

In fact, as you probably know one of the biggest determinants of genetic variation in New England whites of 1790 is the bottleneck that they share with all other non-Africans that dates to 50,000 years or more before 1790!

And these are just the coarse demographic considerations on the broader population/historical scale. In any normal random-mating human population, there’s some reproductive variance by chance (usually it is modeled as a poisson distribution; mean and variance being the same, though from I have read the variance in mammals is usually greater than the mean).

Some people have more children, and some people have fewer children. That means that there is a census population, and a breeding population, and the breeding population is invariably smaller than the census population. Some individuals don’t reproduce to the next generation, obviously. But there are also cases where some individuals have large numbers of surviving offspring, while others have only a few.

To make it concrete I plotted the distribution of the number of children of women older than 50 years of age from the year 2000 and later in the General Social Survey (GSS). You can see that the most common number is two, but there are a fair number with three. Only about 10% of women 50 years and older have no children in the GSS.

But the curious thing is that if you weight the number by the proportion, you notice that women who have three children may not be as common as women who have two children, but they are contributing more children to the next generation than women who have the more typical two children. And, though the number of women who have five or more children is only 11% of the sample, as opposed to 14% who have one child, they contribute nearly five times as many children as those with one child to the next generation (women with six children alone contribute more than women with one child).

Basically, not all the genetic variation in a given generation is created equally. Some people will contribute more to the next generation, and that has a homogenizing effect (there are models of mutation/selection/drift which establish equilibria values of variation in a stationary state).

I’m revisiting all of this for two reasons. First, in Who We Are And How We Got Here David Reich talks about a long period of a shared population bottleneck for “Out of Africa” (all non-Africans) groups before the primary expansion ~60,000 years ago. Second, in my conversation with Matt Hahn, he was very skeptical of drawing any correspondence between effective population and some inferred census size. In hindsight I think part of it is that in most organisms census quotes are more an art than science. Not so with humans.

This made me look more into the literature for humans again. Recently Browning et al. published Ancestry-specific recent effective population size in the Americas. It’s a great paper. Basically, it uses identity by descent tracts of different ancestry to tease apart the distinctive pre-admixture effective population sizes. If you take an admixed population and assume that it was a single population random-mating indefinitely, and then work backward in time, you’re probably going to produce rather strange effective population sizes (if the two groups are about the same genetic diversity beforehand, they’ll probably show an inflated effective population, because you are assuming the two groups were a big random-mating population long before they were randomly mating!).

There are many ways to infer effective population, and the identity by descent method seems reasonable for recent time periods. And one thing about recent population size estimates for humans is that you have reasonable census estimates (you don’t just check with simulations):

Our simulations showed that biased sampling of a structured population results in underestimation of most recent effective population size. When we compare the estimated current effective sizes of HCHS/SOL country-of-origin populations to World Bank population sizes (accessed via Google Public Data Explorer) from 1995 (when the average age of the sampled individuals was around 25), we find that the ratio of current estimated effective size to 1995 population size ranges from approximately 1/60 (Ecuador) to approximately 1/4 (Cuba), with typical values around 1/10. Although estimates of effective size in the most recent generations are affected by these issues, our simulations also showed that less recent generations are not affected. Thus our estimates are useful for learning about the effective population sizes at and before admixture.

The structured part is important. For example, the paper On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? explores how structured models of gene-flow might be confused when genomic inferences assume a panmictic population. Last year a paper in PNAS, Early history of Neanderthals and Denisovans, suggested that Neanderthals were characterized by a high structured meta-population, and that low effective populations from sampled genomes in this group of humans reflects this, rather than a genuinely low census size.

Browning et al. focused on recent population size inferences. I was curious about these inferences because we can compare them to real census sizes. From this I think I can tune my intuition at least to the possibily that census size of a random mating population is not likely to be two orders of magnitude above the inferred effective population size. Conversely, the rough mammalian value of an effective population size of ~1/3 the census size seems to be a ceiling. Population structure and bottleneck aside, humans seem to have enough basal reproductive skew that effective population size is less than half of the census size.

To focus on ancient population growth (or lack thereof), I reread Inferring human population size and separation history from multiple genome sequences (Schiffels et al. 2014), Exploring Population Size Changes Using SNP Frequency Spectra (Liu et al. 2015) and Neutral genomic regions refine models of recent rapid human population growth (Gazavea et al. 2014). The first two papers seem to suggest an “Out of Africa” population bottleneck that’s pretty long, with an effective population that’s somewhat lower than 5,000 individuals. In contrast, the last paper seems to have a sharp bottleneck of 200 individuals.

Remember, different models can produce the same empirical patterns in the genome. You can reduce genetic diversity by a modest, but long, bottleneck. Or, through a very sharp short bottleneck.

In Who We Are and How We Got Here David Reich definitely leans toward a long, but more modest, bottleneck. For anthropological and archaeological reasons this seems more plausible now than it did ten years ago.

But perhaps it makes more sense now that we have more ancient DNA and a more elaborated model of human history seen through the lens of population genetics. In Schlebusch and Jakkbonson’s Tales of Human Migration, Admixture, and Selection in Africa the authors come out say “For our species’ deep history in Africa, both paleoanthropological and genetic evidence increasingly point to a multiregional origin of AMHs [anatomically modern humans] in Africa.”

They’re only saying what I hear other people talking about.

Instead of the “Out of Africa bottleneck” being defining for our species, it’s only a phenomenon which is important for peoples outside of Sub-Saharan Africa. Arguably for the majority of the existence of our species something closer to multi-regionalism was operative within modern humans.

If fact, isn’t that what the new ancient DNA shows? Pulses of admixture and gene-flow between distinct groups? Arguably multiregionalism might be the answer to our origins, but also characterize many of the dynamics after the “Out of Africa” event.

In any case, the best evidence now points to the likelihood that modern human lineages began to diversify and diverge before 200,000 years ago. Conversely, most of the ancestry of modern humans outside of Africa dates to an expansion around ~60,000 years before the present (ancient DNA and archaeology seem to agree here).

This is probably right before the Neanderthal admixture event with non-African humans, at least the modern lineages we have around today. But, it turns out it does not define the point when non-African humans diverged from the ancestral African population. Another group, “Basal Eurasians” (who may not have been Eurasian at all), diverged before the expansion of all eastern non-Africans, Oceanians, as well as the ancestors of Pleistocene Europeans and Siberians. It does not seem that Basal Eurasians had any Neanderthal admixture. Basal Eurasian ancestry is substantial in the Middle East today (although lower than 50%), and non-trivial across broad swaths of Europe and South Asia, due to the expansion of farming. They seem to have been well mixed in places like North Africa with other Eurasian groups ~15,000 years ago. Presumably that was a “back to Africa” migration, since these people had Neanderthal ancestry.

All of this leads to the conclusion that the ancestors of Basal Eurasians/non-Africans must have gone through their shared bottleneck well before ~60,000 years before the present. And, it may have happened on the African continent. So with that, I’ll quote Schiffels et al.:

This comparison reveals that no clean split can explain the inferred progressive decline of relative cross coalescence rate. In particular, the early beginning of the drop would be consistent with an initial formation of distinct populations prior to 150kya, while the late end of the decline would be consistent with a final split around 50kya. This suggests a long period of partial divergence with ongoing genetic exchange between Yoruban and Non-African ancestors that began beyond 150kya, with population structure within Africa, and lasted for over 100,000 years, with a median point around 60-80kya at which time there was still substantial genetic exchange, with half the coalescences between populations and half within (see Discussion). We also observe that the rate of genetic divergence is not uniform but can be roughly divided into two phases. First, up until about 100kya, the two populations separated more slowly, while after 100kya genetic exchange dropped faster.

David Reich’s group, and others, now posit the existence of “Basal Human” population that mixed into West Africans, who can be modeled as primarily proto-East African (without Eurasian admixture), as well as this ancient outgroup. This means that estimates of divergences with non-Africans from something like MSMC may generate a composite if proto-East Africans are closer to the ancestors of non-Africans, which seems likely. One likely model is that the “Out of Africa” population emerged out of the northern edge of this proto-East African distribution of modern humans over 100,000 years ago (but after groups like the Khoisan and Basal Humans had already diverged).

Looking at Schiffel et al., they seem to posit lower in divergence times than seems likely to me. Is that perhaps due to unaccounted for admixture in lineages which fuse together groups which were earlier distinct?

In any case, with details about the divergence dates set aside, the MSMC results are actually in line with a new congealing consensus. Deep structure within Africa, but gene-flow between distinct populations, for at least ~100,000 years (possibly more). This is the period when population structure was quite fluid and indistinct along the East Africa continuum out of with non-Africans emerged.

Also, the archaeological evidence is now strongly suggestive of modern humans in places like Southeast Asia over 10,000 years before the wave which led to the ancestry of most extant populations. In fact, we know that this sort of early migration with no descendants isn’t abnormal. The first modern humans in Europe left no descendants (at least in any appreciable quantity). And the Altai Neanderthal seems to have modern-like admixture that dates to ~100,000 years before the present.

With all the evidence that modern humans were present in Africa, and expansively so, for hundreds of thousands of years, it seems unlikely that they never mixed with “archaic” Eurasian  lineages (and vice versa). In fact, as we obtain more and more Neanderthal and Denisovan genomes perhaps we’ll find that a rapid expansion like the one that occurred ~60,000 years ago across Eurasia and Oceania happened before, out of and/or into Africa.

Looping back to the effective population issue, the effective population of modern non-Africans seems to have been below ~5,000 for a while. There was minimal gene-flow with other populations for many generations. Reich has a schematic of 40,000 years between 90,000 and 50,000 BP in Who We Are and How We Got Here. But that’s obviously just a ballpark figure. I have a hard time believing that the census size was around 500,000. The world population 10,000 years ago is usually estimated to be 1 to 10 million. Human populations were probably much larger at the end of the Pleistocene than 100,000 years ago. But a figure of 10% effective would give 50,000, which seems a reasonable number, especially with the likelihood that we’re talking about many tribes over a wide ecological zone. Meta-population dynamics of extinction and resettlement in inclement periods probably drove down the effective population.

The separation seems to be distinct from the older multiregional phase. What could explain it? The existence of the Sahara, and periods of extreme desertification seems the most likely candidate. I can’t say much with any credibility because I don’t know the archaeology and paleoclimate literature, but before domesticated animals, it was probably difficult for hunter-gatherers to make a go of it in the deep Sahara during the driest phases.

If I had to bet, the Eemian interglacial, 130 to 115 thousand years ago, is when I would assume there was:

  1. Lots of gene flow across the Sahara, perhaps in both directions
  2. A major population expansion of humans, of all sorts

This gives plenty of time for a wave of modern humans to push east, probably going through milder climates, rather than expanding north into Neanderthal or Denisovan territory. Eventually, some group must have mixed with the ancestors of the Altai Neanderthals. It seems likely that a cold and dry spell after the Eemian would have been optimized more to the well adapted Eurasian groups, and modern populations would have withdrawn into refugia. The brutally expanding Sahara would have divided the majority of modern humans, who existed in the meta-populations to the south that dated back hundreds of the thousands of years, from the groups on the northern fringe.

One can imagine that large numbers of modern humans were either absorbed or went extinct with the expansion of Neanderthals and other archaics. Though Neanderthals and Denisovans were interfertile with moderns, the lineages were still distinct enough that it looks like there was some hybrid breakdown. Just as modern humans seem to have purged many Neanderthal alleles from our genome, the opposite dynamic was probably at work.

There was clearly some structure in the relict modern human group that was separated from the African populations. Basal Eurasians did not mix with Neanderthals, but the ancestors of all other non-African humans did. Though one has to be careful about such geographical inferences, that suggests to me that the range of modern humans in the period between 60,000 to 80,000 years ago extended further back into pockets of northeast Africa, where no contact with Neanderthals would have occurred. Perhaps, in the end, we’ll end up thinking that the Basal Eurasians in some ways were a lot more like Africans south of the Sahara, as they didn’t undergo the massive range expansion of other populations during the Upper Paleolithic.

I’ll end with some predictions.

  • Ancient DNA of proto-moderns and archaics in eastern Eurasia dated to between 50,000 to 100,000 years BP will be analyzed at some point and will exhibit a fair amount of admixture. That is, the Altai Neanderthal was not exceptional, and probably relatively attenuated. I’m moderately confident of this.
  • The pre-60,000 year eastern Eurasians will be found to have left some of their genes in modern eastern Eurasians. Especially in Southeast Asia and Oceanian. Probably in the 1-10% range. I’m moderately confident of this.
  • The Denisovan ancestry in Oceanians is mediated by a “first wave” group “Out of Africa.” I have low confidence in this, but I really wouldn’t be surprised either way. My confidence in my confidence is low!
  • At some point we’ll obtain sequence from a 1 million year old hominin somewhere in the colder/drier climes of Eurasia (we have a 900,000 year old horse genome). This will predate Neanderthal/Denisovans. We will see from this that some of these super-archaic populations left their heritage in later archaics, and therefore our own lineage. I’m rather confident of this.
  • By hook or crook we’ll get more ancient genomes out of African samples, and confirm a lot of ancient population structure, as well as some gene-flow from archaic non-modern lineages. Probably around the same range you see in non-Africans (though some of the gene-flow may also apply to non-Africans, since they didn’t separate from eastern Africans until 100,000 to 150,000 years ago). I’m rather confident of this.
  • H. naledi will return sequence at some point. I’m very confident of this. I don’t have inside knowledge, but I know they’re going to keep trying. They are getting more samples.
  • H. naledi will be found to have contributed ancestry to modern southern African populations. I’m moderately confident of this.
  • At some point ancient genomes from the Americas will confirm the existence of an earlier group which was only distantly related to modern New World populations descended mostly from Siberians. There is indirect evidence of this group from South American populations, but we’ll get individuals who are much more distinct at some point in the future. I’m moderately confident of this.
  • Basal Eurasians will be found to have inhabited Southern Arabia/Persian Gulf region. But “pure” population will have been found to have disappeared around the Last Glacial Maximum ~20,000 years ago, as the human populations to the north moved south, and the Near East’s southern fringe became drier. I’m moderately confident of this.

The 4,000 year explosion

The figure above shows a most interesting result from a new preprint, FADS1 and the timing of human adaptation to agriculture. It shows the allele frequency change using ancient Eurasian genomes for the derived allele at FADS1.

In case you don’t know why FADS1 is important, it’s been implicated in variation long-chain polyunsaturated fatty acids (LC-PUFA) metabolism. The derived allele, embedded in haplotype D in the above preprint, seems more optimized for plant-based diets, because of the higher activity of synthesis of LCPUFAs (which one might otherwise obtain from marine resources, as is likely among Inuit).

So the standard model is that the Neolithic changed things, as humans began to adapt to cereal-based diet diets. This preprint suggests maybe not:

Our analysis shows that selection at the FADS locus was not tightly linked to the development of agriculture. Further, it suggests that the strongest signals of recent human adaptation may not have been driven by the agricultural transition but by more recent changes in environment or by increased efficiency of selection due to increases in effective population size.

The authors are explicit that the derived allele at FADS1, which is at ~60% in modern Europeans, was under strong selection during the Bronze Age. In fact, this allele, which is common in Africans, may have been absent in most Paleolithic Eurasians. Using various methods they infer in fact that the ancestors of non-Africans may have been subject to selection for the ancestral variant. Their timing estimates indicate that this predates the standard expansion period starting ~60,000 BP (there was also an older selection event for the derived variant within Africa). Additionally, the authors posit that the derived variant was introduced into Europeans due to the Basal Eurasian ancestry in farmers.

They posit two dynamics that might drive the Bronze Age selection events. First, they suggest that the change in environment was actually more dramatic than that during the Paleolithic-Neolithic transition. Second, they suggest that effective populations were much smaller before the Bronze Age, so selection was not as efficacious (or, more precisely, drift effects were dominant in shaping variation).

This idea that the Neolithic isn’t quite as important, or singular, is somewhat of a surprise. But we may need to consider it. Another line of research, using high-quality modern day sequences rather than ancient genotypes, implies that there has been a lot of recent selection, and that’s likely going on today.

Second, one of the major takeaways from The Fate of Rome is that pandemics probably weren’t a feature of Neolithic small-scale societies. Rather, pandemics relied on long-distance trade and movement, as well as concentrations such as urban centers. Though certain endemic diseases probably arose in the Neolithic, the periodic sweep of pandemics required greater social and cultural complexity and overall human density.

The analogy then is rather straightforward. Just as microbes can move faster and more efficiently in an interconnected world, so such a world is much closer to a panmictic one. Earlier work suggested that effective population size of Neolithic farmers was not particularly small, but perhaps there are dynamics being missed by that simple summary value when it comes to the interconnectedness of the Eurasian landscape triggered by the emergence of pastoralism, and the necessary reaction of larger-scale polities.

A simple test of this would be to compare selection signals in a place like Papua New Guinea, which did not seem to undergo the same sort of pressures as Bronze Age Eurasian societies in relation to reduced diversity. I presume that New World societies as well would be an interesting test.

Y chromosomal star-phylogenies as inter-group competition between paternal lineages

The figure to the left should be familiar to readers of this weblog. It is taken from A recent bottleneck of Y chromosome diversity coincides with a global change in culture (Kamin et al.). Over the past few years a peculiar fact long suspected or inferred has come into sharp focus: some of the Y chromosome haplogroups very common today were not so common in the past, and their frequency changed very rapidly over a short time period.

What Kamin et al. did was look at sequence data across the Y chromosome to make deeper inferences. The issue is that the Y chromosome is not genetically very diverse. Earlier generations of researchers focused on highly mutable microsatellite regions for identification. While microsatellites are good for identification and classification because of their genetic diversity, they are not as good when it comes to making evolutionary inferences about parameters such as time since last common ancestor. They have very high and variable mutation rates.

Single nucleotide polymorphisms (SNPs) are probably better for a lot of evolutionary inference, but the Y chromosome doesn’t have too many of these. SNP-chip era technology which focuses on a select subset of polymorphisms at specific locations didn’t have much to choose from and likely missed rare variants.

This is where whole-genome sequence of the Y comes in. It retrieves maximal information, and with that, the authors of Kamin et al. could definitely confirm that some Y chromosomal lineages under explosive expansion ~4,000 years ago after a bottleneck.

By and large ancient DNA take a different angle, focusing on genome-wide autosomal ancestry, and lacking in high-coverage whole-genome sequences. But they have confirmed the inferences from whole-genomes that some of these lineages exhibit explosive growth in the last ~4,000 years. One moment they were rare, and the next moment ubiquitous.

But geneticists are geneticists. They’re interested in genetical questions, methods, and dynamics. To be frank cultural models for how those genetic patterns might have come about are either exceedingly simple and probably true (e.g., gene-culture coevolution with lactase persistence), or vague and handwavy. With the surfeit of genomic data to analyze it isn’t surprising that this happens.

This is why researchers in the field of cultural evolution need to get involved. They’re model-builders and should see which models predict the copious empirical results we have now when it comes to genetic change over time.

For several years now I have been asserting that inter-group competition of paternal lineages best explains the pattern of Y chromosome expansions ~4,000 years ago. A new paper brings forth a formal model which explores this hypothesis, Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck:

In human populations, changes in genetic variation are driven not only by genetic processes, but can also arise from cultural or social changes. An abrupt population bottleneck specific to human males has been inferred across several Old World (Africa, Europe, Asia) populations 5000–7000 BP. Here, bringing together anthropological theory, recent population genomic studies and mathematical models, we propose a sociocultural hypothesis, involving the formation of patrilineal kin groups and intergroup competition among these groups. Our analysis shows that this sociocultural hypothesis can explain the inference of a population bottleneck. We also show that our hypothesis is consistent with current findings from the archaeogenetics of Old World Eurasia, and is important for conceptions of cultural and social evolution in prehistory.

Their model is interesting because inter-group competition between paternal lineages can result in a loss of haplogroup diversity without huge reproductive skew. That is, instead of a highly polygynous society, one can simply posit that group dynamics of expansion and extinction produce expansions of Y chromosomal lineages.

A formal model synthesized with genomic results is a major step forward, though I haven’t dug into the methods (computational or analytic). Presumably, this is a first step.

But the discussion does review a lot of anthropological literature about the nature of human conflict and social interaction. Basically, it seems that between nomadic hunter-gatherers and before chiefdoms, biologically defined paternal clans were often the organizing principle of society. To some extent this makes total sense since the meta-ethnic religious and social identities explicitly appeal to fictive relationships of blood even after blood was no longer paramount. Ancient Near Eastern kings addressed each other in familial terms (e.g., “brother” and “son”), while universal religions deploy the construct of brotherhood.

In Empires of the Silk Road the author makes the case that these bands of brothers were more influential in shaping history than we realize today. Not surprisingly, the authors of the above paper suggest that the Inner Asian nomad zone is where star-phylogenies have been most pervasive and persist down to historical time. As in Steven Pinker’s The Better Angels of Our Nature it seems that the rise of the state suppressed the viciousness of the paternal kin group. How do we know this? Because the period of the maximal explosion of star-phylogenies seem to be a transient between the early Neolithic and the historical age.

The Y chromosomal literature is just the low hanging fruit. I suspect in the next decade cultural evolutionary models will be brought to bear on the huge mountain of genomic data….

Citation: Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck Tian Chen Zeng, Alan J. Aw & Marcus W. Feldman.