The Genetic History of the Middle East: into Arabia

A new massive preprint on the Middle East is out. I’ve edited the first figure to give people a general sense of the broad results and populations sampled. First, you have to know that these are high-quality modern samples. 137 individuals at 30x whole genome coverage.  In other words, basically the best genomic data you can get on sequences. No need to futz around with subsets of the data. This is important and needful because the 1000 Genomes doesn’t have a Middle Eastern population. So when looking to assemble variants there was a deficit in this domain. Even the WGS of the HGDP was not totally sufficient, since the Middle Eastern populations were not Arabian.

The populations here are sampled from both the classical “Fertile Crescent” and various points within the Arabian peninsula. At the end of the preprint, they do some analysis on selection, which I won’t talk about. The most interesting thing is that they confirm that Arabian people have a unique lactase persistence allele that seems to have been selected very recently, just like in Europeans. A lot of the selection analysis seems to be either replicate what you would find elsewhere. Or, they do not have enough power to detect polygenic selection (though they did detect selection on EDU).

The big finding to me is that this work confirms that there is a north-south cline in the Near East defined by a deep population structure. The admixture graph to the right captures the main features using Lebanese and Emiratis as the two extreme populations, but as you can see in the admixture plot above the cline really runs from the Caucasus to southern Arabia. If you analyze these populations one thing you will see is that Fertile Crescent populations, such as Druze, often seem more like Armenians and Georgians, than South Arabians. Why is this? After all, South Arabians and Fertile Crescent populations speak Semitic languages.

I think the issues here are multiple. First, there is recent admixture that obscures some of the deeper relationships. This is clear insofar as most Arab Muslim populations have Sub-Saharan African admixture. This is historically attested, and physically visible. The variation and range are quite high, in part due to spatial heterogeneity of slavery (e.g., more African slaves in lowlands than highlands), and the recency of the admixture producing variation due to incomplete mixing (the dates are usually 1000 A.D. and later).

But this is not the only admixture. All of the Fertile Crescent populations, along with groups to the north, have much more steppe drift than those to the south in Arabia. The details of the fractions don’t matter, it’s not much, but it’s not trivial, and it’s always higher than among the Arabians. Additionally, this element is new to the region, in relative terms. You can see the contribution in modern Lebanese in comparison to the Bronze Age Sidon samples, which date to 1800 BC. The source could be continuous gene flow during the Roman and Byzantine period, or even later. Or, it could also be Indo-European migrations.

We know that Indo-Iranian peoples were present in Upper Mesopotamia. The Mitanni Kingdom, which had Indo-Aryan affinities, shows up after 1750 BC. The Hittites, the Nesa, show up to the north in Anatolia a bit earlier.  Interestingly, the Hittites speak an Indo-European language that is often considered basal (the outgroup) to most of the others. Armenian, who emerges later in eastern Anatolia, is also quite distinct, just as Greek to the west is. In contrast, there is a lot of suggestive evidence of either genealogical or geographical connectedness between the ancestors of Indo-Iranian and Slavic language families.

The presence of these two very distinct ancestral components, steppe, and Sub-Saharan African, on top of the ancient Near Eastern base, produce distinctions in the modern populations which obscure some of the deeper strands. In the late 2000s when researchers and bloggers began running admixture analyses on Ethiopians it was clear that this population was a mix between “West Eurasian” and African which wasn’t Bantu. The West Eurasian donor population was often Yemeni, in particular Yemeni Jews. Later on, using more sophisticated methods some models suggested greater affinity in Ethiopian genomes to Levantine populations than Yemenis. What was going on?

We now know. It is quite clear Ethiopian populations lack steppe ancestry. In the earlier Bronze Age, and definitely, the Neolithic, Levantines lacked steppe ancestry. In fact, the Neolithic Levantines usually lacked “Iranian” ancestry. The West Eurasian ancestry in Northeast Africans, on the whole, is enriched for a Levantine ancestry quite similar to Natufian. Modern-day South Arabians are the closest to this population mix, even if they are not descended from ancient Levantines. They lack steppe.

Modern-day South Arabians in fact descend in part from indigenous hunter-gatherers, who were a sister clade to the ancestors of Natufians. The admixture graph makes that clear for the Emiratis with the least African ancestry have half their ancestry from this group. In the book Arabs, the author discusses at length various Yemeni legends of a fusion between distinct peoples on the edge of history. This could be recollections of the merge of indigenous Neolithic Arabians and peoples who expanded from the north.

The analyses of these samples confirm and reiterate what has been found with ancient DNA: at some point late in the Neolithic and early in the Bronze Age a massive admixture event occurred in the Fertile Crescent which brought a considerable amount of “Iranian” ancestry into the region (these ancient people are not like modern Iranians; in particular, they lacked steppe ancestry which is copious in much of Iran, particularly the east). This ancestry pushed south and westward so that ~50% of the ancestry of Arabians seems to be Iranian. That being said, I have some qualms here:

We explored whether this ancestry penetrated both the Levant and Arabia at the same time, and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,900 and 5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-3,300) and Arabia (2,000-3,800). These times overlap with the dates for the Bronze Age origin and spread of Semitic languages in the Middle East and East Africa estimated from lexical data (Kitchen et al., 2009; Figure S8). This population potentially introduced the Y-chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ~5.6 [95% CI, 4.8-6.5] kya, agreeing with a potential Bronze Age expansion; however, we do find rarer earlier diverged lineages coalescing ~17 kya (Figure S9). The haplogroup common in Natufians, E1b1b, is also frequent in our dataset, with most lineages coalescing ~8.3 [7-9.7] kya, though we also find a rare deeply divergent Y-chromosome which coalesces 39 kya (Figure S9).

Some of these dates are hard to credit. For example, I obtain a midpoint estimate of Iranian admixture into Egypt around 1836 BC!

The fraction of Iranian ancestry is substantial. The admixture model in the supplements gives this for Egyptians: 45% Levant_N, 32% Iran_N, 8% EHG (Eastern European Hunter-Gatherer), and 15% Mota (African). The older date is 2700 BC. The oldest Egyptian writing dates to 2700 BC, but proto-hieroglyphs are 500 years older. The authors talk about Semitic languages, and ancient Egyptian is not Semitic. So it could be a minority population mixed into the Egyptians, but this is a massive event that we don’t have records of. In fact, the authors claim that it went into much of Northeast Africa at a relatively late date.

Additionally, the values for the Levant seem recent as well. That being said there was a pre-Sumerian civilization, the Uruk Civilization, which spread broadly from Mesopotamia between 4000 and 3000 BC. This is 6000 to 5000 years ago. The midpoint of this is 5500 years, while the midpoint of the admixture into the Syrians, who were on the edge of the Uruk Civilization is 3800 years ago. Basically, I think the evidence points to various statistical genomic artifacts reducing the age from when the admixture truly occurred (this has long been a problem in this field).

I honestly have no idea how to relate the expansion of Semitic languages to the expansion of Iranian languages. My friend Patrick Wyman believes that Anatolian farmers spoke Afro-Asiatic. These were very different people from the Iranians, who arrived from the east later. Additionally, history teaches us that Mesopotamia during the Bronze Age was very linguistically diverse. The Sumerians were not Semitic, and neither were their Elamite neighbors in Khuzistan. The Akkadians, who were more prevalent in the north of Mesopotamia, but were present from the beginning of Sumerian history, were Semitic.

There is still a mystery around the great admixture between Neolithic Near Easterners of the west and the east. I don’t think we’ve closed that chapter of the book.

That being said, there is a lot that is “solved” in this paper. For example, these authors seem to confirm that there is no evidence of “first wave” modern humans in Arabian populations earlier than the non-African radiation. Arabians, like other non-Africans, underwent a population expansion 50-70,000 years ago. Their separation from Mbuti Pygmies was gradual up until 120,000 years ago. Then there seems to have been a separation. What this is telling us, I believe, is that the ancestors of non-Africans were part of the African meta-population until 120,000 years ago. This is suspiciously close to the Eemian Interglacial, which dates to between 115,000 to 130,000 years ago. The Eemian was characterized by a “Green Sahara”, so it seems that this is when early modern humans ventured in substantial numbers out of the continent and to its peripheries. One issue that seems notable in the data is that proto-non-Africans seem to have been characterized by a period of isolation and small population size. Perhaps

But 50-70,000 years ago a massive expansion of one of these daughter populations occurred. These data confirm that Arabians seem to have the same Neanderthal admixture as everyone else, but, even accounting for Sub-Saharan African ancestry they also have somewhat less. In alignment with earlier research, they argue that this is due to admixture with “Basal Eurasian” populations which did not mix with Neanderthals ~55,000 years ago.  Or, more precisely, did not carry as much Neanderthal ancestry (it seems plausible that the Basal Eurasian populations are themselves a compound of conventional non-African at the base of the broader splits, and a deeper basal group which lacks Neanderthal ancestry).

Going back to the admixture graph, you notice that both western and eastern farmer populations are a compound of Basal Eurasian and various lineages that are broadly “West Eurasian.”  Natufians and Anatolian farmers are descended about half from groups related to European hunter-gatherers, while ancient Neolithic Iranians had ancestry related to these people, but even more to populations distantly related to Ancient North Eurasians (Paleo-Siberians). The events here are distant, but the sample proportion of Basal Eurasian ancestry indicates to me a rapidly expanding population at some point which mixed with a well-structured set of groups in the Near East.

The major takeaways

  • Near Easterners are part of the same broad diversification as all other non-Africans
  • The expansion of these non-Africans dates to 50-70,000 years ago
  • Archaeological evidence points to a very intense expansion in the period around ~50,000 years ago, and admixture with Neanderthals somewhat before then
  • At the beginning of the Holocene Near Easterners were deeply structured regionally, and had threaded together disparate ancestral components (Basal Eurasian, related to European hunter-gatherer and Paleo-Siberian)
  • Late in the Neolithic and early Bronze Age much of this structure collapsed, and there was a massive admixture of Iranian ancestry to the south and west (conversely, there is evidence in other work of admixture of western farmer ancestry to the east)
  • Finally, there is evidence for later incursions of steppe people into the northern Arabian fringe and Fertile Crescent
  • On top of this, there is historical admixture from Africans and in the north Turks and other groups

The genomics of the Viking Age

A huge new preprint on Vikings (as well as the Bronze Age, Iron Age, and comparisons to moderns), Population genomics of the Viking world:

…we sequenced the genomes of 442 ancient humans from across Europe and Greenland ranging from the Bronze Age (c. 2400 BC) to the early modern period (c. 1600 CE), with particular emphasis on the Viking Age. We find that the period preceding the Viking Age was accompanied by foreign gene flow into Scandinavia from the south and east: spreading from Denmark and eastern Sweden to the rest of Scandinavia. Despite the close linguistic similarities of modern Scandinavian languages, we observe genetic structure within Scandinavia, suggesting that regional population differences were already present 1,000 years ago. We find evidence for a majority of Danish Viking presence in England, Swedish Viking presence in the Baltic, and Norwegian Viking presence in Ireland, Iceland, and Greenland. Additionally, we see substantial foreign European ancestry entering Scandinavia during the Viking Age. We also find that several of the members of the only archaeologically well-attested Viking expedition were close family members. By comparing Viking Scandinavian genomes with present-day Scandinavian genomes, we find that pigmentation-associated loci have undergone strong population differentiation during the last millennia. Finally, we are able to trace the allele frequency dynamics of positively selected loci with unprecedented detail, including the lactase persistence allele and various alleles associated with the immune response. We conclude that the Viking diaspora was characterized by substantial foreign engagement: distinct Viking populations influenced the genomic makeup of different regions of Europe, while Scandinavia also experienced increased contact with the rest of the continent.

A few notes:

– Though the broad patterns seem to have been established with the expansion between 3,000 and 2,500 BC from the Yamnaya steppe (at least in Northern Europe), some subtle details in genome-wide ancestry shifted in subsequent periods. This data set seems to show a decline in “Neolithic Farmer” and increase in hunter-gatherer and steppe ancestry after the Bronze Age, with some increase in the former by the Viking Age. This suggests that there is some sort of skew in sampling which misses populations enriched for hunter-gatherer ancestry (I suspect these groups live in the most marginal land and are the most mobile).

– There is structure by the Viking Age, which is not surprising. But the authors also report a few regions of southern Sweden where samples are enriched for Neolithic farmer ancestry down to the Viking age, suggesting that even ancient structure wasn’t well mixed (yet).

– Most of the selection for the phenotype which characterizes modern-day Northern European populations seem to have completed over the 2,000 years between the Bronze Age and the Viking Age.

The vines around the tree trunks


A lot of the understanding of scientific theories and models in the public domain is communicated by evocative metaphors and turns of phrase. For example, Charles Darwin famously wrote:

It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us….

When it comes to understanding the origin of our own species and the broader human lineage over the past two million years, I’ve started to come to a mental model of a weighted-graph with edges. Some of the edges traverse time and have strong weights. These are analogous to the normal phylogenetic tree model, representing phyletic gradualism and anagenesis along each branch before some bifurcation event. But, some of the edges move horizontally between others. These represent migration and/or gene flow between the primary lineages.

I’m not sure though that a graph theory derived mental model helps many people, so I’ll use another one: imagine large trunks defining the primary lineages, and vines tying them together representing gene flow events. The above figure is from a new preprint, Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. This is a methods-heavy preprint. It utilizes an “ancestral recombination graph” (so a model of the genealogy of genes in the genome) and MCMC generate Bayesian probabilities of particular events (e.g., introgression of a lineage that diverged x years ago at fraction y).

The abstract presents some specific findings:

…While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present an extended version of the ARGweaver algorithm, ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topology and branch lengths along the genome, but also indicate migrant lineages…We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples…We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. We also identify 1% of the Denisovan genome which was likely introgressed from an unsequenced hominin ancestor, and note that 15% of these regions have been passed on to modern humans through subsequent gene flow.

ARGweaver-D is gnarly. Not in a bad way. But you should never really trust computational wizard of this sort unless you’ve taken it for a test drive, or it’s been around decades and people have validated it. A “play with the parameters” phase is necessary for these packages to become more than magic.

That being said, for about half a decade people have been detecting evidence of a “super-archaic” lineage within Denisovans. This is just another confirmation with another method. The super-archaic hypothesis seems plausible as an explanation of the patterns in the data (there may be other explanations). Second, there’s a lot of circumstantial evidence for gene flow into Neanderthals from moderns. E.g., mtDNA replacement in Neanderthals. Though not in the abstract, the preprint mentions the likelihood of “super-archaic” introgression into Neanderthals as well. From a recent ancient DNA paper on Nuclear DNA from two early Neandertals reveals 80,000 years of genetic continuity in Europe:

We find that population split times between HST and other Neandertals of less than 150 ka ago make the occurrence of a mitochondrial time to the most recent common ancestor (TMRCA) of 270 ka ago unlikely (1.2% of all simulated loci have such a deep TMRCA; note S11). We note that this result is robust to uncertainties in the estimates of the Neandertal population size and of the mitochondrial TMRCA (note S11). The presence of this deeply divergent mtDNA in HST thus suggests a more complex scenario in which HST carries some ancestry from a genetically distant population.

It seems entirely likely that we’re going to see “shadows of forgotten ancestors” in our genomes. But wait, there’s more!

…ARGweaver-D only detected a small amount of Sup→Afr introgression, which was somewhat lower than our estimated false positive rate. One aspect to note here is that the power to identify introgression from an unsequenced population is highly dependent on the population size of the recipient population. The larger the population, the deeper the coalescences are within that population, making it more difficult to discern which long branches might be explained by super-archaic introgression…If we had used a smaller population size, ARGweaver-D would have produced more Sup→Afr predictions, but most of these would be false positives unless that smaller population size is closer to the truth. Overall, we caution that the problem of detecting super-archaic introgression into a large and structured population such as Africas is very difficult and that claims of such introgression need to be robust to the demographic model used in analysis. It may not be possible to address the question of ancient introgression into Africans without directly sequencing fossils from the introgressing population.

In northern Eurasia, in particular, one might imagine a scenario with large fluctuations in population size, and patchy landscapes. This would reduce gene flow between populations, and also foster drift to produce distinct lineages. Simple stylized models of gene flow at particular times across disparate lineages makes a great deal of sense in this context. But if Africa had larger populations of humans, with more interconnected networks with continuous, if variable, levels of gene flow then the stylized models will mislead in important features.

This preprint is likely reporting some true robust results that will hold up. But I think the bigger picture is that it will lead us toward moving beyond the extremely simple models in vogue a generation ago, to a more subtle understanding of complex emergence and collapse of human population structure over the last two million years.

Tracing the paths of Noah’s sons

 
The above admixture graph is from a new preprint, Paleolithic DNA from the Caucasus reveals core of West Eurasian ancestry. To be honest, if you read the supplementary text there’s almost no point in reading the main preprint, as it is far more in depth when it comes to the methodology as well as spotlighting a variety of particular results. It’s hard to know where to begin with such a preprint so I want to highlight the “this is a simplified model” portion in the figure above. That’s actually the truth. Remember, no admixture graph is the Truth, it is an attempt by humans to capture concisely and informatively the major features of our species’ population history dynamics. The reality was never as clear and distinct as stylized graphical representations would have you think, and the researchers are aware of this.

In any case, if you want to really get at how they arrived at the conclusions they did, really read the supplementary section SI 2, “An admixture graph model of Upper Paleolithic West Eurasians.” The authors have so many potential combinations of ancestral populations that they can’t simply manually and intuitively posit admixtures. Rather, they have to explore a huge number of combinations (trees/graphs)…at which point they run into computational limits. This section explicitly lays out computationally efficient ways to automatically traverse the possibility space, and arrive at the best fitting set of models, within reason.

The title of the preprint says it all, but let me quote the abstract in full:

The earliest ancient DNA data of modern humans from Europe dates to ~40 thousand years ago, but that from the Caucasus and the Near East to only ~14 thousand years ago, from populations who lived long after the Last Glacial Maximum (LGM) ~26.5-19 thousand years ago. To address this imbalance and to better understand the relationship of Europeans and Near Easterners, we report genome-wide data from two ~26 thousand year old individuals from Dzudzuana Cave in Georgia in the Caucasus from around the beginning of the LGM. Surprisingly, the Dzudzuana population was more closely related to early agriculturalists from western Anatolia ~8 thousand years ago than to the hunter-gatherers of the Caucasus from the same region of western Georgia of ~13-10 thousand years ago. Most of the Dzudzuana population’s ancestry was deeply related to the post-glacial western European hunter-gatherers of the ‘Villabruna cluster’, but it also had ancestry from a lineage that had separated from the great majority of non-African populations before they separated from each other, proving that such ‘Basal Eurasians’ were present in West Eurasia twice as early as previously recorded. We document major population turnover in the Near East after the time of Dzudzuana, showing that the highly differentiated Holocene populations of the region were formed by ‘Ancient North Eurasian’ admixture into the Caucasus and Iran and North African admixture into the Natufians of the Levant. We finally show that the Dzudzuana population contributed the majority of the ancestry of post-Ice Age people in the Near East, North Africa, and even parts of Europe, thereby becoming the largest single contributor of ancestry of all present-day West Eurasians.

Ancestry from Dzudzuana

Longtime readers know that I hate the American racial term “Caucasians.” It’s pretentious when you could just say “white European,” because that’s what people really mean, judging by the fact that the real people from the Caucasus are marginally Caucasian in the eyes of many Americans. The genealogical origin of the term goes back to Johann Friedrich Blumenbach. And yet this paper takes these two samples, and finds that a lot of the ancestry of modern groups can be attributed to them! (also, a religion interpretation of the results is in the title of the post)

To be fair, they caution that these ancient Caucasian samples are representative of a particular thread of human heritage, not that the center of this thread was necessarily in the Caucasus. This does make me wonder about ascertainment bias in the Near East toward samples from mountainous areas which were colder. But, at the granularity they are attempting to understand human population history, it’s probably not that big of a deal. Ultimately, they conclude that this Paleo-Caucasian population contributes “~46-88% of the ancestry” of modern Europeans, Near Easterners, and North Africans. That’s kind of a big deal.

There are so many results in this preprint, so I think we need to back to the “beginning” of the non-African branch. The Paleo-Caucasian sample is of note in part because it is from before the Last Glacial Maximum, and, about halfway back to the massive diversification of most non-African populations around 55,000 years ago.  Using the Paleo-Caucasian samples’ affinities this preprint reinterprets results from last spring on ancient DNA from Northwest Africa. In that paper, the authors conclude that Paleolithic North Africans were a mix between an unspecific Sub-Saharan population and Natufians. Here though the authors suggest that the Natufians and Yoruba both received gene flow from Paleolithic North Africans. And, these Paleolithic North Africans were themselves mixed between something similar to the Paleo-Caucasians (a mix between an ancient West Eurasian ancestry and “Basal Eurasian”), and a “Deep” ancestry which diverged from other non-Sub-Saharan Africans before the Basal Eurasians did.

The reason that the Paleo-Caucasian sample is so important is that it allowed the researchers to see that the early Holocene Near East, where Anatolian and Iranian farmers, as well as Natufians in the Levant, were ancestral to many later groups, was subject to many genetic changes from before the Last Glacial Maximum. The Natufians seem to be well modeled as having ancestry from the Paleolithic North Africans as one of the major ways they are distinctive from the Paleo-Caucasians. This presents us with a reasonable model for the west to east movement of haplogroup E, and, the Afro-Asiatic languages. The gene flow of Paleolithic North African also explains the non-trivial level of Neanderthal admixture which is found in the Yoruba population. This is mediated through the presumed back migration of Paleo-Caucasians from the Near East at some point in the Pleistocene, contributing some Neanderthal ancestry to the genetic background of Paleolithic North Africans.

Additionally, the distinction between western (Anatolian/Levant) and eastern (Iran) farmers during the early Holocene can now be understood as a product of later admixture into eastern proto-farmers of basic Paleo-Caucasian stock. The relative closeness of Anatolian farmers to the Paleo-Caucasian samples is indicative of the fact that there was an “Ancestral North Eurasian” (ANE) admixture cline into the Near East during the Pleistocene, which meant that some populations to the east became rather different from the pre-LGM samples. Probably after the Last Glacial Maximum proto-Siberian ancestry became prominent in the zone between the Caucasus and Iran (additionally, some of the models imply there was eastern Eurasian ancestry). This is in keeping with the fact that ANE ancestry does seem to have been found in places like Khorasan before the expansion south of steppe populations after 2,000 BC.

As noted in the abstract, Paleo-Caucasians had Basal Eurasian ancestry ~30,000 years ago. This increases the likelihood that Basal Eurasians weren’t recent migrants from deep inside Africa. Additionally, for various reasons, the authors are now positing a Deep ancestry which diverged even further into the past. Both Basal Eurasians and Deep populations seem to lack Neanderthal admixture. The authors also repeatedly suggest that Basal Eurasians were part of the Out of Africa bottleneck event. In Who We Are and How We Got Here David Reich presents the model that this bottleneck population had a low effective population size for a long time. This seems plausible because the genetic homogeneity that you see in non-Africans is pretty striking vis-a-vis Sub-Saharan Africans. On the other hand, this work confirms earlier results that imply that Basal Eurasians did not admix with Neanderthals, and also indicates that the divergence has to be greater than 60,000 years before the present from other non-Africans, who diversified more recently.

In contrast, the Deep ancestry group, which nevertheless forms a clade with the new Eurasian lineages (Basal and non-Basal), does not clearly seem to have undergone the bottleneck event according to this preprint. It’s more a matter of what they don’t say, rather than what they say in this case.

The big picture needs to be integrated I think with the new “modern humans emerged through a multi-regional process” within Africa. If you think of modern humans as emerging across an African range which shifted in the Near East based on oscillating climatic conditions, the ancestors of the “non-African” lineages can be thought of as one of the main deeply rooted lineages, probably in the northeast of the continent. During the Pleistocene, the Sahara was even more brutal than today during many periods, so it is not implausible that some of these marginal populations on the edge of Africa were subject to long periods of very small effective population sizes. Most of them presumably went extinct. But one population was probably far enough north and east that it had a little more margin to play with. This population was probably connected along the Mediterranean littoral at some point with the Deep component in North Africa, which had higher effective population sizes because the mountainous terrain of the Atlas region was always going to remain more clement through dry phases.

At some point one a group of the bottlenecked population mixed with some Neanderthals, and began to break out of containment in southwest Asia. If I had to bet money, I suspect there were already other related groups, probably somewhat admixed with local hominin lineages, further east. That is, I believe the archaeological results in Southeast Asia, and think that those in Australia are credible. But these groups were probably small in number, and totally absorbed by the later migration wave.

Also, the timing of the separation of Africans and “non-Africans” is such that I wouldn’t be surprised Qafzeh-Skull people were somehow ancestral to, or closely related to, the ancestors of non-Africans.

Finally, let’s remember that the authors were focusing on North Africa and Western Eurasia in this preprint. Things will get more complicated as East Asia and Africa come “online” in terms of these analyses. Of course, we are going to be helped by the reality that human genetic variation is not arbitrarily and randomly distributed, but reflects real constraints in our evolutionary history and the forces of geography as well as contingency. The non-African story is made simpler in part because of the great bottleneck, and especially the common descent of most peoples from the population that mixed with Neanderthals. The modeling of effective population size changes over time in Sub-Saharan groups does not lead us to believe that it will be so simple in that continent.

Related papers: The genetic history of Ice Age Europe, Tales of Human Migration, Admixture, and Selection in Africa, and Genomic insights into the origin of farming in the ancient Near East.

Rainforest hunter-gatherers are not primitive or primal

Recently I had a discussion with a friend that I suspect the “tropical pygmy” phenotype you see Central Africa and Southeast Asia is a pretty recent development. So this sort of assertion, “The Sentinelese tribe have remained on their North Sentinel Island, almost completely uncontacted for nearly 60,000 years…” is probably wrong. First, the Sentinelese probably arrived with other Andaman peoples during the Pleistocene from mainland Southeast Asia when the archipelago may have been connected to the mainland due to low sea levels.

Second, the small size of many tropical hunter-gatherer populations may simply be due to the difficulty of surviving in this environment. Though rainforests are lush, humans can’t access a lot of it, and small animals tend to require more energy to catch than is justified by how much meat they provide.

Genomics is now on the case: Polygenic adaptation and convergent evolution across both growth and cardiac genetic pathways in African and Asian rainforest hunter-gatherers:

Different human populations facing similar environmental challenges have sometimes evolved convergent biological adaptations, for example hypoxia resistance at high altitudes and depigmented skin in northern latitudes on separate continents. The pygmy phenotype (small adult body size), a characteristic of hunter-gatherer populations inhabiting both African and Asian tropical rainforests, is often highlighted as another case of convergent adaptation in humans. However, the degree to which phenotypic convergence in this polygenic trait is due to convergent vs. population-specific genetic changes is unknown. To address this question, we analyzed high-coverage sequence data from the protein-coding portion of the genomes (exomes) of two pairs of populations, Batwa rainforest hunter-gatherers and neighboring Bakiga agriculturalists from Uganda, and Andamanese rainforest hunter-gatherers (Jarawa and Onge) and Brahmin agriculturalists from India. We observed signatures of convergent positive selection between the Batwa and Andamanese rainforest hunter-gatherers across the set of genes with annotated ‘growth factor binding’ functions (p<0.001). Unexpectedly, for the rainforest groups we also observed convergent and population-specific signatures of positive selection in pathways related to cardiac development (e.g. 'cardiac muscle tissue development'; p=0.003). We hypothesize that the growth hormone sub-responsiveness likely underlying the pygmy phenotype may have led to compensatory changes in cardiac pathways, in which this hormone also plays an essential role. Importantly, we did not observe similar patterns of positive selection on sets of genes associated with either growth or cardiac development in the agriculturalist populations, indicating that our results most likely reflect a history of convergent adaptation to the similar ecology of rainforest hunter-gatherers rather than a more common or general evolutionary pattern for human populations.

A minor note: there is some ethnographic data that the isolated Sentinelese are not as small as the other Andaman Islanders. Some of their small size may simply be due to exposure to diseases and the stress of settlers from the mainland.

The genome of “Cheddar Man” is about to be published

If you are American you have probably heard about “Cheddar Man” in Bryan Sykes’ Seven Daughters of Eve. If you don’t know, Cheddar Man is a Mesolithic individual from prehistoric Britain, dating to 9,150 years before the present. Sykes’ DNA analysis concluded that he was mtDNA haplogroup U5, which is found in ~10% of modern Europeans, and which ancient DNA has found to be overwhelmingly dominant among European hunter-gatherers. But for years there has been controversy as to whether this result was contamination (after all, if it’s found in ~10% of modern Europeans it wouldn’t be surprising if the DNA was contaminated).

Today that is a moot point. On February 18th Channel 4 in the UK will premier a documentary that seems to indicate genomic analysis of Cheddar Man’s remains have been performed, and he turns out to be exactly what we would have expected. That is, he’s a “Western Hunter-Gatherer” (WHG) with affinities to the remains from Belgium, Spain, and Central Europe. These WHG populations were themselves relatively recent arrivals in Pleistocene Europe, with connections to some populations in the Near East, and with unexplored minor genetic admixture from an East Asian population. Their total contribution to the ancestry of modern Europeans varies, with lower fractions in the south of the continent, and the highest in the northeast.

Overall, the consensus seems to be that in Western Europe the genuine descent from indigenous hunter-gatherers passed down through admixture with Neolithic farmers, and then the Corded Ware and Bell Beaker groups, is around ~10%. This is the number that shows up in the press write-ups. But, there are some researchers who contend it is far less than 10%, and that that fraction is misattribution due to early admixture with relatives of these hunter-gatherers as steppe and farmer peoples were expanding.

Phylogenetics aside, one of the major headline aspects of the Cheddar Man is that reconstructions are now of a very dark-skinned and blue-eyed individual. Some of the more sensationalist press is declaring that the “first Britons were black!” As far as the depiction goes, this is literally true. The reconstruction is of a black-skinned individual in the sense we’d describe black-skinned.

But on one level it is entirely expected that this is what Cheddar Man would look like. The hunter-gatherers of Mesolithic Western Europe were genetically homogenous. They seem to derive from a small founder population. And, on the pigmentation loci which make modern Europeans very distinctive vis-a-vis other populations, SLC24A5, SLC45A2 and HERC2-OCA2, they were quite different from anything we’ve encountered before. First, these peoples seem to have had a frequency for the genetic variants strongly implicated in blue eyes in modern Europeans close to what you find in the Baltic region. The overwhelming majority carried the derived variant, perhaps even in regions such as Spain, which today are mostly brown-eyed because of the frequency of the ancestral variant. Second, these European hunter-gatherers tended to lack the genetic variants at SLC24A5 and SLC45A2 correlated with lighter skin, which today in European is found at frequencies of ~100% and 95% to 80% respectively.

The reason that one of the scientists being interviewed stated that there was a “76 percent probability that Cheddar Man had blue eyes” is that they used something like IrisPlex. They put in the genetic variants and popped out a probability. The problem is that the training set here is modern groups, which may have a very different genetic architecture than ancient populations. Recent work on Africans and East Asians indicate that the focus on European populations when it comes to pigmentation genetics has left huge lacunae in our understanding of common variants which affect variation in outcome.

East Asians, for example, lack both the derived variants of SLC24A5 and SLC45A2 common in Europeans but are often quite light-skinned. A deeper analysis of the pigmentation architecture of WHG might lead us to conclude that they were an olive or light brown-skinned people. This is my suspicion because modern Arctic peoples are neither pale white nor dark brown, but of various shades of olive.

As far as blue eyes go, it is reasonable that these individuals had that eye color because that trait seems somewhat less polygenic than skin color. There are darker complected people with light eyes, from the famous “Afghan girl” to the first black American Miss America, Vanessa Williams. The homozygote of the derived HERC-OCA2 variant seems relatively penetrant. From what I recall the literature indicates many people with blue eyes are not homozygotes on this locus for the derived haplotypes, but those who are homozygotes for the derived haplotypes invariably have blue eyes.

Addendum: It isn’t clear in the press pieces, but it looks like they got a high coverage genome sequence out of Cheddar Man. They refer to sequencing, and, they seem to have hit all the major pigmentation loci. This indicates reasonable coverage of the genome.

Understanding prehistory through genetic inference and ancient DNA

Before David Reich’s book, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, I highly recommend a new preprint from Pontus Skoglund and Iain Mathieson*, Ancient genomics: a new view into human prehistory and evolution.

It’s basically at the sweet spot for a lot of readers: doesn’t overemphasize methods or archaeological minutiae that’s hard to follow. That being said I do think you would benefit if you read two things which would complement in those directions, First Farmers: The Origins of Agricultural Societies, and Ancient Admixture in Human History.

* I have to say, I consider Iain a friend, but am I the only one a bit perplexed by how a British person can have such a difficult to spell version of his name? I always have to look it up!

Africa, the churning continent

Martin Meredith’s The Fortunes of Africa glosses very quickly over one of the major reasons that the “great scramble” for the continent occurred in the late 19th century, the discovery of the usefulness of quinine as an anti-malarial agent. Perhaps because I’ve read Plagues and Peoples and The Retreat of the Elephants: An Environmental History of China, I have always been conscious of the role of disease in discouraging conquest and migration (malaria in Italy was also a way to limit the extent of long-term occupation).

The coastal regions of Africa had been subject to the trade and depredations of European actors for nearly 400 years when the Berlin Conference partitioned the continent amongst European powers. Despite the fact that much of the interior was not charted, there had long been a colonial presence. Accra, the modern capital of Ghana, was originally a 16th-century Portuguese fort, but for several centuries between the 17th and 19th centuries, it was actually a possession of Scandinavian powers, Sweden and Denmark! (before passing on to the British)

For all these centuries the heart of Africa was unknown to Europeans, in part because there were native powers blocking their way, but also because the mortality rates were so high for outsiders, as indicated above. It is no surprise that the main European settlement in Africa which was more than a simple trading fort was at the southern tip of the continent, where the climate was Mediterranean and so the disease burden low.

But once quinine, and machine guns, came into the equation the interior was accessible. It all happened rather quickly in a few decades, though in some cases European ‘colonialism’ involved little more than nominal allegiance of tribal chieftains.

Now, a new paper in Cell may herald the beginning of a great genomic scramble to understand the history of Africa. Carl Zimmer in The New York Times has a piece up, Clues to Africa’s Mysterious Past Found in Ancient Skeletons. It begins:

It was only two years ago that researchers found the first ancient human genome in Africa: a skeleton in a cave in Ethiopia yielded DNA that turned out to be 4,500 years old.

On Thursday, an international team of scientists reported that they had recovered far older genes from bone fragments in Malawi dating back 8,100 years. The researchers also retrieved DNA from 15 other ancient people in eastern and southern Africa, and compared the genes to those of living Africans.

The general results of the paper, Skoglund et.al’s Reconstructing Prehistoric African Population Structure, were presented at the SMBE meeting this summer. So in broad sketches I was not surprised, though the details require some digging into.

The Bantu Expansion repatterned the population structure of Africa

Between 1000 BC and 500 AD the expansion of iron wielding agriculturalists from the environs of modern day southern Cameroon reshaped the cultural and genetic landscape of Sub-Saharan Africa. The relatively late date of this expansion should give us a general sense of how careful we need to be about making assertions about “prehistoric Africa.” When Egypt’s New Kingdom was expanding southward along the Nile and into the Levant, Sub-Saharan Africa was qualitatively very different from what we see today in both culture and genetic structure. The continent’s contemporary human geography does not have a deep time depth.

In any case, anyone who has worked with genetic data from Africa is struck by how similar Bantu-speaking populations are genetically. So these results are not world-shaking. South African Zulus occupy positions far closer to Kenyans and Congolese than they do to Khoisan peoples to the west of them facing the Kalahari. The Xhosa people on the cultural frontier of the Bantus in South Africa exhibit substantial admixture from Khoisan (to the point where they have even integrated clicks into their language!), but even they are preponderantly non-Khoisan.

By sampling ancient genomes across a geographical transect which runs up the Rift Valley to Ethiopia, Skoglund et al. show that before the Bantu Expansion there was a north-south genetic relatedness cline. When this result was presented at SMBE a few friends were quite excited that they were being presented a cline, as some researchers have felt that this particular lab group has a tendency to model everything as pulse admixtures between distinct ancestral populations. But the reasonably deep time transect in Malawi exhibited no variance in admixture fractions, which is indicative of the likelihood that its “mixed” status at a particular K cluster is simply an artifact (see this post for what’s going on).

One particular aspect of the results from Malawi is that they found no continuity between contemporary populations, Bantu agriculturalists, and these ancient hunter-gatherers. That is, hunter-gatherers were replaced in toto. This is not entirely surprising, as many researchers who have worked with European ancient DNA believe that hunter-gatherers in many areas left no descendants at all as well (the “hunter-gatherer” fractions in modern groups in a particular region are believed to be due to migration of mixed populations who obtained “hunter-gatherer” ancestry at another locale).

But the Bantus were not the first “intrusive” population

These results also have some moderate surprises. A Tanzanian sample from 1100 BC from a pastoralist context exhibits an ancestral mix which is Sub-Saharan African and West Eurasian/North African. More precisely, about 38 percent of this individual’s ancestry resembles that of the Pre-Pottery Neolithic culture of the Levant, and the rest of the genome most resembles a 4500 year old sample from Ethiopia.

This date is before the initiation of the Bantu Expansion. The genetic results in this work, and earlier publications, strongly points to the likelihood that this population(s) mediated the spread of pastoralism to the south and west. In particular, all Khoisan groups of southern Africa seem to have admixture from this group, more (Khoi) or less (San).

But a curious aspect of this result is that these early pastoralists do not carry any evidence of admixture from ancient eastern farmers from the Zagros region. That is, the West Eurasian gene flow into the Tanzanian pastoralists predates the great exchange/admixture in the Middle East between western and eastern lineages. Since that reciprocal gene flow seems to have occurred at least 2,000 years before the Tanzanian pastoralist’s time, it suggests that this West Eurasian element was in Africa for thousands of years.

The second important point to emphasize is that the Iranian-like component is found among Cushitic speaking Somali and Afar samples, at 15-20% clips. Looking at the supporting tables a wide range of East African populations have the Tanzanian pastoralist ancestry but do not show evidence of the Iranian-like ancestry, which is now ubiquitous in the Middle East, and presumably in the highlands of Ethiopia as well (which usually show somewhat higher levels of Eurasian ancestry than is the case on the coast, especially among Semitic language speakers).

This fact is important because many of the Nilotic peoples are reputed to have absorbed Cushitic groups relatively recently in the past. This is also true for Bantu speaking groups according to these and other data. Finally, the Sandawe, who speak a language with clicks, and so may have some affinity to Khoisan, are often stated to have Cushitic affinities (looking at the data they clearly have West Eurasian ancestry). But their Eurasian ancestry seems to lack the Iranian-like component as well.

None of the populations with putative Cushitic ancestry, but who lack Iranian-like ancestry, speak a Cushitic language (most speak Nilotic languages, but East African Bantus have mixed with these Nilotic groups, so they have the same ancestry). Therefore I wonder if these pastoralists spoke an Afro-Asiatic language in the first place.

A patchy landscape

The phylogenetic tree illustrates the relationships of various African populations without much recent Eurasian ancestry. In The New York Times article David Reich indicates that the Hadza people of Tanzania are the closest Sub-Saharan Africans to the lineage ancestral to non-Africans. This is actually a simplification of what you see in the paper, and is illustrated in the tree to the left. The 4500 year old Ethiopian sample, which does not have Eurasian ancestry, nevertheless is the closest of all Sub-Saharan groups to Eurasians. The Hadza have the highest fraction of this ancestral component of all Sub-Saharan Africans in their data set, but many other populations also carry this ancestry (the Tanzanian pastoralist combined the PPN ancestry with this element).

This was a patchy landscape of inhabitation, because though the Tanzanian pastoralist ancestry, a combination of PPN and proto-Ethiopian, spread all the way to the Cape, there were populations, such as the Hadza and a 400 year old individual sampled from the Kenya island of Pemba, which lacked this genetic variation. Indeed, they are also not on the north-south (proto-Ethiopian to Khoisan) cline that featured so prominently above.

The sampling of ancient individuals is not very dense yet, so we can’t say much. But I think it does indicate we need to be cautious about assumpting gene flow dynamics as-the-crow-flies, simply a function of distance. Ecological suitability no doubt plays a strong role in how populations expand. The Bantus, for example, were stopped in South Africa by the fact that their agricultural toolkit was not suitable for the western half of the country. So when Europeans arrived in the 16th century the residents of the Cape where Khoi pastoralists.

The presence of the Hadza in Tanzania, or an individual of unmixed proto-Ethiopian ancestry on Pemba 400 years ago, indicates that the ethnic geography of East Africa has long been fluid and dynamic. There is no reason to suppose that the Hadza are not themselves migrants from further north, perhaps easily explaining why they are not on the north-south cline so evident from the ancient DNA.

The rise of Basal Humans

Several years ago researchers discovered that the first farmers of Europe, who descended from an Anatolian population, were in part derived from a group which split off very early from other Eurasian populations. This group was termed “Basal Eurasian” (BEu) because it was an outgroup to all other Eurasians, including European hunter-gatherers, East Asians, Oceanians, and the natives of the New World. Subsequent work has shown that the early Neolithic farmers of the Near East, whether they’re from the Levant or the Zagros, had about half their ancestry from this population.

No ancient genomes which are predominantly BEu have been discovered yet. The fact that populations on the cusp of the Holocene seem to have Basal Eurasian ancestry across the Middle East suggests that the admixture with hunter-gatherers related to those of Europe must have occurred during the Pleistocene. But Basal Eurasian is arguably the most parsimonious explanation of the shared drift patterns that we see.

Skoglund et al. suggest that there may be the necessity of a similar construct in Africa. They are not the first, Schlebusch et al. also suggested the necessity of this lineage in the supplements of their preprint on ancient South Africans. Within Skoglund et al. the authors see variation between the far West African Mende and the eastern West African Yoruba, where the latter exhibits closer affinity to East African populations than the former (this includes those such as the proto-Ethiopian with no Eurasian admixture). Additionally, the authors found that Khoisan groups share more alleles with populations in East Africa than they do with those in West Africa even when you account for admixture.

One model that can explain this variation is long range gene flow, so that there would be connections between various regions as a function of their distance. Another explanation is that West African populations are the product of a Basal Human (BHu) population which separated first, before the bifurcation of Khoisan from other human populations. This would reorder our understanding of who the most basal humans are. Additionally, it would align with long-standing work on deep lineages within Africa contributing a minor component of the continent’s ancestry.

As should be clear due to the tree above, BHu postdates the separation of African humans from Neanderthals. One does wonder about the relevance of the Moroccan “modern” human to these models.

Understanding culture from genetics and genetics from culture

The spread of the Bantus over 1500 years from one end of the continent to the other is perhaps one of the most important dynamics we can use to understand the spread of farming more generally. The linguistic unity of the Bantus, or at least their affinity, suggests to us that the first farmers of Europe, who spread across much of the continent in 2500 years, probably exhibited the same pattern. The low levels of gene flow between hunter-gatherers and farmers, despite living in the same regions for thousands of years, can be illustrated with African examples (e.g., the Hadza vs. their Bantu neighbors).

We are rather in the early phase of understanding these dynamics. There are more remains to be found, perhaps in the dry fastness of the Sahara or Sahel? (though unfortunately political considerations may prevent excavation due to danger to archaeologists) The genetics will give us a general idea about the nature of genetic variation and how it arose, but robust cultural models also need to be developed which illustrate how these genetic patterns arose.

Citation: Reconstructing Prehistoric African Population Structure, Skoglund, Pontus et al. Cell , Volume 171 , Issue 1 , 59 – 71.e21