The Genetic History of the Middle East: into Arabia

A new massive preprint on the Middle East is out. I’ve edited the first figure to give people a general sense of the broad results and populations sampled. First, you have to know that these are high-quality modern samples. 137 individuals at 30x whole genome coverage.  In other words, basically the best genomic data you can get on sequences. No need to futz around with subsets of the data. This is important and needful because the 1000 Genomes doesn’t have a Middle Eastern population. So when looking to assemble variants there was a deficit in this domain. Even the WGS of the HGDP was not totally sufficient, since the Middle Eastern populations were not Arabian.

The populations here are sampled from both the classical “Fertile Crescent” and various points within the Arabian peninsula. At the end of the preprint, they do some analysis on selection, which I won’t talk about. The most interesting thing is that they confirm that Arabian people have a unique lactase persistence allele that seems to have been selected very recently, just like in Europeans. A lot of the selection analysis seems to be either replicate what you would find elsewhere. Or, they do not have enough power to detect polygenic selection (though they did detect selection on EDU).

The big finding to me is that this work confirms that there is a north-south cline in the Near East defined by a deep population structure. The admixture graph to the right captures the main features using Lebanese and Emiratis as the two extreme populations, but as you can see in the admixture plot above the cline really runs from the Caucasus to southern Arabia. If you analyze these populations one thing you will see is that Fertile Crescent populations, such as Druze, often seem more like Armenians and Georgians, than South Arabians. Why is this? After all, South Arabians and Fertile Crescent populations speak Semitic languages.

I think the issues here are multiple. First, there is recent admixture that obscures some of the deeper relationships. This is clear insofar as most Arab Muslim populations have Sub-Saharan African admixture. This is historically attested, and physically visible. The variation and range are quite high, in part due to spatial heterogeneity of slavery (e.g., more African slaves in lowlands than highlands), and the recency of the admixture producing variation due to incomplete mixing (the dates are usually 1000 A.D. and later).

But this is not the only admixture. All of the Fertile Crescent populations, along with groups to the north, have much more steppe drift than those to the south in Arabia. The details of the fractions don’t matter, it’s not much, but it’s not trivial, and it’s always higher than among the Arabians. Additionally, this element is new to the region, in relative terms. You can see the contribution in modern Lebanese in comparison to the Bronze Age Sidon samples, which date to 1800 BC. The source could be continuous gene flow during the Roman and Byzantine period, or even later. Or, it could also be Indo-European migrations.

We know that Indo-Iranian peoples were present in Upper Mesopotamia. The Mitanni Kingdom, which had Indo-Aryan affinities, shows up after 1750 BC. The Hittites, the Nesa, show up to the north in Anatolia a bit earlier.  Interestingly, the Hittites speak an Indo-European language that is often considered basal (the outgroup) to most of the others. Armenian, who emerges later in eastern Anatolia, is also quite distinct, just as Greek to the west is. In contrast, there is a lot of suggestive evidence of either genealogical or geographical connectedness between the ancestors of Indo-Iranian and Slavic language families.

The presence of these two very distinct ancestral components, steppe, and Sub-Saharan African, on top of the ancient Near Eastern base, produce distinctions in the modern populations which obscure some of the deeper strands. In the late 2000s when researchers and bloggers began running admixture analyses on Ethiopians it was clear that this population was a mix between “West Eurasian” and African which wasn’t Bantu. The West Eurasian donor population was often Yemeni, in particular Yemeni Jews. Later on, using more sophisticated methods some models suggested greater affinity in Ethiopian genomes to Levantine populations than Yemenis. What was going on?

We now know. It is quite clear Ethiopian populations lack steppe ancestry. In the earlier Bronze Age, and definitely, the Neolithic, Levantines lacked steppe ancestry. In fact, the Neolithic Levantines usually lacked “Iranian” ancestry. The West Eurasian ancestry in Northeast Africans, on the whole, is enriched for a Levantine ancestry quite similar to Natufian. Modern-day South Arabians are the closest to this population mix, even if they are not descended from ancient Levantines. They lack steppe.

Modern-day South Arabians in fact descend in part from indigenous hunter-gatherers, who were a sister clade to the ancestors of Natufians. The admixture graph makes that clear for the Emiratis with the least African ancestry have half their ancestry from this group. In the book Arabs, the author discusses at length various Yemeni legends of a fusion between distinct peoples on the edge of history. This could be recollections of the merge of indigenous Neolithic Arabians and peoples who expanded from the north.

The analyses of these samples confirm and reiterate what has been found with ancient DNA: at some point late in the Neolithic and early in the Bronze Age a massive admixture event occurred in the Fertile Crescent which brought a considerable amount of “Iranian” ancestry into the region (these ancient people are not like modern Iranians; in particular, they lacked steppe ancestry which is copious in much of Iran, particularly the east). This ancestry pushed south and westward so that ~50% of the ancestry of Arabians seems to be Iranian. That being said, I have some qualms here:

We explored whether this ancestry penetrated both the Levant and Arabia at the same time, and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,900 and 5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-3,300) and Arabia (2,000-3,800). These times overlap with the dates for the Bronze Age origin and spread of Semitic languages in the Middle East and East Africa estimated from lexical data (Kitchen et al., 2009; Figure S8). This population potentially introduced the Y-chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ~5.6 [95% CI, 4.8-6.5] kya, agreeing with a potential Bronze Age expansion; however, we do find rarer earlier diverged lineages coalescing ~17 kya (Figure S9). The haplogroup common in Natufians, E1b1b, is also frequent in our dataset, with most lineages coalescing ~8.3 [7-9.7] kya, though we also find a rare deeply divergent Y-chromosome which coalesces 39 kya (Figure S9).

Some of these dates are hard to credit. For example, I obtain a midpoint estimate of Iranian admixture into Egypt around 1836 BC!

The fraction of Iranian ancestry is substantial. The admixture model in the supplements gives this for Egyptians: 45% Levant_N, 32% Iran_N, 8% EHG (Eastern European Hunter-Gatherer), and 15% Mota (African). The older date is 2700 BC. The oldest Egyptian writing dates to 2700 BC, but proto-hieroglyphs are 500 years older. The authors talk about Semitic languages, and ancient Egyptian is not Semitic. So it could be a minority population mixed into the Egyptians, but this is a massive event that we don’t have records of. In fact, the authors claim that it went into much of Northeast Africa at a relatively late date.

Additionally, the values for the Levant seem recent as well. That being said there was a pre-Sumerian civilization, the Uruk Civilization, which spread broadly from Mesopotamia between 4000 and 3000 BC. This is 6000 to 5000 years ago. The midpoint of this is 5500 years, while the midpoint of the admixture into the Syrians, who were on the edge of the Uruk Civilization is 3800 years ago. Basically, I think the evidence points to various statistical genomic artifacts reducing the age from when the admixture truly occurred (this has long been a problem in this field).

I honestly have no idea how to relate the expansion of Semitic languages to the expansion of Iranian languages. My friend Patrick Wyman believes that Anatolian farmers spoke Afro-Asiatic. These were very different people from the Iranians, who arrived from the east later. Additionally, history teaches us that Mesopotamia during the Bronze Age was very linguistically diverse. The Sumerians were not Semitic, and neither were their Elamite neighbors in Khuzistan. The Akkadians, who were more prevalent in the north of Mesopotamia, but were present from the beginning of Sumerian history, were Semitic.

There is still a mystery around the great admixture between Neolithic Near Easterners of the west and the east. I don’t think we’ve closed that chapter of the book.

That being said, there is a lot that is “solved” in this paper. For example, these authors seem to confirm that there is no evidence of “first wave” modern humans in Arabian populations earlier than the non-African radiation. Arabians, like other non-Africans, underwent a population expansion 50-70,000 years ago. Their separation from Mbuti Pygmies was gradual up until 120,000 years ago. Then there seems to have been a separation. What this is telling us, I believe, is that the ancestors of non-Africans were part of the African meta-population until 120,000 years ago. This is suspiciously close to the Eemian Interglacial, which dates to between 115,000 to 130,000 years ago. The Eemian was characterized by a “Green Sahara”, so it seems that this is when early modern humans ventured in substantial numbers out of the continent and to its peripheries. One issue that seems notable in the data is that proto-non-Africans seem to have been characterized by a period of isolation and small population size. Perhaps

But 50-70,000 years ago a massive expansion of one of these daughter populations occurred. These data confirm that Arabians seem to have the same Neanderthal admixture as everyone else, but, even accounting for Sub-Saharan African ancestry they also have somewhat less. In alignment with earlier research, they argue that this is due to admixture with “Basal Eurasian” populations which did not mix with Neanderthals ~55,000 years ago.  Or, more precisely, did not carry as much Neanderthal ancestry (it seems plausible that the Basal Eurasian populations are themselves a compound of conventional non-African at the base of the broader splits, and a deeper basal group which lacks Neanderthal ancestry).

Going back to the admixture graph, you notice that both western and eastern farmer populations are a compound of Basal Eurasian and various lineages that are broadly “West Eurasian.”  Natufians and Anatolian farmers are descended about half from groups related to European hunter-gatherers, while ancient Neolithic Iranians had ancestry related to these people, but even more to populations distantly related to Ancient North Eurasians (Paleo-Siberians). The events here are distant, but the sample proportion of Basal Eurasian ancestry indicates to me a rapidly expanding population at some point which mixed with a well-structured set of groups in the Near East.

The major takeaways

  • Near Easterners are part of the same broad diversification as all other non-Africans
  • The expansion of these non-Africans dates to 50-70,000 years ago
  • Archaeological evidence points to a very intense expansion in the period around ~50,000 years ago, and admixture with Neanderthals somewhat before then
  • At the beginning of the Holocene Near Easterners were deeply structured regionally, and had threaded together disparate ancestral components (Basal Eurasian, related to European hunter-gatherer and Paleo-Siberian)
  • Late in the Neolithic and early Bronze Age much of this structure collapsed, and there was a massive admixture of Iranian ancestry to the south and west (conversely, there is evidence in other work of admixture of western farmer ancestry to the east)
  • Finally, there is evidence for later incursions of steppe people into the northern Arabian fringe and Fertile Crescent
  • On top of this, there is historical admixture from Africans and in the north Turks and other groups