Substack cometh, and lo it is good. (Pricing)

The Genetic History of the Middle East: into Arabia

A new massive preprint on the Middle East is out. I’ve edited the first figure to give people a general sense of the broad results and populations sampled. First, you have to know that these are high-quality modern samples. 137 individuals at 30x whole genome coverage.  In other words, basically the best genomic data you can get on sequences. No need to futz around with subsets of the data. This is important and needful because the 1000 Genomes doesn’t have a Middle Eastern population. So when looking to assemble variants there was a deficit in this domain. Even the WGS of the HGDP was not totally sufficient, since the Middle Eastern populations were not Arabian.

The populations here are sampled from both the classical “Fertile Crescent” and various points within the Arabian peninsula. At the end of the preprint, they do some analysis on selection, which I won’t talk about. The most interesting thing is that they confirm that Arabian people have a unique lactase persistence allele that seems to have been selected very recently, just like in Europeans. A lot of the selection analysis seems to be either replicate what you would find elsewhere. Or, they do not have enough power to detect polygenic selection (though they did detect selection on EDU).

The big finding to me is that this work confirms that there is a north-south cline in the Near East defined by a deep population structure. The admixture graph to the right captures the main features using Lebanese and Emiratis as the two extreme populations, but as you can see in the admixture plot above the cline really runs from the Caucasus to southern Arabia. If you analyze these populations one thing you will see is that Fertile Crescent populations, such as Druze, often seem more like Armenians and Georgians, than South Arabians. Why is this? After all, South Arabians and Fertile Crescent populations speak Semitic languages.

I think the issues here are multiple. First, there is recent admixture that obscures some of the deeper relationships. This is clear insofar as most Arab Muslim populations have Sub-Saharan African admixture. This is historically attested, and physically visible. The variation and range are quite high, in part due to spatial heterogeneity of slavery (e.g., more African slaves in lowlands than highlands), and the recency of the admixture producing variation due to incomplete mixing (the dates are usually 1000 A.D. and later).

But this is not the only admixture. All of the Fertile Crescent populations, along with groups to the north, have much more steppe drift than those to the south in Arabia. The details of the fractions don’t matter, it’s not much, but it’s not trivial, and it’s always higher than among the Arabians. Additionally, this element is new to the region, in relative terms. You can see the contribution in modern Lebanese in comparison to the Bronze Age Sidon samples, which date to 1800 BC. The source could be continuous gene flow during the Roman and Byzantine period, or even later. Or, it could also be Indo-European migrations.

We know that Indo-Iranian peoples were present in Upper Mesopotamia. The Mitanni Kingdom, which had Indo-Aryan affinities, shows up after 1750 BC. The Hittites, the Nesa, show up to the north in Anatolia a bit earlier.  Interestingly, the Hittites speak an Indo-European language that is often considered basal (the outgroup) to most of the others. Armenian, who emerges later in eastern Anatolia, is also quite distinct, just as Greek to the west is. In contrast, there is a lot of suggestive evidence of either genealogical or geographical connectedness between the ancestors of Indo-Iranian and Slavic language families.

The presence of these two very distinct ancestral components, steppe, and Sub-Saharan African, on top of the ancient Near Eastern base, produce distinctions in the modern populations which obscure some of the deeper strands. In the late 2000s when researchers and bloggers began running admixture analyses on Ethiopians it was clear that this population was a mix between “West Eurasian” and African which wasn’t Bantu. The West Eurasian donor population was often Yemeni, in particular Yemeni Jews. Later on, using more sophisticated methods some models suggested greater affinity in Ethiopian genomes to Levantine populations than Yemenis. What was going on?

We now know. It is quite clear Ethiopian populations lack steppe ancestry. In the earlier Bronze Age, and definitely, the Neolithic, Levantines lacked steppe ancestry. In fact, the Neolithic Levantines usually lacked “Iranian” ancestry. The West Eurasian ancestry in Northeast Africans, on the whole, is enriched for a Levantine ancestry quite similar to Natufian. Modern-day South Arabians are the closest to this population mix, even if they are not descended from ancient Levantines. They lack steppe.

Modern-day South Arabians in fact descend in part from indigenous hunter-gatherers, who were a sister clade to the ancestors of Natufians. The admixture graph makes that clear for the Emiratis with the least African ancestry have half their ancestry from this group. In the book Arabs, the author discusses at length various Yemeni legends of a fusion between distinct peoples on the edge of history. This could be recollections of the merge of indigenous Neolithic Arabians and peoples who expanded from the north.

The analyses of these samples confirm and reiterate what has been found with ancient DNA: at some point late in the Neolithic and early in the Bronze Age a massive admixture event occurred in the Fertile Crescent which brought a considerable amount of “Iranian” ancestry into the region (these ancient people are not like modern Iranians; in particular, they lacked steppe ancestry which is copious in much of Iran, particularly the east). This ancestry pushed south and westward so that ~50% of the ancestry of Arabians seems to be Iranian. That being said, I have some qualms here:

We explored whether this ancestry penetrated both the Levant and Arabia at the same time, and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,900 and 5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-3,300) and Arabia (2,000-3,800). These times overlap with the dates for the Bronze Age origin and spread of Semitic languages in the Middle East and East Africa estimated from lexical data (Kitchen et al., 2009; Figure S8). This population potentially introduced the Y-chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ~5.6 [95% CI, 4.8-6.5] kya, agreeing with a potential Bronze Age expansion; however, we do find rarer earlier diverged lineages coalescing ~17 kya (Figure S9). The haplogroup common in Natufians, E1b1b, is also frequent in our dataset, with most lineages coalescing ~8.3 [7-9.7] kya, though we also find a rare deeply divergent Y-chromosome which coalesces 39 kya (Figure S9).

Some of these dates are hard to credit. For example, I obtain a midpoint estimate of Iranian admixture into Egypt around 1836 BC!

The fraction of Iranian ancestry is substantial. The admixture model in the supplements gives this for Egyptians: 45% Levant_N, 32% Iran_N, 8% EHG (Eastern European Hunter-Gatherer), and 15% Mota (African). The older date is 2700 BC. The oldest Egyptian writing dates to 2700 BC, but proto-hieroglyphs are 500 years older. The authors talk about Semitic languages, and ancient Egyptian is not Semitic. So it could be a minority population mixed into the Egyptians, but this is a massive event that we don’t have records of. In fact, the authors claim that it went into much of Northeast Africa at a relatively late date.

Additionally, the values for the Levant seem recent as well. That being said there was a pre-Sumerian civilization, the Uruk Civilization, which spread broadly from Mesopotamia between 4000 and 3000 BC. This is 6000 to 5000 years ago. The midpoint of this is 5500 years, while the midpoint of the admixture into the Syrians, who were on the edge of the Uruk Civilization is 3800 years ago. Basically, I think the evidence points to various statistical genomic artifacts reducing the age from when the admixture truly occurred (this has long been a problem in this field).

I honestly have no idea how to relate the expansion of Semitic languages to the expansion of Iranian languages. My friend Patrick Wyman believes that Anatolian farmers spoke Afro-Asiatic. These were very different people from the Iranians, who arrived from the east later. Additionally, history teaches us that Mesopotamia during the Bronze Age was very linguistically diverse. The Sumerians were not Semitic, and neither were their Elamite neighbors in Khuzistan. The Akkadians, who were more prevalent in the north of Mesopotamia, but were present from the beginning of Sumerian history, were Semitic.

There is still a mystery around the great admixture between Neolithic Near Easterners of the west and the east. I don’t think we’ve closed that chapter of the book.

That being said, there is a lot that is “solved” in this paper. For example, these authors seem to confirm that there is no evidence of “first wave” modern humans in Arabian populations earlier than the non-African radiation. Arabians, like other non-Africans, underwent a population expansion 50-70,000 years ago. Their separation from Mbuti Pygmies was gradual up until 120,000 years ago. Then there seems to have been a separation. What this is telling us, I believe, is that the ancestors of non-Africans were part of the African meta-population until 120,000 years ago. This is suspiciously close to the Eemian Interglacial, which dates to between 115,000 to 130,000 years ago. The Eemian was characterized by a “Green Sahara”, so it seems that this is when early modern humans ventured in substantial numbers out of the continent and to its peripheries. One issue that seems notable in the data is that proto-non-Africans seem to have been characterized by a period of isolation and small population size. Perhaps

But 50-70,000 years ago a massive expansion of one of these daughter populations occurred. These data confirm that Arabians seem to have the same Neanderthal admixture as everyone else, but, even accounting for Sub-Saharan African ancestry they also have somewhat less. In alignment with earlier research, they argue that this is due to admixture with “Basal Eurasian” populations which did not mix with Neanderthals ~55,000 years ago.  Or, more precisely, did not carry as much Neanderthal ancestry (it seems plausible that the Basal Eurasian populations are themselves a compound of conventional non-African at the base of the broader splits, and a deeper basal group which lacks Neanderthal ancestry).

Going back to the admixture graph, you notice that both western and eastern farmer populations are a compound of Basal Eurasian and various lineages that are broadly “West Eurasian.”  Natufians and Anatolian farmers are descended about half from groups related to European hunter-gatherers, while ancient Neolithic Iranians had ancestry related to these people, but even more to populations distantly related to Ancient North Eurasians (Paleo-Siberians). The events here are distant, but the sample proportion of Basal Eurasian ancestry indicates to me a rapidly expanding population at some point which mixed with a well-structured set of groups in the Near East.

The major takeaways

  • Near Easterners are part of the same broad diversification as all other non-Africans
  • The expansion of these non-Africans dates to 50-70,000 years ago
  • Archaeological evidence points to a very intense expansion in the period around ~50,000 years ago, and admixture with Neanderthals somewhat before then
  • At the beginning of the Holocene Near Easterners were deeply structured regionally, and had threaded together disparate ancestral components (Basal Eurasian, related to European hunter-gatherer and Paleo-Siberian)
  • Late in the Neolithic and early Bronze Age much of this structure collapsed, and there was a massive admixture of Iranian ancestry to the south and west (conversely, there is evidence in other work of admixture of western farmer ancestry to the east)
  • Finally, there is evidence for later incursions of steppe people into the northern Arabian fringe and Fertile Crescent
  • On top of this, there is historical admixture from Africans and in the north Turks and other groups

17 thoughts on “The Genetic History of the Middle East: into Arabia

  1. My friend Patrick Wyman believes that Anatolian farmers spoke Afro-Asiatic.

    Well it’s hardly too north for Afro-Asiatic. I think their homeland lies on the border of Egypt with the Levant. But maybe North Africa. What do you think?

  2. Looks messed up without dzudzuana and ane separately which probably forced them to use popovo as a near 50% source for Iran_n. And once again not acknowledging CHG? Is there like a copyright that they are afraid of violating?

  3. @DaThang, Dzuduana sample probably a must for Basal Eurasian questions; the preprint showed that the sample produced overlapping BEu estimates to AfontovaGora3 and probably to the EHG. So questions about that. (Is BEu really a thing or just reflects composite of separate phenomena of geneflow with Northern African populations and some very splits within Upper Palaeolithic Western Eurasian class + admixture with early East Eurasian splits?)

    Whole genomes probably useful to explore this without confounds from arrays.

  4. It’s a minor quibble, but I really wish they sampled the non-Arabic South Arabian populations, including (non-SSA-admixed) Soqotri people. The latter group in particular is pretty phenotypically distinct (lots of non-SSA looking people with skin that would be dark even in South Asia). My presumption is they would be even more natufian-shifted than Arabs, and possibly provide a decent model for the ancient South Arabian hunter-gathers.

    Or, it could just be they’re so dark because they are considerably further south than Arabs, which meant that the non-derived pigmentation alleles were selected for, instead of selected against.

  5. Using western Iran neolithic + ANE + EHG as sources for eastern Iran mesolithic as a target shows eastern Iran mesolithic overwhelmingly prefers ANE over EHG. The reason why I did this is because the ANE in Iran (east and west) should come from a common source and the different between east and west is mostly in the amount of the components. So if the difference prefers ANE over EHG, that should also be true regarding ANE vs EHG input in western Iran and by extension of this and the east Iran test: to all of Iran post upper paleolithic populations in general.

    So they really should have used AG3 instead of EHG as an input to Iran. Did I also mention that east Iran (and by extrapolation, ANE in Iran in general) prefers AG3 over MA1. Yeah so why not just use AG3?

    This along with not differentiating Iran from CHG is strange at this point.

  6. Mitanni qua Mitanni shows up after the Hittite sack of Hammurabid Babylon. This is variously dated, likely early 1500s BC.
    The Hittites in that adventure were aided by a Hurrian king whom they do not identify as “Mitanni” nor with any of those IndoAryan names. After the sack, the Hittite “old” kingdom faltered; the Indic names start showing up in Hurrian tablets dating later.
    Egypt also notes a “Mitanni” later in the 1500s BC, during their New Kingdom. Maybe as late as Amarna.
    I think you were right earlier, in your Brown Pundits post about the indica bovines, that Mitanni is post 1600 BC.
    (PS. sorry if I sound negative, it’s just that when I agree I don’t like to clutter up your comments with #metoo)

  7. You and Spencer did a podcast a couple of years ago with another guy talking about South Arabia as home for early out of Africans. How does this study sync with that information?

  8. The linguistic analysis, as exemplified by Supplementary Materials map S8 is really bad.

    I’ve never seen any source other than this one put a proposed place of origin of the Semitic languages in the Zargos mountains. See, e.g. https://en.wikipedia.org/wiki/Proto-Semitic_language Genetics can inform linguistic analysis, but use some common sense people. If you are going to do any linguistic analysis at all, at least do some very basic background research.

  9. Funny enough, Genesis makes Abraham, patriarch of the Israelites, an easterner related to Elam, an ancient Iranian civilization preceding the Indo-Europeans. Not saying that its authors remembered an ancient Iranian migration- they’re probably making a case for shared descent with the Persians who allowed the post-exile scribes to return to Judea.

  10. @Razib The ancient pastoralists from Kenya/Tanzania (see Kenya_Pastoralists_IA) and groups like the Somali appear to lack Iranian ancestry, but groups from the Ethiopian highlands like the Amhara and Tigrinya who speak Semitic languages tend to have some in correlation to their partial Arabian ancestry. Please correct me if I am wrong. If we take that into account, these admixture dates don’t seem too far fetched. As for Egypt, it’s plausible that the genetic landscape of the country was shaped by the influx of people from the Levant who likely spoke Semitic languages, at least originally. There’s plenty of evidence from the historical and archeological record that Semitic speakers were present in the Delta and thereabouts. If these admixture dates are corroborated, it would suggest that these settlers left a demographic impact on the genetic landscape of Egypt. What do you think?

    Re: Iranian ancestry in the Middle East “We explored whether this ancestry penetrated both the Levant and Arabia at the same time, and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,900 and 5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-3,300) and Arabia (2,000-3,800).“

    On a side note, I wish they had included Dzudzuana and Taforalt in this study, since they’re important to understanding the genesis of populations in the Middle East.

  11. Based on the supplementary documents, it actually looks like the Somali have some Iranian admixture within their Eurasian ancestry, but not nearly as much as their Ethio-Semitic neighbors, which doesn’t come as a surprise.

  12. yeah. pontus skoglund told me about the iranian in horn ppl coming later. the earliest pastoralists are PPN levant style in his telling.

    i think some of the african stuff could be late (last pulse).

    skeptical of egypt though. i know about levantines in egypt, but these are small numbers. nothing like the turnover we read about elsewhere (amorites), and even there it doesn’t look like a big change in the ancient DNA.

  13. Razib,

    The paper dates the divergence between Mbuti and Eurasians at 120,000 ybp using the MSMC2 program, and in Figure 2 the dotted line along the y-axis, the cross-coalescence rate, is at 0.50, which intersects with the 120k date of the Eurasian-Mbuti divergence on the X-axis. Does this mean essentially that by 120,000 years ago, 50% of the unqiue drift that defines Eurasians relative to Mbuti had already accumulated in a proto-Eurasian type population?

  14. There was also a massive admixture of Iranian ancestry to the steppe that formed and caused the steppe admixture to expand.

Comments are closed.