Yemen and the Yemeni Jews


In my Substack post Under pressure: the paradox of the diamond I said this:

The implication of these DNA results is that Yemeni Jews are by and large descended from natives of this region of Arabia. They are converts, and their genetic uniqueness is a function of their isolation from demographic currents that swept across Arabia with the rise of Islam. The Yemenis of the highlands, isolated by geography, show the same genetic signature of isolation, as they descend solely from the original inhabitants of the region. This is the nth demonstration that culture and geography are both powerful factors driving genetic distinctiveness.

Some people took objection, or, inquired further, as to why I said this. From High-resolution inference of genetic relationships among Jewish populations:

Four Jewish populations included in the study—Ethiopian Jews, Indian Jews from Cochin, Indian Jews from Mumbai, and Yemenite Jews—are considered to be culturally distinct and not part of the Ashkenazi, Mizrahi, North African, or Sephardi groups; they are therefore not analyzed in sets…

…Figure 1b reveals a distinctive position for the Yemenite Jewish samples in relation to other Jewish populations…

…The resulting MDS plot (Fig. 1c) places the Yemenite Jews near Bedouin, Saudi Arabian, and Yemenite non-Jewish populations…

…Jewish populations have mixed membership in the two clusters, with the exception of the Yemenite Jews, who are placed primarily in the main cluster among Middle Eastern populations. For K = 3, the third cluster (dark blue) separates the Mozabite and Moroccan populations. Non-Jewish populations from the Levant generally have substantial membership in this cluster, as do North African and Yemenite Jews.

For K = 6, Yemenite Jews have relatively high membership in the new cluster, which also has substantial membership from Middle Eastern populations such as Bedouins and Saudi Arabians (pink)…

We further reduced the population set, exploring structure among Jewish populations, continuing to exclude Ethiopian and Indian Jews, and also excluding the relatively dissimilar Yemenite Jews (population set 4)…

You can look at the plots above. I also added some of my own after I added Vyas et al. Yemen samples (warning, only 7,000 SNP intersection!). Using my own Fst, PCA, and TreeMix, I think it’s possible that the modern Yemenis aren’t related to ancient Yemenis, but Yemeni Jews clearly cluster with modern Arabian populations.

What does the three-population test say? You can look here, but Yemeni Jews don’t show a significant deviation from a three-population phylogeny when they’re an outgroup with the populations I have. That means with my particular model they’re probably best thought of as an ancient Arabian population without much gene flow from external sources (they don’t have much African admixture, unlike other Yemenis).

If you want to see the alternative, please read Mitochondrial DNA reveals distinct evolutionary histories for Jewish populations in Yemen and Ethiopia. I’m not spending any more time on this.

Population Pairwise Fst on 250,000 SNPs

People routinely ask me about a place to find pairwise Fst values. I have a dataset with 250,000 SNPs and 200 populations, and a script using plink that generates pairwise differences crosses populations. Here are two files with the results:

A file with the Fst values between populations in rows

A file with the Fst values between populations as a matrix

The Tarim Mummies were the last of the Paleo-Siberians


The paper that reported on the data from the famous “Tarim Mummies” is out. If you don’t have a good grasp of the alphabet soup of ancient early Holocene populations, the results are going to be hard to parse. So I’ll make it simple for you: it now looks that the Tarim populations from 4,000 years ago are among the last people who were mostly “Ancestral North Eurasian” (ANE), and, they had no connection to populations in Europe. The second part is important because Victor Mair and others who have examined the mummies are wont to proclaim that the “Loulan Beauty” and her peers were “Caucasoid” due to their physical features. This may still be technically true, but the inference that this has to do with migration from the west of a European-origin population turns out to be false.

Being wrong is not a big deal. My own suspicion and assumption that these were part of the early movement of Iranian people eastward also turned out to have been wrong too. This is clear because the Tarim population from 4,000 years ago didn’t have any gene flow from the eastern Yamnaya, the Afansievo, let alone the larger Corded Ware reflux to the steppe (Iranian people did move into the Tarim zone later; the languages of the southern rim of the basin in historic times were East Iranian, and Iranians seem to have arrived in the north in Mongolia late in the Bronze Age).

I’ve added some stuff to the plot below to make it clear:

The PCA above is consistent with the Tarim mummies being mostly descended from ANE, though they have a minority of northern East Asian ancestry.

Interestingly, the earlier remains from Dzungaria are mostly descended from Afanasievo populations with a minority of ANE ancestry. The authors conclude, correctly I think, that this points to the likely origins of the Tocharian languages from the Afanasievo, and the possibility (I bet) that the ancient Yamnaya language was similar to that of the Tocharians. The fact that the Tarim people seem to have been mostly very distant branches of R1b illustrates the origin of the R lineage deep in Siberia during the Pleistocene. R and Q are clearly from the Paleo-Siberians.

Finally, let’s talk about the famous “European” or “white” Buddhist monks depicted at Tufan:

Tajik Girl

The Chinese describe some of these people as having light hair and eyes. In other words, they looked like Europeans. Previous work had argued that this was due to the Tocharians being descended from European-like people. But we now have enough evidence from the Yamnaya to know that very few were pale-eyed or light-haired. Rather, they were a dark-haired and dark-eyed population with olive skin. Unless there was later natural selection for these characteristics, this isn’t due to the Afanasievo ancestry of the Tocharians (who by the period 500-1000 AD were mostly localized to the northeast of the modern Tarim basin, around Turfan). Rather, the Tocharians were themselves a mix of people, and I believe those with Europoid physical appearance had those because they were heavily Iranianized in ancestry.

Today most people associate “Iranian” with Iran, and Persia, but Persians emerged on the southwestern frontier of the Iranian world, as heirs of Anshan and Elam (these were non-Indo-European societies). For much of history, Iranian-speaking people spanned the zone between Hungary and Mongolia and were much more physically and culturally diverse. To this day many Tajiks could pass for European, and these people have some of the highest fractions of “steppe herder” ancestry in the world. A minority of the Sintashta likely had blue eyes going by their genomes, so I think the origin of these “Europoid” people has to be interactions with the expanding Andronovo-horizon in the latter period of the Bronze Age.

These results in this paper show that the core population 4,000 years ago in the region of the Tarim that was later home to the Tocharians was inhabited by an ANE/Paleo-Siberian population, with a minority component of ancestry derived from northern East Asians. This ancestry dates to the early Holocene, 10,000 years ago. The later Tocharians probably absorbed these people, but I believe they were a mix of post-Afanasievo populations and Iranians. The former gave the Tocharians their unique and very basal Indo-European language, and the latter were responsible for “European” physical features so noted by the Han Chinese chroniclers in the 1st millennium A.D.

Note: the ANE are closer to West Eurasians than East Eurasians, but they are very distantly related to the former. Their ancestors seem to have diverged from European and West Asian hunter-gatherers 35,000 to 40,000 years ago.

Razib Khan 30x whole-genome sequence data

About four years ago I posted my genotype data for anyone who wanted it. This included the raw export files from consumer genomics firms + my VCF file generated by Dante Labs.

Today I will make my raw data all public from Dante Labs. This means you can access

– raw reads
– .bam file
– .vcf files, as well as files with CNVs and SVs

This is all 30x coverage so be warned these aren’t the smallest files. Here is the link to my data.

If you find something noteworthy, reach out to me! For those who want geographic provenance, seven of my eight great-grandparents were born in the Comilla region of modern Bangladesh. The eight was born in the Noakhali region, just to the south of Comilla.

Horses were domesticated before mammoth went extinct

The origins and spread of domestic horses from the Western Eurasian steppes:

Domestication of horses fundamentally transformed long-range mobility and warfare1. However, modern domesticated breeds do not descend from the earliest domestic horse lineage associated with archaeological evidence of bridling, milking and corralling at Botai, Central Asia around 3500 BC3. Other longstanding candidate regions for horse domestication, such as Iberia5 and Anatolia6, have also recently been challenged. Thus, the genetic, geographic and temporal origins of modern domestic horses have remained unknown. Here we pinpoint the Western Eurasian steppes, especially the lower Volga-Don region, as the homeland of modern domestic horses. Furthermore, we map the population changes accompanying domestication from 273 ancient horse genomes. This reveals that modern domestic horses ultimately replaced almost all other local populations as they expanded rapidly across Eurasia from about 2000 BC, synchronously with equestrian material culture, including Sintashta spoke-wheeled chariots. We find that equestrianism involved strong selection for critical locomotor and behavioural adaptations at the GSDMC and ZFPM1 genes. Our results reject the commonly held association7 between horseback riding and the massive expansion of Yamnaya steppe pastoralists into Europe around 3000 BC8,9 driving the spread of Indo-European languages10. This contrasts with the scenario in Asia where Indo-Iranian languages, chariots and horses spread together, following the early second millennium BC Sintashta culture.

And, Late Quaternary dynamics of Arctic biota from ancient environmental genomics:

During the last glacial–interglacial cycle, Arctic biotas experienced substantial climatic changes, yet the nature, extent and rate of their responses are not fully understood1,2,3,4,5,6,7,8. Here we report a large-scale environmental DNA metagenomic study of ancient plant and mammal communities, analysing 535 permafrost and lake sediment samples from across the Arctic spanning the past 50,000 years. Furthermore, we present 1,541 contemporary plant genome assemblies that were generated as reference sequences. Our study provides several insights into the long-term dynamics of the Arctic biota at the circumpolar and regional scales. Our key findings include: (1) a relatively homogeneous steppe–tundra flora dominated the Arctic during the Last Glacial Maximum, followed by regional divergence of vegetation during the Holocene epoch; (2) certain grazing animals consistently co-occurred in space and time; (3) humans appear to have been a minor factor in driving animal distributions; (4) higher effective precipitation, as well as an increase in the proportion of wetland plants, show negative effects on animal diversity; (5) the persistence of the steppe–tundra vegetation in northern Siberia enabled the late survival of several now-extinct megafauna species, including the woolly mammoth until 3.9 ± 0.2 thousand years ago (ka) and the woolly rhinoceros until 9.8 ± 0.2 ka; and (6) phylogenetic analysis of mammoth environmental DNA reveals a previously unsampled mitochondrial lineage. Our findings highlight the power of ancient environmental metagenomics analyses to advance understanding of population histories and long-term ecological dynamics.

Related to the first paper, The horse bit and bridle kicked off ancient empires – a new giant dataset tracks the societal factors that drove military technology:

Starting around 3,000 years ago, a wave of innovation began to sweep through human societies around the globe. For the next millennium the continued emergence of new technologies had a dramatic effect on the course of human history.

This era saw the advancement of the ability to control horses with bit and bridle, the spread of iron-working techniques through Eurasia that led to hardier and cheaper weapons and armor and new ways of killing from a distance, such as with crossbows and catapults. On the whole, warfare became much more deadly.

I still think this is a plausible model:

– Horses were used opportunistically on the Eurasian steppe between 3500 BC and 2000 BC

– The horse was fully domesticated 2000 BC with the emergence of the light war-chariot

– The true power of the horse as a military instrument emerged after 1000 BC due to the emergence of mounted cavalry

I can be convinced that the horse wasn’t used opportunistically but it makes the militaristic expansion of Corded Ware so much more plausible to me. We’ll see what David Anthony comes up with in the near future, as he’s on the horse-beat too…

Open Thread – 10/17/2021 – Gene Expression

Reading Julia Galef’s The Scout Mindset: Why Some People See Things Clearly and Others Don’t. I’ve known Julia for a decade now, and I really respect her. She’s a good-faith actor.

Less time for this blog, but I’ve been posting a lot on the Substack. If you don’t like newsletters (but you should!), I’m still pushing stuff to razib.com, with a total-content-feed RSS.

I’m rebooting the South Asian Genotype Project. Email me at razib.com.

There are 2,438 articles on bioRxiv when you look for China, genetics, and genomics. For India, the figure is 923. Pretty sad.

The Aristocracy of Talent: How Meritocracy Made the Modern World. Another book on my “to-read.”

The promise of disease gene discovery in South Asia.

Untapped opportunities for rare disease gene discovery in India.

Genomic insights into the population history and biological adaptation of Southwestern Chinese Hmong-Mien people.

Admixture dynamics in colonial Mexico and the genetic legacy of the Manila Galleon.

A Genomic Perspective on the Evolutionary Diversification of Turtles.

Indo-European podcasts

I’ve been doing my steppe series since the spring of 2021 and will be continuing it into 2022 (chronologically). But I thought some blog readers might have missed the podcasts

Thomas Olander: the origin and spread of Indo-European languages
David Anthony: the origin of Indo-Europeans
Kristian Kristiansen: the birth of Northern Europe
James P. Mallory: finding the Indo-Europeans

(all ungated)

The Pleistocene was more interesting than we think

Late Pleistocene/Early Holocene sites in the montane forests of New Guinea yield early record of cassowary hunting and egg harvesting:

Eggshell is an understudied archaeological material with potential to clarify past interactions between humans and birds. We apply an analytical method to legacy collections of Late Pleistocene to mid-Holocene cassowary eggshell and demonstrate that early foragers in the montane rainforests of New Guinea preferentially collected eggs in late stages of embryonic growth. This finding suggests that foragers regulated the exploitation of cassowaries and may have hatched eggs to rear chicks. The montane rainforests of New Guinea may thus present the earliest known evidence of human management of avian breeding.

Related, Why Civilization Is Older Than We Thought.

Open Thread – 10/01/2021 – Gene Expression

Over at my Substack I’ve posted my interview with Steven Pinker on Rationality. You can read my review at UnHerd.

I would appreciate it if readers of this weblog leave positive reviews on my podcast for Apple or Stitcher.

Also, I posted a long piece on the arrival of humans to the New World, and the finds that suggest humans were here before the LGM.