The Hui Muslims, two pulses of “western” ancestry 1,000 and 500 years ago, mostly male mediated

There are 20 million Hui people in China. These are traditionally Chinese-speaking Muslims. Though they are found in every region of China (and in the Chinese Diaspora), they are concentrated in the northwest, in Gansu and Ningxia in particular. Their origins have always been curious. They speak the local Chinese dialect, and look mostly Chinese, but they are traditional Muslims. Are they purely the descendants of local converts?  The Hui do not believe so, and some of them physically do look more West Eurasian.

Thanks to genetics we know the answer,  Genetic Origins and Sex-Biased Admixture of the Huis:

To investigate whether there was a sex-biased admixture in the history of NXH, we compared the admixture results obtained from autosomes, X chromosome, mtDNA, and Y chromosome. We estimated the admixture proportion assuming two major ancestral components, that is, western and eastern (fig. 3 and supplementary tables S2 and S3Supplementary Material online). The estimated genetic contribution of the western ancestry into NXH was 8.6% for autosomes, 5.9% for X chromosome, 3.6% for mtDNA, and 39.3% for Y chromosome, respectively. The results of Y chromosome and mtDNA were consistent with the previous studies (Yao et al. 2004Wang et al. 2019Xie et al. 2019). Additionally, though the difference in genetic contribution was small, there was a significant difference in admixture proportions between autosomes and X chromosome (Student’s t-test, P<10710−7). This pattern was consistent across different regions in Ningxia (fig. 3C). These results indicated that the admixture of NXH was sex biased to the combination of Eastern females and Western males.

The 39% western Y chromosomes is key. This is probably a floor for the fraction of originally Muslim lineages. The Hui likely had Iranian and Turkic precursors, and the latter would have had eastern Y chromosomes. But the point is that cultural continuity was maintained in paternal lineage systems, and over the generations intermarriage with local Han women resulted in 90% of the genome being replaced over time.

This is a common pattern in large parts of the world. Paternal cultural transmission is a thing.

Nick Patterson responds to Feldman and Riskin’s NYRB piece

Nick Patterson has responded on his Substack to the NYRB piece Why Biology is not Destiny, which itself is an attack on Kathryn Paige Harden’s book The Genetic Lottery. Patterson does not say anything you can’t find in Stuart Ritchie’s defense, but he is at this stage in his career a very eminent scholar operating from a perch in one of the world’s most prominent population genetics laboratories. So him speaking publically is worthwhile in my opinion, since a lot of this is going to shake out in credential thumping and status signaling (Marcus Feldman in particular is a massive deal in population genetics so it needs to be responded to those in senior positions)…

Happy DNA Day (and whole genome sequencing yourself)

Today is “DNA Day,” I checked Nebula Genomics website to see if there was a deal. So I got the 30x whole genome sequencing for $199+$24.99/month subscription. The deal is you have to get the subscription, but you can cancel at any time. What I plan to do is just download my data when the results come back and the subscription starts ticking, and cancel after that. The other options are more expensive. But, they won’t let you add more than 1 item unless you get the most expensive upfront deal, so what I did was just started separate carts and sent the order to separate email addresses.

And yes, I have my own DNA sequenced. This is for friends and family.

If you want to download my whole genome sequence, from raw reads to bam files to vcf’s, go here.

Update: It was pointed out to me that quarterly would charge $75 as it will be $25 per month quarterly (3 months).

Why my Substack posts are better and worse than ancestry calculators


Of all my Substack posts, Ashkenazi Jewish genetics: a match made in the Mediterranean has been the most popular of the paid posts. It prompted this response from a reader:

The issue here is that my Substack is doing something different than what personal genomics companies are trying to do. My Substsack post is giving a survey of a whole population and its history, a personal genomics test is trying to give an individual estimate that is intelligible. When 23andMe or the other companies tell you are are 99% “Ashkenazi Jewish” it is simply giving you confirmation that you’re within the range of variation typical for Ashkenazi Jews (there is some suspicions from genealogy enthusiasts that 23andMe smooths out differences between Galicianers and Litvaks, for example).

Imagine that 23andMe told its Jewish customers that they were 45.3% Northern Levantine, 40% Southwest European, and 9.7% Northern European. How would they interpret it? Sophisticated users would understand this points to a deep history of admixture, but most users are not sophisticated. They want to know that they’re Ashkenazi Jewish, and how Jewish they are (most will be nearly 100%, but some people may have non-Ashkenazi cryptic ancestry).

When I worked for Embark Vet. one of the issues that the canine DNA test was having is that we were looking for wolf ancestry in dogs, as some customers with F1’s or backcrosses wanted to test their pooch. But, it turned out that you had to be careful because some northern Arctic dog breeds kept coming back with “wolf” ancestry at low fractions because they did have ancient wolf ancestry. But that’s not what the wolf test was designed to pick up.

Tibetans as the compound of two populations

A new paper looks at some ancient Tibetan genomes:

Present-day Tibetans have adapted both genetically and culturally to the high altitude environment of the Tibetan Plateau, but fundamental questions about their origins remain unanswered. Recent archaeological and genetic research suggests the presence of an early population on the Plateau within the past 40 thousand years, followed by the arrival of subsequent groups within the past 10 thousand years. Here, we obtain new genome-wide data for 33 ancient individuals from high elevation sites on the southern fringe of the Tibetan Plateau in Nepal, who we show are most closely related to present-day Tibetans. They derive most of their ancestry from groups related to Late Neolithic populations at the northeastern edge of the Tibetan Plateau but also harbor a minor genetic component from a distinct and deep Paleolithic Eurasian ancestry. In contrast to their Tibetan neighbors, present-day non-Tibetan Tibeto-Burman speakers living at mid-elevations along the southern and eastern margins of the Plateau form a genetic cline that reflects a distinct genetic history. Finally, a comparison between ancient and present-day highlanders confirms ongoing positive selection of high altitude adaptive alleles.

Y haplogroup D is found at high frequencies in Japan, Tibet, and the Andaman Islands. It strikes me this is evidence of a Paleolithic substrate, though the graph above shows that it diverged really deep in Eurasia, and in the text they say it’s only in Tibet.

The Tibetan ancestry seems to have been found in the Himalayan zone by 1450 BC, so rather early.

Pausing research on autism (for now)

High-profile autism genetics project paused amid backlash:

But soon after the study’s high-profile launch on 24 August, autistic people and some ASD researchers expressed concern that it had gone ahead without meaningfully consulting the autism community. Fears about the sharing of genetic data and an alleged failure to properly explain the benefits of the research have been raised by a group called Boycott Spectrum 10K, which is led by autistic people. The group plans to protest outside the ARC premises in Cambridge in October. A separate petition against the study gathered more than 5,000 signatures.

Damian Milton, a researcher in intellectual and developmental disabilities at the University of Kent in Canterbury, UK, is one of those who signed the Boycott Spectrum 10K petition. Milton has been diagnosed with Asperger’s syndrome, a form of ASD. He says it is not clear how the study will improve participants’ well-being, and its “aim seems to be more about collecting DNA samples and data sharing”.

As a result of the backlash, the Spectrum 10K team paused the study on 10 September, apologized for causing distress, and promised a deeper consultation with autistic people and their families.

I assume they’ll restart, but this sort of research will happen somewhere. Autism is a reasonably heritable trait, and many of the people with autism are not “high functioning.”

G allele at Rs10774671 protects against severe COVID-19

A new paper digs into OAS1, A prenylated dsRNA sensor protects against severe COVID-19:

Inherited genetic factors can influence the severity of COVID-19, but the molecular explanation underpinning a genetic association is often unclear. Intracellular antiviral defenses can inhibit the replication of viruses and reduce disease severity. To better understand the antiviral defenses relevant to COVID-19, we used interferon-stimulated gene (ISG) expression screening to reveal that OAS1, through RNase L, potently inhibits SARS-CoV-2. We show that a common splice-acceptor SNP (Rs10774671) governs whether people express prenylated OAS1 isoforms that are membrane-associated and sense specific regions of SARS-CoV-2 RNAs, or only express cytosolic, nonprenylated OAS1 that does not efficiently detect SARS-CoV-2. Importantly, in hospitalized patients, expression of prenylated OAS1 was associated with protection from severe COVID-19, suggesting this antiviral defense is a major component of a protective antiviral response.

You can find the SNP in you 23andMe raw data (unless you are on the recent chip; I looked for a tag variant but found none). If I’m reading the paper correctly, having the AA genotype increases your risk of severe COVID-19 by an odds of 1.58, all things equal. Not crazy bad, but not great either. The haplotype that carries the G allele in non-Africans seems to come from Neanderthals. In Africa, the ancestral G is the majority, though a minority of individuals are A, and that was passed on to Eurasians.

Here is a plot for the 1000 Genomes populations.

One thing I immediately noticed is that Peruvians have the highest frequency of the A allele in the dataset. Peru has had the highest COVID-19 death rate in the world, and its frequency of A means that a great number of people will be AA (the frequency of A squared).

I looked in Anders Bergstrom’s HGDP whole-genome data and found an interesting pattern in the frequencies of the G alelle:

PopulationFreqCount2N
Karitiana0022
Pima0026
Surui0016
Yakut0050
Maya0.04762242
Oroqen0.1111218
Tujia0.1111218
Peruvian0.111819170
She0.15320
Cambodian0.1667318

Three of the four populations with no copies of the protective G allele are indigenous to the Americas. The Maya, who are known to have European admixture, also have very low frequencies of the G allele. Now, it is true that East Asians also have low frequencies of the G allele (the Yakuts also lack it, so perhaps this was ancestral to Siberians?), but they may have other protective variants (or, suffered through an earlier coronavirus epidemic). I think OAS1 may turn out to be one of the loci that could be associated with a higher risk to severe COVD-19 in the New World.

The Japanese as a creation of the Christian Era

The traditional model, which I’ve alluded to before on this weblog before, is that Japan is a synthesis of Jomon and Yayoi, with the latter dominant, and bringing rice-agriculture to the islands. A new paper in Science indicates it may be more complicated than that.

Ancient genomics reveals tripartite origins of Japanese populations:

Prehistoric Japan underwent rapid transformations in the past 3000 years, first from foraging to wet rice farming and then to state formation. A long-standing hypothesis posits that mainland Japanese populations derive dual ancestry from indigenous Jomon hunter-gatherer-fishers and succeeding Yayoi farmers. However, the genomic impact of agricultural migration and subsequent sociocultural changes remains unclear. We report 12 ancient Japanese genomes from pre- and postfarming periods. Our analysis finds that the Jomon maintained a small effective population size of ~1000 over several millennia, with a deep divergence from continental populations dated to 20,000 to 15,000 years ago, a period that saw the insularization of Japan through rising sea levels. Rice cultivation was introduced by people with Northeast Asian ancestry. Unexpectedly, we identify a later influx of East Asian ancestry during the imperial Kofun period. These three ancestral components continue to characterize present-day populations, supporting a tripartite model of Japanese genomic origins.

The Kofun period begins around 300 AD. The implication here is that there was a mass migration from the Asian continent less than 2,000 years ago, likely from Korea. The first agriculturalists, the Yayoi, seem to be a mix of native Jomon and individuals with strong affinities to populations in Manchuria.

Here’s a stylized representation that captures the turnover:

The Jomon are interesting because these results indicate low effective population, and, deep connections with ANE (Ancient North Eurasians). They also seem a clade deep within Northeast Asians, dating to the Pleistocene.

In any case, the authors admit that their sampling of the Yayoi is weak, so there needs to be follow-up here. If it does turn out that the Japanese are mostly Kofun-period, then I think that recalibrates our sense of its history a great deal. The Japan of the 7th century which enters into history was a very young nation.

Not all causes are treated equal

Over on Twitter the eminent population geneticist Molly Przeworski has an important and lauded thread up:

The thread has been widely re-tweeted and quote-tweeted by biologists. This prompted a response by a prominent sociologist, who quoted this from Kathryn Paige Harden’s discussion with Sam Harris:

What Harden is alluding to is that heritability within populations is not portable necessarily to between populations. In less sophisticated hands, this is almost used as an incantation. In my review of Harden’s book I said the following:

The biological reason that this extrapolation founders is that human populations differ, and those differences matter. The genetic architecture of intelligence may vary between populations so that predictions from the markers in one population are poorly predictive of variation in another, in line with the general concerns for GWAS portability…Harden points out correctly that population structure exhibits different layers of granularity and continuity. Perhaps a prediction trained on British samples is poorly predictive in Pakistanis. But what about Iranians? If it is poorly predictive in Iranians, what about in Bulgarians? The ability to infer within and between-group heritability is conditional on what you mean by “group,” and that is to some extent a subjective choice guided more by heuristics and instrumental utility than idealistic differences between races.

To be entirely frank I think Harden was on solid ground as a behavior geneticist with psychological training who relied on what population geneticists say publicly all the time about heritability and group differences. The issue is that I do not believe population geneticists were entirely candid about the deep texture of their assumptions, beliefs, and expectations. They wanted to be left alone to do their research, and so relied on a mantra to make people leave them alone, and now that mantra taken so literally is coming back to haunt them. One reason Prezworski’s thread got a lot of attention is privately this is the sort of intuition and sense that’s widely understood, but the issues are subtle, so to outsiders people just leave it off with the quick quips about portability. A friend told me “Molly doing this is like a goddess descending to Earth to speak to mere mortals so it will get a lot of attention.”

The real issue though is that some are now rather perturbed that Harden and behavior geneticists are trying to shield their study of psychological trait heritability from charges of racism by separating the discussion of between and within-group differences by implicitly reifying “population.” Additionally, some geneticists are quite unhappy at the discussion of heritability when it comes to psychological characteristics, so what was a convenient mantra to have people leave them alone is now coming back to haunt them, as it’s opening up avenues for research that they’re not comfortable with, are not interesting in, and believe are possibly dangerous. To be candid if I was Harden I’d be a bit peeved since all she’s doing is repeating what a lot of authorities in the field have been writing and saying for decades.

Nevertheless, if you take a look at the people who re-tweeted and commented on Przeworski’s thread it’s pretty much everyone. The high and mighty, all the way to the low. It was positively re-tweeted by people who are very skeptical of the study of heritability in psychological characteristics in humans (to be charitable). And, it was positively re-tweeted by me. Since so many people liked it and re-tweeted it, I can tell you it was re-tweeted by people who are actually quite open to and interested in the study of psychological characteristics in humans, within and between groups, without divulging confidence (I checked who commented and re-tweeted and liked).

So what’s going on? Prezworski’s group has published several papers in this area (for example, The evolution of group differences in changing environments), and one of the upshots for many is that there’s a lot less certainty about the heritability of many traits and its utility for polygenic risk scores even within groups because of uncorrected confounds. Some people took from this that polygenic risk scores are useless (not necessarily Prezworski and her group!). But when I talked about these findings with Amit Khera, who works on polygenic risk scores relating to cardiovascular disease, he was actually happy about these results. Why? Because he wanted to correct any confounds there were. He viewed these results not as a death knell for polygenic risk scores, but as a way to make them better, more accurate, more precise. He’s a medical doctor who is trying to help people in their health decisions. All he cares about is greater effectiveness. He’s not invested in a particular result, he’s invested in outcomes (OK, at least ideally, but I talked to him and his enthusiasm seemed genuine).

This is almost certainly why people who think polygenic risk scores are useful, and heritability in psychological characteristics are real, and vary widely in human populations, re-tweeted the Prezworski explainer. I myself did for this reason. My own current belief is there’s good evidence for heritability for a lot of behavioral traits, and that polygenic risk scores can be useful, at least on the margin. But we need to get better, and to do that, we need to explore all the subtle distinctions and details in relation to environmental and genetic variation. This is no guarantee. Perhaps the skeptics of polygenic risk scores will be correct (I doubt it, but who knows). But we’re not at the point where we can settle that question right now. More science needs to be done.

Finally, we need to address the magic of genes. People put a lot of stock in genes for various ideological reasons. But the reality is a lot of environmental factors taken for granted by many (e.g., shared home environment) are a lot less clear and well understood than genes are. And yet the skeptical takes don’t rain down on social science inferences and correlations. Mostly because they’re not seen as insidious because they’re environmental. But causes are causes. When there is a great deal of environmental variation in an outcome that doesn’t mean that you can control it, or you even know what it is. A lot of what is in the “E” in the ACE model is mysterious. Many focus on genes because they’re clear and distinct.

Armenia, Azerbaijan, Turkey, and genetics

Recently a few people have been asking me about Armenians, Turks, and genetics. Mostly because I’ve written about this topic before. Unless you’ve been asleep you know that there is a war going on in the Caucasus. Armenia and Azerbaijan are renewing their decades-long conflict and bringing other nations into as the local powers choose sides. Unpleasant all around.

Who are the Armenians? The Turks? Azeris?

Azeris in particular are not well known in the West, but they’re kind of a big deal. The Iranian province of Azerbaijan has nearly as many people as the Republic of Azerbaijan, and there are more Azeris in Iran than in independent Azerbaijan itself. The leader of Iran has an ethnic Azeri father. Azeris are Turkic and have traditionally been dominant in Iran’s military. Before the Turkification of the region over the last 500 years, Azerbaijan was called Albania. The native language was Iranian and related to Persian.

Living in close proximity to each other, it is no surprise that the peoples of the region are genetically rather similar. That being said, there are notable differences.

While a few Armenians in my datasets have Russian admixture (they are likely F1 individuals who identify as Armenian), what is notable about Azeris is like Turks they exhibit a small but notable shift toward East Asians. This is almost certainly the consequence of Turkic ancestry. Though most of the ancestry of Azeri is pre-Turkic, Turkification occurred through the assimilation of nomads with some East Asian ancestry.

The same applies to Turks to the west of Armenia. In previous posts, I’ve had discussions about the nature of Turkish ancestry, but in general, I am convinced by those who argue that the non-Turkic component (which would include East Asian and Turanian) reflects the earlier pattern of variation; Greek in the west, Armenian and Kurdish in the east.

Conflicts, like we are seeing today, illustrates the power of ideas over relatedness. Armenians are Christians, albeit peculiar Oriental Orthodox Christians. Additionally, they continue to speak an ancient Indo-European language. This sets them against Turkic speaking Muslims to the west, and Turkic speaking Muslims to the east, though all the groups share a deep common ancestry.