The Genetic History of the Middle East: into Arabia

A new massive preprint on the Middle East is out. I’ve edited the first figure to give people a general sense of the broad results and populations sampled. First, you have to know that these are high-quality modern samples. 137 individuals at 30x whole genome coverage.  In other words, basically the best genomic data you can get on sequences. No need to futz around with subsets of the data. This is important and needful because the 1000 Genomes doesn’t have a Middle Eastern population. So when looking to assemble variants there was a deficit in this domain. Even the WGS of the HGDP was not totally sufficient, since the Middle Eastern populations were not Arabian.

The populations here are sampled from both the classical “Fertile Crescent” and various points within the Arabian peninsula. At the end of the preprint, they do some analysis on selection, which I won’t talk about. The most interesting thing is that they confirm that Arabian people have a unique lactase persistence allele that seems to have been selected very recently, just like in Europeans. A lot of the selection analysis seems to be either replicate what you would find elsewhere. Or, they do not have enough power to detect polygenic selection (though they did detect selection on EDU).

The big finding to me is that this work confirms that there is a north-south cline in the Near East defined by a deep population structure. The admixture graph to the right captures the main features using Lebanese and Emiratis as the two extreme populations, but as you can see in the admixture plot above the cline really runs from the Caucasus to southern Arabia. If you analyze these populations one thing you will see is that Fertile Crescent populations, such as Druze, often seem more like Armenians and Georgians, than South Arabians. Why is this? After all, South Arabians and Fertile Crescent populations speak Semitic languages.

I think the issues here are multiple. First, there is recent admixture that obscures some of the deeper relationships. This is clear insofar as most Arab Muslim populations have Sub-Saharan African admixture. This is historically attested, and physically visible. The variation and range are quite high, in part due to spatial heterogeneity of slavery (e.g., more African slaves in lowlands than highlands), and the recency of the admixture producing variation due to incomplete mixing (the dates are usually 1000 A.D. and later).

But this is not the only admixture. All of the Fertile Crescent populations, along with groups to the north, have much more steppe drift than those to the south in Arabia. The details of the fractions don’t matter, it’s not much, but it’s not trivial, and it’s always higher than among the Arabians. Additionally, this element is new to the region, in relative terms. You can see the contribution in modern Lebanese in comparison to the Bronze Age Sidon samples, which date to 1800 BC. The source could be continuous gene flow during the Roman and Byzantine period, or even later. Or, it could also be Indo-European migrations.

We know that Indo-Iranian peoples were present in Upper Mesopotamia. The Mitanni Kingdom, which had Indo-Aryan affinities, shows up after 1750 BC. The Hittites, the Nesa, show up to the north in Anatolia a bit earlier.  Interestingly, the Hittites speak an Indo-European language that is often considered basal (the outgroup) to most of the others. Armenian, who emerges later in eastern Anatolia, is also quite distinct, just as Greek to the west is. In contrast, there is a lot of suggestive evidence of either genealogical or geographical connectedness between the ancestors of Indo-Iranian and Slavic language families.

The presence of these two very distinct ancestral components, steppe, and Sub-Saharan African, on top of the ancient Near Eastern base, produce distinctions in the modern populations which obscure some of the deeper strands. In the late 2000s when researchers and bloggers began running admixture analyses on Ethiopians it was clear that this population was a mix between “West Eurasian” and African which wasn’t Bantu. The West Eurasian donor population was often Yemeni, in particular Yemeni Jews. Later on, using more sophisticated methods some models suggested greater affinity in Ethiopian genomes to Levantine populations than Yemenis. What was going on?

We now know. It is quite clear Ethiopian populations lack steppe ancestry. In the earlier Bronze Age, and definitely, the Neolithic, Levantines lacked steppe ancestry. In fact, the Neolithic Levantines usually lacked “Iranian” ancestry. The West Eurasian ancestry in Northeast Africans, on the whole, is enriched for a Levantine ancestry quite similar to Natufian. Modern-day South Arabians are the closest to this population mix, even if they are not descended from ancient Levantines. They lack steppe.

Modern-day South Arabians in fact descend in part from indigenous hunter-gatherers, who were a sister clade to the ancestors of Natufians. The admixture graph makes that clear for the Emiratis with the least African ancestry have half their ancestry from this group. In the book Arabs, the author discusses at length various Yemeni legends of a fusion between distinct peoples on the edge of history. This could be recollections of the merge of indigenous Neolithic Arabians and peoples who expanded from the north.

The analyses of these samples confirm and reiterate what has been found with ancient DNA: at some point late in the Neolithic and early in the Bronze Age a massive admixture event occurred in the Fertile Crescent which brought a considerable amount of “Iranian” ancestry into the region (these ancient people are not like modern Iranians; in particular, they lacked steppe ancestry which is copious in much of Iran, particularly the east). This ancestry pushed south and westward so that ~50% of the ancestry of Arabians seems to be Iranian. That being said, I have some qualms here:

We explored whether this ancestry penetrated both the Levant and Arabia at the same time, and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,900 and 5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-3,300) and Arabia (2,000-3,800). These times overlap with the dates for the Bronze Age origin and spread of Semitic languages in the Middle East and East Africa estimated from lexical data (Kitchen et al., 2009; Figure S8). This population potentially introduced the Y-chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ~5.6 [95% CI, 4.8-6.5] kya, agreeing with a potential Bronze Age expansion; however, we do find rarer earlier diverged lineages coalescing ~17 kya (Figure S9). The haplogroup common in Natufians, E1b1b, is also frequent in our dataset, with most lineages coalescing ~8.3 [7-9.7] kya, though we also find a rare deeply divergent Y-chromosome which coalesces 39 kya (Figure S9).

Some of these dates are hard to credit. For example, I obtain a midpoint estimate of Iranian admixture into Egypt around 1836 BC!

The fraction of Iranian ancestry is substantial. The admixture model in the supplements gives this for Egyptians: 45% Levant_N, 32% Iran_N, 8% EHG (Eastern European Hunter-Gatherer), and 15% Mota (African). The older date is 2700 BC. The oldest Egyptian writing dates to 2700 BC, but proto-hieroglyphs are 500 years older. The authors talk about Semitic languages, and ancient Egyptian is not Semitic. So it could be a minority population mixed into the Egyptians, but this is a massive event that we don’t have records of. In fact, the authors claim that it went into much of Northeast Africa at a relatively late date.

Additionally, the values for the Levant seem recent as well. That being said there was a pre-Sumerian civilization, the Uruk Civilization, which spread broadly from Mesopotamia between 4000 and 3000 BC. This is 6000 to 5000 years ago. The midpoint of this is 5500 years, while the midpoint of the admixture into the Syrians, who were on the edge of the Uruk Civilization is 3800 years ago. Basically, I think the evidence points to various statistical genomic artifacts reducing the age from when the admixture truly occurred (this has long been a problem in this field).

I honestly have no idea how to relate the expansion of Semitic languages to the expansion of Iranian languages. My friend Patrick Wyman believes that Anatolian farmers spoke Afro-Asiatic. These were very different people from the Iranians, who arrived from the east later. Additionally, history teaches us that Mesopotamia during the Bronze Age was very linguistically diverse. The Sumerians were not Semitic, and neither were their Elamite neighbors in Khuzistan. The Akkadians, who were more prevalent in the north of Mesopotamia, but were present from the beginning of Sumerian history, were Semitic.

There is still a mystery around the great admixture between Neolithic Near Easterners of the west and the east. I don’t think we’ve closed that chapter of the book.

That being said, there is a lot that is “solved” in this paper. For example, these authors seem to confirm that there is no evidence of “first wave” modern humans in Arabian populations earlier than the non-African radiation. Arabians, like other non-Africans, underwent a population expansion 50-70,000 years ago. Their separation from Mbuti Pygmies was gradual up until 120,000 years ago. Then there seems to have been a separation. What this is telling us, I believe, is that the ancestors of non-Africans were part of the African meta-population until 120,000 years ago. This is suspiciously close to the Eemian Interglacial, which dates to between 115,000 to 130,000 years ago. The Eemian was characterized by a “Green Sahara”, so it seems that this is when early modern humans ventured in substantial numbers out of the continent and to its peripheries. One issue that seems notable in the data is that proto-non-Africans seem to have been characterized by a period of isolation and small population size. Perhaps

But 50-70,000 years ago a massive expansion of one of these daughter populations occurred. These data confirm that Arabians seem to have the same Neanderthal admixture as everyone else, but, even accounting for Sub-Saharan African ancestry they also have somewhat less. In alignment with earlier research, they argue that this is due to admixture with “Basal Eurasian” populations which did not mix with Neanderthals ~55,000 years ago.  Or, more precisely, did not carry as much Neanderthal ancestry (it seems plausible that the Basal Eurasian populations are themselves a compound of conventional non-African at the base of the broader splits, and a deeper basal group which lacks Neanderthal ancestry).

Going back to the admixture graph, you notice that both western and eastern farmer populations are a compound of Basal Eurasian and various lineages that are broadly “West Eurasian.”  Natufians and Anatolian farmers are descended about half from groups related to European hunter-gatherers, while ancient Neolithic Iranians had ancestry related to these people, but even more to populations distantly related to Ancient North Eurasians (Paleo-Siberians). The events here are distant, but the sample proportion of Basal Eurasian ancestry indicates to me a rapidly expanding population at some point which mixed with a well-structured set of groups in the Near East.

The major takeaways

  • Near Easterners are part of the same broad diversification as all other non-Africans
  • The expansion of these non-Africans dates to 50-70,000 years ago
  • Archaeological evidence points to a very intense expansion in the period around ~50,000 years ago, and admixture with Neanderthals somewhat before then
  • At the beginning of the Holocene Near Easterners were deeply structured regionally, and had threaded together disparate ancestral components (Basal Eurasian, related to European hunter-gatherer and Paleo-Siberian)
  • Late in the Neolithic and early Bronze Age much of this structure collapsed, and there was a massive admixture of Iranian ancestry to the south and west (conversely, there is evidence in other work of admixture of western farmer ancestry to the east)
  • Finally, there is evidence for later incursions of steppe people into the northern Arabian fringe and Fertile Crescent
  • On top of this, there is historical admixture from Africans and in the north Turks and other groups

The genomic landscape of Brazil in 1950


A new whole-genome analysis out of Brazil has some interesting ancestry information. The preprint, Whole-genome sequencing of 1,171 elderly admixed individuals from the largest Latin American metropolis (São Paulo, Brazil):

As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases….

Admixed populations are useful for a lot of reasons. But let’s observe some things about his Brazilian population.

First, it’s old. The average age is 72, so these are people born in 1950. This is the genetic characteristics of Brazil in 1950 in many ways, not today. This is why you see so many individuals who self-identify as Asian who are nearly 100% Asian. These individuals are the children of Japanese immigrants. In 1950 the endogamy of the community was high. Today the youngest generation of Japanese Brazilians is 60% mixed.

Second, most of the ancestry of self-identified Brazilian whites in this sample is mostly white. Like the Japanese, a large number of these individuals are probably the children of European immigrants. I suspect this accounts for many of the 20% of the “white” sample that has no trace non-European ancestry. But observe that around another 20% has trace proportions (~1%) of non-European ancestry, mostly African.  My supposition, in this case, is that these are “old stock” white Brazilians. That this, one or both of their parents descend from Portuguese Brazilians who settled in overwhelmingly European areas and retain some non-European admixture due to long-term residence in Brazil. The remainder is white Brazilians who have substantial non-European ancestry, with a small minority whose proportions are quite high from a North American perspective.

A point of comparison is probably useful. About 95% of non-Hispanic whites in the United States seem to have almost no detectable non-European ancestry using this sort of model-based clustering. This illustrates the massive demographic difference between the USA and Latin American nations. The vast majority of white Latin Americans look quite Iberian, but the majority also have far more non-European ancestry than 95% of North American whites. This is partly a reflection of the smaller population sizes of native peoples in North America, and, the nature of hypodescent for people of any African ancestry in the United States, so that mixed individuals were integrated into African Americans.

Third, the people who are “mixed” and black in Brazil are more European than you might expect. All the estimates of European ancestry I’ve seen for self-identified black Brazilians (a somewhat protean category due to social changes over the past few generations) indicate higher European ancestry fraction than among African Americans (~20% median in the latter). Self-identified “mixed” Brazilians have more European ancestry than anything.

The native category is interesting because most of these people have only a minor component of that ancestry. Additionally, a huge number of white, mixed, and black Brazilians have native ancestry. This is not surprising from previous work. Ancestry deconvolution indicates this is an old admixture, and mtDNA lineages are more native than Y chromosomes. There was a sex asymmetry in the early settlement, and native women married into the settler population. Both black and white Brazilians (and mixed) have lots of native ancestry.

Finally, though there is some overlap between these groups (despite their average differences), I assume that the overlap is much greater in contemporary cohorts in terms of genomic ancestry. It will be interesting to see when we get temporal transects in Brazil to see how assortative mating does, or doesn’t, work.

Looking forward to more of this from Latin America. So many opportunities for admixture mapping!

Solute carrier family genes are important…but how?

Over the last ten years David Reich and other researchers have been constructing what is basically an atlas of human demographic history. Taking the genealogies written in our DNA, mapping them onto population bifurcations and admixtures, and synthesizing that back together with what we know from history and archaeology.

To a great extent, this is a project of human phylogenomics. Taking genome-wide data and constructing phylogenies out of it (or, perhaps more precisely, graphs, as this is on a intra-species time scale mostly and characterized by lots of gene flow across the “tips” of the tree). But there’s another thing you can do with modern human genomics and evolution: look at patterns of selection within the genome.

The Reich group has already started doing this. For example, they have adduced that CCR5 delta 32 mutation seems to have emerged out of the Yamnaya horizon.

Last fall, a paper came out in MBE, Ancestry-Specific Analyses Reveal Differential Demographic Histories and Opposite Selective Pressures in Modern South Asian Populations, which I gave a cursory read, but which I’ve looked at more closely. It takes a “natural experiment,” the emergence of Indian subcontinental populations from a massive admixture between lineages which diverged 40,000 years ago, and looks to see which genetic regions deviate from what you would expect based on overall genome.

The method is simple: imagine that “Ancestral North Indians” are fixed for an allele at a gene in one state and “Ancestral South Indians” are fixed in the other state. Indian populations are about 50:50 (with a range). If the frequency today in Indian populations is 95% for the allele that is from the “Ancestral North Indians”, one might be suspicious as to what’s going on. Or, vice versa.

In the paper, they used whole genomes to reconstruct the ancestral steppe/Iranian population without any residual “Ancient Ancestral South Indian” (AASI), the latter of which has no West Eurasian. They did the same for the AASI. These reconstructions are always dicey, but they made a good faith effort to check their work. On the whole, that section was impressive. The authors seem to be roughly aligned with the results in Narasimhan et al. 2019. The AASI seems to be homogeneous, with the exception of attempting to model them from donors which were Munda or Burusho, both groups with deep East Asian admixture (illustrating the problem with deconvolution). Second, they show that the AASI are not clustering with the Andamanese, which makes sense since these groups diverged closer to 40,000 years ago. Finally, the steppe/Iranian group looks most like Armenian middle-to-late Bronze Age people. A synthesis of steppe and some Iranian-like ancestry.

But this isn’t the most interesting part of the paper. It’s the selection. Here are the top, top, candidates:

Read More

Correlated response is a big story of selection

Adaptation is clearly one of the most important processes in understanding how evolution occurs. In a classical sense, it’s easy to understand. Parallel adaptations in body plans make dolphins and swordfish shaped the same. It’s physics.

But with the emergence of DNA, a lot of the focus on adaptation has been displaced to the signatures of natural selection on the molecular level. Phenotypes are controlled by variation in genotypes, and instead of description and hypothesizing, researchers can actually infer from the genetic patterns the history and arc of adaptation. 

At least that’s the theory.

The initial tests for signatures of natural selection focused on adaptation between species. For example, Tajima’s D. Usually this took the form of comparing variation across two lineages of Drosophila. In the 2000s with genome-wide data new methods predicated on looking at ‘haplotype structure’ (variation across sequences of genes) emerged. Instead of between species, these methods focused on the selection within species (e.g., why are some humans adapted to malaria?). These methods were good at picking up strong signals at a few genes where the selective sweeps were recent.

But as datasets and genomics got bigger and better researchers focused on more fundamental patterns and analyses, such as looking at ‘site frequency spectra.’ Ultimately the goal was to go beyond selection at a single locus (e.g., lactase persistence), and understand polygenic characteristics (e.g., height). Obviously, this is much harder because polygenic characters are distributed across many genetic loci, and issues of statistical power are always going to loom large (and there is the soft vs hard sweep issue too!).

A new preprint is an excellent introduction to this wild world, Disentangling selection on genetically correlated polygenic traits using whole-genome genealogies:

We present a full-likelihood method to estimate and quantify polygenic adaptation from contemporary DNA sequence data. The method combines population genetic DNA sequence data and GWAS summary statistics from up to thousands of nucleotide sites in a joint likelihood function to estimate the strength of transient directional selection acting on a polygenic trait. Through population genetic simulations of polygenic trait architectures and GWAS, we show that the method substantially improves power over current methods. We examine the robustness of the method under uncorrected GWAS stratification, uncertainty and ascertainment bias in the GWAS estimates of SNP effects, uncertainty in the identification of causal SNPs, allelic heterogeneity, negative selection, and low GWAS sample size. The method can quantify selection acting on correlated traits, fully controlling for pleiotropy even among traits with strong genetic correlation (|rg| = 80%; c.f. schizophrenia and bipolar disorder) while retaining high power to attribute selection to the causal trait. We apply the method to study 56 human polygenic traits for signs of recent adaptation. We find signals of directional selection on pigmentation (tanning, sunburn, hair, P=5.5e-15, 1.1e-11, 2.2e-6, respectively), life history traits (age at first birth, EduYears, P=2.5e-4, 2.6e-4, respectively), glycated hemoglobin (HbA1c, P=1.2e-3), bone mineral density (P=1.1e-3), and neuroticism (P=5.5e-3). We also conduct joint testing of 137 pairs of genetically correlated traits. We find evidence of widespread correlated response acting on these traits (2.6-fold enrichment over the null expectation, P=1.5e-7). We find that for several traits previously reported as adaptive, such as educational attainment and hair color, a significant proportion of the signal of selection on these traits can be attributed to correlated response, vs direct selection (P=2.9e-6, 1.7e-4, respectively). Lastly, our joint test uncovers antagonistic selection that has acted to increase type 2 diabetes (T2D) risk and decrease HbA1c (P=1.5e-5).

There’s a lot going on here. This is my favorite passage:

To address these issues, we recently developed a full-likelihood method, CLUES, to test for selection and estimate allele frequency trajectories. 21 The method works by stochastically integrating over both the latent ARG using Markov Chain Monte Carlo, and the latent allele frequency trajectory using a dynamic programming algorithm, and then using importance sampling to estimate the likelihood function of a focal SNP’s selection coefficient, correcting for biases in the ARG due to sampling under a neutral model.

Alrighty then! Someone’s a major-league nerd.

The preprint is fine, but ultimately this is something you get a “feel” for by working with models, data, and general analyses in the field. And I don’t have a strong feel since I don’t work with these sorts of data and questions myself. So what do I know? That being said, I like the preprint because it satisfies an intuition I’ve long had: correlated response is a big part of the story of polygenic selection.

Basically, you have to remember that complex traits are subject to variation at a host of genetic positions. And genetic variants rarely have singular effects. That is, one locus usually exhibits pleiotropy. The genetic effect shapes a lot of characteristics. Therefore, if there is a strong selection on a gene, more traits than simply the target of selection will be impacted. In animal breeding making huge, meaty, fast-growing lineages can render them infertile if selection is taken too far. That’s a bad correlated response.

After correcting for the genetic correlation the authors note that some traits, such as EDU and hair color, are not really selected directly at all. This is like the fact that we know EDAR is associated with hair thickness and is a strong target of selection. We have no idea what the trait of interest is. But it’s a pretty big deal. All these quantitative traits controlled by variation across the genome are being reshaped by adaptation on other traits. What are those traits? This preprint doesn’t answer that really.

Hopefully, we’ll make some headway in the 2020s because we’re definitely looking through the mirror darkly.

Knanaya & Kerala: perhaps there is something different down south?


Over the past few months I have been getting together some samples from people from Kerala, with a focus on Knanaya Christians. A subset of the broader St. Thomas Christian community, two things have jumped out in my analyses:

– they are quite endogamous

– they are shifted off the ‘India-cline’

More precisely, like Cochin and Mumbai Jews, they are often shifted toward Middle Eastern populations. This is relevant because the Knanaya believe themselves, like most St. Thomas Christians, descended in part from Jews or Christians from the Middle East.

All that being said, looking more deeply into the data I’m not quite as sure. One of the reasons is that Kerala may not be as “structured” as other parts of India. Some of this is well known. The Nair samples I have are shifted toward South Indian Brahmins, which is plausible in light of connections between Nairs and Brahmins. The Brahmin-adjacent Ambalavasi seem quite similar to Brahmins. These are not surprising. But, Kerala samples I have as a whole seem notably shifted on the India cline more toward the “north” than I would have expected. This could be due to gene flow from without and within Kerala, in a way that is not typical in other parts of the subcontinent.

I say this because even the Ezhava, who were basically what we’d call a Dalit community (no longer today), shows a shift.

Read More

Hard sweeps and natural selection obscured by Bronze Age admixture

The above is the map from the Online Ancient Genome Repository. You can see the variation by region. There’s a lot of ancient DNA in Europe. Very little in Asia. And only moderate amounts elsewhere.

The map is from a new preprint, Ancient human genomes reveal a hidden history of strong selection in Eurasia:

The role of selection in shaping genetic diversity in natural populations is an area of intense interest in modern biology, especially the characterization of adaptive loci. Within humans, the rapid increase in genomic information has produced surprisingly few well-defined adaptive loci, promoting the view that recent human adaptation involved numerous loci with small fitness benefits. To examine this we searched for signatures of hard sweeps – the selective fixation of a new or initially rare beneficial variant – in 1,162 ancient western Eurasian genomes and identified 57 sweeps with high confidence. This unexpectedly extensive signal was concentrated on proteins acting at the cell surface, and potential selection pressures include cold adaptation in early Eurasian populations, and oxidative stress from carbohydrate-rich diets in farming populations. Critically, these sweep signals have been obscured in modern European genomes by subsequent population admixture, especially during the Bronze Age (5-3kya) and empires of classical antiquity.

So the “big thing” that they found here is that admixture obscures signals of selection. More precisely, it obscures signals of hard selective sweeps, the classical variant where a single position in a single haplotype rises up in frequency rapidly due to positive selection.

If you read further into the paper you note that they believe admixture, due to the mixing of backgrounds, attenuates the signal of hard sweeps, and may even imply that these hard sweeps are soft sweeps through the mixing of distinct genetic backgrounds. I honestly didn’t follow that too closely, but I guess it depends on the selection coefficient and rate of mixing. They are reporting lots of selection events of >1%, and I wonder about how credible this is (Haldane’s dilemma?).

That being said, the functional significance of these selection events is important. Basically, they look like adaptations to climate and changes in diet. What authors seem to be suggesting here is that the shift in lifestyle and expansion of farmers in the early Holocene was a pretty big deal, and the mixing between various divergent streams during the Bronze Age muddled the signals.

If the authors are right, that means that ancient DNA is going to be very big for understanding the trajectory of selection, because it’s not just going to be subtle polygenic changes.

Blood group A at greater risk from COVID-19 (maybe)

To a great extent much of the population genetics of humans in the 20th-century that doesn’t involve external traits is the population genetics of blood groups. A, B, and O, along with Rhesus factor. Read L. L. Cavalli-Sforza and William Bodmer’s The Genetics of Human Populations, the first edition of which was written in the 1960s. The emergence of more genetic markers, and Y, mtDNA, and genome-wide analysis has marginalized the exploration of population genetic variation of ABO. But it’s still useful. And it’s still functionally important (there’s a reason that A and B groups evolved!).

Many years ago while reading Alan Templeton’s Population Genetics and Microevolutionary Theory I stumbled upon the fact that spontaneous abortion (miscarriage) is associated with blood group differences between mother and fetus on the ABO blood groups. Basically, women who are O (and so genotype OO) have issues with fetuses that express A or B antigen. This isn’t deterministic, just a change in probabilities (I’m A, my wife is O, and our children are a mix, as my genotype is AO).

ABO has also been associated with different risks to different diseases (e.g., it is well known that those who express blood group B are more at risk for Hepatitis B).

So with that, a new preprint, ABO blood group and susceptibility to severe acute respiratory syndrome:

…The ABO group in 3694 normal people in Wuhan showed a distribution of 32.16%, 24.90%, 9.10% and 33.84% for A, B, AB and O, respectively, versus the distribution of 37.75%, 26.42%, 10.03% and 25.80% for A, B, AB and O, respectively, in 1775 COVID-19 patients from Wuhan Jinyintan Hospital. The proportion of blood group A and O in COVID-19 patients were significantly higher and lower, respectively, than that in normal people (both P < 0.001). Similar ABO distribution pattern was observed in 398 patients from another two hospitals in Wuhan and Shenzhen. Meta-analyses on the pooled data showed that blood group A had a significantly higher risk for COVID-19 (odds ratio-OR, 1.20; 95% confidence interval-CI 1.02~1.43, P = 0.02) compared with non-A blood groups, whereas blood group O had a significantly lower risk for the infectious disease (OR, 0.67; 95% CI 0.60~0.75, P < 0.001) compared with non-O blood groups. In addition, the influence of age and gender on the ABO blood group distribution in patients with COVID-19 from two Wuhan hospitals (1,888 patients) were analyzed and found that age and gender do not have much effect on the distribution…

It looks like from their data that A individuals were:

1) more likely to get infected
2) more likely to have severe responses

The individual difference is modest. You aren’t invulnerable if you are O. But, this might impact the course and severity of COVID-19 as it runs through populations…

Here is the table:

South China Morning Post has a good write-up. Here are blood group distributions if you don’t know them offhand (they are pre-Columbian):

Read More

The complex origins of our species in Africa

The figure to the right illustrates a model that is put forward in a new paper, Recovering signals of ghost archaic introgression in African populations. This was originally a preprint, Recovering signals of ghost archaic introgression in African populations. So we’ve discussed the implications extensively. Carl Zimmer has covered the story in The New York Times, while Georbe Busby did so in The Conversation.

Broadly, the results are getting at something which plenty of people have been noticing for many years: when it comes to Sub-Saharan Africans, there is something deeply diverged in West Africans vis-a-vis non-West Africans. These results seem to suggest that the divergence between this outgroup lineage and our own is a bit earlier than the modern-Neanderthal/Denisovan split. There are many abstruse statistical inferences and simulations, and it looks like the reviewers made them do a lot of analyses. But the general result is something other groups have seen as well, so I believe it. Additionally, the admixture of this lineage into West Africans seems to have occurred about 50,000 years ago, suspiciously close to the general expansion of modern humans out of Africa (or the most recent expansion).

From the discussion:

The signals of introgression in the West African populations that we have analyzed raise questions regarding the identity of the archaic hominin and its interactions with the modern human populations in Africa. Analysis of the CSFS in the Luhya from Webuye, Kenya (LWK) also reveals signals of archaic introgression, although our interpretation is complicated by recent admixture in the LWK that involves populations related to western Africans and eastern African hunter-gatherers (section S8) (20). Non-African populations (Han Chinese in Beijing and Utah residents with northern and western European ancestry) also show analogous patterns in the CSFS, suggesting that a component of archaic ancestry was shared before the split of African and non-African populations. A detailed understanding of archaic introgression and its role in adapting to diverse environmental conditions will require analysis of genomes from extant and ancient genomes across the geographic range of Africa.

This work seems more a question than an answer.

Indian ancestry maritime Southeast Asia

In the comments, people keep asking about Indonesia, and Java in particular. The reason is pretty simple: before wholesale conversion to Islam maritime Southeast Asia was dominated at the elite level by Indic social and religious forms. I say “Indic” because unlike mainland Southeast Asia Theravada Buddhism did not supplant other Indian religions, and in fact, while indigenous Buddhism that led to the Borobudur temple complex in the 9th-century went extinct, Hinduism persisted for quite a bit longer and persists to this day. Not only are there long-standing Hindu traditions in Bali, but far eastern Java remained a Hindu kingdom until 1770, and there remain Javanese Hindus (some of them are recent converts).

As several mainland Southeast Asian groups seem to have Indian admixture, what is the evidence for Indonesia? (the Singapore genome data offers up some Malays, and though some show recent Indian admixture, all of them have some Indian admixture). Luckily, there is a paper and data, Complex Patterns of Admixture across the Indonesian Archipelago. It uses the GLOBETROTTER framework, so I decided to reanalyze the data in a simpler manner, adding the Cambodians as a check (since from my previous posts you know a fair amount about that as a baseline).

Three points.

1) Definitely gene flow. But on the whole less than mainland Southeast Asia?

2) Lots of heterogeneity. Not surprising. The Sumatra samples seem to be taken from Aceh. This may matter a great deal.

3) In mainland Southeast Asia east of Burma there hasn’t been lots of colonial migration of Indians, nor a great deal of trade. The opportunities within maritime Southeast Asia for contact with outsiders are far greater. The inspection of results from Malaysia indicates continuous gene flow over a long period of time. In contrast, the results from Thailand and Cambodia indicate an early pulse.

The Indian admixture into Southeast Asia is not just a function of distance

In the comments to the post below about Indian ancestry in Thailand, some observed that this should not be surprising due to reciprocal gene flow and proximity. Implicitly, I think what is being suggested here is that there is isolation by distance and continuous gene flow. Obviously some of this is true, but there details here which suggest that it is simply not just geography at work.

The reason I was curious about the Dusun people in coastal Borneo is that while Malays all seem to have Indian ancestry, many tribal Austronesian groups in maritime Southeast Asia do not. The Indian admixture into the Malays is not just recent. Some of it seems quite a bit older than the colonial period.

In the context of Southeast Asia, it seems that some of the more ancient Austro-Asiatic people, in particular, the Mon and Khmer, have Indian ancestry, and groups which mixed with Austro-Asiatic substrates, such as Burmans and Thai, also have this.

Additionally, some groups in the northeastern states of India have less “Indian” admixture than the Thai and Khmer. To show this, see this PCA:

Read More