Back to Africa migration & Deep Learning


Revisiting the Out of Africa event with a novel Deep Learning approach:

Anatomically modern humans evolved around 300 thousand years ago in Africa. Modern humans started to appear in the fossil record outside of Africa about 100 thousand years ago though other hominins existed throughout Eurasia much earlier. Recently, several researchers argued in favour of a single out of Africa event for modern humans based on whole-genome sequences analyses. However, the single out of Africa model is in contrast with some of the findings from fossil records, which supports two out of Africa, and uniparental data, which proposes back to Africa movement. Here, we used a novel deep learning approach coupled with Approximate Bayesian Computation and Sequential Monte Carlo to revisit these hypotheses from the whole genome sequence perspective. Our results support the back to Africa model over other alternatives. We estimated that there are two successive splits between Africa and out of African populations happening around 60-80 thousand years ago and separated by 12-13 thousand years. One of the populations resulting from the more recent split has to a large extent replaced the older West African population while the other one has founded the out of Africa populations.

This is basically Dienekes Pontikos’ old Afrasian/Paleo-African Hypothesis. These models are not crazy. We have so much data there is room for them. The problem with “Deep Learning” is that we’re not too familiar with these methods, and who knows how to get a sense whether the results are just crazy artifacts? Even the authors say “it is still challenging to measure the significance of a prediction performed by NN, given that it is a black-box approach.”

Is there going to be a future where we just throw all the data into machine-learning approaches and let them converge to the best/most likely models? Perhaps. The problem seems to be that human population history a bit more complicated than we’d thought twenty years ago, requiring more complicated methods to figure out the details.

Neanderthal introgression at COVID-19 severity locus at high frequency in South Asians

To review, as most of you know about ~2% of the ancestry outside of Africa is attributable to ancestry form Neanderthals. The fraction is a bit higher in East Asia, a bit lower in Europe, and lower still in the Near East. That being said, a disproportionate fraction of the Neanderthal ancestry is found in non-genic regions, which implies there’s been “purification.” Negative selection. The implication being that Neanderthal variation doesn’t “work well” with the genetic background of modern humans, the main lineage of Neo-Africans.

But there are exceptions to this, whether through natural selection (the Neanderthal variant was beneficial) or drift.

Svante Paabo’s group has discovered a new candidate for introgression with a newsy twist, The major genetic risk factor for severe COVID-19 is inherited from Neandertals:

A recent genetic association study (Ellinghaus et al. 2020) identified a gene cluster on chromosome 3 as a risk locus for respiratory failure in SARS-CoV-2. Recent data comprising 3,199 hospitalized COVID-19 patients and controls reproduce this and find that it is the major genetic risk factor for severe SARS-CoV-2 infection and hospitalization (COVID-19 Host Genetics Initiative). Here, we show that the risk is conferred by a genomic segment of ~50 kb that is inherited from Neandertals and occurs at a frequency of ~30% in south Asia and ~8% in Europe.

First, they need to change the title. From “The” to “a”. The study they’re piggybacking off of, Genomewide Association Study of Severe Covid-19 with Respiratory Failure, found in several thousand Spanish and Italian individuals that a particular SNP that Paabo’s group found to be embedded in a haplotype of Neanderthal origin was a risk variant. There are other genetic risk locations, probably in Africans, and non-Europeans.

Within European populations, the odds ratio of severe respiratory failure all things equal is ~1.75 if you are a heterozygote for the risk allele, and ~3.0 if you are a homozygote. This is not anything, though presumably hypertension and all the other covariations are still a massive deal.

The authors highlight the South Asian angle. But they focused on the 1000 Genomes. One of the SNPs in perfect LD with the causal variant. (1.0) is in major SNP-arrays. I computed the frequency in a range of modern and ancient (page down for the ancient) populations. You can see the pattern below.

  1. It is a huge deal in South Asians
  2. It is found in high frequency in Papuans too…
  3. Non-trivial fractions in Near Eastern groups as well as Europeans
  4. Found in some ancient samples

My assumption is that there is some selective benefit from this. Notice how in Northeast Asians it’s almost totally expunged. Perhaps ancient coronavirus sweeps? I don’t know.

(rs10490770 by the way)

Update: In SE Asia a lot of the frequency seems likely attributable to South Asia or Negrito ancestry. Borneo Austronesians have low frequencies of the risk allele. Burmese at 0.075% can be just due to South Asian. Looks like there is some correlation between ASI/AASI ancestry and the risk allele.

Read More