On this week’s episode of The Insight, I talked to Matt Hahn about why he wrote his new book, his opinions on “Neutral Theory”, and what he thought about David Reich’s op-ed. Without Spencer’s supervision, I have to admit that I think I lost control and just went “full nerd”. Next week we’re dropping Carl Zimmer’s podcast, so rest assured that the world will come back into balance, and The Insight will be more welcoming to civilians!
At a certain point, Matt and I were discussing allele frequency differences between populations and he came close to saying all such differences between human populations were of modest frequency in relation to pairwise comparisons (e.g., 40% vs. 49%). Obviously, this is not true, because there is always the huge difference in SLC24A5 at SNP rs1426654 (at Duffy and a few other loci). A substitution of a G for an A converts the codon from alanine to threonine.
You have heard of this locus because of a paper in 2005, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. This paper came out in December of 2005, a few years after Armand Leroi wrote in Mutants that geneticists still hadn’t come to grips with normal variation in pigmentation in humans. The above publication was the first step in solving this question in the years between 2005 to 2010, at least to a good first approximation.
In the sample in the paper they explain 25-40% of the variation in melanin index between Africans and Europeans with this single genetic change (for various technical reasons it’s probably not that big an effect, though it is still big, and probably the largest effect quantitative trait locus for pigmentation in the human genome).
It turns out that this mutation, the derived variant, is almost disjoint is frequency between Europeans and Africans. That is, about ~100% of Africans carry the ancestry G base at while ~0% of Europeans carry the G base (as opposed to the A base). Interestingly, East Asians carry the G base at ~100% frequency as well. If you genotype an anonymous individual and their genotype is AG or GG on at rs1426654 then it is highly likely that that individual is not a European.
To give an example of how this works, in 2013 I stumbled onto a paper which genotyped 101 Europeans from Cape Town in South Africa. That means there are 202 alleles (two per person) at rs1426654. Of these, 5 of the alleles were ancestral (G). From this, I immediately concluded that it was highly likely that the Afrikaaner people of South Africa have non-European ancestry. I came to this conclusion because of 5 copies of the ancestral allele, ~2.5%, is shockingly high for a European population, and it was long surmised that the Afrikaaner people had some non-European heritage (Khoisan, Bantu, South and Southeast Asian) ancestry. The major of the whites sampled in Cape Town could have been Afrikaaners (I’ve confirmed this with genome-wide data).
To get a sense of where my intuitions come from you need to look at allele counts within populations. Using 1000 Genomes, Yale’s Alfred, and Gnomad I assembled a representative list to give you a sense of what’s going on. Using 126,548 counted alleles in Gnomad for individuals of European (non-Finnish) descent you see that 0.38% out of the total, 486, are ancestral.
|Population||Ancestral alleles||Total alleles||Freq|
|Greeks (Thrace, Athens)||0||184||0%|
|Pandit Brahmin, Kashmir||0||40||0%|
|Havyaka Brahmin, Karnataka||2||62||3%|
|Uttar Pradesh Brahmin||4||34||12%|
|Pandit Brahmin, Haryana||13||78||17%|
|Sri Lanka Tamil||105||204||51%|
|Austro-Asiatic tribe, Odisha||43||56||77%|
|Mende Sierra Leone||155||170||91%|
|Austro-Asiatic tribe, Odisha||92||96||96%|
Last fall Crawford et al. reported that rs1426654 is embedded in a haplotype that’s about ~30,000 years ago. Additionally, they contend that its presence within Africa is probably no earlier than the Holocene, the last ~12,000 years. Martin et al. report that KhoeSan exhibit higher frequencies of the derived allele because of Eurasian back-migration and then in situ natural selection. Of course, not all Eurasians. Most East Asians have the ancestral variant of rs1426654.
This leaves us with West Eurasians, North Africans, and South Asians. I’ve put a few South Asian populations in the list to show you that there is a wide range of variation in allele frequencies. The South Asians in Gnomad, probably mostly Diaspora, have the ancestral variant at only 22%. In contrast, Austro-Asiatic speaking South Asian groups from northeast India have very high frequencies of the ancestral variant. There has clearly been in situ selection in some South Asian populations for the derived variant at rs1426654. Ancestral North Indian groups (ANI) probably brought the derived allele, and Ancient Ancestral South Indians (AASI) probably tended to carry the ancestral allele, like East Eurasians and Oceanians. Additionally, South Asian populations often have high drift. Some of the differences in the Alfred data seem to be impacted by this.
The situation in the Middle East, North Africa, and Europe is different. In the Middle East and North Africa, the ancestral variant is present at frequencies around 1-10%. Some of this can probably be attributed to admixture from Africa and in some cases South and East Asian populations. Ancient DNA from the Middle East and North Africa presents a mixed picture. The farmers who brought the Neolithic to Europe carried the derived variant at rs1426654, and some of the ancient Middle Eastern samples carry it. But not all. The recent Iberiomauserian samples which date to ~15,000 years ago don’t seem to have had the derived variant.
Though the hunter-gatherers of Western Europe only seem to have carried the ancestral variant at rs1426654, the hunter-gatherers of Scandinavia and Eastern Europe did exhibit the derived variant in some frequency, though lower than modern Europeans.
My own hunch is that the original genetic background against which the A mutation at rs1426654 emerged will be found increasing in frequency first somewhere in the Near East after the Last Glacial Maximum. But no ancient population shows the frequencies of the derived variant we see in modern Europeans. In isolated populations subject to drift it wouldn’t be surprising if the ancestral variant decreased to ~0%, But in European populations today in the vast majority of cases the ancestral variant is far lower than 1%, even though we know that within the last 10,000 years the ancestral populations streams had several groups with very high frequencies of that ancestral variant. The low frequency is not due to a freakish bottleneck all across Europe. It has to be selection
One thing I have pointed out is that this very low frequency of the ancestral variant indicates that the advantage at rs1426654 for the A allele in Europe is additive. In Northern Europe, the frequency of the derived variant that confers lactase persistence tops out at around ~90 percent. We know this region of the genome has been targeted by natural selection, but lactase persistence also happens to express dominantly genetically. That is, one variant of the mutant allele confers the phenotype. Once you hit ~90 percent of the derived variant only ~1 percent of the population would be lactose intolerant homozygotes (two copies of the ancestral variant). In the Gnomad sample of 60,000+ Europeans, they count three homozygote genotypes rs1426654. That’s 0.005%.
Something is happening at rs1426654. Selection. But why? No one really has any explanation beyond the obvious.