Substack cometh, and lo it is good. (Pricing)

Selection is going on with SLC24A5….

The ancestral allele for rs1426654 at SLC24A5

 
On this week’s episode of The Insight, I talked to Matt Hahn about why he wrote his new book, his opinions on “Neutral Theory”, and what he thought about David Reich’s op-ed. Without Spencer’s supervision, I have to admit that I think I lost control and just went “full nerd”. Next week we’re dropping Carl Zimmer’s podcast, so rest assured that the world will come back into balance, and The Insight will be more welcoming to civilians!

At a certain point, Matt and I were discussing allele frequency differences between populations and he came close to saying all such differences between human populations were of modest frequency in relation to pairwise comparisons (e.g., 40% vs. 49%). Obviously, this is not true, because there is always the huge difference in SLC24A5 at SNP rs1426654 (at Duffy and a few other loci). A substitution of a G for an A converts the codon from alanine to threonine.

You have heard of this locus because of a paper in 2005, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. This paper came out in December of 2005, a few years after Armand Leroi wrote in Mutants that geneticists still hadn’t come to grips with normal variation in pigmentation in humans. The above publication was the first step in solving this question in the years between 2005 to 2010, at least to a good first approximation.

In the sample in the paper they explain 25-40% of the variation in melanin index between Africans and Europeans with this single genetic change (for various technical reasons it’s probably not that big an effect, though it is still big, and probably the largest effect quantitative trait locus for pigmentation in the human genome).

It turns out that this mutation, the derived variant, is almost disjoint is frequency between Europeans and Africans. That is, about ~100% of Africans carry the ancestry G base at while ~0% of Europeans carry the G base (as opposed to the A base). Interestingly, East Asians carry the G base at ~100% frequency as well. If you genotype an anonymous individual and their genotype is AG or GG on at rs1426654 then it is highly likely that that individual is not a European.

To give an example of how this works, in 2013 I stumbled onto a paper which genotyped 101 Europeans from Cape Town in South Africa. That means there are 202 alleles (two per person) at rs1426654. Of these, 5 of the alleles were ancestral (G). From this, I immediately concluded that it was highly likely that the Afrikaaner people of South Africa have non-European ancestry. I came to this conclusion because of 5 copies of the ancestral allele, ~2.5%, is shockingly high for a European population, and it was long surmised that the Afrikaaner people had some non-European heritage (Khoisan, Bantu, South and Southeast Asian) ancestry. The major of the whites sampled in Cape Town could have been Afrikaaners (I’ve confirmed this with genome-wide data).

To get a sense of where my intuitions come from you need to look at allele counts within populations. Using 1000 Genomes, Yale’s Alfred, and Gnomad I assembled a representative list to give you a sense of what’s going on. Using 126,548 counted alleles in Gnomad for individuals of European (non-Finnish) descent you see that 0.38% out of the total, 486, are ancestral.

PopulationAncestral allelesTotal allelesFreq
Samaritan0740%
Basque02160%
Greeks (Thrace, Athens)01840%
Burusho0500%
Pandit Brahmin, Kashmir0400%
European (Non-Finnish)4861265480%
Ashkenazi Jewish47101480%
European (Finnish)329257901%
Iraq Kurds1682%
Yemenite Jews2783%
Havyaka Brahmin, Karnataka2623%
Palestinian41223%
Gujarati102065%
Tunisian Berber61105%
Andalusian142526%
Iranian6847%
Pashtun2119011%
Uttar Pradesh Brahmin43412%
Pandit Brahmin, Haryana137817%
Punjabi4219222%
South Asian69213077422%
Kalash144829%
Telugu7120435%
Bangladeshi8017247%
Sri Lanka Tamil10520451%
Adi-Dravida, Karnataka213462%
Masai Kenya19228667%
Austro-Asiatic tribe, Odisha435677%
Luhya Kenya15518882%
Hausa687690%
Mende Sierra Leone15517091%
Gambian20922692%
Ibo909496%
Austro-Asiatic tribe, Odisha929696%
Esan Nigeria19319897%
Yoruba Nigeria21321699%
Biaka13513699%
East Asian187281885699%
Ghana140140100%
Mbuti7474100%

Last fall Crawford et al. reported that rs1426654 is embedded in a haplotype that’s about ~30,000 years ago. Additionally, they contend that its presence within Africa is probably no earlier than the Holocene, the last ~12,000 years.  Martin et al. report that KhoeSan exhibit higher frequencies of the derived allele because of Eurasian back-migration and then in situ natural selection. Of course, not all Eurasians. Most East Asians have the ancestral variant of rs1426654.

This leaves us with West Eurasians, North Africans, and South Asians. I’ve put a few South Asian populations in the list to show you that there is a wide range of variation in allele frequencies. The South Asians in Gnomad, probably mostly Diaspora, have the ancestral variant at only 22%. In contrast, Austro-Asiatic speaking South Asian groups from northeast India have very high frequencies of the ancestral variant. There has clearly been in situ selection in some South Asian populations for the derived variant at rs1426654. Ancestral North Indian groups (ANI) probably brought the derived allele, and Ancient Ancestral South Indians (AASI) probably tended to carry the ancestral allele, like East Eurasians and Oceanians. Additionally, South Asian populations often have high drift. Some of the differences in the Alfred data seem to be impacted by this.

The situation in the Middle East, North Africa, and Europe is different.  In the Middle East and North Africa, the ancestral variant is present at frequencies around 1-10%.  Some of this can probably be attributed to admixture from Africa and in some cases South and East Asian populations. Ancient DNA from the Middle East and North Africa presents a mixed picture. The farmers who brought the Neolithic to Europe carried the derived variant at rs1426654, and some of the ancient Middle Eastern samples carry it. But not all. The recent Iberiomauserian samples which date to ~15,000 years ago don’t seem to have had the derived variant.

Though the hunter-gatherers of Western Europe only seem to have carried the ancestral variant at rs1426654, the hunter-gatherers of Scandinavia and Eastern Europe did exhibit the derived variant in some frequency, though lower than modern Europeans.

My own hunch is that the original genetic background against which the A mutation at rs1426654 emerged will be found increasing in frequency first somewhere in the Near East after the Last Glacial Maximum. But no ancient population shows the frequencies of the derived variant we see in modern Europeans. In isolated populations subject to drift it wouldn’t be surprising if the ancestral variant decreased to ~0%, But in European populations today in the vast majority of cases the ancestral variant is far lower than 1%, even though we know that within the last 10,000 years the ancestral populations streams had several groups with very high frequencies of that ancestral variant. The low frequency is not due to a freakish bottleneck all across Europe. It has to be selection

One thing I have pointed out is that this very low frequency of the ancestral variant indicates that the advantage at rs1426654 for the A allele in Europe is additive. In Northern Europe, the frequency of the derived variant that confers lactase persistence tops out at around ~90 percent. We know this region of the genome has been targeted by natural selection, but lactase persistence also happens to express dominantly genetically. That is, one variant of the mutant allele confers the phenotype. Once you hit ~90 percent of the derived variant only ~1 percent of the population would be lactose intolerant homozygotes (two copies of the ancestral variant). In the Gnomad sample of 60,000+ Europeans, they count three homozygote genotypes rs1426654. That’s 0.005%.

Something is happening at rs1426654. Selection. But why? No one really has any explanation beyond the obvious.

16 thoughts on “Selection is going on with SLC24A5….

  1. “Something is happening at rs1426654. Selection. But why? No one really has any explanation beyond the obvious.”

    Sorry to have to ask, but what is “the obvious”?

  2. From a first glance, the Indian derived percentages seem higher than ANI ancestry, too – does this mean selection happened there as well? And does the hunch about it developing post glacial maximum come from native American alleles being ancestral?

  3. Kalash – Burusho comparison is illuminating here, the former has higher ancestral than South Asian average despite being “peak ANI”, the latter has no ancestral.

  4. what is “the obvious”?

    I think that the notion of the obvious is phenotypic advantage of the skin color at higher latitudes and/or cultural preference leading to sexual selection.

    No one denies that lighter skin color might confer intrinsic selective fitness advantages, such as less Vitamin D deficiency, in places where skin cancer and sunburn aren’t as serious survival threats, and camouflage relative to dark skin in snowy environments when hunting. In low latitudes even dark skin doesn’t lead to Vitamin D deficiency if you spend lots of time outside and skin cancer and sunburn are much greater selective fitness threats than in high altitudes. (Vitamin D can be obtained through direct consumption of dairy products or through exposure to sun which there is less of at high latitudes because the Earth’s axis is tilted relative to its plane of rotation around the Sun.)

    The concern is that the selective fitness advantages of having lighter skin color as a result of this gene naively would be expected to be too modest to be subject to such extreme selective pressure in Europe, particularly when this gene does not appear to be under similar selective pressure, for example, in places like East Asia, much of which has a climate similar to that of Europe. Dark skinned people in high latitudes today don’t die at particularly elevated levels or have fewer children in a manner causally related to having dark skin.

    A similar concern applies in the case of lactase persistence (which is less strongly selected for because it is a dominant gene, as mentioned). LP surely confers some intrinsic selective fitness advantage because more food flexibility is good. But, it isn’t at all obvious why it is the gene most intensely under selective pressure in the entire genome of Northern Europeans, dramatically increasing its frequency in about 20-30 generations.

    Interesting both skin color and LP have known functional relationships to Vitamin D levels, since Vitamin D, which in the early 1980s was thought of as mostly a bone strength and tooth formation supporting nutrient, is now recognized as an immune system supporting nutrient that is a rival in potency to Vitamin C. One plausible explanation that could explain both rounds of intense selection would be that Vitamin D was extremely protective against diseases that otherwise had extremely high mortality and reproduction costs in Bronze Age Europe (possibly diseases that arrived from the steppe with steppe people), such as small pox, cholera, the Black Plague, and infections leading to infant and maternal mortality, which may be non-obvious today because Vitamin D was less protective against later strains of those diseases. Vitamin D deficiency could also have had some other negative health or fertility effect that is now less severe because a third unknown gene was also selected for at the same time and mitigated the consequences of Vitamin D deficiency.

    Another possibility is that SLC24A5 results in gene expression of some other unknown phenotype that was wildly fitness enhancing then, but isn’t now. But, again, figuring out what that would be is not an easy question. What was a threat to survive then that isn’t a threat now and wasn’t a threat in Africa and South Asia and Southeast Asia and East Asia? Was there some European specific Neolithic package crop it aided in getting nutrients from, for example?

    The cultural angle would be that if people were able to notice that lighter skin in high latitudes (and dark skin in lower latitudes) was good for you, and that this gave rise to a general strong cultural preference for light skin could have emerged resulting in selection at levels far exceeding and disproportionate to the true intrinsic phenotypic benefit of the gene, something that is particularly easy to have happen in the case of SLC24A5 because it is so obviously visible so it is easy to sexually discriminate based upon it.

  5. Oops I missed that carelessly (you mention quite clearly in your post).

  6. (I understand if you ignore a second naive question, asking this sort of question also helps me to learn) I looked at your link to ALFRED: https://alfred.med.yale.edu/alfred/SiteTable1A_working.asp?siteuid=SI007419V

    I think I got my answer to the native American question (assuming the Muscogee etc. reflect admixture).

    Some austroasiatic tribes like Juang, Gond, Khorku seem outliers to the pattern of Indian austroasiatics?

    While the Uyghur seem far closer to Europeans as one would expect, the Hui and Hmong seem to deviate a bit from most other East Asians too. For Hui perhaps this may be “proportional” to their west Asian ancestry (you had written about their Y-chromosomes recently)??

    (I am assuming that each group like SA001818S refers to a collection of people, as many as the “N”, so that 2N is the number of alleles, which means the picture doesn’t just describe exceptions). Thanks again.

  7. “In the Gnomad sample of 60,000+ Europeans, they count three homozygote genotypes rs1426654. That’s 0.005%.”

    Is there any way to obtain the phenotype of these 3 homozygote Europeans? It might go some way towards understanding how “dark-skinned” the ancient WHG population with this ancestral variant might really have been. Your stats above suggest there would be a few hundred homozygotes in Finland to find.

    In terms of selection, I do wonder about the Masai at 33% derived. That portion seems higher than the reported Eurasian ancestry from back migration to East Africa. But it’s hard to imagine they lack Vitamin D. Moreover, they appear very dark.

  8. “Kalash – Burusho comparison is illuminating here, the former has higher ancestral than South Asian average despite being “peak ANI”, the latter has no ancestral.”

    “While the Uyghur seem far closer to Europeans as one would expect, the Hui and Hmong seem to deviate a bit from most other East Asians too. For Hui perhaps this may be “proportional” to their west Asian ancestry (you had written about their Y-chromosomes recently)??”

    Something odd is going on with the ALFRED numbers.

  9. Something weird about the ALFRED data.

    Gujarati, and Karnataka_Brahmin, less ancestral alleles than Andalusians and Berbers?

    Kalash and Bangladeshi are similar?

    I think someone noted the oddness of the ALFRED numbers at Eurogenes. Take those counts with a pinch of salt.

  10. Razib: My own hunch is that the original genetic background against which the A mutation at rs1426654 emerged will be found increasing in frequency first somewhere in the Near East after the Last Glacial Maximum.

    That’s a pretty good bet given the current state of play where Satsurblia Cave sample at 13kybp shows variant, and Villabruna samples, presumably from SE European-Anatolian refugium, at 14kybp don’t.

    (Though there’s a rumour of this variant at UP samples from Caucasus at roughly 20kybp. So if that turns out to be true, variant will have to at some frequency in populations older than post-LGM.)

    At post-LGM Near East, does mean variant spread despite fairly low rates of geneflow (given very low/absent Basal Eurasian in European post LGM HG samples with higher derived variant frequency). That does make me more dubious of why absence in WHG until late Spanish mesolithic Canes_Meso; because seems more likely there should be some continued geneflow with other HG groups in Europe with variant at reasonable frequencies, at levels at least what European HG would have with Near East.

    It seems pretty obvious now that agriculture can’t be much to do with rs1426654, not as a sole/parsimonious explanation; there are just too many HG subsistence cultures post-LGM, pre-agriculture with variant at high frequency / fixture (e.g. SHG 0.9, EHG 1, Iron Gates HG 0.5).

  11. KEVIN,

    “At this point its obvious the SNP rs1426654 is not highly responsible for lighter skin, at least not in south Asia.”

    Not at all true.

    Rather, as noted above it just seems that the ALFRED data is likely to be “wrong” when it comes to many populations.

    If I’m not mistaken (hopefully, my memory is serving me right), the Kalasha and the HGDP Pashtuns are actually at 0% for the ancestral alleles, Gujarati are at around 20%, and many South Indian groups are at around 50%-60% for the ancestral alleles.

  12. “The cultural angle would be that if people were able to notice that lighter skin in high latitudes (and dark skin in lower latitudes) was good for you, and that this gave rise to a general strong cultural preference for light skin…”

    That’s some pretty astute “noticing” on the part of those populations. Do we have any evidence of cultures selecting sexual partners mostly on the basis of perceived disease resistance? It also ignores the fact that pale skin is culturally valued in widely scattered places. East Asia, for instance. The hadiths and the Sira stress the whiteness of Mohammed’s skin. There are other examples. Sexual selection would be the answer, IMO.

  13. i’m fixating on the copy count and # of homozygotes for a reason: it suggests that two copies of the derived allele are more fit than the single by a sig amt. and yet phenotypically (see 2005 paper) the derived substitution is semi-dominant. this is why sexual selection on skin color doesn’t make sense to explain the frequency.

    if the above confuses you, perhaps you are too stupid or ignorant to leave evolutionary speculations on this blog. thanks.

  14. Extremely strong selection with strong regional differences that don’t seem to correlate with obvious environmental differences suggests selection for resistance to a specific pathogen to me, but that might just be my microbiologist bias speaking. There are known links between melanin production and immunity against fungal pathogens, I wonder if anyone has looked for phenotypic differences in antifungal immunity at this locus?

Comments are closed.