Selection for pigmentation loci…but not pigmentation?

About a year and a half ago at ASHG, I had a discussion with Dan Ju and Iain Mathieson about their work on ancient pigmentation. Or, more precisely, ancient pigmentation related genes. Now it’s out in a preprint, The evolution of skin pigmentation associated variation in West Eurasia:

…It is unclear whether selection has operated on all the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1158 individuals from West Eurasia covering a period of 40,000 years combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on skin pigmentation variants ascertained in the UK Biobank, but find this signal is driven mostly by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants; rather, only the top five show strong evidence of selection. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.

There are a lot of moving parts in this preprint. Look closely, and you will notice that the authors are careful to stipulate that they can’t really infer the pigmentation of ancient peoples, only the alleles ascertained in modern populations. This matters, because naive deployments of polygenic risk score models trained on modern populations projected on ancient ones seem highly suspect. I’m thinking here mostly of the “Cheddar Man is black” meme. It is true that using modern SNP batteries Mesolithic Europeans are predicted to be rather dark-skinned, but higher latitude humans tend to be paler, on average, than lower latitude humans (albeit, not as pale as the typical Northern European!). But, we can be sure about the alleles we do know about, and, their likely effect (the functional understanding of these pathways is pretty good).

The best modern genetic analyses of pigmentation suggest that variation is dominated by some large-effect loci, but that there is a large residual of smaller-effect loci segregating within the population (I’ve seen 50% accounted for with SNPs, and 50% as “ancestry”, which really masks small-effect QTLs). This is in contrast with the architecture in height, where there are few large-effect loci, and almost all of the variance is small-effect loci. What Ju et al. confirm is that selection “for pigmentation” is due to the large-effect loci; there’s no polygenic selection detectable on the smaller-effect loci for the ancient populations. Importantly, the change in allele frequency isn’t just due to admixture. It’s also due to selection after admixture.

I use quotes above because honestly, I think these sorts of results make it unclear what the selection was for. The general prior is conditioned on the fact that even after a few decades we still think of EDAR as a hair-thickness gene, but it’s one of the strongest signals of selection in the human genome. The “light” allele in SLC24A5 is at an incredibly high frequency in Europe today, and has increased in the last 4,000 years. Though this SNP is impactful for the complexion, it’s hard to imagine how strong selection must be to drive it from 95% to 99.5% (as per 2005 paper on this SNP, the “light” allele exhibits some phenotypic dominance).

As noted in the preprint, there’s not enough data on other regions of the world. It’s hard to assess what’s going on Europe without assessing other regions. The authors do present an intriguing suggestion: that lighter pigmentation in East Asia is driven by smaller-effect genes shifted through polygenic selection.

I’ll present a strange hypothesis: selection for lighter skin at high latitudes through polygenic selection on standing variation naturally takes populations to the coloring of Northeast Asians. But very light complexion, as you see in Northern Europe, could be due to strong selection on the large-effect pigmentation genes, and pigmentation itself may simply be a side effect due to a genetic correlation with the true target of selection.


Assessing the utility of models in ancient DNA admixture analyses

Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture:

qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and non-human) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm.

The Reich lab provides its software and data. It’s really not that hard to replicate and tweak some of the analyses they do in their papers (check the supplements for the detailed specifications of the parameters). I’ve done many times when I got curious about a detail they hadn’t explored.

The preprint above is a valuable addition to the intuitions one can develop through using the packages.


The Knanaya of Kerala do seem a bit more Near Eastern than other St. Thomas Christians

Click to enlarge

Last year I was approached by someone from the Knanaya community of South India as to their genetics. The Knanaya believe themselves to be descendants of later Near Eastern migrants than the other Nasrani St. Thomas Christians (both communities seem to believe in some connection to Near Eastern Jews). The history of these communities is complex, but they are rooted in the Oriental Orthodox Christianity of Iraq and the Levant. You might be curious to note that the largest number of individuals associated with the Syrian Orthodox Church are South Indians.

For some context, I’d recommend The Lost History of Christianity by Philip Jenkins.

With some preliminary analysis, it did seem like the Knanaya community was enriched for Near Eastern ancestry, even compared to the other Nasrani samples I had. Recently I’ve been given a total of 11 samples of Knanaya, so I decided to do some further analysis (2 of the individuals seem somewhat related, so they are not independent data points).

If you look at the plot above, you can see that the y-axis is PC 3. This separates Northern European samples (Belorussians and Lithuanians) at one end and Yemeni Jews at the other. Groups such as Armenians are in the middle. You can see that some groups, such as the Mumbai Jews (Bene Israel), Cochin Jews, and, the Knanaya, do seem shifted toward the Yemeni Jews. Groups in the Levant and minorities like Assyrians are usually about 2/3 “northern” and 1/3 “southern” in ancestry.

To get a better sense of that, take a look at the Admixture barplot below.

Click to enlarge

This is a supervised run with several reference populations. The light blue are Yemeni Jews, and you can see quite clearly that the Knanaya show evidence of this ancestry, while most other Indian populations do not.

To get a sense of the ratio of northern Middle Eastern vs. southern Middle Eastern, here are the results for the Druze:

TreeMix is a little more ambivalent:

Click to enlarge

The flow to the Cochin and Mumbai Jewish groups is clearer (or from in the latter case). I think the history of the Kerala Christians, and the Knanaya in particular, is more complex.

I’ll probably run some more stats tomorrow to see what the best donor population is…


The details of Eurasian back-migration into Africa

Carl Zimmer has an interesting write-up on the new method to detect Neanderthal ancestry in Africa, Neanderthal Genes Hint at Much Earlier Human Migration From Africa. There are two quotes from researchers that are of note.

First, from David Reich:

Despite his hesitation over the analysis of African DNA, Dr. Reich said the new findings do make a strong case that modern humans departed Africa much earlier than thought.

“I was on the fence about that, but this paper makes me think it’s right,” he said.

It’s possible that humans and Neanderthals interbred at other times, and not just 200,000 years ago and again 60,000 years ago. But Dr. Akey said that these two migrations accounted for the vast majority of mixed DNA in the genomes of living humans and Neanderthal fossils.

Over the years I have had several discussions with members of the Reich lab about whether there was a major migration of the antecedent lineage of modern humans before the one that we detect 60,000 years ago. Many were quite skeptical because of the lack of clear genetic signal of anything before 60,000 years ago, as well as its correlation with a strong archaeological record. But, it seems now that David Reich at least is convinced that the evidence of admixture into Neanderthals means that there were descendants of the same lineage that led to the major “Out of Africa” expansion 60,000 years ago who had spread earlier (though the footprint was small, and their impact on later humans difficult to detect).

Second, Sarah Tishkoff says something that I forgot to mention in my earlier post:

Sarah Tishkoff, a geneticist at the University of Pennsylvania, is doing just that, using the new methods to look for Neanderthal DNA in more Africans to test Dr. Akey’s hypothesis.

Still, she wonders how Neanderthal DNA could have spread between populations scattered across the entire continent.

The second part isn’t that inexplicable. In the paper, they mention that they don’t have the power to analyze small sample numbers. So they focused on the 1000 Genomes samples, which are from West and East Africa. From agriculturalist and agro-pastoralist populations. If you listen to this week’s episode of The Insight Spencer and I talk extensively about the recent agriculturally mediated expansions within Africa. Much of the genetic landscape of the continent is novel, new, and of short historical time-depth. The Africa of Old Kingdom Egypt, 4,500 years ago, was very different.

As hinted by Tishkoff the key is going to be when we get samples from hunter-gatherers. Some of these have much lower Eurasian affinities, and likely they’ll carry less Neanderthal ancestry.

On a final note, this paper and the first author, Joshua Akey, hints at some resolution in the interminable disagreement about continuous gene flow vs. pulse admixture. Some of the methods to infer and detect admixture assume pulse admixture, and so our conception of the past has been skewed. On the other hand, I think it is plausible that in a patchy low population density Paleolithic landscape continuous gene flow may have been quite attenuated over long distances. Admixture then would occur when there were cultural revolutions and long-distance contact for short periods of time, before an equilibration. Basically, it’s some of both.


If you meet a model, kill it!

If you are awake in the year 2019 you have heard of “machine learning.” And, if you listened to my podcast The Insight you know that Andy Kern’s lab at University of Oregon is leveraging machine learning (and “deep learning” and “neural networks”) for population genetics.

Now, obviously in population genetics, you know that models are a big deal. The Hardy-Weinberg model. The coalescent. Various models of selection against which you can test data. This is not a coincidence. In the 20th-century population genetics was a data-poor field and a lot of work was done in the theoretical space since that’s where the work could be done (here’s to you two-locus models of selection from the 1970s!).

In the 2000s genomics transformed the landscape. All of a sudden there was a surfeit of data. On the one hand, this meant that there was a lot of material for models to work on. On the other hand, it turns out that some models aren’t too scalable to big data, nor do they turn out to be very robust (one reason for the persistence of single-locus phylogenetic models around mtDNA and Y is their elegant tractability).

This is where a “bottom-up” machine learning approach comes into the picture. Kern’s group just came out with new a preprint I’ve been hearing about for a while, Predicting Geographic Location from Genetic Variation with Deep Neural Networks:

Most organisms are more closely related to nearby than distant members of their species, creating spatial autocorrelations in genetic data. This allows us to predict the location of origin of a genetic sample by comparing it to a set of samples of known geographic origin. Here we describe a deep learning method, which we call Locator, to accomplish this task faster and more accurately than existing approaches. In simulations, Locator infers sample location to within 4.1 generations of dispersal and runs at least an order of magnitude faster than a recent model-based approach. We leverage Locator’s computational efficiency to predict locations separately in windows across the genome, which allows us to both quantify uncertainty and describe the mosaic ancestry and patterns of geographic mixing that characterize many populations. Applied to whole-genome sequence data from Plasmodium parasites, Anopheles mosquitoes, and global human populations, this approach yields median test errors of 16.9km, 5.7km, and 85km, respectively.

Reads of this weblog can jump to the empirical examples of the HGDP. They make sense, and I especially liked the local ancestry deconvolution analysis and variation in predictive power conditional on recombination.

Sometimes quantity has a quality all its own, and the eye-opening aspect of locator is how it can test a lot of propositions quickly (this is more important in the era of WGS datasets).  It’s no joke that dispensing with a model can speed things up.

One minor element I’ll note is that getting locator installed is not trivial from what I have seen. Especially the tensorFlow dependency. So I’ll probably have more updates once I get it up and running myself.


If marrying cousins is so bad why does everyone want to marry their cousins?

The above figure illustrates the geographic distribution of the prevalence of people marrying people closely related to them. Mostly this involves cousin marriage. Most people know the urban legends around the debilities that occur due to cousin marriage, but traditionally the focus has been on rare recessive diseases (e.g., albinism). Now, a massive new study has been published (more than 400 authors, with sample sizes for 1 million or more for some characteristics) looking at a variety of traits, Associations of autozygosity with a broad range of human phenotypes:

In many species, the offspring of related parents suffer reduced reproductive success, a phenomenon known as inbreeding depression. In humans, the importance of this effect has remained unclear, partly because reproduction between close relatives is both rare and frequently associated with confounding social factors. Here, using genomic inbreeding coefficients (FROH) for >1.4 million individuals, we show that FROH is significantly associated (p < 0.0005) with apparently deleterious changes in 32 out of 100 traits analysed. These changes are associated with runs of homozygosity (ROH), but not with common variant homozygosity, suggesting that genetic variants associated with inbreeding depression are predominantly rare. The effect on fertility is striking: FROH equivalent to the offspring of first cousins is associated with a 55% decrease [95% CI 44–66%] in the odds of having children. Finally, the effects of FROH are confirmed within full-sibling pairs, where the variation in FROH is independent of all environmental confounding.

The offspring of first cousins have on average 0.10 fewer children. On an individual level, this is not that great of an effect. But in an evolutionary population genetics sense this is a serious selection coefficient.

On the whole, the paper is impressive in its scope. There are even sibling analyses to confirm the impact of runs of homozygosity causing problems due to rare alleles (since this paper involved r.o.h, of course, Jim Wilson is involved!).

Rather, I want to ask: if inbreeding is so bad genetically and biologically, why is it so common? One of the consequences of the Protestant Reformation is that the Roman Catholic Church’s strict enforcement of consanguinity rules were dropped, and cousin marriage became much more common among elites (such as the Darwin-Wedgewood family). The material rationale for cousin marriage is actually rather straightforward, in that it keeps accumulated property and power within the extended lineage. Marriages between children of brothers may cement alliances, while matrilocality and marriages between cross-cousins in South India have been associated with lower domestic abuse rates (in contrast, in North India strongly enforced exogamy has been associated with the idea that women marry into an alien household).

I would suggest perhaps that though marriages between relatives are biologically disfavored, there are many cases where it is culturally beneficial. In societies where collective family units engage in inter-group competition, some level of consanguinity may benefit cohesion. Other societies where individualism is more operative may exhibit no such incentives.

Note: I don’t see great evidence of purging genetic load in populations with more inbreeding. The rare variants are probably replenished constantly through mutation?


Phenotype does not imply ancestry (always)

One of the questions I often get relate to whether “trait X comes from population Y and does that mean if one has trait X that one has more ancestry from population Y.” To give an illustration, I have had people ask “I have blue eyes, does that mean I am more ‘Western Hunter-Gather’ than other people?”

One issue is that though the WHG tended toward high frequency of the derived OCA2-HERC2 haplotype, other populations clearly carried it, the other is that admixture is so far in the past that having blue or brown eyes is not informative to any degree of ancestry. There were probably relict populations of WHG less than 4,000 years ago (David has mentioned of a sample less than 3,000 years ago in Scandinavia), but the admixture of WHG into other groups was very long ago. More than 1,500 generations ago. To a great extent, it seems plausible that even within populations variation in ancestral fractions should be marginal to non-existent.

But this is a verbal model. A new preprint on bioRxiv has posted a formal model that outlines the different parameters that shape the trajectory of this decoupling between phenotype and ancestry. Assortative mating and the dynamical decoupling of genetic admixture levels from phenotypes that differ between source populations:

Source populations for an admixed population can possess distinct patterns of genotype and phenotype at the beginning of the admixture process. Such differences are sometimes taken to serve as markers of ancestry—that is, phenotypes that are initially associated with the ancestral background in one source population are taken to reflect ancestry in that population. Examples exist, however, in which genotypes or phenotypes initially associated with ancestry in one source population have decoupled from overall admixture levels, so that they no longer serve as proxies for genetic ancestry. We develop a mechanistic model for describing the joint dynamics of admixture levels and phenotype distributions in an admixed population. The approach includes a quantitative-genetic model that relates a phenotype to underlying loci that affect its trait value. We consider three forms of mating. First, individuals might assort in a manner that is independent of the overall genetic admixture level. Second, individuals might assort by a quantitative phenotype that is initially correlated with the genetic admixture level. Third, individuals might assort by the genetic admixture level itself. Under the model, we explore the relationship between genetic admixture level and phenotype over time, studying the effect on this relationship of the genetic architecture of the phenotype. We find that the decoupling of genetic ancestry and phenotype can occur surprisingly quickly, especially if the phenotype is driven by a small number of loci. We also find that positive assortative mating attenuates the process of dissociation in relation to a scenario in which mating is random with respect to genetic admixture and with respect to phenotype. The mechanistic framework suggests that in an admixed population, a trait that initially differed between source populations might be a reliable proxy for ancestry for only a short time, especially if the trait is determined by relatively few loci. The results are potentially relevant in admixed human populations, in which phenotypes that have a perceived correlation with ancestry might have social significance as ancestry markers, despite declining correlations with ancestry over time.

There are a lot of words and math. It’s quite gnarly. But the figure at the top of the post shows the major effect.


– loci in a trait (e.g., height) means that association between ancestry and trait decays more slowly
– stronger assortative mating of phenotype means that the association between ancestry and trait decays more slowly
– stronger assortative mating on ancestry means that the association between ancestry and trait decays more slowly

Since historically people did not have individualized genome-wide ancestry results “assortative mating on ancestry” means by physical appearance in the generality. To me panel E above is really what you should focus on. About 10 genes impact the phenotype, and assortative mating is at 0.5 (between 0 and 1.0). You see the correlation is already only ~0.50 between genome-wide ancestry and the trait in about 10 generations.

Anyway, dig into the math. I read the whole thing but didn’t go over the math in detail. The model and simulations make intuitive sense. I’d be curious how they fit empirical results (which are cited in the paper).


Extreme inbreeding is bad

If you read a book like Principles of Population Genetics, or know a little animal breeding, you know inbreeding has some serious consequences. The UK Biobank turns out to have about ~100 individuals who are the products of extreme inbreeding (EI). That is, they are the offspring of parent-child pairings or full-sibling pairings, as inferred from the runs of homozygosity in their genomes (there are lots).

Intuition, theory, and a few results tell us that these individuals will have issues. Genomics confirms. Extreme inbreeding in a European ancestry sample from the contemporary UK population:

In most human societies, there are taboos and laws banning mating between first- and second-degree relatives, but actual prevalence and effects on health and fitness are poorly quantified. Here, we leverage a large observational study of ~450,000 participants of European ancestry from the UK Biobank (UKB) to quantify extreme inbreeding (EI) and its consequences. We use genotyped SNPs to detect large runs of homozygosity (ROH) and call EI when >10% of an individual’s genome comprise ROHs. We estimate a prevalence of EI of ~0.03%, i.e., ~1/3652. EI cases have phenotypic means between 0.3 and 0.7 standard deviation below the population mean for 7 traits, including stature and cognitive ability, consistent with inbreeding depression estimated from individuals with low levels of inbreeding. Our study provides DNA-based quantification of the prevalence of EI in a European ancestry sample from the UK and measures its effects on health and fitness traits.

The two major caveats are I’d put out there is that UK Biobank sample is a bit healthier and better educated than the average British person, and, the rates of individuals who were adopted is considerably higher in people who are products of EI than is the norm. In other words, these people are from an atypical sample, and they are themselves somewhat atypical (since they were given up for adoption they likely had no idea they were the products of EI).


The genetic discovery of France

Finally, a deep drive into the population genetic structure of France, The Genetic History of France:

…These clusters match extremely well the geography and overlap with historical and linguistic divisions of France. By modeling the relationship between genetics and geography using EEMS software, we were able to detect gene flow barriers that are similar in the two cohorts and corresponds to major French rivers or mountains…A marked bottleneck is also consistently seen in the two datasets starting in the fourteenth century when the Black Death raged in Europe.

Nothing too surprising. In a nation of France’s size without strong socio-cultural dynamics that might encourage endogamy, it makes sense that geographic barriers are very important in structure. That being said, there does seem to be a correspondence between deep linguistic differences which date back to antiquity. Additionally, the people of Brittany turn out to be more “British” than not. This is not entirely surprising since the Breton dialect descends from the Brythonic language brought bystanders Celtic Britons (its closest relative is quasi-extinct Cornish).

I do wonder though how much France being a “target” nation for immigration over the centuries has shaped some of these patterns. I’m not talking here about recent non-European immigration, but the migration of Spaniards, Italians, and Poles, in the 19th-century, and earlier. Until the rise of Britain in the 18th-century France had been the largest, most powerful, and in the aggregate wealthiest, Western European nation in the post-Roman world. I suspect that this results in long-term trends toward cosmopolitanism genetically that might be absent in a few populations, such as the French Basque (who are distinct in these data).


Uyghur genetics and Kenneth Kidd – going beneath the surface

The latest episode of NPR’s “Planet Money” was interesting to me and touched upon issues I’ve been thinking on a lot. Stuck In China’s Panopticon has a genetic angle. The Chinese government seems to be identifying and tracking Uyghurs with genetics. Or at least has the capability to do so. That is, in part, thanks to the work of Kenneth Kidd.

If you have read this weblog for a long time, or are a geneticist, you know who Kenneth Kidd is. You may have used his Alfred database. Though Wikipedia states that Kidd has been doing science in China since 1981, the podcast suggested that Kidd’s work under scrutiny dates to 2010.

That’s important. Because the reality is that the Chinese government did not need this late sampling to genetically identify Uyghurs. The HGDP data set has 10 Uyghurs already. People had been publishing on the pop genetics of the Uyghurs for more than 10 years by the time Kidd did his sampling. Alfred has 94 Uyghurs. This is better than 10, but for forensic purposes of ethnic identification, it’s probably superfluous.

In 2008 two Chinese researchers had already published a population genetic analysis with a bigger sample size than the HGDP. Kidd is not on the author list, so I don’t think he was involved.

Basically, Uyghurs are a group that will show admixture between various East and West Eurasian ancestry components many generations ago. This was already known before 2010. Only a few groups within China, such as Kazakhs, are even close to similar in their profile.

There is one area where I think Kidd’s work may have been pushing the frontier a bit: doing genealogical matching on diverse Uyghurs. Though I can’t imagine you could get more close relatives, the greater geographic diversity would probably implicate many more pedigrees.

Ultimately I don’t think the big picture is about Kenneth Kidd. Yes, forensics, genetics, and the  Chinese government give many Americans nightmares. But thousands and thousands of scientists in America do work in China, with China, or are themselves of Chinese origin. American researchers develop technology that is later used in China to clamp down on various dissenters from the regime in an authoritarian manner. American consumers purchase goods and services that power the Chinese economy. American researchers collaborate with Chinese researchers and have indirectly furthered Chinese institutions such as the Beijing Genomics Institute.

I think we need to be honest that this implicates all of us in a globalized “just-in-time” world economy. Do the reporters interviewing Kidd use iPhones made in China?

And, it even goes well beyond China. In general, I think the United States is a force for good. But, as the world’s current superpower we have done some nasty things. Our democratically elected presidents, all of the recent ones, have sent people to their deaths for the good of the world (so they thought). We have intervened in nations and caused massive destruction and death, even though we meant well. Many non-Americans have a deep suspicion of our nation because of the dark shadow that it casts in certain circumstances.

There are bigger questions about power, morality, and individual responsibility and culpability that I wish we’d address, rather than focusing on a single researcher. Especially when I don’t think Kidd’s work was nearly as necessary and essential as the media portrays it.