Assessing the utility of models in ancient DNA admixture analyses

Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture:

qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and non-human) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm.

The Reich lab provides its software and data. It’s really not that hard to replicate and tweak some of the analyses they do in their papers (check the supplements for the detailed specifications of the parameters). I’ve done many times when I got curious about a detail they hadn’t explored.

The preprint above is a valuable addition to the intuitions one can develop through using the packages.


If marrying cousins is so bad why does everyone want to marry their cousins?

The above figure illustrates the geographic distribution of the prevalence of people marrying people closely related to them. Mostly this involves cousin marriage. Most people know the urban legends around the debilities that occur due to cousin marriage, but traditionally the focus has been on rare recessive diseases (e.g., albinism). Now, a massive new study has been published (more than 400 authors, with sample sizes for 1 million or more for some characteristics) looking at a variety of traits, Associations of autozygosity with a broad range of human phenotypes:

In many species, the offspring of related parents suffer reduced reproductive success, a phenomenon known as inbreeding depression. In humans, the importance of this effect has remained unclear, partly because reproduction between close relatives is both rare and frequently associated with confounding social factors. Here, using genomic inbreeding coefficients (FROH) for >1.4 million individuals, we show that FROH is significantly associated (p < 0.0005) with apparently deleterious changes in 32 out of 100 traits analysed. These changes are associated with runs of homozygosity (ROH), but not with common variant homozygosity, suggesting that genetic variants associated with inbreeding depression are predominantly rare. The effect on fertility is striking: FROH equivalent to the offspring of first cousins is associated with a 55% decrease [95% CI 44–66%] in the odds of having children. Finally, the effects of FROH are confirmed within full-sibling pairs, where the variation in FROH is independent of all environmental confounding.

The offspring of first cousins have on average 0.10 fewer children. On an individual level, this is not that great of an effect. But in an evolutionary population genetics sense this is a serious selection coefficient.

On the whole, the paper is impressive in its scope. There are even sibling analyses to confirm the impact of runs of homozygosity causing problems due to rare alleles (since this paper involved r.o.h, of course, Jim Wilson is involved!).

Rather, I want to ask: if inbreeding is so bad genetically and biologically, why is it so common? One of the consequences of the Protestant Reformation is that the Roman Catholic Church’s strict enforcement of consanguinity rules were dropped, and cousin marriage became much more common among elites (such as the Darwin-Wedgewood family). The material rationale for cousin marriage is actually rather straightforward, in that it keeps accumulated property and power within the extended lineage. Marriages between children of brothers may cement alliances, while matrilocality and marriages between cross-cousins in South India have been associated with lower domestic abuse rates (in contrast, in North India strongly enforced exogamy has been associated with the idea that women marry into an alien household).

I would suggest perhaps that though marriages between relatives are biologically disfavored, there are many cases where it is culturally beneficial. In societies where collective family units engage in inter-group competition, some level of consanguinity may benefit cohesion. Other societies where individualism is more operative may exhibit no such incentives.

Note: I don’t see great evidence of purging genetic load in populations with more inbreeding. The rare variants are probably replenished constantly through mutation?


Phenotype does not imply ancestry (always)

One of the questions I often get relate to whether “trait X comes from population Y and does that mean if one has trait X that one has more ancestry from population Y.” To give an illustration, I have had people ask “I have blue eyes, does that mean I am more ‘Western Hunter-Gather’ than other people?”

One issue is that though the WHG tended toward high frequency of the derived OCA2-HERC2 haplotype, other populations clearly carried it, the other is that admixture is so far in the past that having blue or brown eyes is not informative to any degree of ancestry. There were probably relict populations of WHG less than 4,000 years ago (David has mentioned of a sample less than 3,000 years ago in Scandinavia), but the admixture of WHG into other groups was very long ago. More than 1,500 generations ago. To a great extent, it seems plausible that even within populations variation in ancestral fractions should be marginal to non-existent.

But this is a verbal model. A new preprint on bioRxiv has posted a formal model that outlines the different parameters that shape the trajectory of this decoupling between phenotype and ancestry. Assortative mating and the dynamical decoupling of genetic admixture levels from phenotypes that differ between source populations:

Source populations for an admixed population can possess distinct patterns of genotype and phenotype at the beginning of the admixture process. Such differences are sometimes taken to serve as markers of ancestry—that is, phenotypes that are initially associated with the ancestral background in one source population are taken to reflect ancestry in that population. Examples exist, however, in which genotypes or phenotypes initially associated with ancestry in one source population have decoupled from overall admixture levels, so that they no longer serve as proxies for genetic ancestry. We develop a mechanistic model for describing the joint dynamics of admixture levels and phenotype distributions in an admixed population. The approach includes a quantitative-genetic model that relates a phenotype to underlying loci that affect its trait value. We consider three forms of mating. First, individuals might assort in a manner that is independent of the overall genetic admixture level. Second, individuals might assort by a quantitative phenotype that is initially correlated with the genetic admixture level. Third, individuals might assort by the genetic admixture level itself. Under the model, we explore the relationship between genetic admixture level and phenotype over time, studying the effect on this relationship of the genetic architecture of the phenotype. We find that the decoupling of genetic ancestry and phenotype can occur surprisingly quickly, especially if the phenotype is driven by a small number of loci. We also find that positive assortative mating attenuates the process of dissociation in relation to a scenario in which mating is random with respect to genetic admixture and with respect to phenotype. The mechanistic framework suggests that in an admixed population, a trait that initially differed between source populations might be a reliable proxy for ancestry for only a short time, especially if the trait is determined by relatively few loci. The results are potentially relevant in admixed human populations, in which phenotypes that have a perceived correlation with ancestry might have social significance as ancestry markers, despite declining correlations with ancestry over time.

There are a lot of words and math. It’s quite gnarly. But the figure at the top of the post shows the major effect.


– loci in a trait (e.g., height) means that association between ancestry and trait decays more slowly
– stronger assortative mating of phenotype means that the association between ancestry and trait decays more slowly
– stronger assortative mating on ancestry means that the association between ancestry and trait decays more slowly

Since historically people did not have individualized genome-wide ancestry results “assortative mating on ancestry” means by physical appearance in the generality. To me panel E above is really what you should focus on. About 10 genes impact the phenotype, and assortative mating is at 0.5 (between 0 and 1.0). You see the correlation is already only ~0.50 between genome-wide ancestry and the trait in about 10 generations.

Anyway, dig into the math. I read the whole thing but didn’t go over the math in detail. The model and simulations make intuitive sense. I’d be curious how they fit empirical results (which are cited in the paper).


Extreme inbreeding is bad

If you read a book like Principles of Population Genetics, or know a little animal breeding, you know inbreeding has some serious consequences. The UK Biobank turns out to have about ~100 individuals who are the products of extreme inbreeding (EI). That is, they are the offspring of parent-child pairings or full-sibling pairings, as inferred from the runs of homozygosity in their genomes (there are lots).

Intuition, theory, and a few results tell us that these individuals will have issues. Genomics confirms. Extreme inbreeding in a European ancestry sample from the contemporary UK population:

In most human societies, there are taboos and laws banning mating between first- and second-degree relatives, but actual prevalence and effects on health and fitness are poorly quantified. Here, we leverage a large observational study of ~450,000 participants of European ancestry from the UK Biobank (UKB) to quantify extreme inbreeding (EI) and its consequences. We use genotyped SNPs to detect large runs of homozygosity (ROH) and call EI when >10% of an individual’s genome comprise ROHs. We estimate a prevalence of EI of ~0.03%, i.e., ~1/3652. EI cases have phenotypic means between 0.3 and 0.7 standard deviation below the population mean for 7 traits, including stature and cognitive ability, consistent with inbreeding depression estimated from individuals with low levels of inbreeding. Our study provides DNA-based quantification of the prevalence of EI in a European ancestry sample from the UK and measures its effects on health and fitness traits.

The two major caveats are I’d put out there is that UK Biobank sample is a bit healthier and better educated than the average British person, and, the rates of individuals who were adopted is considerably higher in people who are products of EI than is the norm. In other words, these people are from an atypical sample, and they are themselves somewhat atypical (since they were given up for adoption they likely had no idea they were the products of EI).


The genetic discovery of France

Finally, a deep drive into the population genetic structure of France, The Genetic History of France:

…These clusters match extremely well the geography and overlap with historical and linguistic divisions of France. By modeling the relationship between genetics and geography using EEMS software, we were able to detect gene flow barriers that are similar in the two cohorts and corresponds to major French rivers or mountains…A marked bottleneck is also consistently seen in the two datasets starting in the fourteenth century when the Black Death raged in Europe.

Nothing too surprising. In a nation of France’s size without strong socio-cultural dynamics that might encourage endogamy, it makes sense that geographic barriers are very important in structure. That being said, there does seem to be a correspondence between deep linguistic differences which date back to antiquity. Additionally, the people of Brittany turn out to be more “British” than not. This is not entirely surprising since the Breton dialect descends from the Brythonic language brought bystanders Celtic Britons (its closest relative is quasi-extinct Cornish).

I do wonder though how much France being a “target” nation for immigration over the centuries has shaped some of these patterns. I’m not talking here about recent non-European immigration, but the migration of Spaniards, Italians, and Poles, in the 19th-century, and earlier. Until the rise of Britain in the 18th-century France had been the largest, most powerful, and in the aggregate wealthiest, Western European nation in the post-Roman world. I suspect that this results in long-term trends toward cosmopolitanism genetically that might be absent in a few populations, such as the French Basque (who are distinct in these data).


Uyghur genetics and Kenneth Kidd – going beneath the surface

The latest episode of NPR’s “Planet Money” was interesting to me and touched upon issues I’ve been thinking on a lot. Stuck In China’s Panopticon has a genetic angle. The Chinese government seems to be identifying and tracking Uyghurs with genetics. Or at least has the capability to do so. That is, in part, thanks to the work of Kenneth Kidd.

If you have read this weblog for a long time, or are a geneticist, you know who Kenneth Kidd is. You may have used his Alfred database. Though Wikipedia states that Kidd has been doing science in China since 1981, the podcast suggested that Kidd’s work under scrutiny dates to 2010.

That’s important. Because the reality is that the Chinese government did not need this late sampling to genetically identify Uyghurs. The HGDP data set has 10 Uyghurs already. People had been publishing on the pop genetics of the Uyghurs for more than 10 years by the time Kidd did his sampling. Alfred has 94 Uyghurs. This is better than 10, but for forensic purposes of ethnic identification, it’s probably superfluous.

In 2008 two Chinese researchers had already published a population genetic analysis with a bigger sample size than the HGDP. Kidd is not on the author list, so I don’t think he was involved.

Basically, Uyghurs are a group that will show admixture between various East and West Eurasian ancestry components many generations ago. This was already known before 2010. Only a few groups within China, such as Kazakhs, are even close to similar in their profile.

There is one area where I think Kidd’s work may have been pushing the frontier a bit: doing genealogical matching on diverse Uyghurs. Though I can’t imagine you could get more close relatives, the greater geographic diversity would probably implicate many more pedigrees.

Ultimately I don’t think the big picture is about Kenneth Kidd. Yes, forensics, genetics, and the  Chinese government give many Americans nightmares. But thousands and thousands of scientists in America do work in China, with China, or are themselves of Chinese origin. American researchers develop technology that is later used in China to clamp down on various dissenters from the regime in an authoritarian manner. American consumers purchase goods and services that power the Chinese economy. American researchers collaborate with Chinese researchers and have indirectly furthered Chinese institutions such as the Beijing Genomics Institute.

I think we need to be honest that this implicates all of us in a globalized “just-in-time” world economy. Do the reporters interviewing Kidd use iPhones made in China?

And, it even goes well beyond China. In general, I think the United States is a force for good. But, as the world’s current superpower we have done some nasty things. Our democratically elected presidents, all of the recent ones, have sent people to their deaths for the good of the world (so they thought). We have intervened in nations and caused massive destruction and death, even though we meant well. Many non-Americans have a deep suspicion of our nation because of the dark shadow that it casts in certain circumstances.

There are bigger questions about power, morality, and individual responsibility and culpability that I wish we’d address, rather than focusing on a single researcher. Especially when I don’t think Kidd’s work was nearly as necessary and essential as the media portrays it.


The phylogenetic trees falling on the tundra

A massive new ancient DNA preprint just dropped, The population history of northeastern Siberia since the Pleistocene:

…Here, we report 34 ancient genome sequences, including two from fragmented milk teeth found at the ~31.6 thousand-year-old (kya) Yana RHS site, the earliest and northernmost Pleistocene human remains found. These genomes reveal complex patterns of past population admixture and replacement events throughout northeastern Siberia, with evidence for at least three large-scale human migrations into the region. The first inhabitants, a previously unknown population of “Ancient North Siberians” (ANS), represented by Yana RHS, diverged ~38 kya from Western Eurasians, soon after the latter split from East Asians. Between 20 and 11 kya, the ANS population was largely replaced by peoples with ancestry from East Asia, giving rise to ancestral Native Americans and “Ancient Paleosiberians” (AP), represented by a 9.8 kya skeleton from Kolyma River. AP are closely related to the Siberian ancestors of Native Americans, and ancestral to contemporary communities such as Koryaks and Itelmen. Paleoclimatic modelling shows evidence for a refuge during the last glacial maximum (LGM) in southeastern Beringia, suggesting Beringia as a possible location for the admixture forming both ancestral Native Americans and AP. Between 11 and 4 kya, AP were in turn largely replaced by another group of peoples with ancestry from East Asia, the “Neosiberians” from which many contemporary Siberians derive. We detect additional gene flow events in both directions across the Bering Strait during this time, influencing the genetic composition of Inuit, as well as Na Dene-speaking Northern Native Americans, whose Siberian-related ancestry components is closely related to AP. Our analyses reveal that the population history of northeastern Siberia was highly dynamic, starting in the Late Pleistocene and continuing well into the Late Holocene. The pattern observed in northeastern Siberia, with earlier, once widespread populations being replaced by distinct peoples, seems to have taken place across northern Eurasia, as far west as Scandinavia.

The preprint is very interesting and thorough, and the supplements are well over 100 pages. I read the genetics and linguistics portions. They make for some deep reading, and I really regret making fun of Iosif Lazaridis’ fondness for acronyms now.

I will make some cursory and general observations. First, the authors got really high coverage (so high quality) genomes from the Yana RS site. Notice that they’re doing more data-intense analytic methods. Second, they did not find any population with the affinities to Australo-Melanesian that several research groups have found among some Amazonians. Likely they are hiding somewhere…but the ancient DNA sampling is getting pretty good. We’re missing something. Third, I am not sure what to think about the very rapid bifurcation of lineages we’re seeing around ~40,000 years ago.

The ANS population, ancestral by and large to ANE, seems to be about ~75% West Eurasian (without much Basal Eurasian) and ~25% East Eurasian. Or at least that’s one model. Did they then absorb other peoples? Or, was there an ancient population structure in the primal ur-human horde pushing out of the Near East? That is, are the “West Eurasians” and “East Eurasians” simply the descendants of original human tribes venturing out of Africa ~50,000 years ago? Also, rather than discrete West Eurasian and East Eurasian components, perhaps there was a genetic cline where the proto-ANS occupied a position closer to the former, as opposed to some later pulse admixture?

Without more ancient DNA we probably won’t be able to resolve the various alternative models.


Chinese and Indian American population genetic structure

In Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past David Reich makes the observation that India is a nation of many different ethnicities, while China is dominated by a single ethnicity, the Han. This is obviously true, more or less. Even today the vast majority of Indians seem to be marrying with their own communities, jati.

Over the years I’ve collected many different genotypes of Americans of various origins who have purchased personal genomics kits, and given me their raw results. I decided to go through my collection and strip detailed ethnic labels and simply group together all those individuals from India, and China, who have had their genotypes done from one of the major services.

I suspect that these individuals are representative of “Indian Americans” and “Chinese Americans.” So what’s their genetic structure?

Read More


Nomads, cosmopolitan predators, and peasants, xenophobic producers

Ten years ago when I read Peter Heather’s Empires and Barbarians, its thesis that the migrations and conquests of the post-Roman period were at least in part folk wanderings, where men, women, and children swarmed into the collapsing Empire en masse, was somewhat edgy. Today Heather’s model has to a large extent been validated. The recent paper on the Lombard migration, the discovery that the Lombards were indeed by and large genetically coherent as a transplanted German tribe in Pannonia and later northern Italy, confirms the older views which Heather attempted to resurrect. Additionally, the Lombards also seem to have been defined by a dominant group of elite male lineages.

Why is this even surprising? Because to a great extent, the ethnic and tribal character of the post-Roman power transfer between Late Antique elites and the newcomers was diminished and dismissed for decades. I can still remember the moment in 2010 when I was browsing books on Late Antiquity at Foyles in London and opened a page on a monograph devoted to the society of the Vandal kingdom in North Africa. The author explained that though the Vandals were defined by a particular set of cultural codes and mores, they were to a great extent an ad hoc group of mercenaries and refugees, whose ethnic identity emerged de novo on the post-Roman landscape.

In the next few years, we will probably get Vandal DNA from North Africa. I predict that they will be notably German (though with admixture, especially as time progresses). Additionally, I predict most of the males will be haplogroup R1b or I1. But the Vandal kingdom was actually one where there was a secondary group of barbarians: the Alans. It was Regnum Vandalorum et Alanorum. I predict that Alan males will be R1a. In particular, R1a1a-z93.

But this post is not about the post-Roman world. Rather, it’s about the Inner Asian forest steppe. The sea of grass, stretching from the Altai to the Carpathians. A new paper in Science adds more samples to the story of the Srubna, Cimmerians, Scythians, and Sarmatians. Ancient genomes suggest the eastern Pontic-Caspian steppe as the source of western Iron Age nomads. The abstract is weirdly nonspecific, though accurate:

For millennia, the Pontic-Caspian steppe was a connector between the Eurasian steppe and Europe. In this scene, multidirectional and sequential movements of different populations may have occurred, including those of the Eurasian steppe nomads. We sequenced 35 genomes (low to medium coverage) of Bronze Age individuals (Srubnaya-Alakulskaya) and Iron Age nomads (Cimmerians, Scythians, and Sarmatians) that represent four distinct cultural entities corresponding to the chronological sequence of cultural complexes in the region. Our results suggest that, despite genetic links among these peoples, no group can be considered a direct ancestor of the subsequent group. The nomadic populations were heterogeneous and carried genetic affinities with populations from several other regions including the Far East and the southern Urals. We found evidence of a stable shared genetic signature, making the eastern Pontic-Caspian steppe a likely source of western nomadic groups.

The German groups which invaded the Western Roman Empire were agropastoralists. That is, they were slash and burn farmers who raised livestock. Though they were mobile, they were not nomads of the open steppe. Man for man the Germans of Late Antiquity had more skills applicable to the military life than the Roman peasant. This explains in part their representation in the Roman armed forces in large numbers starting in the 3rd century. But the people of the steppe, pure nomads, were even more fearsome. Ask the Goths about the Huns.

Whole German tribes, like the Cimbri, might coordinate for a singular migration for new territory, but for the exclusive pastoralist, their whole existence was migration. Groups such as the Goths and Vandals might settle down, and become primary producers again, but pure pastoralists probably required some natural level of predation and extortion upon settled peoples to obtain a lifestyle beyond marginal subsistence. Which is to say that some of the characterizations of Late Antique barbarians as ad hoc configurations might apply more to steppe hordes.

There has been enough work on these populations over the past few years to admit that various groups have different genetic characteristics, indicative of a somewhat delimited breeding population. But, invariably there are outliers here and there, and indications of periodic reversals of migration and interactions with populations from other parts of Eurasia.

Earlier I noted that Heather seems to have been correct that the barbarian invasions of the Roman Empire were events that involved the migration of women and children, as well as men. The steppe was probably a bit different. Here are the Y and mtDNA results for males from these data that are new to this paper:

Read More


How related should you expect relatives to be?

Like many Americans in the year 2018 I’ve got a whole pedigree plugged into personal genomic services. I’m talking from grandchild to grandparent to great-aunt/uncles. A non-trivial pedigree. So we as a family look closely at these patterns, and we’re not surprised at this point to see really high correlations in some cases compared to what you’d expect (or low).

This means that you can see empirically the variation between relatives of the same nominal degree of separation from a person of interest. For example, each of my children’s’ grandparents contributes 25% of their autosomal genome without any prior information. But I actually know the variation of contribution empirically. For example, my father is enriched in my daughter. My mother is my sons.

The sample principle applies to siblings. Though they should be 50% related on their autosomal genome, it turns out there is variation. I’ve seen some papers large data sets (e.g., 20,000 sibling pairs) which gives a standard deviation of 3.7% in relatedness. But what about other degrees of relation?

Read More