Pop gen; more robust than you’d think

41Y1PqrWh5L._SX392_BO1,204,203,200_One of the interesting things about genetics, and population genetics even more specifically, is how the theory and analysis outran the biophysical mechanism of the phenomenon. By this, I mean that the Mendelian laws inferred from transmission of physical characteristics predate any understanding about how genes were embedded within chromosomes, let alone the structural nature of DNA.

Population genetics, which fused the quantitative evolutionary thinking of the biometrical school with Mendelism, arguably outran the data by decades. Until the molecular evolution revolution of the 1960s controversies such as the role of selection and drift in shaping variation were rhetoric rich and data poor. Though the allozyme era was clarifying, I do think people who were shaped by that era get a bit fixated on being a particular camp. In contrast, with the genomics revolution many researchers seem to be more willing to let the data speak, because the data is so copious. A model that is relevant in one part of the tree of life may not be as predictive in another portion of it.

The rise of data makes old questions live again. With that, I present a paper in PNAS where the first author is Jonathan Wakely, a pioneer of coalescent theory, Effects of the population pedigree on genetic signatures of historical demographic events:

Genetic variation among loci in the genomes of diploid biparental organisms is the result of mutation and genetic transmission through the genealogy, or population pedigree, of the species. We explore the consequences of this for patterns of variation at unlinked loci for two kinds of demographic events: the occurrence of a very large family or a strong selective sweep that occurred in the recent past. The results indicate that only rather extreme versions of such events can be expected to structure population pedigrees in such a way that unlinked loci will show deviations from the standard predictions of population genetics, which average over population pedigrees. The results also suggest that large samples of individuals and loci increase the chance of picking up signatures of these events, and that very large families may have a unique signature in terms of sample distributions of mutant alleles.

The paper is open access, so read the whole thing. The major math is tucked away in the extended material. Many of the formalisms in the text are those you’d regularly encounter in population genetics. The issue they’re addressing here is the fact that real populations exhibit pedigree structure, and even unlinked loci, which we treat as independent evolutionary histories, share a pedigree history.

If you read the text though it is notable how robust standard population genetic inferences are to the fact that in a literal sense they’re based on false assumptions. Massive demographic expansion (e.g., Genghis Khan haplotype) and unrealistic selection coefficients don’t seem to disturb the lineages enough so that the assumption of independent assortment starts to become misleading.

This shouldn’t be entirely surprising. I would argue that genomics has not really revolutionized evolution or population biology. The big frameworks are vindicated because nature is one, and the glimmers of reality you see in sparse data nevertheless sample from a comprehensible underlying distribution. As we get more data we’re getting more clarity, but the overall picture is not shocking or surprising.

Citation: John Wakeley, Léandra King, and Peter R. Wilton, Effects of the population pedigree on genetic signatures of historical demographic events

Selection on recessive traits in inbred populations

51zeajUmWhL._SX316_BO1,204,203,200_Reading The Essential Talmud about ten years ago I vaguely recall the author stating that it was common for working class males to devote each day to one page of one a tractate from the commentaries on the oral law of the Jewish religion. As I am not religious, and look dimly on excessive orthopraxy, it struck me as a depressing thought.

But I am not entirely different. I often will relax at some point in the day and open up a random page of a population genetics textbook. Just as those Jewish men attempted to gain insight into the divine intent for how they should live their life, so with population genetics I am attempting to refine the theory which allows me to interpret the world around me.

It would probably help anyone who reads many of my posts as well, as it develops particular habits of mind. Though I often recommend Principles of Population Genetics, Elements of Evolutionary Genetics is also excellent. So in the future I’ll try to write up short insights which are pretty banal to most population geneticists, but which might be interesting to a motivated public, if my modest readership can be considered the “public.”

Page 100 has a section, “Selection in inbreeding populations.” The most important formal relationship on this page is:

Δq ≈  qs[h(1 –f) + f]

q = minor allele frequency on a biallelic locus, that is, the remainder from 1 – p

h = dominane coefficient , so that h = 0 means q is totally recessive and h = 0.5 means that the locus is additive in regards to allelic effect.

f = inbreeding coefficient, a basic measure of two alleles at the same locus sharing recent common ancestry (and therefore, rendering the genotype likely homozygous). From 0 to 1, with 1 meaning totally inbred and homozygous.

s = selection coefficient against the population mean fitness. Usually the value is near zero, though not exactly zero. A positive selection coefficient of 0.01 is considered very favorable for a new mutant.

What you see here is that in an instance where q is entirely recessive, inbreeding increases the selection on the locus. In a normal population with lots of random mating homozygous recessive genotypes are rare. When f  ≈ 0 the change in the frequency of is just a function of the selection coefficient and the dominance.  As inbreeding increases, the importance of alleles (or lack thereof) in heterozygote genotypes decreases. For recessive traits inbreeding is another way to expose the novel alleles to selection.

This is one reason that unscrupulous breeders of animals sometimes utilize very close relatives in programs to change traits. The problem is that inbreeding has an effect across the whole genome, even if you are interested in particular loci. And that effect on the whole genome is often very bad, as lots of deleterious alleles with recessive expression are present in populations which are normally outbred. Of course in plants this also results in purging of genetic load, as alleles get flushed out of the system. Unfortunately for mammals, and complex metazoans in general, this doesn’t seem to work to well for out lineage. If it did work well zoological veterinarians, who I’ve talked to, would be a lot more hopeful about what they’re trying to do by mating near relations in the hopes that they can get a large enough population to maintain a viable breeding program.

Mutation, a fundamental evolutionary genetic parameter

The mutation rate in human evolution and demographic inference:

The germline mutation rate has long been a major source of uncertainty in human evolutionary and demographic analyses based on genetic data, but estimates have improved substantially in recent years. I discuss our current knowledge of the mutation rate in humans and the underlying biological factors affecting it, which include generation time, parental age and other developmental and reproductive timescales. There is good evidence for a slowdown in mean mutation rate during great ape evolution, but not for a more recent change within the timescale of human genetic diversity. Hence, pending evidence to the contrary, it is reasonable to use a present-day rate of approximately 0.5 x 10−9 bp−1 yr−1 in all human or hominin demographic analyses.

Even since this review came out there has been new work. Fast changing.

The origins of Ashkenazi Jews near resolution

Screenshot 2016-07-19 23.03.50

The Time and Place of European Admixture in the Ashkenazi Jewish History:

The Ashkenazi Jewish (AJ) population is important in medical genetics due to its high rate of Mendelian disorders and other unique genetic characteristics. Ashkenazi Jews have appeared in Europe in the 10th century, and their ancestry is thought to involve an admixture of European (EU) and Middle-Eastern (ME) groups. However, both the time and place of admixture in Europe are obscure and subject to intense debate. Here, we attempt to characterize the Ashkenazi admixture history using a large Ashkenazi sample and careful application of new and existing methods. Our main approach is based on local ancestry inference, assigning each Ashkenazi genomic segment as EU or ME, and comparing allele frequencies across EU segments to those of different EU populations. The contribution of each EU source was also evaluated using GLOBETROTTER and analysis of IBD sharing. The time of admixture was inferred using multiple tools, relying on statistics such as the distributions of segment lengths and the total EU ancestry per chromosome and the correlation of ancestries along the chromosome. Our simulations demonstrated that distinguishing EU vs ME ancestry is subject to considerable noise at the single segment level, but nevertheless, conclusions could be drawn based on chromosome-wide statistics. The predominant source of EU ancestry in AJ was found to be Southern European (≈60-80%), with the rest being likely Eastern European. The inferred admixture time was ≈35 generations ago, but multiple lines of evidence suggests that it represents an average over two or more admixture events, pre- and post-dating the founder event experienced by AJ in late medieval times. The time of the pre-bottleneck admixture event was bounded to 25-55 generations ago.

I think this preprint is coming close to the answer. Why does a small ethno-religious minority in Europe matter? Well, that’s a matter of historical contingency.

In any case, there were some good papers on Ashkenazi Jewish genetics which came out in the spring of 2010. They really moved the ball forward from the uniparental work. But they suffered from two major problems. First, the putative “parent” populations of Ashkenazi Jews are not that genetically distinct. Second, the hypothesized parental populations were often implausible; e.g., Northern Europeans and modern Levantines.

The likely parental populations of Ashkenazi Jews are Roman period peoples of the eastern Mediterranean, particularly the swath of territory from Alexandria up to Anatolia, and, the peoples of the western Mediterranean. That is, Levantines and Iberians & Italians. These two groups are distinct, but they’re not that distinct.

Additionally, the more and more we learn about the Middle East, the more likely it seems that Muslim populations, who are often modeled as a parental group, are highly cosmopolitan compared to ancient groups. Recall that Neolithic farmers from the Levant resemble Sardinians more than they do locals, because of later migration from further east in Eurasia, as well as later African gene flow. Using imperfect reference populations will probably skew the results appropriately.

The major change in the past few years is the usage of more genetic information than common genotypes. This paper for example looks at haplotype information. Sequences of variants across the genome. This preserves more recent genetic variation. In other cases you can look at whole genome sequences, and focus on low frequency variants which are extremely informative of recent population differentiation.

Ultimately the only reason I’d suggest that this paper is lacking is the imperfection of Middle Eastern source populations. That’s probably increasing the European and decreasing the Middle Eastern fraction somewhat on the margins. The contemporary populations of the Near East have changed a fair amount over the past 2,000 years, though there is still some continuity.

Open Thread, 7/18/2016


Been busy with work. Lots of data coming in. Will be good to turn around some science.

But I’m eating OK. Location matters….

Here’s a FB post from a researcher on Eran Elhaik’s weird results which regularly make press. I’ve started ignoring Elhaik’s stuff because it’s also so crazy.

I’ll try to monitor the open thread better this week and respond to questions.

The world after the great mixing

In my free moments I have been reading R. Scott Bakker’s The Great Ordeal, as I needed to take a break from Congo: The Epic History of a People (I stopped before the Great War). As you might guess the latter is not a ‘feel-good’ work. And to be frank, The Great Ordeal is probably not the best choice to lighten the mood as a change of pace. It is one of the darkest and philosophically textured examples of the fantasy genre I’ve ever encountered, but that’s not surprising given Bakker’s previous works, and his background as an academic philosopher. Though the series does not indulge in as much graphic and visually rich descriptions of death and gore as George R. R. Martin’s A Song of Ice and Fire, it’s more deeply haunting and horrible. If Martin deals in shades of gray, from the honorable lightness of Jon Snow to the black depravity of Ramsay Bolton, Bakker’s characters seem to be swallowed by a blankness of color. Amorality rather than immorality.

Martin is a master of creating vivid characters with deep color who operate in a world of frenetic and engaging activity (at least up until the third book, when the plot was relatively fast). In contrast, Bakker’s plotting and characterization are both inferior, but that is in part because he gives more space over to a broader philosophical and moral framework, which hangs heavily over the whole narrative. Golgotterath and the Inchoroi are more memorable to me, alive in my imagination, than assorted protagonists swept up along the tides of history over the course of Bakker’s five books so far.

Where R. Scott Bakker excels, and where he rivals Tolkien in my opinion, is world building on a cosmic scale, complete with a well thought out mythos for humanity in his Secondary World. Bakker’s vision exhibits a great deal of verisimilitude, traversing humanity’s Bronze Age to the medieval period in ~4,000 years. The main actors within the narrative action are people from three of the races of men, of whom there are five total, and whose history goes back to an event termed the Breaking of the Gates, as humanity streamed into the western portion of the continent on which they reside, and engaged in a campaign of genocide against the Nonmen and their human servile caste, the Emwama.

Why am I regaling you with the narrative of a fantasy book series? Because the recent results out of ancient DNA and historical genetic inference of human prehistory suggest that the ‘make-believe’ narratives of epic fantasy may actually be an appropriate model of the formation of human populations in the wake of the Holocene. A friend of mine half-seriously quipped that the last 200,000 years of human history are a matter of collapsing ancient population structure. In fantasy novels often main characters themselves are exemplars of such broken population structure; the ‘half-blood’ trope as it were.

As a primal and backward looking genre fantasy dispenses with the need for a liberal individualist ethical framework, as historical relativism allows us to “put ourselves in the place” of protagonists whose motives and concerns are profoundly alien to moderns, albeit often with a sympathetic and contemporary twist. Jon Snow’s life to a great extent is motivated by his need to prove himself despite his bastardy. The specific motivation here would be hard to understand today, as legitimacy is not legally or normatively privileged as it has been historically, but the general need to find a place for yourself is one we can empathize with. Snow’s situation within a world of great noble houses and warring polities divided by region and language is one which most moderns are not comfortable with, but he is no revolutionary who yearns to overthrow the old regime. On the contrary, he is likely to play a large role in its maintenance and perpetuation.

Sargon_of_AkkadThe meteoric rise of individuals from a humble station in the context of a static and hierarchical world are not aberrations on a world-historical scale. Sargon of Akkad, the first recorded emperor, whose dominion spanned multiple polities, was from a humble background. Gilgamesh, the scion of a noble family may be semi-mythical, but Sargon was a real person. On the edge of history, but a real person. In a world of corporate entities, defined by group identity, affinity, and affiliation, his success occurred though co-option of a system of city-states with roots over 1,000 years old at that point.

Sargon’s world is one whose outlines we are only vaguely aware of. There are many lacunae, not least of which the origins of the Sumerian people, who served to Sargon’s Akkadians the role of cultural progenitors. A linguistic isolate, the origin of the Sumerians is an unresolved mystery to this day. The end of the Sumerian cultural hegemony occurred in part due to the depredations of the Gutians, people from the hills of what is today Kurdistan, and rivalry with the people of Elam, from modern day Khuzistan.
Elam-mapThe linguistic affinities of the Gutians are unknown, while the Elamites, like the Sumerians, seem to be part of a linguistic isolate.

Much of this ignorance has to do with the importance of literacy in history. What we know about Elam is often through a Mesopotamian lens. The people of Sumer and Akkad, and later Babylonia and Assyria, saw Elam as the great enemy, the Persia to their Rome. The Gutians were a coalition of tribes from the mountainous areas to the east of Mesopotamia, and so had no real indigenous literate tradition. They do not even seem to have a distinctive enough archaeological tradition to trace their migrations.

F4.largeWithout text and material where does that leave us? Obviously we have a new method: ancient DNA. With this method one can infer demographic change by looking at patterns of genetic variation. The genetic relationship of various peoples who are “mysterious” to us today with modern populations will give us great insight. I predict that when the first results come back from Elamite Iran there will be a strong affinity to peoples in southern Pakistan, especially the Baloch and Brahui, as well as connections to India more broadly, above and beyond the expected local continuity.

Last week Science published a new paper on ancient Iranian genomes, from a period thousands of years before what I discussed above, Early Neolithic genomes from the eastern Fertile Crescent. It’s open access, so you can read it yourself, and I encourage you to do.

What makes this paper different from what has come before? Two things. The first is minor: better sampling. In particular, they have better regional sampling. For example, Iranian Zoroastrians (the link has plink format files). Second, and more important, they have at least one sample at 10x or more coverage. This means they can use haplotype based methods and make better calls on genotypes. It’s much more extensive in the supplements, but the authors discuss the functional characteristics of these populations more than in the earlier papers because of access to higher quality whole genome data. You need to be more confident at a specific locus when inferring function from that locus, than you need to be across the whole genome.

The phylogenetic portion reinforces what the earlier work argues: there were two great tribes of founding farmers who brought agriculture to North Africa, and Western & Southern Eurasia. Though the “cradles of civilization” were often in riverine landscapes, the agricultural revolution began in the Near East in the uplands, which would later become backwaters. Only here could primitive dryland agriculture take root in the desiccated landscape. This was the “Breaking of the Gates”.

There were, it seems, two major phases. The first phase was expansionary. The western farmers pushed outward to Europe and North Africa. The eastern farmers pushed toward South Asia and Central Asia. But look at the position of Iranians in the PCA, and the affinities within Iran. Modern Iranians are much more west shifted than you might expect from perfect continuity. Additionally, the haplotype affinities of populations to western vs. eastern farmers shows that Iranians today have much more affinity to western farmers than Iranian speaking people from Pakistan, especially the Baloch and Makrani in the southwest of the country. This is because there was a second phase: the great scrambling, when reflux from the west into Iran, and vice versa, erased the great division.

In the initial expansionary phased a stylized model was probably as good as any model. The world was dominated by hunter-gatherers, whose social-political ability to scale and organize was minimal. The farming populations probably began to organize chiefdoms rather early, and the spread of their lifestyle was to some extent at the tip of the spear. The hunter-gatherers fled, or were rapidly assimilated as subordinates, losing their cultural distinctiveness. But the next stage after the chiefdoms were more complex arrangements, which might transcend tribal loyalties, especially when one’s tribe spanned a continent.

A close look at the map shows that the Baloch and Sardinians have more affinity with these two ancient peoples than many of the groups which today occupy the Middle East. Why? Mostly because they are distinctive in being less subject to the reflux migrations in the wake of the Neolithic. And, if you look at Europe and South Asia, you can see that Indo-Europeans also left a stamp on these areas, by mediating gene flow from these tribes into areas where the other tradition had been dominant. Northern Europe is less biased toward western farmers than Southern Europe. Within South Asia, the most skewed bias toward eastern farmers are the Baloch, who happen to co-inhabit territory with a non-Indo-European speaking population, the Brahui. These Dravidian speakers are basically indistinguishable from the Baloch. Among the other groups, the Vishwabrahmin are biased toward eastern farmers. In contrast, the Tiwari, North Indian Brahmins, are more balanced. I believe this is because the Indo-Aryans brought western farmer ancestry with them from the steppe.

Rather than talking about the phylogenetic aspects anymore, I want to move to the functional considerations. It seems that the ancient eastern farmers did not have many of the adaptations that we associate with farmers. This is entirely logical. Much of our genetic character is the product of cultural changes, rather than cultural changes being the product of our genetic character. The null hypothesis should be that hunter-gatherers who had just taken to farming are basically like hunter-gatherers who adapted a new lifestyle.

But there are some intriguing elements of the pigmentation genetics, a topic I know a fair amount about. The results from this paper show that the derived variant of SLC24A5, the largest effect pigmentation allele we know of, was segregating in these farmers. This is not surprising. It was segregating in western farmers at high frequency as well. Among Caucasian hunter-gatherers, and even among hunter-gatherers from Mesolithic Sweden. It was, though, not so much found among Western European hunter-gatherers. It is totally fixed in Europe today in the derived variant. Curiously, the authors mention that SLC45A2, another skin-lightening derived allele, which is much more concentrated in Europe, has been found segregating in Neolithic Aegeans. So it may be that the two major skin-lightening alleles were introduced by western and eastern farmers. Finally, the allele known to produce blue eyes in Europeans, found in high frequencies in Mesolithic European hunter-gatherers, was also found segregating in WC1. WC1 is the highest quality genome in their ancient data, so this seems a likely inference.

What this tells us I think is that skin-lightening alleles have been segregating at appreciable frequencies for long time. They have a deep history. Periodically, a particular haplotype gets targeted for selection, and a sweep occurs. Personally, I am more and more leaning to the hypothesis that a diversity of functions and characteristics are the targets of this selection, with the phenotype often being a side effect. What is even more intriguing to me is that the peoples as distinct as Sardinians and Baloch don’t actually look that different physically. The great reflux even affected them, and with it perhaps came alleles which were selected upon and produced a relatively uniform phenotype from the Atlantic to the Indus?

Much of the prior understanding of history and prehistory has been driven by a banal and workaday conception of progress and change. Proponents of demic diffusion imagined stateless villagers pushing outward. Diffusionists assumed that techniques and material would flow along trade routes. There were no great disruptions, rather, there were evolutions and continuities.

That is not what ancient DNA tells us. In another context I’ve mentioned that ISIS is appealing to some because of its “heroic” narrative. Similarly, the origins of modern humanity may be much more heroic than we’d have thought. We the descendants of humans who crossed in Australia. The descendants of humans who finally made it to the New World. Would it be any surprise that nearer prehistory was as ground-breaking and tumultuous?

Open Thread, 7/10/2016

The_Great_OrdealI’ve been in upstate New York, working this week. So busy. Should take a break to crank out some blog posts. In particular, probably “How to read an admixture estimate”, since even after so many years readers are confused….

While I’ve been holed away, Pokemon Go happened. What?

Great Ordeal, the third book in R. Scott Bakker’s Aspect Emperor series is going to come out in two days. #Excited

Though who knows when I will have time to read it?

The Great Human Disruptions

Screenshot 2016-07-04 23.33.00

One can appreciate a work of art on two levels. When one beholds the sculpted renderings of the Classical Greeks, across the distance of more than 2,000 years we can feel viscerally that they have touched something beautiful, and made it stone. To reduce this to biology, our perception maps onto to deep grooves in our evolutionary landscape of aesthetic judgments. As a savanna ape the darkness of the forest haunts us with its beauty and majesty; but we are the children of the meadows and edges of the Paleolithic pastoral. Similarly, on some level we acknowledge physical beauty when we see it, before we even think it.*

Another level of appreciation is narrower, and that is one where you have awareness of the ingenuity of technique, the deep virtuosity and fluency of execution. This aspect of understanding aesthetics is naturally delimited to those with equivalent skills, or whose skills aspire toward the plane of the masters.

Reading Iosif Lazaridis’ The genetic structure of the world’s first farmers you can evaluate on both levels. The results are broadly accessible, but the depth of the analysis is clear to anyone who has ever attempted something analogous. These papers coming out of David Reich’s lab have a certain template, but they are definitely not paint-by-numbers. For those who are interested in technical details, you have to read the supplements.

Ten years ago the insights gleaned from this preprint were only glimmers in the eyes of assorted researchers and “genome bloggers.” The problem now is one of going from the raw result, back to the dynamics which produced the result. A deep problem of inference.

To get to where we are now, and the embarrassment of copious conclusions, researchers needed three things:

1) Lots of genetic data, and methods designed to leverage that data (basically, genomics, and the statistical genetics geared toward analyzing large data sets).

2) Genetic data from time points in the human past, and not just present.

3) The technological infrastructure necessary to handle the data (from computational power to the arcane arts of the ancient DNA lab).

What have we learned? Ancient DNA has revealed that genetic variation in the human past has been characterized by very strong discontinuities, both over time and space. What do I mean by this?

As a stylized fact it has been fashionable in some quarters to describe human variation as being overwhelmingly clinal. That is, a continuous change in gene frequencies as a function of space. One associated fact has been the expectation that gene frequencies will change over time in a similar steady and regular fashion.

Obviously there is some truth to the clinal variation in our species. If, for example, you walked form France to the Punjab, it would be difficult to establish a hard-and-fast line where there was a definite discontinuity in genes. But there could be candidates. In particular, in Central Asia there would be regions where you would find rather high frequencies of alleles more typical of East Asia, while in Afghanistan the genetic signatures of non-West Eurasian peoples of a different sort, typically found in South Asia, would start to crop up.

But these two points of discontinuity illustrate the general principle that discontinuity emerges from specific historical-demographic events. In the case of the rather high fraction of East Asian associated genes in Central Asia, this is almost certainly a product of the Turkic expansion, which occurred in starts and fits over the ~1,000 years between 500 and 1500. In South Asia, we now suspect that there was a relatively recent intrusion of West Eurasian populations, and likely some reciprocal gene flow between indigenous groups and the incomers.

These two instances point out that major disruptions in gene flow are likely correlated with major cultural disruptions. The Turkic expansion occurred in historical time, so we can inspect it and note that the decline of Iranian populations within Central Asia began during the late Sassanian period, but came to near completion with the major shocks of the Mongol period 700 years later. These were events of geopolitical note.

This is important to consider, because the older models which posit clinal variation assume that genetic change occurs through a ‘mass action’ process, whereby small family or village groups enter into a phase of demographic expansion, and literally outbreed others. This was to some extent the model implicit in the ‘demic diffusion’ theories of the expansion of the Neolithic lifestyle into Europe from the Near East, pioneered by Colin Renfew, and extended by L. L. Cavalli-Sforza and colleagues.

In a classical economic framework one can simply assume that those who practice the farming lifestyle will be in a state of land surplus on the frontier. Therefore, they will have large families, and keep expanding their range. In such a fashion individual decisions of Homo economicus can drive cultural and demographic change over large regions in relatively short time periods.


The decisions of the many in an uncoordinated fashion can lead to the ordered patterns we see around us, with clines of variation, as well as signals of genetic expansion. As L. L. Cavalli-Sforza noted the argument here is not that most of the ancestry of modern Europeans is exogenous to the continent when using Pleistocene groups as the indigenous reference, but that the demographic wave of advance is responsible for agriculture, not cultural emulation. Even with this wave of advance model, which has been widely explored in population genetics, assimilation of native groups on the frontier means that most of the ancestry on the frontier by the end of the process could be “indigenous.”

Cavalli-Sforza’s assertions came in the wake of a series of results in the early 2000s which were interpreted to suggest that most of the ancestry of modern Europeans derives from populations resident during the Pleistocene. These results were taken to suggest that agriculture must have then spread by cultural diffusion, not demographic expansion. All Cavalli-Sforza was pointing out was that the model he was supporting was about a dynamic process, not some specific value of haplotype counting by region.

Ultimately this rearguard apologia was not necessary. It turns out that a majority of the ancestry of modern Europeans is likely exogenous to the continent over the last ~10,000 years. The earlier results which were used to support the converse were right in their results, but were misinterpreted. Additionally, I also think that the model outlined by Cavalli-Sforza and his colleagues is in some ways too elegant and stylized to be useful. If you read The War Before Civilization there are plenty of archaeological hints that there were massive inter-group conflicts during prehistory, and the arrival of farmers to the continent probably exhibited some coordination and collective action beyond the village. The 3,200 year old battle on the Baltic is probably the continuation of a long tradition in Europe, and the world, of collective action and conflict.

This is a “problem” because inter-group conflicts on a geopolitical scale are not as tractable in terms of a general model as a “wave of advance” demographic scenario where endogenous growth parameters rule supreme. Rather, demographic patterns are not due to continuous predictable dynamics, but the intersection of such parameters and contingent events. History has no guarantees, though its wheels tend toward certain favored grooves.

Twenty years ago L. L. Cavalli-Sforza wrote a book geared toward the lay audience, Great Human Diasporas. The culmination of a lifetime’s work, it surveyed what we then knew about human genetic variation with classical markers derived from contemporary populations. The tools we have today are far more precise, with hundreds of thousands of markers rather than hundreds, and DNA samples from populations thousands or tens of thousands of years in the past. Instead of simply inferring the tree of life, researchers are now constructing a lattice of relationships derived not only from the nodes visible today, but also positions within the lattice from the deep past.

The evidence which is coming back is that pre-modern populations exhibited a great deal of genetic differentiation over even small distances, and, that differentiation could persist for thousands of years. Between group proportions of variation on the order of 10% of the total variance, what you see between Europeans and Han Chinese, were not atypical for nearby peoples, even though one migrant between them per generation would have eliminated that difference in short order. This equilibrium of difference would eventually get disrupted by radical demographic turnover, as location populations went extinct or were absorbed by newcomers, who reshaped whole landscapes through their expansions. In other words, if Cavalli-Sforza were to write a book today I believe it would be titled “Great Human Disruptions and their Diasporas.”

And this isn’t just about agriculture. Ancient DNA from Pleistocene Europe indicates turnover there too. There may be meta-population dynamics which are at work on the edge of the modern human range in Eurasia. As local populations go extinct, new populations expand to occupy their territory. The ancient human landscape may have been relatively sparsely populated, diminishing opportunities for gene flow.

But this is likely not the whole story. Inter-group conflict certainly played a role, and ancient DNA has uncovered evidence of long periods of genetic distinctiveness between neighboring populations. This suggests cultural practices serving as a barrier to gene flow. We do have one case where this occurs today: India. The caste system is such that continental wide genetic distances can be found within local populations in the same region, which have coexisted for thousands of years.

So what are the results of the the Lazaridis’ paper? The figure at the top gives you a PCA-centric view. Basically, all West Eurasian populations today can be modeled to a first approximation as a mixture of four ancestral groups which flourished on the order of ~10,000 years ago. If modern genetic variation can be conceived of as an algebra, then for West Eurasia these are the four variables with differential weights you need to produce any reasonable output.

The four are:

1) Western hunter-gatherers (WHG), the indigenous populations of Europe and surrounding areas.

2) Eastern hunter-gatherers (EHG), the indigenous populations of the the northeastern fringe of Europe.

3) Western farmers, the ancestors of Early European Farmers (EEF), with roots in the zone from the southern Levant north into Anatolia.

4) Eastern farmers, who are rooted populations which flourished in the Zagros mountains of western Iran (Central Asian Farmers, CAF).

These four themselves exhibit some compound ancestry. On the order of half the ancestry of EEF and and CAF was basal Eurasian (BEu), a population which seems to have diverged from other non-Sub-Saharan Africans more than 50,000 years ago, before Neanderthal admixture. To be clear, BEu seems to be an outgroup to populations as diverse as Pleistocene European hunter-gatherers, Australian indigenous groups, and Andaman Islanders. The other half of EEF and CAF ancestry derives from two distinct sources, which explain their different positions on the PCA plot. The EEF have a WHG-like admixture. That is, some of their ancestors are nested within the broader clade which includes European hunter-gatherers, and far more distantly the Ancestral North Eurasians (ANE). Work on Pleistocene genomics indicates that there was a major increase in affinity between European hunter-gatherers and Near Easterners ~15,000 years before the present, suggesting that there was major gene flow uniting these two regions. The Near Eastern element of this movement probably fused with BEu.

Second, the CAF population, which is known from far fewer samples, seems to have shared a lot of ancestry with EHG, so the two must have shared common ancestry from related groups. It seems that the mostly likely source of this was ANE. Due to the genetic distance between ANE and WHG, the Fst between EEF and CAF was on the order of ~0.10, similar to that between Chinese and Europeans today. These two groups seem to have stumbled upon agriculture very near to each other at similar times.

Where they independent events? I suspect that they weren’t. I’m not implying here cultural diffusion. There is evidence of independent domestication of landraces in the Zagros. Rather, these two populations were part of a broader network of trade connections within a similar ecological landscape. It was not coincidental that both stumbled upon agriculture. Likely there was diffusion between the two of similar cultural precursors to agriculture. Their location in such proximity can not be coincidence, though the details are to be worked out.

Interestingly once these two populations stumbled onto agriculture they expanded in opposite directions. Why? Probably because they could. That is, both of them had high population densities and social complexity, and rather than impinging upon each other’s territories they expanded into “empty” landscape. Regions inhabited by hunter-gatherers who were easier to eliminate or assimilate. The spread of Cardial and LBK people in Europe was so fast that it is almost certain that they were all one cultural unit initially. Something similar probably applies to the CAF groups which expanded east into South Asia, and north to the steppe.

Another intriguing result in this paper is that WHG themselves seem to have had admixture from eastern populations. More precisely, the Mesolithic hunter-gatherers used in earlier analyses as “pure” exemplars of WHG turn out not to be, but exhibit some admixture with other groups. This is probably why the ANE proportion of EHG is much higher in this paper. An older sample from Bichon in Switzerland lacks the eastern admixture, and so serves as a better reference for WHG. Though not definitive, it now looks as if ANE admixture into East Eurasians (e.g., Han Chinese) has resulted in some affinity between these populations and Europeans today, going back to, but not limited to, the WHG. This is no surprise. The emergence of agriculture is not singularly new, cultural innovation seems to trigger demographic disruptions, no matter the time or place.

Though the centerpiece of this preprint is the fact that four populations are sufficient to explain the genetic variation, and demographic history, of West Eurasian populations, I think perhaps a more interesting element is the role of ANE and BEu. Neither of these groups exist in “pure” form today. We don’t know who BEu were. We don’t know where they came from. To me it is suspicious that BEu ancestry exists in about the same fractions in both EEF (at least their precursors in the Middle East) and in CAF. It does not seem that the two BEu components were very differentiated. To me that indicates that BEu may have expanded relatively recently. I also believe that BEu may have a role mediating “back to Africa” gene flow. As BEu lacks Neanderthal admixture that would explain the very low levels in most of the continent, and yet the presence of what now look to be Eurasian origin E Y chromosomal haplotypes.

As for the ANE, their geographic coverage is incredible, from Western Europe all the way to the New World. It seems that as a unadmixed group they persisted into the Holocene, but in numbers they were always stretched thin. Through their amalgamation into agriculturalists they’ve persisted, but likely many of their Paleo-Siberian folkways diminished. I do believe though the R1 haplogroups on the Y chromosome likely derive from them, as it is a sister to the Amerindian Q.

There’s a lot in the paper to chew on, especially the supplements. For example, the percentages of “steppe” ancestry are non-trivial throughout South Asia. What to make of this? I think I’ll hold off until ancient DNA comes in, as it will in the next 6 months.

But, I do think ancient DNA and the model of disruptions and discontinuity supports the proposition that punctuated equilibrium as a thesis has much more validity for cultural evolution than it does for biology. Cultures exhibit inertia and a tendency toward conformity. Learning new things is difficult. Very special conditions must have existed for agriculture to “take” in the Near East, as hunter-gatherers shifted form facultative cultivation to obligate modes of production of crops. Once these cultures became farming cultures, it wasn’t easy for neighbors to easily adopt them, as cultural packages often come as a whole, with many contingent parts. The advantage of agriculture is that it extracts more yield from the ground, and population densities go up. Higher population densities means more resources in inter-group conflict, if it comes to that, and the need to expand to continue to race beyond the Malthusian limit. Once the space is occupied, a new equilibrium is reached.

And I want to reiterate, this model does not apply to just agriculture. A tweet from Spencer Wells:

New Guinea is a horticultural society, and the highlands are very densely populated. The high Fst is in line with what you see in early Holocene Western Eurasia, or in India today. But observe that the genetic differentiation is from the past 10,000 years, not the past 50,000. 10,000 years ago is when horticulture began. In all likelihood one population in the highlands began to practice this, and expanded demographically, eliminating or absorbing its neighbors. But the landscape of New Guinea sets tight limits to the range of possibilities, as the highlands of the island are a very isolated ecosystem. The genetic differentiation began once the expansion phase ceased, and groups began to struggle for existence at the Malthusian limit.

One of the insights of Lazaridis et al.’s paper is that this didn’t happen in Eurasia. The differences between EEF and CAF diminished, as the Near East saw reciprocal gene flow during the Bronze Age. The difference was not in agriculture, but post-agricultural social complexity, which allowed for the emergence of what Peter Turchin would term “meta-ethnic” identities, and complex institutions which transcend locality. In the new equilibrium state the Fst did not begin to go up as populations jostled for resources, as innovation began to gently push the production frontier outward, and foster connections of material (e.g., trade) and ideas (e.g., religion).

The whole story is not written in stone yet. The next few years are going to be interesting. China is the next frontier, and ancient DNA will open up its history soon enough. But it’s an exciting time to be witnessing the unveiling of prehistory before our eyes.

* There is a dimension of aesthetic judgement which is culturally conditional, and another which is not. I speak of the latter here.

Open Thread, 7/3/2016

The_Great_OrdealThe Great Ordeal, the third book in R. Scott Bakker’s Aspect Emperor series is going to come out in nine days. Bakker is apparently working on revisions to the fourth book, The Unholy Consult. So this series will complete (apparently Bakker’s original vision was for three related sequential series, so this would be the second of the three). There has been a large gap in time between the second and third book, but it seems mostly due to issues relating to publishing, and not the writer.

You may be interested in his blog, which has assorted links and comments. Also, there is a history of Earwa online which you might find interesting. The artwork is very evocative.

Bangladesh Attack Is New Evidence That ISIS Has Shifted Its Focus Beyond the Mideast. Apparently the attackers were educated at an English immersion school of some sort. Spoken Bengali, like British English, is very sharply differentiated by class. The menial workers at the bakery were apparently surprised at the obviously upper middle class speech patterns of the terrorists, as well as the fact that some of them spoke casually in English.

This is not surprising. Terrorist ideologues are often from privileged backgrounds. Marc Sageman has presented the most thorough ethnographies that I know of, and they’re rather clear. Generally middle to upper class, often with a technical background. Also a disproportionate number of converts. Many people with cosmopolitan backgrounds, in term of having exposure to other cultures and traveling.

As for why radicalism is cropping up in Bangladesh, it’s because they know that they’re in the middle of a culture war they might lose. There are atheists and gays in most countries, but in Bangladesh they are starting to be public. That’s a sign that the norms of conservative Islam are breaking down. Additionally, economic development and NGO influence are also integrating the nation into a web of international commerce.


Michael Cimino, Director of ‘The Deer Hunter’ and ‘Heaven’s Gate,’ Dies at 77.

Growing Pains for Field of Epigenetics as Some Call for Overhaul. The correction.

This robot-powered burger joint could put fast food workers out of a job.

3 reasons the American Revolution was a mistake.

Democratizing DNA Fingerprinting.

Multi-layered population structure in Island Southeast Asians.

10,000 genome at 30x & human variation

Screenshot 2016-07-02 22.20.21
Deep Sequencing of 10,000 Human Genomes:

We report on the sequencing of 10,545 human genomes at 30-40x coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single nucleotide variants in the coding and non-coding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries in average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.

The 30x means that they’re hitting each base on an average of 30 times, so they can be very confident of their call. This matters a lot for rare variants, as might be useful when it comes to idiopathic diseases. The 10,000 number is obviously to take it a step beyond the “1,000” genomes, which went well above 1,000 genomes in any case. But the coverage means that these are very confident calls for any given individual.

A distribution of variants shows that their panel of unrelated individuals (~8,000) yields ~150,000,000 single nucleotide variants (out of a genome of 3,000,000,000 bases). You see that half of these 150 million are found at counts of one across their whole sample set. In contrast, you have ~5 million variants present at allele frequencies of about 5% or more, and a bit more than ~10 million variants at 1% or more, and ~20 million variants at 0.1% or more. Remember that the 1000 Genomes paper reported that each individual within their data set have about ~5 million variants in comparison to the human reference genome.

I reiterate these dull numbers to give people a sense of what it means to have 100,000 to 1 million marker SNP-chips in humans. It is true that without imputation these chips aren’t capturing a lot of functional variants (though they’re typical designed to target a lot of the most important disease markers in particular). But when comes to capturing the shape of genetic variation they’re a very good sampling indeed. Consider, for example, the proportion and number of voters who are part of the sample for exit polls or pre-election surveys. For standard PCA or genotypic model based clustering (e.g., ADMIXTURE/STRUCTURE) anything more than 1 million markers is pretty useful from what I’ve seen, and the 100,000 to 500,000 interval is sufficient for pretty much everything. And haplotype based methods that generally use phasing, like fineSTRUCTURE, seem to do fine in the ~250,000 marker range.