Is American genetic diversity enough?


In the nearly 20 years since the draft of the human genome was complete,* we’ve moved on to bigger and better things. In particular, researchers are looking to diversify their panels of human genetic diversity, because of differences between groups matter. You can’t just substitute them for each other genetically.

There have been efforts to diversify the population panels recently, but that prompts the question whether American population coverage is sufficient. My first thought is that the genetic diversity in the USA is probably getting us 90% of the way there. Consider Spencer’s comment about Queens, it’s the most ethnically diverse large conurbation in the country.

There are some gaps though. In Who We Are David Reich points out the distinctiveness of Indian population genetics. The subcontinent has lots of large census populations which have drifted upward deleterious alleles due to long-term endogamy. And, many of these populations don’t have a strong representation in the Diaspora.

In contrast, much of the rest of the world is panmictic enough that an American panel can pick up most of the variation. American Chinese are skewed toward Guandong and Fujian, but a substantial number of people from other parts of China have arrived in the last generation. Regional structure is not so strong that you’ll miss out on too much, aside from very rare variants which are more extended pedigree scale rather than population scale.

There are small populations such as Hadza, Khoikhoi, and Pygmies in Africa which are probably going to be missed by American population panels, but the total census size of these groups is pretty low (for comparison, there are 1 million Pulayar Dalits in the state of Kerala alone). Much of the rest of Africa is West African variation well represented in African Americans, and Bantu and Nilotic variation probably captured my immigrant communities.

I’d propose supplementing American genetic diversity with sampling Cape Coloureds in South Africa.

* No discussions about how the genome isn’t totally complete. I know that.

Open Thread, 04/29/2018

One of the strange things about getting old is that your friends start to become kind of a big deal. Matthew Hahn has a new book out, Molecular Population Genetics. If there is one single reason I keep blogging, it’s to get awareness of the field of population genetics to spread beyond the small circle who are “in the know.” I joked on Twitter that buying this textbook is like spending money to talk to Matt about pop-gen, and that’s surely worth it.

Another one for the stack!

Speaking of worth it, Kyle Harper’s The Fate of Rome: Climate, Disease, and the End of an Empire is definitely worth a read. Not done, and I’m not sure it’s better than The Fall of Rome: And the End of Civilization. Perhaps my issue is that exogenous shocks are to be expected in my view of the world. Though the details in The Fate of Rome are novel, the general thesis and framework were what I’d assumed were taken for granted.

What Happens When Geneticists Talk Sloppily About Race. I don’t think that David Reich was sloppy…though the op-ed was edited in a way that was confusing. That being said I’ve heard through the grapevine that some prominent human population geneticists may write a response to David’s op-ed, which is something I want to see. Part of me still thinks that these vigorous public discussions are important (another part of me just thinks that when Sulla or Marius take over all this old-fashioned fixation on truth will be irrelevant).

One thing stated in the piece above is that regular people have a Platonic model of race. This is true. But it is also a fact that geneticists have not done a good job of explaining to the educated public what population structure is, and why it’s not trivial or arbitrary. I know this from personal experience over 15 years interacting with people about genetics online (some of the funniest interactions are on Facebook where a person of professional class background/status “genetics-splains” me about how I don’t understand the extent [lack] of human genetic variation and how arbitrary population cluster identity is).

With The Genomic Formation of South and Central Asia I obviously think we have the broad outlines of the peopling of South Asia in hand. There will be lots of detailed elaborations of how/what happened, but I think the big picture is nailed down.

That being said some of the objections remind me a lot of Creationist tendencies. Creationists often focus on weak points and hammer in on them over and over.

One of the weird things about Indian genetics is that a lot of people think new research will overturn Hindu nationalism. But I know several Hindu nationalists, and privately they tell me that most Hindu nationalists don’t care about these abstruse issues, and many of the more intellectual ones don’t have a major problem with the science.

GEDmatch, Ysearch and the Golden State Killer.

Anthropogenic habitat alteration leads to rapid loss of adaptive variation and restoration potential in wild salmon populations.

Bracketing phenotypic limits of mammalian hybridization.

A few people have asked about the podcast. We skipped a week, but we’ll be back. Taking some feedback in relation to various aspects of the show. A common issue seems to be that my voice is too quiet though Spencer’s is “just right.”

Again, if you use Stitcher or iTunes please remember to give us positive reviews and 5-stars!

If you have ideas for shows, we’re game.

Why Bronze Age steppe people replaced the farmers they conquered

One of the major revisions in my own mind about the demographic and historical processes of the Holocene in relation to humans has been the reality that large and dense agglomerations of agriculturalists could be marginalized by later peoples, to the point of having a smaller genetic footprint in the future than anyone might have imagined. If you had asked me ten years ago I just wouldn’t have believed that the first farmers of Europe or South Asia wouldn’t account for the vast majority of the ancestry of the contemporary populations of the region. By “first farmers” I don’t even mean migrants. At that point, I had assumed a primarily Pleistocene indigenous hypothesis for the origin of Europeans and South Asians, with farming diffusing through a mixture of a few migrants along a demographic wave of advance.

That’s not what it looks like according to ancient DNA. In Northern Europe, it seems that around half or more of the ancestry is due to the incursions of a pastoralist steppe population during the Bronze Age. In Southern Europe and South Asia, the fraction is closer to 10-25%. But even in the latter case, the fraction of steppe ancestry is far higher than I had expected.

I had assumed that the steppe migrants would contribute 1-5% of the ancestry of Europeans and South Asians and that the spread of Indo-European languages was a matter of elite transmission and emulation. Think the Hungarians, for example, as an example of what had assumed.

So what explains what really happened?

During the Mongol conquest of Northern China Genghis Khan reputedly wanted to turn the land that had been the heart of the Middle Kingdom into pasture, first by exterminating the whole population. Part of the motive was to punish the Chinese for resisting his armies, and part of it was to increase his wealth. One of his advisors, Yelu Chucai, a functionary from the Khitai people, dissuaded him from this path through appealing to his selfishness. Chinese peasants taxed on their surplus would enrich Genghis Khan far more than enlarging his herds. Rather than focus on primary production, Genghis Khan could sit atop a more complex economic system and extract rents.

Most of you at this point can see the general framework then. For thousands of years, pastoralist people of the Inner Asian steppe and forest would extract rents out of the oikoumene by threatening them with force. The reason the East Roman Empire did not face the Hunnic onslaught during the lifetime of Attila is that they paid the horde tribute. Imperial China did the same during some periods. In other instances, civilized states found in the barbarians of the steppe useful confederates. The Tang dynasty did not collapse during the 750s because of the intervention of the Uyghurs, who suppressed the rebellion of An Lushan. In 9th century Baghdad the rise of the Turks was enabled by their usefulness in court politics and distance from any given faction.

The rise of the “gunpowder empires” during the 16th century and the eventual closing of the Inner Asian frontier with the crushing of the last embers of the Oirat confederacy between the Russian and Chinese Empires in the 18th century marked the end of thousands of years of interaction between the farmland and pasture.

But this makes us ask: when did this dynamic begin? I don’t think it was primordial. It was invented and developed over time through trial and error. I believe that the initial instinct of pastoralists was to turn farmland into pasture for his herds. This was Genghis Khan’s instinct. The rude barbarian that he was he had not grown up in the extortive system which more civilized barbarians, such as the Khitai, had been habituated to.

In these situations where pastoralists expropriated the land, there wouldn’t have been an opportunity for the farmer to raise a family. Barbarian warlords throughout history have aspired to be rich by plundering from the civilized the peoples…but would the earliest generations have understood the complexity of the institutions that they would have to extract rents out of if there wasn’t a precedent?

Instead of conventional historical dynamics of predatory elites and static peasantry, a better way to understand what occurred with the incursion of steppe pastoralists during the Bronze Age might be a simple ecological model of intra-specific competition. In a pre-state society defined by clan and tribal ties, steppe elites may have seen the farmers who were earlier residents in the territories which they were expanding into as competitors rather than resources from which a life of leisure might be obtained. In other words, instead of conquest, the dynamic was of animal competition.

Of course, pre-modern societies did not have totalitarian states and deadly technology. Rapid organized genocide in a way that we would understand was unlikely to have happened. Rather, in a world on the Malthusian margin, a few generations of deprivation may have resulted in the rapid demographic extinction of whole cultures. You don’t need to kill them if they starve because they were driven off their land.

In fact, we have some precedent of this historically. The Spaniards were intent on extracting rents out of the native peoples of the New World and living a life of leisure, but in many areas disease and exploitation resulted in demographic collapse. Imagine a conquest elite as vicious as the Spaniards, but without thousands of years of precedent that conquered peoples were more useful alive rather than dead. 

Addendum: The fraction of haplogroup M, which is probably derived from Pleistocene South Asians, is greater than 50% in places like Sindh. This indicates that the steppe migrations were strongly male biased in the initial generations.

Rakhigarhi sample doesn’t have steppe ancestry (probably “Indus Periphery”)

We’ve been waiting for two years now, and it looks like they’re about to pull the trigger, Indus Valley People Did Not Have Genetic Contribution From The Steppes: Head Of Ancient DNA Lab Testing Rakhigarhi Samples:

Niraj Rai, the head of the Ancient DNA Laboratory at Lucknow’s Birbal Sahni Institute of Palaeosciences (BSIP), where the DNA samples from the Harappan site of Rakhigarhi in Haryana are being analysed, has revealed that a forthcoming paper on the work will show that there is no steppe contribution to the DNA of the Harappan people….

“It will show that there is no steppe contribution to the Indus Valley DNA,” Rai said. “The Indus Valley people were indigenous, but in the sense that their DNA had contributions from near eastern Iranian farmers mixed with the Indian hunter-gatherer DNA, that is still reflected in the DNA of the people of the Andaman islands.” He added that the paper based on the examination of the Rakhigarhi samples would soon be published on bioRxiv (pronounced “bio-archive”), a preprint repository of papers in the life sciences.

At this point none of this is surprising. I also wonder if this preprint was hastened by the release of The Genomic Formation of South and Central Asia. It seems that the results here are totally consonant with what came before. My expectation is that the lone sample that they got genetic material out of will be similar to the “Indus Periphery” (InPe) individuals in the earlier preprint: a mix of West Asian with ancestry strongly shifted toward eastern Iran, and indigenous South Asian “hunter-gatherer.”  That’s pretty much what Niraj Rai states in the piece. I think genetically the individual won’t be that different from the Chamars of modern day Punjab.

In fact, Rai, the lead researcher, ends by twisting the knife:

In other words, the preprint observes that the migration from the steppes to South Asia was the source of the Indo-European languages in the subcontinent. Commenting on this, Rai said, “any model of migration of Indo-Europeans from South Asia simply cannot fit the data that is now available.”

A major caveat here is that we’re talking about one sample from the eastern edge of the Indus Valley Civilization (IVC). I’m not sure that this should adjust our probabilities that much. From all the other things we know, as well as copious ancient DNA from Central Asia, our probability for the model which the Rakhigarhi result aligns with should already be quite high.

Again, since it’s one sample, we need to be cautious…but I bet once we have more samples from the IVC the Rakhigarhi individual will probably be enriched for AASI relative to other samples from the IVC. The InPe samples in The Genomic Formation of South and Central Asia exhibited some variation, and it’s likely that the IVC region was genetically heterogeneous.

But, this is going to be a DNA sample from an individual who lived 4,600 years ago within the orbit of the IVC when it was in its mature phase. That’s still a big deal. As most of you know the IVC is prehistory because we haven’t deciphered the seals which are associated with this civilization. But, the IVC clearly had relationships with West Asia and Central Asia, with parts of eastern Iran and the BMAC culture both being influenced and interaction with it. Traders who were likely from the IVC seem to be mentioned in Mesopotamian records.

Additionally, the genetics of one individual can be highly informative if it’s high-quality whole-genome data (I’m skeptical of that in this case). One could possibly even identify the time period that admixture between West Asian and AASI components occurred from a single genome, by looking at ancestry tract lengths.

A single sample isn’t going to falsify the idea held by some that steppe peoples were long present within the IVC. Perhaps they’ll show up in other samples? That’s possible, and it’s what I would argue if I held their position, but I think the constellation of evidence on the balance now does suggest that a relatively late incursion into South Asia is likely. The steppe ancestry with Northern European affinities shows up in BMAC only around 4,000 years ago. It is hard to imagine it was in South Asia before it was in Central Asia.

As I’ve been saying for a while it seems that though there will be more genetic work written on India in the near future, the real analysis is going to have to come out of archaeology and mythology.

It’s pretty clear that in Northern Europe the arrival of the Corded Ware peoples from the steppe zone resulted in great tumult. A linguistic analysis suggests that the languages of Northern Europe have words related to agriculture with a non-Indo-European origin, of common provenance.  But we don’t have much in the way of mythos about the arrival of the Corded Ware.

In contrast, India has a rich mythos which seems to date to the early period of the arrival of the Indo-Aryans. One interpretation has been that since these myths seem to take as a given that Indo-Aryans were autochtonous to India, they were. But the genetic data seem to be strongly suggesting that the arrival of pastoralists occurred in South Asia concomitant with their arrival in West Asia, and somewhat after their expansion westward into Europe. Indian tradition and mythos could actually be a window into the general process of how these pastoralists dealt with native peoples and an illustration of the sort of cultural synthesis that often occurred.

The genetic future is here when it comes to finding relatives of suspects

You may have heard that a suspect was arrested who is alleged to be the “Golden State Killer.” DNA played an important role, Relative’s DNA from genealogy websites cracked East Area Rapist case, DA’s office says.

I think Alexander Kim’s supposition is probably right. It wasn’t a direct to consumer company that you know of that uses a genome-wide analysis, but probably old-fashioned Y STR matching which allowed the researchers to converge on the suspect. The public databases for this are extensive enough now that they might yield something, and law enforcement is comfortable with STR tests. This is really a preview of what’s to come. If researchers routinely extract DNA from remains that are tens of thousands of years old it seems clear that a lot more material will come out of old rape kits.

That’s one dimension. The other dimension is that we have many more markers to work with now. Even without whole-genome analysis, you can identify relatives with reasonable precision out to 2nd cousins (it gets a little dicier beyond that).

But the most important variable happens to be with numbers. If you read Alon Keinan’s piece, Crowdsourcing big data research on human history and health: from genealogies to genomes and back again, you know that probably nearly 20 million people have taken advantage of genome-wide consumer testing. Assuming 10 million are in the United States, a substantial number of “cold cases” could probably be closed by just looking for matches within these databases and establishing the pedigrees which suspects come from.

Of course, the genomics companies are not just going to open their databases to law enforcement.  But I’m not sure that that will be necessary. There are enough genealogy enthusiasts that public forums and services to facilitate matches will probably suffice. If only a few percent of the American population is in these forums, then that might get us 90% of the way there.

Addendum: There has been some work in forensic genetics “predicting” physical appearance. A lot of this is not primetime, but one area where a lot could be done: fine-scale ancestral analysis. Using haplotype-based methods and looking for matches within public datasets one could probably narrow down the ethnic background of a suspect pretty well from DNA. If the test tells you someone is Northern European in Minnesota that might not help, but if it tells you that they are around half Lithuanian, that might be very useful….

The Ancient Neanderthal Mariner

More recent stuff on Neanderthals of interest, Neandertals, Stone Age people may have voyaged the Mediterranean:

A decade ago, when excavators claimed to have found stone tools on the Greek island of Crete dating back at least 130,000 years, other archaeologists were stunned—and skeptical. But since then, at that site and others, researchers have quietly built up a convincing case for Stone Age seafarers—and for the even more remarkable possibility that they were Neandertals, the extinct cousins of modern humans.

But a growing inventory of stone tools and the occasional bone scattered across Eurasia tells a radically different story. (Wooden boats and paddles don’t typically survive the ages.) Early members of the human family such as Homo erectus are now known to have crossed several kilometers of deep water more than a million years ago in Indonesia, to islands such as Flores and Sulawesi. Modern humans braved treacherous waters to reach Australia by 65,000 years ago. But in both cases, some archaeologists say early seafarers might have embarked by accident, perhaps swept out to sea by tsunamis.

The effective population size of Australian people is just too large for me to imagine that it was only a few individuals swept out on driftwood. There was some sort of sea-going craft which mediated migration to Sahul from Sundaland. Just because we have only recent evidence of sea-going craft doesn’t mean that they weren’t around for tens of thousands of years before that.

I’ve been hearing about Neanderthal tools on islands like Crete, which were never connected with the European mainland, for a while now. It seems that people are finally convinced that this is the real deal, as the stratigraphy came together to confirm dates. One thing that seems obvious from this, as well as Neanderthal “art”, is that the differences between modern humans and Neanderthals were more quantitative than qualitative. Differences of degree, not of kind.

It is hard to deny that modern human expansion between 60 and 15 thousand years ago is sui generis. Hominins didn’t make it to the New World or Sahul, what later became Oceania, until our own kind. There’s also a fair amount of evidence that our lineage pushed the northern frontier of human habitation beyond what Neanderthals ever did. But in the process of marking off our distinctiveness, it seems to me that we’ve overemphasized the differences between us and Neanderthals, and dismissed or ignored evidence of “human-like” “advanced” behaviors from them.

I’ll still go with the prediction that we’ll never find a singular gene which marks us off from other human lineages.

Whales and complex speciation


“Reader request”, what’s going on with this new crazy baleen whale paper, Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. First, putting “blue whale” in the title is genius, since blue whales are awesome and people will read the paper with that in the title (most people don’t know what “rorquals” are). Second, this paper was interesting because it highlighted the importance of thinking across different ecologies when attempting to understand evolutionary processes.

Since I don’t think too much about speciation, a lot of my thought is derived from the fifteen-year-old book Speciation (good book, too bad it doesn’t seem to be in print anymore!). The authors of Speciation are evolutionary geneticists and emphasize allopatric speciation and the biological species concept. They’re instrumentalists. Basically, you separate populations and they eventually diverge until they’re no longer interfertile. Then you get species.

The problem is that in the oceans allopatric speciation isn’t as straightforward, the seas are open three-dimensional spaces after all. This opens up the likelihood that a lot of oceanic speciation is sympatric speciation (think cichlid fish). Something like this seems to apply to large non-toothed whales in this study.

Though the gray whale is phenotypically very distinct from others in the study above, it turns out that phylogenetically they are within the rorqual clade. The authors suggest that the gray whale distinctiveness is a function of adaptation to a benthic lifestyle. They’re bottom feeders.

The topology at the top of this post illustrates that there seems to have been a lot of complexity and gene flow as the rorquals diverged early on so that it’s not really a simple bifurcating phylogenetic tree. We’ve seen this story before. Remember Genetic evidence for complex speciation of humans and chimpanzees?

I think the moral of the story is that large mammalian species which are the basis of the biological species concept don’t really fit under that paradigm too easily. Even this study is probably not going to be the last word on rorqual phylogenetics.

Open Thread, 4/24/2018

Finished She Has Her Mother’s Laugh: The Powers, Perversions, and Potential of Heredity. To be honest I was pleasantly surprised that the narrative wasn’t overly fixated on the ‘perversions.’ Sometimes it’s hard to move past that.

I think different people will benefit from reading the book differently. If you are a layperson a serial reading from front to back is optimal. She Has Her Mother’s Laugh is a long book, so this will take a while. But you need to do this to get situated. If you are a geneticist, you may benefit from jumping around chapters, and sampling what people in other fields are doing. Additionally, some geneticists would actually benefit from reading the historical chapters.

Started reading The Fate of Rome: Climate, Disease, and the End of an Empire. Yes, it’s very good. Will see if it’s better than The Fall of Rome: And the End of Civilization after I’ve finished.

Thanks for whoever reviewed the podcast I cohost on iTunes and Sticher. If you haven’t done so, please do so!

Appreciate the feedback so far.

Found out today that India Today posted my review of Who We Are a few weeks ago! Pretty funny I didn’t see it.

Meanwhile, The Genetic History of Indians: Are We What We Think We Are? It looks like Indian scientists are bending before reality: ““How do I say it? See, I am a nationalist,” Rai says over the phone. “People will be upset. But that’s how it is. All the studies are showing that people came here from elsewhere.”

A friend asked again “how do I learn population genetics?” My opinion has not changed in the 15 years I’ve become interested in the field, read Principles of Population Genetics. If you need a gentle introduction, Population Genetics: A Concise Guide is probably that. But I read Principles of Population Genetics in 2004 without any formal training in the field. It’s not that difficult if you put time into it.

Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia. Gotta do it on flies first!

California, Coffee and Cancer: One of These Doesn’t Belong. The cancer warnings in California are treated as a joke by the population. Unfortunately, there are real carcinogens out there.

Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits.

There were possibly late archaic introgression events in Eurasia

A few weeks ago I posted on the strong likelihood that there were at least two Denisovan admixture events in Eurasia into modern humans. That’s probably the floor, not the ceiling. We have an Altai Denisovan genome, but the proportion is so low in most of South and Southeast Asia I don’t think we have a good grasp of how that component differs from the Oceanian fraction, which is much higher.

At the AAPA meeting last week I noticed something strange in one of the presentations: introgressed Denisovan variants which were present among East Asian populations, but lacking elsewhere. The fractions were not >50%, but they were >10%. The Denisovan variants were nearly absent outside of this core zone of East Asians.

There are two possible reasons for this distribution. One reason is that Denisovan variants were segregating in East Asians for thousands of years, and a common bottleneck, or, more likely selection, drove them up in frequency. Another, not exclusive, explanation is that admixture occurred in East Asia relatively late. The Denisovan signature is totally absent in the New World. Either that’s selection or drift eliminating variation, or, it’s the fact that this admixture event happened in East Asia less than about 30,000 years ago when Native American populations’ East Asian-like source population began to divergence from that of East Asians.

One thing that we know from paleontology is that species exist before the remains we find, and persist after the remains we find. It’s quite possible that small relic populations of Denisovans persisted for thousands of years after modern humans came to dominate the East Asian landscape.

Good night Avicii, you lived before you died

Like many people I didn’t know much about Avicii when he was alive, though I know much more now that he has died. His stuff played while I was on the computer in the lab, or when I was working out. Avicii for me was the anti-Kardashian, as I had no idea who “he” (I wasn’t sure of gender though I assumed he was male)was, where he was from. He was just a DJ who made music, and I enjoyed the music. He wasn’t famous to me, but his music was famous.