African genomics tells us about deep structure and history


Two interesting papers in Genome Biology that are open access, Whole-genome sequence analysis of a Pan African set of samples reveals archaic gene flow from an extinct basal population of modern humans into sub-Saharan populations and African evolutionary history inferred from whole genome sequence data of 44 indigenous African populations. Since they are open access you should just read both of them.

I believe they are the first in a series of papers over the next few years using whole-genome analysis to understand the population structure within Africa, and how it relations to the people who branched off from Africans. Eventually, this will also lead to research focused on medical and population genomics, looking at characteristics and forces beyond phylogeny.

Read More

Let the genomic die fly!


A new “polygenic risk score” (PRS) paper is making some waves, Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood. Since it is open access I suggest you read it.

But basically, they took ~2 million common variants (there are about ~100 million common variants in the world population) in ~300,000 individuals in 4 cohorts, and used it to predict weight. A genome-wide polygenic score statistic. The correlation with BMI of the score is 0.29. This is pretty modest. But it seems to me that the biggest and most important finding is that it seems to capture a lot of the people at the tails of the distribution.

I’m becoming more and more convinced that the best things these PRS scores can do in the near-term is to identify people who are possibly at these tails. In a complex trait context, the tails are where for diseases a lot of the people who are going to have issues later in life exist. People with BMI in the range 25-30 may have a modest increase in risks, but someone who is very obese, with BMI above 35, is at much greater risk. Over 40% of the people in the top decile here were obese. Only 10% of people in the bottom decile were.

This research comes out of the context of earlier work on the heritability of BMI. It’s around 0.75 or so. That means it runs in families. Combined with the fact that in the recent past, or in other nations, there is a great variation in median size and distribution, one can intuit that genetic dispositions and environmental context both help explain the variation we see around us. The modern American environment is clearly obesogenic. When most of the American population were involved in physical jobs on farms the environmental context was very different.

Over the next few years, there risk scores for BMI will get better, and expand to other populations. One thing that some people are pointing out is that we know it’s heritable, so why not just look at your family? As many of you know, Mendelian segregation means that siblings may have quite different risk profiles on the genomic level. Polygenic risk score prediction is I think going to be extremely interesting and informative in the case of traits which are known to be found within families across generations (e.g., autism), but don’t seem to impact everyone. Perhaps we’ll find for a given characteristic expression is random, due to some life event or cofactor such as infection. Or perhaps we’ll find that differences among siblings have some genetic basis in variants inherited from parents?

Addendum: One of the authors, Sek Kathiresan, has been answering questions on Twitter.

Who We Are and How We Got Here, a book worth reading

Yesterday I talked to a friend who has a review copy of Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. They gave me a preview (their overall assessment was positive).

I haven’t personally asked to get a copy because, to be honest, I thought there wouldn’t be anything new in it. If you “read the supplements” what more could there be in 368 pages? So I was waiting until the end of the month to buy the book and read it in my own sweet time as due diligence.

Well, this morning I asked a publicist to send me a copy. I will be getting it next week. The reason is that I’m told the latter portions of the book are quite challenging and candid as to what genetics may tell us in the 21st century. Who We Are and How We Got Here is a 21st-century revision and update of The History and Geography of Human Genes. But it’s apparently a lot more.

Also, I make a small cameo in the book, as does Eurogenes and Dienekes. I have always appreciated how the David Reich and Nick Patterson and their whole lab has taken people outside of the halls of the academy seriously. They didn’t need to as a matter of professional necessity but often engage as a matter of decency and seriousness.

The architecture of skin color variation in Africa

Baby of hunter-gatherers in Southern Africa

Very interesting abstract at the ASHG meeting of a plenary presentation,Novel loci associated with skin pigmentation identified in African populations. This is clearly the work that one of the comments on this weblog alluded to last summer during SMBE. There I was talking about the likely introduction of the derived SLC24A5 variant to the Khoisan peoples and its positive selection in peoples in southern Africa.

Below is the abstract in full. Those who follow the literature on this see the usual suspects in relation to genes, but also new ones:

Despite the wide range of variation in skin pigmentation in Africans, little is known about its genetic basis. To investigate this question we performed a GWAS on pigmentation in 1,593 Africans from populations in Ethiopia, Tanzania, and Botswana. We identify significantly associated loci in or near SLC24A5MFSD12TMEM138…OCA2 and HERC2. Allele frequencies at these loci in global populations are strongly correlated with UV exposure. At SLC24A5 we find that a non-synonymous mutation associated with depigmentation in non-Africans was introduced into East Africa by gene flow, and subsequently rose to high frequency. At MFSD12, we identify novel variants that are strongly correlated with dark pigmentation in populations with Nilo-Saharan ancestry. Functional assays reveal that MFSD12 codes for a lysosomal protein that influences pigmentation in cultured melanocytes, zebrafish and mice. CRISPR knockouts of murine Mfsd12 display reduced pheomelanin pigmentation similar to the grizzled mouse mutant (gr/gr). Exome sequencing of gr/gr mice identified a 9 bp in-frame deletion in exon two of Mfsd12. Thus, using human GWAS data we were able to map a classic mouse pigmentation mutant. At TMEM138…we identify mutations in melanocyte-specific regulatory regions associated with expression of UV response genes. Variants associated with light pigmentation at this locus show evidence of a selective sweep in Eurasians. At OCA2 and HERC2 we identify novel variants associated with pigmentation and at OCA2, the oculocutaneous albinism II gene, we find evidence for balancing selection maintaining alleles associated with both light and dark skin pigmentation. We observe at all loci that variants associated with dark pigmentation in African populations are identical by descent in southern Asian and Australo-Melanesian populations and did not arise due to convergent evolution. Further, the alleles associated with skin pigmentation at all loci but SLC24A5 are ancient, predating the origin of modern humans. The ancestral alleles at the majority of predicted causal SNPs are associated with light skin, raising the possibility that the ancestors of modern humans could have had relatively light skin color, as is observed in the San population today. This study sheds new light on the evolutionary history of pigmentation in humans.

Much of this is not surprising. Looking at patterns of variation around pigmentation loci researchers suggested years ago that Melanesians and Africans exhibited evidence of similarity and functional constraint. That is, the dark skin alleles date back to Africa and did not deviate from their state due to selection pressures. In contrast, light skin alleles in places like eastern and western Eurasia are quite different.

Nyakim Gatwech

This abstract also confirms something I said in a comment on the same thread, that Nilotic peoples are the ones likely to have been subject to selection for dark skin in the last 10,000 years. You see above that variants on MFSD12 are correlated with dark complexion. In particular, in Nilo-Saharan groups. The model Nyakim Gatwech is of South Sudanese nationality and has a social media account famous for spotlighting her dark skin. In comparison to the Gatwech and the San Bushman child above are so different in color that I think it would be clear these two individuals come from very distinct populations.

The fascinating element of this abstract is the finding that most of the alleles which are correlated with lighter skin are very ancient and that they are the ancestral alleles more often than the derived! We’ll have to wait until the paper comes out. My assumption is that after the presentation Science will put it on their website. But until then here are some comments:

  • There is obviously a bias in the studies of pigmentation toward those which highlight European variability.
  • The theory of balancing selection makes sense to me because ancient DNA is showing OCA2 “blue eye” alleles which are not ancestral in places outside of Western Europe. And in East Asia there their own variants.
  • Lots of variance in pigmentation not accounted for in mixed populations (again, lots of the early genomic studies focused on populations which were highly diverged and had nearly fixed differences). Presumably, African research will pick a lot of this up.
  • This also should make us skeptical of the idea that Western Europeans were necessarily very dark skinned, as now we know that human pigmentation architecture is complex enough that sampling modern populations expand our understanding a great deal.
  • Finally, it’s long been assumed that at some stage early on humans were light skinned on most of their body because we had fur. When we lost our fur is when we would need to have developed dark skin. This abstract is not clear at how far long ago light and dark alleles coalesce to common ancestors.

Genetic variation and disease in Africa


Very readable review, Gene Discovery for Complex Traits: Lessons from Africa. It’s open access, so I recommend it. The summary:

The genetics of African populations reveals an otherwise “missing layer” of human variation that arose between 100,000 and 5 million years ago. Both the vast number of these ancient variants and the selective pressures they survived yield insights into genes responsible for complex traits in all populations.

The main issue I might have is I’m not sure that focusing on 5 million year time spans is particularly useful. Rather, looking at the last major bottleneck for modern humans before the “Out of Africa” event would be key, since that’s when a lot of the common variation would disappear, and very rare variants probably don’t have deep time depth in any case. With all that being said, the qualitative analysis is on point.

One of the major issues in the “SNP-chip” era has been that ascertainment of variation has been skewed toward Europeans. Though more recent techniques have tried to fix this…this review points out that if you by necessity constrain the SNPs of interest to those that vary outside of Africa (most of the world’s population), you are taking may alleles private to Africa off the table. This is relevant because the “Out of Africa” bottleneck ~50,000 years ago means that African populations harbor a lot more genetic variation than non-African populations do.

The move to high-quality whole genome sequencing obviates these concerns. As a matter of course African variation will be “picked up” since the marker set is not constrained ahead of time.

Importantly the authors focus on South Africa and the Xhosa population. This group has about ~20% Khoisan genetic ancestry, which is very diverse, and, very distinct, from that of the remaining ~80% of its ancestry. With its large African immigrant population and highly diverse native groups, some of them quite admixed, South Africa could actually provide some hard-to-substitute value in biomedical genetics.

Khoisan may not have diverged ~300,000 years ago


A few years ago I contributed to an op-ed which defended the utility of the race concept in biology in USA Today (which by the way prompted a quite patronizing email from a famous doyen of population genetics who wished to correct my ignorance; here’s a clue: “Out of Africa again & again”).

In my initial draft, I had stated that the Khoisan diverged from other human populations ~200,000 years ago. The fact-checker came back and said that this didn’t seem to be a supportable claim. The reason I gave the ~200,000 figure is that I’d button-holed people who looked at these genomes, and they were coming to the conclusion that the divergence between Khoisan and non-Khoisan was further back than we’d presupposed. And that was the number given to me.

Ultimately I compromised and allowed them to change the divergence value to 150,000 years before the present.

Today we’re in a different landscape. The above figure is from the Science paper, Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago, which was earlier a biorxiv preprint (which I mentioned last spring). In concert with the North African find, the media is running with the idea that the origin of modern humans goes back very far indeed. This piece in ScienceNews is actually pretty good in my opinion at staying under control, though not all write-ups have been so measured.

So in a span of two years we’ve gone from me pushing and compromising on a value of ~150,000 years, to researchers suggesting that the Khoisan/non-Khoisan divergence is about two-fold older than that!

Well, I’m here to tell you that a prominent geneticist who is very conversant with these issues is simply incredulous about the likelihood of this particular value. I brought up this preprint to them over lunch and they just didn’t buy it. That is, they are skeptical that the amount of admixture would have skewed the earlier inferences to the magnitude that they seem to have in these results.

The authors in the paper used G-PhoCS and their own ingenious method to come to these inferences of split dates. The problem with these methods is that the inferences generated aren’t nearly as straightforward as an admixture estimate (which can be checked by something as simple as a PCA). I don’t want to get into the details, but I remember seeing models in the 2000s which inferred that East Asians and Europeans diverged ~25,000 years ago, or that there was no Neanderthal admixture in Europeans (to a high degree of confidence). Models can come out with a lot of values.

More importantly, look at the dates of divergence of non-Africans (Sardinians here) from their closest African relatives.

  • 115,000 years before the present (Dinka-Sardinian) for G-PhoCS
  • 76,000 years before the present for their TT-method

In light of the likelihood that the closest population to non-Africans may have been an East African population represented by Ethiopia Mota individual (along with modern Hadza), we can probably drop that estimate down a bit. But G-PhoCS in particular just gives too old an estimate. There are ways it makes sense (lots of old structure within Africa) of course. I’m just speaking in terms of possibilities.

The diversification of extant modern populations seems to have occurred around ~50,000-60,000 years before the present. This aligns with the archaeology, and the ancient genomes which we have on hand.

Of course the methods in this paper might be right. And the fossil from North Africa does add some plausibility to that. But really the whole field is somewhat unsettled now, and we should be cautious of reporting of definitive truths in the media.

Population structure in Neanderthals leads to genetic homogeneity


The above tweet is in response to a article which reports on the finding past month in PNAS, Early history of Neanderthals and Denisovans. It’s open access, you should read it. I don’t think I’ve reviewed it because I haven’t dug through the supplements. To be frank this is a paper where you pretty much have to read the supplements because they’re introducing a somewhat different model here than is the norm.

I talked to Alan Rogers at SMBE about this paper. Broadly, I think there might be something to it, and it’s because of what David says above. It is simply hard to imagine that Neanderthals could be extremely successful with such low genetic diversity as we see, and spread so thin. Now, the Quanta Magazine tries to emphasize that the effective population is not the true census population, but I wish it would have explained it more clearly. Basically, the size that is relevant for breeding is obviously not going to the same as a head count. And, because effective populations are highly sensitive to bottlenecks you can get really small numbers even when the extant population at any given time may be large.

The PNAS paper makes some novel inferences, and I’ll set that to the side until I read the supplements. But I don’t think it’s crazy that population structure within Neanderthals could be leading to lower total genetic diversity.

Massive genomic sample sizes = detecting evolution in real time

The recent PLOS BIOLOGY paper, Identifying genetic variants that affect viability in large cohorts, seems to have triggered a feeding frenzy in the media. For example, Big Think has put up Researchers Find Evidence That Human Evolution Is Still Actively Happening.

I wasn’t paying close attention because of course human evolution is still happening actively. From a genetic perspective, evolution is just change in allele frequencies. Populations aren’t infinite, so even if there wasn’t any selection stochastic forces would shift allele frequencies. But of course selection is probably happening. For adaptation by natural selection to occur you need heritable variation on a trait where there are fitness differences as a function of variation within the population. It seems implausible that these conditions don’t still apply. There’s plenty of fitness variation in the population, and it’s unlikely to be random as a function of heritable variation.

But the devil is in the details. And last year Field et al. used the modern genomic tools available to detect selection occurring over the past 2,000 years. It is not credible that it would have magically stopped a few centuries ago.

So why is this new paper such a big deal? (note that it’s in PLOS BIOLOGY, not PLOS GENETICS) Because the method they use is ingenious and simple. Basically, they’re looking at changes in allele frequencies as a function of age in huge populations. It’s a little more complicated than that, they used a logistic regression to control for some of the other variables. But they found some biologically plausible hits with their data set of 50,000-150,000. And, they replicated their hits from a European sample to a non-European one.

This does bring me back to a discussion I observed a while back. An evolutionary geneticist who works with Drosophila mentioned offhand that in his field there really wasn’t that much of a need for more data. They could spend all their time to doing analysis. A prominent human geneticist whose work focused on biomedicine piped up that that wasn’t true at all for their field. There are some differences in the scientific questions, but there are also differences in terms of what you can do with humans as a model organism.

In the paper they look forward to the day of increasing sample sizes an order of magnitude beyond where it is now. At some point in the near future, large fractions of entire nations will be sequenced at medical grade level (30x coverage).

Anyway, you should read Identifying genetic variants that affect viability in large cohorts. It’s pretty straightforward.

After agriculture, before bronze

 

The above plot shows genetic distance/variation between highland and lowland populations in Papa New Guinea (PNG). It is from a paper in Science that I have been anticipating for a few months (I talked to the first author at SMBE), A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea.

What does “strong genetic structure” mean? Basically Fst is showing the proportion of genetic variation which is partitioned between groups. Intuitively it is easy to understand, in that if ~1% of the genetic variation is partitioned between groups in one case, and ~10% in another, then it is reasonable to suppose that the genetic distance between groups in the second case is larger than in the first case. On a continental scale Fst between populations is often on the order of ~0.10. That is the value for example when you pool the variation amongst Northern Europeans and Chinese, and assess how much of it can be apportioned in a manner which differentiates populations (so it’s about ~10% of the variation).

This is why ancient DNA results which reported that Mesolithic hunter-gatherers and Neolithic farmers in Central Europe who coexisted in rough proximity for thousands of years exhibited differences on the order of ~0.10 elicited surprise. These are values we are now expecting from continental-scale comparisons. Perhaps an appropriate analogy might be the coexistence of Pygmy groups and Bantu agriculturalists? Though there is some gene flow, the two populations exist in symbiosis and exhibit local ecological segregation.

In PNG continental scale Fst values are also seen among indigenous people. The differences between the peoples who live in the highlands and lowlands of PNG are equivalent to those between huge regions of Eurasia. This is not entirely surprising because there has been non-trivial gene flow into lowland populations from Austronesian groups, such as the Lapita culture. Many lowland groups even speak Austronesian languages today.

Using standard ADMIXTURE analysis the paper shows that many lowland groups have significant East Asian ancestry (red), while none of the highland groups do (some individuals with East Asian admixture seem to be due to very recent gene flow). But even within the highlands the genetic differences are striking. The  Fst values between Finns and Southern European groups such as Spaniards are very high in a European context (due to Finnish Siberian ancestry as well as drift through a bottleneck), but most comparisons within the highland groups in PNG still exceeds this.

The paper also argues that genetic differences between Papuans and the natives of Australia pre-date the rising sea levels at the beginning of the Holocene, when Sahul divided between its various constituents. This is not entirely surprising considering that the ecology of the highlands during the Pleistocene would have been considerably different from Australia to the south, resulting in sharp differences in the hunter-gatherer lifestyles. Additionally, there does not seem to have been a genetic cline. Papuans are symmetrically related to all Australian groups they had samples from.

Using coalescence-based genomic methods they inferred that separation between highlands and some lowland groups occurred ~10-20,000 years ago. That is, after the Last Glacial Maximum. For the highlands, the differences seem to date to within the last 10,000 years. The Holocene. Additionally, they see population increases in the highlands, correlating with the shift to agriculture (cultivation of taro).

None of the above is entirely surprising, though I would take the date inferences with a grain of salt. The key is to observe that large genetic differences, as well as cultural differences, accrued in the highlands of PNG during the Holocene. In the paper they have a social and cultural explanation for what’s going on:

  Fst values in PNG fall between those of hunter-gatherers and present-day populations of west Eurasia, suggesting that a transition to cultivation alone does not necessarily lead to genetic homogenization.

A key difference might be that PNG had no Bronze Age, which in west Eurasia was driven by an expansion of herders and led to massive population replacement, admixture, and cultural and linguistic change (7, 8), or Iron Age such as that linked to the expansion of Bantu-speaking
farmers in Africa (24). Such cultural events have resulted in rapid Y-chromosome lineage expansions due to increased male reproductive variance (25), but we consistently find no evidence for this in PNG (fig. S13). Thus, in PNG, wemay be seeing the genetic, linguistic, and cultural diversity that sedentary human societies can achieve in the absence of massive technology-driven expansions.

Peter Turchin in books like Ultrasociety has aruged that one of the theses in Steven Pinker’s The Better Angels of Our Nature is incorrect: that violence has not decreased monotonically, but peaked in less complex agricultural societies. PNG is clearly a case of this, as endemic warfare was a feature of highland societies when they encountered Europeans. Lawrence Keeley’s War Before Civilization: The Myth of the Peaceful Savage gives so much attention to highland PNG because it is a contemporary illustration of a Neolithic society which until recently had not developed state-level institutions.

What papers like these are showing is that cultural and anthropological dynamics strongly shape the nature of genetic variation among humans. Simple models which assume as a null hypothesis that gene flow occurs through diffusion processes across a landscape where only geographic obstacles are relevant simply do not capture enough of the dynamic. Human cultures strongly shape the nature of interactions, and therefore the genetic variation we see around us.