Well, it turns out that that is a new phrase. I confirmed it myself. If Google is perplexed, it must be a novelty.
As many of you know ‘populations’ are constructs. Really what’s happening is that we’re tracing the genealogies of genes, and those summary genealogies give us insights into population splits and mixture.
Reticulation is a pretty general concept in evolution and genetics. I didn’t really know what other terms to give the dynamic I was alluding to.
What other term describes the process you see in the admixture graph below?
Over the last ten years David Reich and other researchers have been constructing what is basically an atlas of human demographic history. Taking the genealogies written in our DNA, mapping them onto population bifurcations and admixtures, and synthesizing that back together with what we know from history and archaeology.
To a great extent, this is a project of human phylogenomics. Taking genome-wide data and constructing phylogenies out of it (or, perhaps more precisely, graphs, as this is on a intra-species time scale mostly and characterized by lots of gene flow across the “tips” of the tree). But there’s another thing you can do with modern human genomics and evolution: look at patterns of selection within the genome.
The Reich group has already started doing this. For example, they have adduced that CCR5 delta 32 mutation seems to have emerged out of the Yamnaya horizon.
The method is simple: imagine that “Ancestral North Indians” are fixed for an allele at a gene in one state and “Ancestral South Indians” are fixed in the other state. Indian populations are about 50:50 (with a range). If the frequency today in Indian populations is 95% for the allele that is from the “Ancestral North Indians”, one might be suspicious as to what’s going on. Or, vice versa.
In the paper, they used whole genomes to reconstruct the ancestral steppe/Iranian population without any residual “Ancient Ancestral South Indian” (AASI), the latter of which has no West Eurasian. They did the same for the AASI. These reconstructions are always dicey, but they made a good faith effort to check their work. On the whole, that section was impressive. The authors seem to be roughly aligned with the results in Narasimhan et al. 2019. The AASI seems to be homogeneous, with the exception of attempting to model them from donors which were Munda or Burusho, both groups with deep East Asian admixture (illustrating the problem with deconvolution). Second, they show that the AASI are not clustering with the Andamanese, which makes sense since these groups diverged closer to 40,000 years ago. Finally, the steppe/Iranian group looks most like Armenian middle-to-late Bronze Age people. A synthesis of steppe and some Iranian-like ancestry.
But this isn’t the most interesting part of the paper. It’s the selection. Here are the top, top, candidates:
I believe they are the first in a series of papers over the next few years using whole-genome analysis to understand the population structure within Africa, and how it relations to the people who branched off from Africans. Eventually, this will also lead to research focused on medical and population genomics, looking at characteristics and forces beyond phylogeny.
But basically, they took ~2 million common variants (there are about ~100 million common variants in the world population) in ~300,000 individuals in 4 cohorts, and used it to predict weight. A genome-wide polygenic score statistic. The correlation with BMI of the score is 0.29. This is pretty modest. But it seems to me that the biggest and most important finding is that it seems to capture a lot of the people at the tails of the distribution.
I’m becoming more and more convinced that the best things these PRS scores can do in the near-term is to identify people who are possibly at these tails. In a complex trait context, the tails are where for diseases a lot of the people who are going to have issues later in life exist. People with BMI in the range 25-30 may have a modest increase in risks, but someone who is very obese, with BMI above 35, is at much greater risk. Over 40% of the people in the top decile here were obese. Only 10% of people in the bottom decile were.
This research comes out of the context of earlier work on the heritability of BMI. It’s around 0.75 or so. That means it runs in families. Combined with the fact that in the recent past, or in other nations, there is a great variation in median size and distribution, one can intuit that genetic dispositions and environmental context both help explain the variation we see around us. The modern American environment is clearly obesogenic. When most of the American population were involved in physical jobs on farms the environmental context was very different.
Over the next few years, there risk scores for BMI will get better, and expand to other populations. One thing that some people are pointing out is that we know it’s heritable, so why not just look at your family? As many of you know, Mendelian segregation means that siblings may have quite different risk profiles on the genomic level. Polygenic risk score prediction is I think going to be extremely interesting and informative in the case of traits which are known to be found within families across generations (e.g., autism), but don’t seem to impact everyone. Perhaps we’ll find for a given characteristic expression is random, due to some life event or cofactor such as infection. Or perhaps we’ll find that differences among siblings have some genetic basis in variants inherited from parents?
Addendum: One of the authors, Sek Kathiresan, has been answering questions on Twitter.
I haven’t personally asked to get a copy because, to be honest, I thought there wouldn’t be anything new in it. If you “read the supplements” what more could there be in 368 pages? So I was waiting until the end of the month to buy the book and read it in my own sweet time as due diligence.
Well, this morning I asked a publicist to send me a copy. I will be getting it next week. The reason is that I’m told the latter portions of the book are quite challenging and candid as to what genetics may tell us in the 21st century. Who We Are and How We Got Here is a 21st-century revision and update of The History and Geography of Human Genes. But it’s apparently a lot more.
Also, I make a small cameo in the book, as does Eurogenes and Dienekes. I have always appreciated how the David Reich and Nick Patterson and their whole lab has taken people outside of the halls of the academy seriously. They didn’t need to as a matter of professional necessity but often engage as a matter of decency and seriousness.
Very interesting abstract at the ASHG meeting of a plenary presentation,Novel loci associated with skin pigmentation identified in African populations. This is clearly the work that one of the comments on this weblog alluded to last summer during SMBE. There I was talking about the likely introduction of the derived SLC24A5 variant to the Khoisan peoples and its positive selection in peoples in southern Africa.
Below is the abstract in full. Those who follow the literature on this see the usual suspects in relation to genes, but also new ones:
Despite the wide range of variation in skin pigmentation in Africans, little is known about its genetic basis. To investigate this question we performed a GWAS on pigmentation in 1,593 Africans from populations in Ethiopia, Tanzania, and Botswana. We identify significantly associated loci in or near SLC24A5, MFSD12, TMEM138…OCA2 and HERC2. Allele frequencies at these loci in global populations are strongly correlated with UV exposure. At SLC24A5 we find that a non-synonymous mutation associated with depigmentation in non-Africans was introduced into East Africa by gene flow, and subsequently rose to high frequency. At MFSD12, we identify novel variants that are strongly correlated with dark pigmentation in populations with Nilo-Saharan ancestry. Functional assays reveal that MFSD12 codes for a lysosomal protein that influences pigmentation in cultured melanocytes, zebrafish and mice. CRISPR knockouts of murine Mfsd12 display reduced pheomelanin pigmentation similar to the grizzled mouse mutant (gr/gr). Exome sequencing of gr/gr mice identified a 9 bp in-frame deletion in exon two of Mfsd12. Thus, using human GWAS data we were able to map a classic mouse pigmentation mutant. At TMEM138…we identify mutations in melanocyte-specific regulatory regions associated with expression of UV response genes. Variants associated with light pigmentation at this locus show evidence of a selective sweep in Eurasians. At OCA2 and HERC2 we identify novel variants associated with pigmentation and at OCA2, the oculocutaneous albinism II gene, we find evidence for balancing selection maintaining alleles associated with both light and dark skin pigmentation. We observe at all loci that variants associated with dark pigmentation in African populations are identical by descent in southern Asian and Australo-Melanesian populations and did not arise due to convergent evolution. Further, the alleles associated with skin pigmentation at all loci but SLC24A5 are ancient, predating the origin of modern humans. The ancestral alleles at the majority of predicted causal SNPs are associated with light skin, raising the possibility that the ancestors of modern humans could have had relatively light skin color, as is observed in the San population today. This study sheds new light on the evolutionary history of pigmentation in humans.
Much of this is not surprising. Looking at patterns of variation around pigmentation loci researchers suggested years ago that Melanesians and Africans exhibited evidence of similarity and functional constraint. That is, the dark skin alleles date back to Africa and did not deviate from their state due to selection pressures. In contrast, light skin alleles in places like eastern and western Eurasia are quite different.
Nyakim Gatwech
This abstract also confirms something I said in a comment on the same thread, that Nilotic peoples are the ones likely to have been subject to selection for dark skin in the last 10,000 years. You see above that variants on MFSD12 are correlated with dark complexion. In particular, in Nilo-Saharan groups. The model Nyakim Gatwech is of South Sudanese nationality and has a social media account famous for spotlighting her dark skin. In comparison to the Gatwech and the San Bushman child above are so different in color that I think it would be clear these two individuals come from very distinct populations.
The fascinating element of this abstract is the finding that most of the alleles which are correlated with lighter skin are very ancient and that they are the ancestral alleles more often than the derived! We’ll have to wait until the paper comes out. My assumption is that after the presentation Science will put it on their website. But until then here are some comments:
There is obviously a bias in the studies of pigmentation toward those which highlight European variability.
The theory of balancing selection makes sense to me because ancient DNA is showing OCA2 “blue eye” alleles which are not ancestral in places outside of Western Europe. And in East Asia there their own variants.
Lots of variance in pigmentation not accounted for in mixed populations (again, lots of the early genomic studies focused on populations which were highly diverged and had nearly fixed differences). Presumably, African research will pick a lot of this up.
This also should make us skeptical of the idea that Western Europeans were necessarily very dark skinned, as now we know that human pigmentation architecture is complex enough that sampling modern populations expand our understanding a great deal.
Finally, it’s long been assumed that at some stage early on humans were light skinned on most of their body because we had fur. When we lost our fur is when we would need to have developed dark skin. This abstract is not clear at how far long ago light and dark alleles coalesce to common ancestors.
The genetics of African populations reveals an otherwise “missing layer” of human variation that arose between 100,000 and 5 million years ago. Both the vast number of these ancient variants and the selective pressures they survived yield insights into genes responsible for complex traits in all populations.
The main issue I might have is I’m not sure that focusing on 5 million year time spans is particularly useful. Rather, looking at the last major bottleneck for modern humans before the “Out of Africa” event would be key, since that’s when a lot of the common variation would disappear, and very rare variants probably don’t have deep time depth in any case. With all that being said, the qualitative analysis is on point.
One of the major issues in the “SNP-chip” era has been that ascertainment of variation has been skewed toward Europeans. Though more recent techniques have tried to fix this…this review points out that if you by necessity constrain the SNPs of interest to those that vary outside of Africa (most of the world’s population), you are taking may alleles private to Africa off the table. This is relevant because the “Out of Africa” bottleneck ~50,000 years ago means that African populations harbor a lot more genetic variation than non-African populations do.
The move to high-quality whole genome sequencing obviates these concerns. As a matter of course African variation will be “picked up” since the marker set is not constrained ahead of time.
Importantly the authors focus on South Africa and the Xhosa population. This group has about ~20% Khoisan genetic ancestry, which is very diverse, and, very distinct, from that of the remaining ~80% of its ancestry. With its large African immigrant population and highly diverse native groups, some of them quite admixed, South Africa could actually provide some hard-to-substitute value in biomedical genetics.
A few years ago I contributed to an op-ed which defended the utility of the race concept in biology in USA Today (which by the way prompted a quite patronizing email from a famous doyen of population genetics who wished to correct my ignorance; here’s a clue: “Out of Africa again & again”).
In my initial draft, I had stated that the Khoisan diverged from other human populations ~200,000 years ago. The fact-checker came back and said that this didn’t seem to be a supportable claim. The reason I gave the ~200,000 figure is that I’d button-holed people who looked at these genomes, and they were coming to the conclusion that the divergence between Khoisan and non-Khoisan was further back than we’d presupposed. And that was the number given to me.
Ultimately I compromised and allowed them to change the divergence value to 150,000 years before the present.
Today we’re in a different landscape. The above figure is from the Science paper, Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago, which was earlier a biorxiv preprint (which I mentioned last spring). In concert with the North African find, the media is running with the idea that the origin of modern humans goes back very far indeed. This piece in ScienceNews is actually pretty good in my opinion at staying under control, though not all write-ups have been so measured.
So in a span of two years we’ve gone from me pushing and compromising on a value of ~150,000 years, to researchers suggesting that the Khoisan/non-Khoisan divergence is about two-fold older than that!
Well, I’m here to tell you that a prominent geneticist who is very conversant with these issues is simply incredulous about the likelihood of this particular value. I brought up this preprint to them over lunch and they just didn’t buy it. That is, they are skeptical that the amount of admixture would have skewed the earlier inferences to the magnitude that they seem to have in these results.
The authors in the paper used G-PhoCS and their own ingenious method to come to these inferences of split dates. The problem with these methods is that the inferences generated aren’t nearly as straightforward as an admixture estimate (which can be checked by something as simple as a PCA). I don’t want to get into the details, but I remember seeing models in the 2000s which inferred that East Asians and Europeans diverged ~25,000 years ago, or that there was no Neanderthal admixture in Europeans (to a high degree of confidence). Models can come out with a lot of values.
More importantly, look at the dates of divergence of non-Africans (Sardinians here) from their closest African relatives.
115,000 years before the present (Dinka-Sardinian) for G-PhoCS
76,000 years before the present for their TT-method
In light of the likelihood that the closest population to non-Africans may have been an East African population represented by Ethiopia Mota individual (along with modern Hadza), we can probably drop that estimate down a bit. But G-PhoCS in particular just gives too old an estimate. There are ways it makes sense (lots of old structure within Africa) of course. I’m just speaking in terms of possibilities.
The diversification of extant modern populations seems to have occurred around ~50,000-60,000 years before the present. This aligns with the archaeology, and the ancient genomes which we have on hand.
Of course the methods in this paper might be right. And the fossil from North Africa does add some plausibility to that. But really the whole field is somewhat unsettled now, and we should be cautious of reporting of definitive truths in the media.
The above tweet is in response to a article which reports on the finding past month in PNAS, Early history of Neanderthals and Denisovans. It’s open access, you should read it. I don’t think I’ve reviewed it because I haven’t dug through the supplements. To be frank this is a paper where you pretty much have to read the supplements because they’re introducing a somewhat different model here than is the norm.
I talked to Alan Rogers at SMBE about this paper. Broadly, I think there might be something to it, and it’s because of what David says above. It is simply hard to imagine that Neanderthals could be extremely successful with such low genetic diversity as we see, and spread so thin. Now, the Quanta Magazine tries to emphasize that the effective population is not the true census population, but I wish it would have explained it more clearly. Basically, the size that is relevant for breeding is obviously not going to the same as a head count. And, because effective populations are highly sensitive to bottlenecks you can get really small numbers even when the extant population at any given time may be large.
The PNAS paper makes some novel inferences, and I’ll set that to the side until I read the supplements. But I don’t think it’s crazy that population structure within Neanderthals could be leading to lower total genetic diversity.