« Sons, daughters and professions | Gene Expression Front Page | Altruistic punishment »
May 23, 2005

The Middle Model

Genomics refutes an exclusively African origin of humans. This is a long and complex paper, but intelligible with close reading. I have cut & pasted the introduction and discussion below. But I want to highlight one point:


...Many, even most, functional loci globally surveyed...MC1R...show deep structure both geographically and temporally, with coalescence times for non-African variation extending to much earlier time periods than the first emergence of modern humans in Africa. This strongly suggests that archaic assimilation affected these loci....

The point about assimilation of the MC1R alleles from archaic populations matches what a GNXP commentor postulated earlier today. Please note that model presented above matches (down to semantics) what was outlined in Dragon Bone Hill, though with far greater clarity and precision.

(via Dienekes)

Introduction

Since the discovery of apparent signals of strong late Pleistocene population expansions (Rogers and Harpending, 1992 and Harpending et al., 1993) in human mitochondrial DNA (mtDNA), a number of studies have sought similar signs in other genetic polymorphisms. Among the data so analyzed have been nuclear sequences, short tandem repeat polymorphisms (STRs), and single nucleotide polymorphisms (SNPs). While mtDNA shows signals of recent expansions in almost every human population, it has by now become clear that the nuclear data do not present an unambiguous picture regarding population expansion associated with the spread of anatomically modern humans. For example, various analyses of STR data using different statistics have given contradictory signals of expansions, their timing, and the sub-populations involved (Di Rienzo et al., 1998, Reich and Goldstein, 1998, Kimmel et al., 1998 and Zhivotovsky et al., 2000). The first detailed evidence from the nuclear genome also showed no evidence at all of expansion (Harris and Hey, 1999).

To explain low interpopulation diversity in humans, it has been suggested that humans passed through a bottleneck (Haigh and Maynard Smith, 1972). It has also been proposed that there was a bottleneck associated with the emergence of modern humans in Africa and their spread throughout the world (Jones and Rouhani, 1986). For example, SNP haplotype block data show a signature of bottlenecks at vastly differing times in the prehistory of African and non-African populations (Reich et al., 2001 and Gabriel et al., 2002), even if their cause as yet remains unclear. These bottlenecks would need to have been of extraordinary severity and/or duration to explain some of the data, e.g., for Europeans, Reich et al. (2001) suggested a pre-expansion bottleneck size of 50 individuals for 20 generations (or any size and duration of the same ratio), while Marth et al. (2003) obtained their best fit with size-duration ratios between 1000 to 2500 individuals for, respectively, 240 to 550 generations. Yet, single locus studies (e.g., Harding et al., 1997, Harding et al., 2000, Zhao et al., 2000, Yu et al., 2001 and Yu et al., 2002) often find at most mild bottlenecks, or none, in non-Africans, resulting in an overall picture that is puzzling. A recent study by Marth et al. (2003) used 500,000 SNPs to conclude that the dominant population history of humans was a Pleistocene population collapse followed by a mild post-Pleistocene recovery. The importance of these varied signals of bottlenecks and expansions is the subject of this paper.

The significance of genetic signatures of late Pleistocene population expansions is that they directly address the contrasting theories of modern human origins that have been the subject of much debate since the 1980s. The recent African origin model (Cann et al., 1987 and Stringer, 1992) proposes that anatomically modern humans arose in Africa around 130,000 years ago as a new species, which subsequently spread across the world, replacing all non-African archaic humans. In contrast, the multiregional evolution model proposes that modern humans emerged across the world from regional archaic human populations that were always linked by gene flow (Wolpoff et al., 1984).

Genetic evidence to date has been interpreted as giving far greater support to the recent African origin (RAO) model than to the multiregional evolution (MRE) one. Signs of population expansions in human mtDNA have seemed to confirm the RAO model [albeit in a modified “weak” form that suggests multiple bottlenecks and then expansions; Harpending et al. (1993)], as the African expansion seems to clearly pre-date the Asian and European ones. These signals have been seen as indicating the rise of modern humans in Africa and their subsequent expansion into the other continents.

It is now routinely assumed that such signals of expansions support the most controversial of the claims of the strict RAO model—that all non-African “archaics” were replaced and all living people are descended exclusively from African modern humans (Manderscheid and Rogers, 1996). For this to be true, the entire extant human genome has to be derived from recent Africans. Therefore, if the signals of expansions in mtDNA, for example, indicate a population history rather than merely the history of a single genetic locus, the same signals should be discernable at all genetic loci.

Others have proposed models intermediate between the strict RAO and MRE models (Smith, 1985, Relethford, 2001 and Templeton, 2002). Relethford called his version “mostly out of Africa” because in it there is actual movement of populations from Africa. These newly arrived Africans mostly replace the local archaics, but there is some degree of admixture. On the other hand, in our model, there is no long distance movement of populations at all; change is driven entirely by local gene exchange among demes and natural selection. This model has the advantage of parsimony and simplicity, and it will be important to disentangle the effects of selection and long range migration from the archaeological and fossil records. For example, if there were long range population movements with local hybridization, then signatures of that hybridization should persist and be discernible in the fossil record and in populations today. In contrast, our model posits that there are hybrids essentially only at the wavefront. Since this front is moving at something more than 3 km per generation, it would take about 30 generations, or 750 years, to travel 100 km. We would then expect to find hybrid-looking fossils only within this small temporal window.

Templeton's (2002) paper was an ambitious attempt to trace ancient gene flow from molecular markers. He examined the geographic distribution of subclades of several markers with an intuitively appealing logic that allowed him to date major movements to and from Africa. There are, however, several problems. His algorithm is regarded with skepticism by population geneticists (Felsenstein, 2003). Moreover, even if his algorithm were to identify real movements between ancestral populations, there is no information about where those populations were at the time. For example, a signature of movement between African and Asian ancestors several hundred thousand years ago might have been a movement between Africa and Asia under MRE, but a movement between adjacent river valleys, for example, under RAO since those ancestral populations would have been in Africa at the time. Templeton's findings provided almost no evidence for distinguishing among models of modern human origins.

A strongly negative value (e.g., ≤ −1.5) of the Tajima D statistic (TD; Tajima, 1989) for a single locus likely indicates a selective sweep at that locus, while many such values obtained at independent loci would suggest a population expansion. Przeworski et al. (2000), analyzing data from 16 independent loci, found nearly evenly distributed positive and negative TD-values, thus offering no support for putative population expansions. Stephens et al. (2001), analyzing data from 313 genes, found that 90% had negative values, but only a fraction of these were statistically significant.

In the RAO model, all loci should have strongly negative TD-values, comparable to that shown in non-African mtDNA [TD = −2.28; Ingman et al. (2000)]. Thus, the nuclear data do not consistently signal expansion, and when they do, the signal is of a mild expansion, perhaps reflecting only post-Pleistocene population growth associated with the spread of agriculture.

Alternative explanations for the puzzling features of the genetic data discussed above may be found in a recently proposed theory of modern human origins. Arguing along the lines of Sewall Wright's (1932) “shifting balance” theory, Eswaran (2002) suggested that the African transition to anatomical modernity may not have been a speciation event, but was rather a “character change” involving alleles at multiple loci that cooperated to confer a co-adapted genetic advantage to modern humans. Given small random movements of hunter-gatherer groups (demic diffusion), and under the condition of a low rate of interbreeding between modern and archaic humans, such an advantageous gene combination could spread as a wave of advance, or a “diffusion wave,” of anatomical modernity (Eswaran, 2002).

One can visualize this process as that of the region of modern humans expanding at a steady rate into the region of archaic humans, the two regions being separated by a moving “wavefront” where the modern and archaic populations overlap. Only at the wavefront would both human types coexist; therefore, all hybridization and all selection favoring the moderns against the archaics—and thus all expansions in the modern population—would occur there.

According to this theory, the progress of the wave could be accompanied by considerable hybridization at the wavefront. Even so, the assimilation of archaic human genes into the modern populations would be low if the advantageous modern gene combination were complex enough that hybrids, with no selective advantage from their incomplete complement of modern genes, rarely became fully modern. Under such circumstances, the wave would essentially be an expansion of the modern humans at the wavefront. Further, as the small wavefront modern population would at any time be principally derived from previous wavefront moderns, the wavefront modern population would become severely bottlenecked over the thousands of generations that the wave took to travel from Africa to the far corners of Asia and Europe.

However, as all new modern populations would be created principally by the small wavefront modern population, the bottleneck would be followed by a continuous “rolling” expansion in the wake of the wave. As the signs of the wavefront bottleneck and the subsequent expansion would be passed on to the emergent modern populations, this theory offers an explanation for the bottleneck-and-expansion signature seen in so much human genetic data. It also explains why the expansions in Asia and Europe could have occurred tens of thousands of years after the African one (Harpending et al., 1993), for the wave would have traveled at about 3 km per generation, given the empirical evidence of the spread of modern humans.

Under conditions of a limited rate of archaic assimilation, only a few polymorphisms would survive at each locus in the bottlenecked wavefront modern populations. So, it is possible that African alleles often spread with the wavefront across the world. Such loci would then show signs of an expansion of a previously small set of African polymorphisms into a worldwide population. However, given a non-zero rate of assimilation from archaic populations, it is also possible that the wavefront moderns would, at some point along their spread, assimilate archaic human alleles (at loci unassociated with the functional advantages of modernity), which would then “surf” the wavefront and spread along with anatomical modernity. Thus, at these loci, the final modern world populations would have African/“modern” alleles, as well as alleles assimilated from archaic populations. The latter loci would show a considerable time depth and a corresponding lack of signs of expansions. This seems the most plausible explanation for the inconsistent signals of expansions obtained from various STR statistics (Eswaran, 2003), as well as the weak and variable signs of expansion in humans nuclear SNPs and the correlation among loci between Tajima's D and nucleotide diversity (Stephens et al., 2001).

Missing signs of expansions at many genetic loci would be correlated with assimilation at those loci from non-African archaic populations. Such assimilation would obviously also be compatible with evidence of great time depth in present-day non-Africans and of ancient and uniquely non-African polymorphisms (Harding et al., 1997, Harding et al., 2000, Zhao et al., 2000, Yu et al., 2001 and Yu et al., 2002). It would also explain why significant geographical structuring (presumably ancient, and with partly archaic roots) is often seen at such loci, but not in others like mtDNA. The theory thus suggests that present day modern humans are not exclusively derived from early modern Africans, but have a significant genetic inheritance from non-African archaics as well.

In this paper, we explore this proposed scenario through simulations of a modified version of the numerical model of Eswaran (2002). We use the model to compute population statistics of the emergent modern populations for two cases simulating (a) the spread of modern humans from a regional source across a one-dimensional world through the replacement of archaic types, and (b) the analogous spread of a modern human type defined by an advantageous combination of some C unlinked genes. We show that while the replacement case reproduces some of the gross features of the genetic data, the model replicates its subtler details only in the assimilation case—thereby arguing that significant assimilation from non-African archaics accompanied the modern human transition. Our model is the simplest implementation of the idea of a coadapted gene complex, a phenotype, and the consequences for the neutral genome of a selective phenotype sweep.

...

Discussion
How pervasive was assimilation?

The question immediately arises as to how widespread was archaic assimilation in the human genome. Even a rough estimate suggests that assimilation was surprisingly high—surprising because the debate until now has been whether there was any assimilation at all. The simulations conducted here have shown that significant TD-values are strongly correlated with low assimilation, and conversely, that loci with high assimilation usually yield non-significant (but often negative) TD-values.

The Tajima D statistic is ideally suited for detecting assimilation from archaic populations. Given the long history of humans living outside of Africa, there would have been significant geographical structure (which increases TD) in the global human population at the time of the wave initiation, ca. 100,000 years ago. Thus, any assimilation from non-African archaic humans would inevitably increase TD, reducing the possibility of a significant signal of expansion. Therefore, TD is less likely to show expansions than other statistics if assimilation occurs. [See Yu et al. (2002) for a possible case in point.]

We can make a stronger assertion based on a comparison of nuclear loci with mtDNA. The deep differences between Neandertal and modern mtDNA, and the extremely low variability and geographical structure in modern mtDNA leaves little doubt that archaic mtDNA was largely, if not completely, replaced. The empirical TD for mtDNA in non-African populations is −2.28 (Ingman et al., 2000). While such a low TD-value is not obtained in the simulations of 10,000 individuals that we described above, it is well within the range of TD-values obtained for simulated world populations of 30,000 or 50,000.

Recovery from the wavefront bottlenecks in these larger simulated populations leads to stronger signals of expansions, i.e., more negative values of TD, even while the effective populations indicated by the mean pairwise differences remain below 10,000 at the simulated present day (as the pairwise differences have not fully recovered their equilibrium values after the wave). These latter simulations are, we believe, closer to the putative modern human diffusion wave, as they deliver highly negative TD-values in the range empirically seen in mtDNA. However, the TD-values for the replacement cases in these larger simulated populations nearly always fall within the 90% significance range (TD ≤ −1.5). This suggests that any loci with empirical TD-values that are less than 90% significant are likely to have been affected by assimilation. For example, the “geography-based” sample of 437 loci presented by Ptak and Prezeworski (2002) shows fewer than 25% of the values are below −1.55, implying, by the above argument, that around 75% of the loci had significant assimilation. These are astoundingly high figures.

Other data, too, suggest pervasive assimilation. Scans of 624 STR loci by Storz et al. (2004) revealed that 13 of these had significantly reduced variability by their very strict criterion, which the authors attributed to selection, but which we think signals “replacement” loci. Even among the 13 loci, reduced variation occurred either in Europe or Asia, but rarely in both—which seems peculiar if selection were involved, but is entirely likely if random fixation of certain alleles carried by the wavefront independently occurred in the separate Asian and European waves. Only one of the 13 apparent sweeps was seen in Africa, where the diffusion wave model predicts some sub-Saharan populations would not have been subjected to the wave. Therefore, we believe the STR pattern better fits the diffusion wave hypothesis than the selectionist one. By looser criteria, approximately 25% of their loci showed this pattern of reduced variability outside of Africa—in either Europe or Asia, but not both. While Storz et al. (2004) proposed that the patterns are due to selection, it is important to note that they studied STRs with no known functional significance.

The immediate objection to the possibility of such pervasive assimilation in the nuclear genome may come from physical anthropologists. If such a degree of assimilation occurred during the modern human transition, they may ask, why did the modern human morphology remain essentially the same across the world, rather than showing physical signs of admixture with archaics from Asia and Europe? One possible answer to this question has already been given by Eswaran (2002), who argued that the modern morphology itself may have given the coadapted modern advantage—possibly due to reduced childbirth mortality—that propagated modernity. Thus, the modern morphology, and the alleles that “coded” for it, could be fixed in all modern populations even while the rest of the human genome carried a considerable number of assimilated archaic human alleles. Other aspects of modern human morphology could also have been selectively advantageous, but it is not immediately apparent what they could have been. Some correlated aspect of energy requirement and usage might be involved.
Assimilation at functional loci

With the exception of the C loci associated with modernity, the simulations presented here consider only neutral loci. Thus, the interpretation of the contrasting signals of expansions may be thought to hold only for such loci. To take an extreme view: is it possible that assimilation from archaics affected only neutral parts of the genome, while the functional parts were entirely derived from early modern Africans?

The evidence weighs heavily against this possibility. Many, even most, functional loci globally surveyed (e.g., β-globin, MC1R, PDHA1, Dys44, Y-chromosome, etc.) show deep structure both geographically and temporally, with coalescence times for non-African variation extending to much earlier time periods than the first emergence of modern humans in Africa. This strongly suggests that archaic assimilation affected these loci. The TD values, too, are usually non-significant and even positive, not even remotely suggesting population expansions. Moreover, in functional loci, an expansion would cause strongly negative TD-values in a random mating population, except in cases where deep phylogenetic structure was preserved by balancing selection before the expansion. Two of us (HCH, ARR) proposed balancing selection as an explanation for the lack of signals of expansions at functional loci (Harpending and Rogers, 2000), but such loci do not show some other signs of such selection (Wall and Przeworski, 2000). On the other hand, assimilation from archaic humans is a sufficient explanation for the missing signs of expansions, especially since loci with local selective advantage are more likely to be assimilated (Eswaran, 2002), and loci under balancing selection are also more likely to be assimilated.
Conclusions

The simulations presented here suggest resolutions for a number of crucial puzzles in the genetic data on modern human origins.

A diffusion wave of a complex genotype can explain why mismatch distributions of high-mutation-rate loci (such as mtDNA) show late Pleistocene expansions, while those of lower-mutation-rate SNPs show a contraction (Marth et al., 2003). It also explains why the more rapidly responsive site frequency spectra of SNPs show a bottleneck-and-expansion history (Marth et al., 2004). These explanations follow directly as consequences of a low assimilation-rate diffusion wave of moderns spreading out of Africa.

The same mechanism also explains why the expansions in Europe and Asia followed so late after the expansions in Africa (Harpending et al., 1993, Reich et al., 2001 and Gabriel et al., 2002), while certain populations in sub-Saharan Africa show no signs of expansions (Excoffier and Schneider, 1999). The model suggests that these populations are directly descended from the first modern populations in the “core region,” which would not be swept by the wave (Eswaran, 2002). But perhaps the most interesting explanations offered by the model concerns why—even among non-Africans—certain other loci do not show the characteristic bottleneck-and-expansion pattern and why, while most show mildly negative Tajima D values, there is so much variation in these values. These empirical findings directly suggest that assimilation from archaic human populations accompanied the modern human transition across the world. The bottleneck at the wavefront, while greatly restricting the genetic diversity in the non-African (and north African) modern populations, randomly allowed—at least at neutral loci—either African polymorphisms to spread worldwide, or else allowed non-African polymorphisms to be assimilated and spread along with the wavefront. In the first case, we see strong signals of expansions, as in mtDNA; in the second case, we see less clear-cut signals of expansions, often accompanied by signs of deep population subdivision, significant numbers of unique non-African polymorphisms, and great time depths in non-African populations. The latter signs have been found in numerous nuclear loci studied in the last few years. We conjecture that as much as 80% of the nuclear genome is significantly affected by assimilation from archaic humans (i.e., 80% of loci may have some archaic admixture, not that the human genome is 80% archaic).

While each locus has its own history, the above reasoning suggests that African-dominated loci would all roughly tell the same story, while the others would each have its own—for assimilation would have varied in time and place in each case. Thus in the late 1990s, after a decade when most geneticists became convinced of the strict replacement recent African origin model, there was confusion when many nuclear loci—each in its own way—contradicted the patterns first seen in mtDNA. Yet, the following of that particular model remained strong, as there was no other theory that could explain the contrasting patterns. Now there is such a theory, and it tells us that while modern humans first emerged in Africa, living human populations carry within them a substantial genetic inheritance that had its origins in non-African archaics.

Posted by razib at 03:26 PM