Life has been busy. Very busy. The company I’m working for is ramping up on releasing product…as in on the order of weeks, not months. We’ve already released results to a few early beta testers, and are taking reservations for orders (basically you are in the front of the line for notification when the orders are being taken, and I’m 99% sure that the turnaround is going to be faster than later on when the analysis pipeline will be crowded).
The details of the test are pretty straightforward, at least for a reader of this weblog. 224,000 markers on the SNP-chip, a reference panel of thousands, with 150 populations. Yes, we have wolves and coyotes, so we should be able to pick up admixture/introgression. It’s mostly a breed focused product at this point when it comes to ancestry, but we’ve got a large number of village dogs in there too. In terms of functional characteristics, the current focus is on diseases, but we’ll be expanding into traits soon. Really there are many directions we could go with this.
I’m making one public plea to Chris Chang of plink to allow us to be able to use his tool without always having to specify that we’re not using a human data set.
As some of you recall I’m following Game of Thrones now via the internet. The whole idea of a “fork” is somewhat liberating, and, the reality is that I’m not patient enough to wait until Martin completes the series when my daughter is in high school. Speaking of series, I left T Greer of Scholar’s Stage totally dispirited when I informed him that Brandon Sanderson’s The Stormlight Archive is projected to run to 10 books. At his current pace that means Sanderson will complete the series in 21 years assuming that he hits his mark…
Speaking of a long time, this August A Song of Ice & Fire will will be 20 years old! It seems highly implausible to me that George R. R. Martin can wind up all his plot threads in two final books, even if both are 1,500 page monsters. I think we’ll be lucky if he manages to accelerate publication rate again and tie things up in the middle-2020s, when he’s in his late 70s.
A note on comments. If you begin a comment by pointing out some presumed detail of my ethnicity you’re probably getting deleted immediately (I state presumed because 3/4 of the time people are wrong). I don’t prevent assholes from reading what I have to say, but I can make sure assholes don’t leave comments.
One of the most curious things to people is that siblings can vary a great deal in their traits. Sometimes, this is not simply due to environment. Height is a predominantly genetic characteristic in terms of its heritability within the population, but the correlation between siblings is only 0.50 in terms of the trait value. The standard deviation in IQ among siblings is only a bit less than the standard deviation among the general population.
And yet with a quantitative trait where most of the variation is due to genes of small effect this seems peculiar. Though genetics is not “blending,” it seems that inheritance should be closer to blending when it comes to thousands of genes combining to account for the variation on one trait.
Because if there are really 5,000-10,000 loci, the law of averages is going to kick in with a vengeance and similarly regression to the mean should be huge, but while IQ breeds fairly true, IQ variation between fairly closely related individuals is often quite significant. If children inherit randomly from both parents and there are really 5,000-10,000 loci that matter, all of which have very small effects, IQ differences between siblings ought to be really, really slight and rare, because the 5,000-10,000 random trials for the large number of low effect SNPs should average out between full siblings almost completely. But, while full siblings are definitely correlated, there are routinely meaningful magnitude sibling IQ differences. When no one inherited factor has an impact of more than say 0.02%, that shouldn’t happen.
The short answer is that segregation maintains variance. I allude to this in my Slate piece on grandparents. Siblings may be expected to be 50% identical in terms of their genetic state due to parental contribution, but in reality there is variation around this value. I have two siblings who are 41% identical. The standard deviation around 50% is about 3%.
Feldman and Cavalli-Sforza (Theoret. Pop. Biol. (1979), 15, 276-307; (1981), 19, 370-377) have emphasized the role of the segregation variance in models of assortative mating for continuous characters. This note examines its behavior in the context of a general additive model. Using known results concerning the effects of assortative mating and selection on genic variance and correlations among uniting gametes it is shown that the effects of these processes on segregation variance will be small if the effective number of loci is large. Thus models in which the segregation variance remains constant are approximate descriptions of the behavior of characters determined by many loci.
Basically he’s saying that contra what some had modeled, segregation variance is rather constant across generations if the genetic variance is on a highly polygenic trait. Naturally this means polygenic traits exhibit segregation variance.
Rogers shows through some algebra that the segregation variance is a function of the additive genetic variance (the first term after 1/2), and, (1 – f). Therefore if f ~0, the segregation variance is about the same as the additive genetic variance, which to me aligns intuitively with why there is roughly the same standard deviation across groups of siblings and the general population in IQ (though the former is smaller than the latter).
Rogers shows that f is a function of variance at the locus (weighted across all the loci) and fi, which is the correlation between uniting gametes. If the correlation between gametes is very high (in the context of this paper he is focused on phenotypes and assortative mating), then variance will naturally be low, as there is not going to be genetic variation at that locus in that individual. Basically, f measures deviation from Hardy-Weinberg equilibrium. In a random mating population then f is small, so the segregation variance and additive genetic variance will be of similar magnitude.
An alternative view is presented by revisionist scholars, who in the process of revising Islamic history tear apart its basic foundations, at least from a Muslim perspective. Their views can be found in works such as The Hidden Origins of Islam. This school of scholars contends that much of Islam’s early history, basically before 700 A.D., is myth-making that dates from the Abbasid period (>750 AD). An analogy here might be made to Republican Rome. The city emerges prominently in history only in the 3rd century B.C., so much of centuries of Roman history which are referred to by later writers are difficult to corroborate. Presumably many of the figures of these earlier periods, such as Cincinnatus, may have been historical, but more often than not it is likely that details of their life served as moral exemplars for republican political leaders.
Similarly, a basic thrust of the revisionists in relation to Islam is that the idea of Muhammad is far more important than the details of who he really might have been. Even the milieu of Muhammad, a desert merchant, may have been manufactured to give him a particular aura. To reduce one line of scholarship to its essence Islam emerged as a national religion of Christian Arabs who had long been on the margins of the Roman and Persian worlds decades after the time of Muhammad. The construction of the Muhammad myth, and relocating sacred sites to a area far outside Roman control and influence (Mecca & Medina), may have been motivated by considerations of distancing from the Greco-Roman and Persian cultural traditions which they were attempting to absorb and supersede.
One aspect of the mythos of Muhammad is that he grew up as a primal monotheist in a pagan land. The revisionists reject this, and suggest that Muhammad was a Christian, in an Arabia where Christianity and Judaism were the dominant elite religions. No doubt there were other religious sects, and the influence of Zoroastrianism was also likely, but organized paganism as depicted in Mecca may have been a propaganda device. There are precedents for this line of thought, some scholars have argued the same for the late survival of paganism in Sweden (in comparison to Denmark and Norway), suggesting that in fact it was a scurrilous attempt by Western Christians to besmirch Eastern Orthodox believers, who were much more numerous in this region of Scandinavia.
I don’t personally take a strong position here. It seems likely that the revisionists go too far, but I do think that a quasi-state paganism in Arabia in the year 600 A.D. is implausible in light of what we know about other regions of the world on the Roman frontier. The dominant forms of religion in Muhammad’s world probably was Christianity, with roles for Judaism, Zoroastrianism, and various gnostic cults. Pagans still remained, but they were likely a marginal residual, not a threatening elite force as depicted in Islamic tradition.
So, with all this historical context in place, it has come to my attention that there are some peculiarities in the male paternal lineage of descendants of the clade L859+, the dominant haplotype among the Quraysh, Muhammad’s tribe. This lineage, L859+ is a clade within haplogroup J1, which includes the famous Cohen modal haplogroup. On the L859+ tree above you see that the Qurayshi’s are a brother clade to ZS22012. This is traditionally a Jewish lineage. None of this “proves” anything, but it’s interesting and suggestive. If the revisionist are right, and Muhammad grew up in a world dominated by Jews and Christians, it would not be implausible if he himself was of Jewish background in some fashion. Or, that Arab Jews and Arab Christians had a fluid and permeable cultural relationship, and both interacted with the large Jewish community of the Middle East of the period, where some Arab Christians descended from Jews.
Facts are important. But they can be inconvenient. Despite the stream of “think” pieces about “hookup culture” over the past decade there is no evidence that young people today are more promiscuous than in the past. In fact, on the contrary. Young people today are by most measures less promiscuous than past post-WW2 generations, in particular, Baby Boomers. Those articles ultimately are not about the behavior of young people, but the fears, dreams, and nightmares, of a declining Baby Boomer cohort which refuses to go into the sunset quietly. I’m not a Boomer, so I won’t psychoanalyze their motives, but like literature the facts proffered in these essays are a means toward probing deeper issues and questions about the human condition, their generation’s condition and preoccupations, as opposed to being literally true (some of the more recent articles will even admit that the statistical evidence falsifies their premise, but then proceed to suggest there are anecdotal data that lend credence to their premise!).
This applies to other things. Today Quartz put up a piece, If Asian Americans saw white Americans the way white Americans see black Americans, which is not really about Asian Americans at all, but simply uses them as a prop, often in a mendacious manner. First, it gives a nod to the Asian American “Model Minority Myth,” stating that there is “perception that they are high achievers relative to other American ethnic groups.” Get it? There’s a perception. There’s a myth in some scholarly and political quarters that the model minority idea is a myth, founded mostly on assertion (e.g., just stating that it’s a false myth) and slicing and dicing the statistics to emphasize ways in which Asian Americans are disadvantaged in relation to non-Hispanic whites. For example, there is often a focus on the diversity among Asian Americans, ranging from affluent Indian Americans, to groups with more conventional socioeconomic profiles like Filipinos, and finally, those which are somewhat disadvantaged such the Hmong. This is to show that Asian Americans are not a model minority…some of them are struggling. But the logic is not applied to whites! Those who purport to debunk the myth of the model minority would not accede to debunking the idea of white privilege by pointing to the state of Appalachia, and rural white America more generally. Group averages for we, but not thee?
And yet the Quartz piece engages in some interesting jujitsu by actually reporting the statistics of Asian American advantage vis-a-vis white Americans in the service of a broader agenda of putting whites in their place in relation to their critiques of black Americans. In particular it quotes Anil Dash as saying “If Asian Americans talked about white Americans the way whites talk about black folks, they’d bring back the Exclusion Act.”
This to me is really bizarre, and why I term the piece mendacious: Asian Americans do talk about white Americans the way whites talk about black folks. This sort of thing was a clear subtext of Amy Chua’s Battle Hymn of the Tiger Mother. Many (most?) Asian American kids who grew up with immigrant parents were barraged with assertions about the disreputable character of their “American” (white) friends, and how it was important to keep on the straight & narrow. Immigrants from Asia often perceive white Americans to be sexually obsessed, lazy, and prone to a general amorality and fixation on short term hedonic interests. These are polite ways to condense the sort of attitude many Asian immigrants have toward the white American mainstream, which they worry will absorb and corrupt their children. Dash must know this, as he probably had immigrant parents, or was friends with people from immigrant backgrounds. Most white Americans don’t know this, partly because most white Americans don’t have non-white friends. But anyone from an Asian American background would be aware of the stereotypes and perceptions.
The tacit misrepresentation of Asian Americans here, not acknowledging that they do engage in the exact sort of behavior you are hypothetically positing they might engage in and so alienate white people, is not surprising. Asian Americans are often simply bit characters in a drama involving broader social and political streams which dominate the political landscape. For many decades conservatives asserted that Asian Americans were “natural Republicans,” and expressed confusion as to why more were not voting for their party. But this was an empty talking point; over the past generation the Republican party has become the de facto white Christian party, and many Asian Americans are not Christian, and all are not white. Some conservative Christian Asian Americans can identify with Republicans because of their religious ties, but socially conservative Indian Americans, to give one example, naturally have a difficult time identifying with a party which wears evangelical Protestantism on its sleeve as modern Republicans often do. This isn’t rocket science.
On the flip side of this, many liberals erase Asian Americans from the landscape of our culture if it does not serve their framework of white privilege uber alles. When it came out many people pointed me to The New York Times infographic, Money, Race and Success: How Your School District Compares. The only mention of Asians is this: “Reliable estimates were not available for Asian-Americans.” But my wife pointed out to me that within the chart itself you still had Asian Americans tabulated! If you check the bubble plots at the top right, you see schools like Cupertino Union. It’s 73% Asian American. If you read this blog you know that it irritates me that Asian Americans are routinely elided out of these stories. It’s too regular to be due to a lack of data. It’s because it doesn’t fit the narrative of white privilege and domination. So Asian Americans are skipped over to make the picture neat and tidy.
Instead of taking reality as it is, in all its complexity and nuance, people attempt to fit the data into a narrative straightjacket. Complexity is a talking point only when confronted with a hypothesis you disagree with. When the data does not cooperate in a simple fashion with your own model, the data conveniently goes unmentioned. In a putatively multicultural America the dominant narrative on the Left side of the cultural and political spectrum is that of a dichotomy between whites, who have privilege, and non-whites, who are oppressed.
The black American template, unique, and rooted deeply in the soil of this country, is injected into strange and inappropriate contexts when it comes to people whose ancestors are from Latin America and Asia. White liberals and minorities are assumed to naturally form an alliance against the majority white rump; white liberals because of their moral virtue, and minorities because of their interests. The injustices experienced by someone with a name like Raheem Washington, who grew up in the inner city, are rather easy to enumerate. Raheem Washington begins life with some disadvantages. But there is a particular mainstream narrative where someone with a name like Deepa Iyer (Update: When I wrote this post I actually didn’t know who “Deepa Iyer” was, I just thought up a plausible name! Turns out there really is a Deepa Iyer of some prominence!!!), who might have elite educations, affluent parents, and a good secure career, has more common with Raheem Washington than their white colleagues at the university that they might work at. And of course, there is the further aspect that often goes unmentioned that someone with the name Iyer is from the top echelons of South Asia’s caste system, and so benefits from thousands of years of privilege! And yet it is common among Indian Americans for literal Brahmins to style themselves PoC tribunes of the plebs, oppressed by white America.
A genuine multiculturalism would actually acknowledge the real empirical texture of this nation’s changing demographics. And, a genuine multiculturalism rooted in fact, rather than vacuous critical theory, would dig deep into the richness of human history, rather than outlining broad sketches where white privilege reigns supreme from Sumer to America. As it is, often liberal multiculturalism is simply an inversion of white supremacist theory. That’s unfortunate, because there are real political debates and values divergences which we can grapple with and debate as a society, but the water is immediately muddied and when the facts are subordinate to an ideological narrative. No side really wishes to live in the reality we are given, instead of their imagining.
* Many of the things I said above can be generalized to the American Right as well, though the particularities will differ.
** I shouldn’t have to say this, but any racist comment isn’t going to be published. That’s not going to stop some of you, but I thought I’d give you fair warning.
I’ve joked on Twitter that one aim of conservatives should be to defund disciplines whose avowed goals are to espouse a particular ideological viewpoint. Of course “scholars” in those disciplines might dispute the characterization of their chosen fields in such a manner, but the reality is that that’s how they roll. Conservative or moderate viewpoints are considered illegitimate and not worthy of consideration in many of these departments and disciplines. The political spectrum goes from mainstream liberals on the Right to Marxists on the Left. There is no reason that the the “master” should be paying for someone to burn down his house.
Of course these viewpoints are concentrated in the “studies,” which is ironic as many of the scholars in this field don’t study much, as opposed to being activists and ideologues espousing their views at length. Traditional humanities and philosophy are relatively sane compared to Women’s or Ethnic Studies, but I see where Rod Dreher’s reader, a professor in STEM, is coming from when he suggests that “Why Not Close Humanities Departments?”
The findings have proved divisive. Some researchers hope that the work will aid studies of biology, medicine and social policy, but others say that the emphasis on genetics obscures factors that have a much larger impact on individual attainment, such as health, parenting and quality of schooling.
“Policymakers and funders should pull the plug on this sort of work,” said anthropologist Anne Buchanan and genetic anthropologist Kenneth Weiss at Pennsylvania State University in University Park in a statement to Nature. “We gain little that is useful in our understanding of this sort of trait by a massively large genetic approach in normal individuals.”
Buchanan and Weiss are smart. Money is what fuels research, and without that oxygen further studies may not be possible. At least in the short term. Whole genome sequencing will become ubiquitous soon, so understanding these patterns is going to be a matter of joining a few tables somewhere. Imagine a future where Facebook has your genome as part of your profile; they could glean a lot about human behavior genomics simply by combining genetic states with online browsing and engagement patterns.
No time to comment. Yes, the hits with SNPs are cool. But look at all the functional associations and analysis in this paper! Some serious biology in this. The figure from the paper to the left which shows how the genes associated with this SNP hits are expressed in different tissue/types and organs. These are the biggest effect SNPs for years of education in the genome, so it makes sense that they’d be way over-expressed in the brain. It is definitely more convincing to those who might be skeptical a priori than some statistically robust associations (well, it should be more convincing at least).
Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals1. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample1, 2 of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.
The purpose of this two-week workshop is to introduce graduate students and beginning faculty in economics, sociology, psychology, statistics, genetics, and other disciplines to the methods of social-science genomics—the analysis of genomic data in social science research. The program will include interpretation and estimation of different concepts of heritability; the biology of genetic inheritance, gene expression, and epigenetics; design and analysis of genetic-association studies; analysis of gene-gene and gene-environment interactions; estimation and use of polygenic scores; as well as applications of genomic data in the social sciences.
Went to see Captain America: Civil War yesterday at the Alamo Drafthouse. I don’t watch many movies, and I’m not into comic books, but the Marvel films series is one I watch partly for cultural literacy (years ago I got tired of references to The Dark Knight, so I watched them just to get caught up). Also, Alamo Drafthouse really knows how to make a profit on films and distinguish themselves from Netflix in terms of what they offer. Tentpole films are still something you want to go to the movie theater for, but most of the time someone like me is not profitable for the establishment, because I avoid purchasing $4.00 size giant coke’s at the concessions.
As far as the movie, it is hard from where I stand to side with anyone except Captain America, even though if you think about it there are many merits to the position of Tony Stark. I’d probably have been more persuaded by Stark’s consequentialism if it had been motivated by cold calculations, as opposed to an emotionally fraught interaction with someone negatively impacted by the Avengers.
One of the things that kind of annoys me about the Avenger’s films is how much hand to hand combat there is, and lack of acknowledgement at how fragile the human body can be. Recently I got into a bike crash. I’m fine, but I had a lot of bruising which is just healing. I can understand that Captain America can take the hits, but without the suit Tony Stark is a man just like us, while Black Widow at 5’3 putting the smackdown on so many people is kind of ridiculous. But then again, it’s just a movie, and one which had Ant-Man in it, so I’m not taking it too seriously.
Struck by the importance of ancient Near East in The Shape of Ancient Thought. The Axial Age in the middle of the first millennium B.C. resulted in an efflorescence of ideas which persist down to the modern age. The distribution of these of ideas were geographically distributed across the length of the Old World oikoumene, from Greece to China. The Shape of Ancient Thought is focused on two particular loci, Greece and India. But there is a repeated reference to the primacy of motifs and patterns which seem to have their ultimate roots in Egypt and Mesopotamia. The underlying idea here seems to be that Indian and Greece civilization did not emerge de novo, but rather hoisted itself up on the shoulders of the Bronze Age (the fact that both Indian and Greek writing styles derive from Aramaic roots illustrate this).
What Makes Texas Texas. Speaking of Texas, Uber and Lyft Are Leaving Austin After Losing Background Check Vote. Uber and Lyft are trying to muscle out and marginal city government, but is regulating ride-sharing really going to be the issue on which defenders of government prerogative are going to stand? Was this the “progressive” thing to do? Apparently some people were frightened by the lack of background checks on Uber and Lyft drivers, and the possibility of sexual assault. Perhaps anyone who uses a public restroom should also get fingerprinted. There’s the principle that corporations shouldn’t dictate to the polity, but in the case of taxi services and local governments, there’s been decades of cozy collusion.
If you leave a comment which tries to hijack a thread into on of your pet issues, I won’t publish it.
I have long had contempt for the television show Game of Thrones. My contempt was couched in the language of sophistication. Television shows are not as rich in texture and narrative depth as books. In hindsight this seems to have been mostly snobbery. I don’t watch the show as I don’t have HBO, and and I’m not invested in the serial in any deep way, but I am now paying attention to what’s going on since HBO has gone further than George R. R. Martin. And to be honest I am in the camp which believes there is a modest probability that HBO is the only way many of us will get satisfaction in relation to finale of the series.
Also, it is interesting to see clips of flashbacks, as such a young Ned Stark confronts Ser Arthur Dayne. And of course it confirms and foreshadows the final working out of R + L = J. I remember back before the show having discussions on message boards around ~2000, and it as pretty clear to everyone that R + L = J is the most parsimonious model. Not necessarily the right one, but it was always the one you were going to have to bet on. If Martin and the HBO show go in separate directions it would almost be cool. It isn’t as if fantasy and science fiction series haven’t done a bait-and-switch before; Brandon Sanderson did so in Mistborn and that’s how the Dune series ended (the books co-written by Kevin Anderson and Brian Herbert).
This complied version of ADMIXTOOLS runs straight-out-of-the-box on Ubuntu.
It would be nice if those you would speculate on genetics constantly in the comments actually bother to know something about genetics. That way it would be easier to understand what you’re trying to say, instead of having to always decrypt your inchoate ramblings. For the purposes of this blog John Gillespie’s Population Genetics would probably suffice. If you are a little more ambitious there are used copies of Nielsen and Slatkin’s An Introduction to Population Genetics to be had for $40. That’s a lot better than $90 used copies of Principles of Population Genetics, and, it is focused on population genomics.
Selection is one of the major parameters which population geneticists investigate. The easiest way to investigate selection is to have omniscience as to the change in allele frequencies over time. If you are a Drosophila geneticist this is feasible, as you control the reproduction of your model organism in the lab. It is obviously much more difficult in natural populations (one reason that I think ecological genetics went into decline for a while is that it is just very hard). And in long-lived species like humans it is really not feasible to “track” change in allele frequencies in real time, as that would take centuries in the least.
So researchers have to make recourse to inferences from patterns of variation in the genome for species like humans, as it allows us to look back into the deep past. The inheritance pattern of Mendelian genetics is such that transmission of variants across the generations can be modeled, and processes such as rapid population growth or positive selection leaves footprints in the genome long after they’ve done their job. So you can test for selection, or population expansion, or bottlenecks, just by looking at patterns that you’d expect being left in their wake. The PSMC method famously infers demographic history of populations by examining variation within a single whole genome!
In regards to selection, which population geneticists are interested in because it is one of the preconditions for the evolutionary process of adaptation, there are many methods of inference from genetic and genomic data. Tajima’s D is an older method which compares different types of diversity across the genome, and popular for those looking more at inter-specific differences. More recently haplotype based tests look for long segments of variants within the genome. EHH and iHS are probably two of the more popular versions of this. Haplotype based methods really didn’t become popular until the middle 2000s because they require a certain density of data which is really “post-genomic” era. Then you have the methods which look for frequency differences between populations, and compare them to the expectation based on patterns across the whole genome (e.g., PBS). Again, these require genome-wide data. More generally the popularity of site frequency based techniques rely on enough data to actually produce a site frequency.
And just as these methods have needs in terms of the raw data necessary to produce viable statistics, they also exhibit different strengths and weaknesses. The haplotype based methods are good at detecting “hard sweeps,” that is, strong positive selection on a novel mutation emerging against the ancestral background. EHH picks up completed sweeps across populations. In contrast, iHS is better at obtaining traction at incomplete sweeps. Though they have good power to detect events on a human microevolutionary scale, think on the order of 10,000 years, they get fuzzy as one approaches the present. Specifically, when iHS detects older incomplete sweeps it may not tell you if the sweep is still occurring, but it probably is. Additionally, they’re not particularly good at picking up “soft sweeps,” where alleles long segregating within the population are driven up in frequency by selection, or polygenic selection where the impact of the coefficient is distributed across the genome.
Finally, there have been attempts to detect selection using ancient DNA. This a technique which takes a step toward omniscience; rather than inferring from extant variation one can track allele frequency change in “real time” through the record of the DNA. The problem of course is that sample sizes are finite and data quality is often hit and miss.
This is why the preprint Detection of human adaptation during the past 2,000 years, out of Jonathan Pritchard’s lab, has me so excited. Using the whole genome sequence data that has come online over the past few years at large sample sizes they manage to infer selection events over the past 2,000 years among the British! Here’s the abstract:
Detection of recent natural selection is a challenging problem in population genetics, as standard methods generally integrate over long timescales. Here we introduce the Singleton Density Score (SDS), a powerful measure to infer very recent changes in allele frequencies from contemporary genome sequences. When applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past 2,000 years. We see strong signals of selection at lactase and HLA, and in favor of blond hair and blue eyes. Turning to signals of polygenic adaptation we find, remarkably, that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we report suggestive new evidence for polygenic shifts affecting many other complex traits. Our results suggest that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.
The basic logic is not difficult to grasp. Derived alleles (the novel ones which mutated recently) subject to selection tend to alter their local genomic region in predictable ways. In particular, derived alleles subject to positive selection will exhibit shallower genealogies than ancestral neutral variants. Conventional neutral processes result in the birth of mutations and extinction of ancestral variants at regular intervals as modeled by the coalescent process. Some alleles will increase in frequency rapidly, and some more slowly, but it will be a random affair. In the figure above the dark branches are ancestral and red derived. The right panel shows that the coalescence of ancestral and derived are regular and approximately the same for a neutral context (i.e., selection is not targeting the derived variant). In contrast, in the left panel you see that the derived variants have a much shallower coalescence, presumably because of rapid expansion in the population of alleles in the recent past back to a common ancestor.
The SDS needs genome-wide data, as well as large sample sizes. In 2016 you have both, at least for some regions of the world and populations. Comparing SDS to haplotype-based methods they find that the biggest differences in selection in the latter are continental-scale; that is, between Europe and Africa. In contrast, SDS tends to zoom in on intra-European variation, because a ~2,000 year time scale is likely to be localized.
They found lots and lots of selection. The signals around LCT and MHC were not entirely surprising. LCT is almost a positive control for a test of selection. It’s pervasive in Europe, but it was only recently selected, and so there are still ancestral variants around (unlike SLC24A5 which went nearly to fixation in a literal sense). MHC has to do with immune response, and that’s always evolving.
Perhaps more interesting is that the authors detect continuous selection on height and pigmentation in their sample. Why height? I’ve been skeptical of some of the genetic arguments in Greg Clark’s A Farewell to Alms (and have told Greg so), but, recent selection for height does seem to align with his idea that the English were particularly wealth and healthy over the past ~2,000 years. And, it also seems to support the suggestion of elite over-production, as presumably tall men would be more well represented among elites for both nutritional and genetic reasons.
The results for pigmentation are intriguing. Some of the older signals don’t show up (e.g., SLC24A5 and SLC45A2). They’re either fixed, or near fixed, so where are the old haplotypes going to be to compare to? But intriguingly the selection around KITLG and OCA-HERC2 still seems to be occurring! Though the authors associate them with hair and eye color, the extreme tissue specific expression does not mean they have no effect on skin color. In the supplements they note that “In all 14 cases the derived allele is associated with either lighter pigmentation (i.e., lighter hair, skin, or eyes) or increased freckling.” Additionally, they state in the main text that “We speculate that recent selection in favor of blond hair and blue eyes may reflect sexual selection for these phenotypes in the ancestors of the British, as opposed to the longer-term trend toward lighter skin pigmentation in non-Africans, generally thought to have been driven by the need for Vitamin D production.”
At this point reader Sean will probably have a meltdown, and have to go to his natural reflex to core-dump everything on sexual selection he has taken in from Peter Frost for the 1000th time. If he doesn’t control his overwhelming sexually selected urge to repeat himself like a robot I’m going to ban him, as I don’t really want to re-read the same comment again. That being said, I don’t really know how seriously the authors take the idea that pigmentation is sexually selected….
I find Geoffrey Miller’s The Mating Mind interesting, but I’m mildly skeptical of the importance of sexual selection in recent human history (as opposed to earlier periods when broad human behaviors became fixed in our lineage). Often sexual selection crops up as a deus ex machina in these sorts of papers (I also don’t see enough variation in reproductive skew to make sexual selection plausible). The reason is simple. Geneticists are good at detecting selection occurring, but far less clear how and why selection is occurring. In this way LCT is an exception.
With all that said, this is an incredible paper. Because of the large genomic data sets in the United Kingdom the preprint focused on the British. But this is the sort of analysis going to expand to all populations in the near future. Genomics will be ubiquitous, as will the tools to make inferences about population history and dynamics.
In my last post I drilled down on just a few of the results in the paper The genetic history of Ice Age Europe (ungated). There are many results which I didn’t really explore, in particular, the finding that there seems to be a gradual decline in Neanderthal ancestry within European populations over time. That’s for a follow up.
In any case, it’s an interesting time to be alive and be interested in these topics.
The epoch we are situated is between an age of ignorance, and one in which we will be overwhelmed by interpretations based on a surfeit of data. The whole genome of the Neanderthal was published in 2010. Today we have many more whole genomes, and probably on the order of 1,000 ancient genomes of varying quality in the pipeline (i.e., some of it is unpublished). Reconstructing the history of humanity from genetic data has transformed from inference from the tips of the phylogenetic tree, to the examining of points deep in the nodes.
This reminds me of an argument that was highlighted in The Monkey’s Voyage between cladists with a background in systematics and paleontologists. The way paleontologists understood evolutionary relationships was to examine the fossil record, and reconstruct trees with putative ancestral forms and their descendants. The cladists asserted that this method relied upon the incomplete and unreliable fossil record, and so was not nearly as powerful as simply looking at extant variation in a more rigorous manner. Though the added rigor of the cladists arguably transformed the field of phylogenetics, as I have suggested before the extremism of the cladists in dismissing whole domains of knowledge and alternative methods has not swept the field.
Personally, I think that is a good thing. But, some of the warnings of the cladists probably need to be considered when taking into account the new results from paleogenetics. The reality is that in many ways there is little difference in terms of the raw data which paleogenetics and paleontology based on fossils provides. For various technical reasons phylogenetic inference from whole genomes of DNA sequence can be much more powerful than analyzing, for example, the teeth of an ancient hominin. Those teeth would give you phylogenetic and functional information. But reconstruction is more robust when you have tens of thousands (or perhaps millions) of variations, which is what DNA gives you. Second, those markers can tell you a great deal more about a variety of functions than simply teeth (I am not denigrating teeth here, as they are very informative!).
One thing which is more and more clear as more data comes in this that the genetic architecture of pigmentation in modern Europeans is a product of the Holocene, and perhaps even the last 4,000 years. A more sensational way to state this is that the Nordic phenotype may not have been present in appreciable amounts in any population when the pyramids of Giza were being constructed! Of course, there is a major caveat here in that we know that light skin emerges with different genetic architectures, so ancient Pleistocene Europeans may simply have had a different arrangement of functional SNPs. The main caution on this caveat is that pigmentation is a trait that is very well characterized across mammals as a whole, so prediction is much less dodgy than in other traits. If we eventually get enough high quality genomes from Gravettian period Europeans, and they lack derived SNPs across ten major pigmentation genes, then we can be pretty certain that they were in the ancestral state.
Researchers are then literally putting flesh on ancient bones. And yet we still see what we see. Paleogenetics suffers from the same issue as paleontology: skewed sampling. Especially when sample sizes for certain periods and regions are small, our illumination can’t give us a sense of what we don’t know. The genetic history of Ice Age Europe gives us a picture of genetic turnover, but one in which the Goyet sample representative of early Aurignacians turns out be the ancestor of a population which pops back up into prehistory (the Magdalenians) after a long 10,000 year Gravettian interlude. Unless they traveled through a wormhole, it seems clear that this lacunae is a consequence of patchy and biased sampling. As the number of DNA samples increases we’ll get a better sense of how patchy and biased the methods turn out to be, but we’ll never totally abolish this problem. I believe that the same fact explains why many papers see a resurgence of Mesolithic hunter-gatherer ancestry among Neolithic farmers; the former were always there, but they are not being sampled across much of the temporal transect due to spatial patchiness.
As we traverse this period between ignorance and the potential for knowledge, we can start forming conjectures as to the shape of the future. In the comments below Andrew Oh-Wilke suggests that ancient people traveled much further than we might have guessed. I think this is right. In The Monkey’s Voyage the author suggests that in biogeography there has been a move away from vicariance to a somewhat stochastic long distance dispersal model to explain variation. The vicariance model emphasized the emergence of geographical barriers due to geological processes, and the subsequent divergence between two populations due to reduced gene flow. The idea behind stochastic long distance dispersal is basically that a lot of the patterns are due to random freak events, such as a small group of Old World monkeys somehow making it across the Atlantic from Africa, and becoming the ancestors of the whole family of New World monkeys.
The vicariance model has less relevance for human prehistory, because in most cases we’re not talking about geological time scales. There are exceptions, after a fashion. Berengia and Sahul both harbored populations, and after the sea levels rose groups were isolated on opposite sites of the water barrier. But there was always some contact even after this, because humans can traverse water barriers. There is an analog to the vicariance model in historical population genetics, and that is the isolation-by-distance model of human genetic variation and diversity. The major example is in the 2005 paper Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. This model’s logic is sound. One imagines that humans start at a point in space, and expand outward through a demographic wave of advance, as groups disperse into territory inhabited by archaic hominins, or not inhabited by hominins at all (e.g., the Americas and Oceania). Because this results in a serial founder effect, you see a pattern where populations further away from Africa exhibit less genetic diversity. Additionally, genetic structure in humans can be conceived as as dominated by geographic distance and exhibiting clinal variation.
What the paper Toward a new history and geography of human genes informed by ancient DNA did was show that the genetic data used to support the isolation-by-distance model of decay of genetic diversity did not have the power to truly show this was the correct model. What David Reich and Eske Willerslev (and others) have shown with ancient DNA (as well as novel methods in Reich’s case) is that 1) population turnover has been relatively common 2) most (all?) modern populations are best thought of as admixtures between ancient lineages, in many cases pulse admixtures that occurred rapidly. Like the vicariance model the isolation-by-distance model was boring and general. It was easy to model, and didn’t engage in special pleadings to historical contingency. In other words, it’s a perfect model to use as null hypothesis. But that doesn’t mean that it’s correct, or, more accurately, captures most of the dynamics.
An example may suffice. Europe is the most well elucidated of the major regions of the world in terms of prehistory. The “standard model”, utilizing simple and generic population genetic demographic processes produces a nice and simple model to fit the data. ~50,000 years ago humans leave Africa, they settle the Middle East/Central Asia. ~40,000 years ago they arrive in Europe. ~10,000 years ago farmers arrive from the Middle East, and expand into Europe from the southeast, with their genetic signal diluting over time to the northwest.
Here is the model, sketchily, informed by ancient DNA. ~50,000 years ago humans leave Africa, and mix with a number of Neanderthals. ~40,000 years ago, they arrive in Europe. ~35-40,000 years ago the first modern Europeans are replaced by another population. This second population is culturally similar to the first, and contributes some (though small proportionally) ancestry to modern Europeans. It is replaced by another population, which does not contribute much to modern Europeans (Gravettians), though populations related to it do. It is replaced by a population related to the first Europeans with descendants (Magdalenians, who are descended in part from Aurignacians, and do not share much drift with Gravettians). Then, the Magdalenians are replaced by Villabruna populations, the very late Paleolithic populations at the tail end of the Ice Age. The Villabruna have mixture from both the Near East, and to a lesser extent East Asia. Or, Villabruna populations were intrusive to the Near East, and possibly East Asia, or there were mediating populations between. It is all somewhat unclear. Then the Villabruna populations, which become Mesolithic hunter-gatherers, are overwhelmed by Near Eastern groups, which have very exotic ancestry unrelated to all other non-Africans (Basal Eurasian). Finally, the Neolithic groups are overwhelmed by populations from the steppe, who are themselves compounds of very distinct elements.
This is a difficult and historically contingent story. It is not neat, tidy, and is a dog of a model. It is not easy to generalize. But, it is probably a model which captures many more of the salient dynamics than the earlier one.
Going forward what generalizations can we take from this? Europe has been well elucidated for historically contingent and biogeographic reasons. But the rest of the world will come into the light of understanding in a similar fashion over the next ten years. One prediction I will make is that inter-group barriers were more powerful earlier in the human past than today, at least in terms of how they were relevant genetically. The emergence of meta-ethnic religions and fictive kinship may have paved the way for gene flow on a massive scale over the past 4,000 years. Additionally, human population density is such that the landscape of habitation is less patchy, and conventional continuous gene flow between adjacent populations is just more feasible. In prehistory human groups thin on the ground may have had organize proactively to exchange mates, perhaps during gatherings which were culturally focused. This might imply that mate exchange was less a function of proximity than cultural affinity.
A pattern of turnovers that we see in Pleistocene Europeans aligns with the idea that socio-cultural boundaries were major fault-lines which were inimical to gene flow. Admixture between two groups in the recent past can occur when one collapses culturally, as occurred in the New World. But it also occurs as a matter of course through proximity, as is the case with the Hui in China. The balance of forces in the hunter-gatherer world may have been toward the former. Patching sampling means we don’t know where the pre-Magdalenian and post-Aurignacian peoples were persisting over the 10,000 years of Gravettian domination, but they were there, biding their time. Any modern understanding of 10,000 years would expect us to lead to massive mixing and gene flow, but that did not seem to occur (some did, but look at the admixture graphs and the Magdalenians are >50% Aurignacian, while the Gravettians are ~0%).
Second, the turnovers probably were partly due to ecological forces. At this stage in history humans were animals whose existence was strongly conditioned on natural vicissitudes. Small numbers of people may easily have gone extinct because of diminished opportunities, and drifted below sustainable levels. Particularly if they weren’t part of a broader network of redundant support, which seems unlikely to have been the case. Agricultural populations still retain a reservoir numerically even after famine. Hunter-gatherers may not have.
Finally, Europe may be a special cases because it is on the frontier of habitation during a phase of glaciation, but it is unlikely to be totally sui generis. The branches of the human phylogenetic tree see to be pruned rather regularly. The genetic history of other parts of the world are likely to exhibit the same pattern of turnover, and relatively recent roots for the demographically dominant group.