I haven’t talked about Warriors of the Cloisters: The Central Asian Origins of Science in the Medieval World much since I read it, and haven’t had time to blog it. I don’t really accept the thesis in the subtitle, but it’s a really good work which illustrates the importance of what some would term “cultural appropriation.” The “recursive argument method” seems common sense today, in that it is distilled through the blood of scientific discourse, but it does in hindsight seem like a very important invention. Anyway, I would recommend the book.
I’ve been doing work with a firm Embark recently. Two of the co-founders are giving talks at South by Southwest on Tuesday morning (also, I guess this is my ‘disclosure’ of interests, which is feasible now that it’s coming out of ‘stealth mode’).
SJWs Will Elect Trump. It strikes me that Donald Trump is facing major structural headwinds in the general election. But aggressive behavior at his rallies to shut them down are going to help him because the majority of people in this country (including me) don’t have much sympathy for the antics of campus activists and Left radicals.
In South Sudan, City of Hope Is Now City of Fear. I remember some friends on Facebook being very happy about South Sudan’s independence. Yes, the South Sudanese should have gained their independence, rather than remaining shackled to Sudan. But it seemed pollyannish to assume that this was not a nation that would have a long and rocky road ahead.
One out of five people in the world today are of the Han ethnicity. Colloquially known as Chinese. Like the West China has a long history, and its development can be traced, more or less, over the past 3,000 years. Because of the history of a system of taxation coordinated from the center we also know about aspects of its demographic expansion as a social, cultural, and biological entity from the North China plain south toward the edges of Southeast Asia (e.g., between the Tang and Song there was a shift in taxation from the northern provinces to the southern ones because of demographics). The Retreat of the Elephants: An Environmental History of China documents the movement out of the north, and eventually the shift of the center of Chinese civilization at an equipoise between the subtropical rice consuming south threading arable sections around rugged panoramas, and the old north, where a continental temperate climate characterized fields of millet and wheat and an open landscape. These environmentally contingent models of economic and agricultural production have even been used to infer broader social-cultural patterns which characterize Chinese civilization, such a recent paper in Science, Large-Scale Psychological Differences Within China Explained by Rice Versus Wheat Agriculture.
But when you are focused on the genetic origins and distribution of Chinese populations, the answers are a bit different from the cultural history. In History and Geography of Human Genes L. L. Cavalli-Sforza reported that North and South Chinese were genetically very distinct; with the northern populations being closer to northern Northeast Asians and the southern ones closer to Southeast Asians than either were to each other. He was wrong. Genome-wide analyses make it clear that Chinese populations exhibit relatively little intra-ethnic variation, though the southern groups are closer to Southeast Asians, in particular Tai and Vietnamese, and the northern Chinese are similar to Koreans and other Northeast Asians.
To get a sense of this, I plotted some East Asian HGDP groups with 1000 Genome Chinese on Pcaso. You can manipulate and examine the PCs yourself. What you see is that Southern Chinese are very distinct from the HGDP samples from northern China. The individuals from Beijing span the whole range of Han variation, probably because Beijing is a cosmopolitan city. Across PC 1 the South Chinese are clearly positioned between the North Chinese and Tai and Vietnamese. Fromm this can we conclude that the South Chinese emerge from an admixture event between migrants from the north and indigenous peoples? Not necessarily. Or at least there may be more to the story than a PCA can tell us.
I ran TreeMix 10 times, and the graph to the left is pretty representative (I rooted with Cambodia and removed some of the groups you can see in the PC). You can view all the other plots in Dropbox. These graphs do seem to suggest that the South Chinese population has received substantial admixture from an indigenous Southeast Asian population. What I’m curious about though is the relationship of central Chinese ethnic minorities like the She people to the Han majority. On the PC plots the She and Southern Chinese are basically in the same position. But not so in TreeMix, where the long branch out toward the She tip indicates some sort of bottleneck or lower effective population. In addition, the Southern Chinese are near the She, but the gene flow is moving from a Tai or Vietnamese group on TreeMix. Why?
One model which we can’t necessarily reject at this point without further investigation is that like the Hui the ethnic minorities across China resemble nearby Han because of gene flow form the Han. Another model is that the Han absorbed in totality indigenous groups very different from the ones which were, and are, resident in the rugged hinterlands, and are today national minorities. Finally, there is the possibility that the North Chinese themselves are complex mixes due to intrusion of Turkic groups between the Han and Sui-Tang, and later back-migration from Central China as the empire expanded in comparison to barbarian groups.
Finally, the genetic homogeneity of Han and many of their national minorities (the Fst values are invariably small) suggests to me that all underwent agricultural expansion during the Holocene, but there was a second stage where the proto-Han marginalized the other groups to become so numerically preponderant. This explains the recent coalescence of ancestries across many of these populations, and the weak genetic differentiation between the Han and minorities.
The figure above popped up on Twitter to show that even within a socialized medical system, in this case in the United Kingdom and its NHS, ethnic differences in infant mortality remain. But what jumped out at me immediately was the high rate for infants whose mothers were born in Pakistan, as opposed to India and Bangladesh. While the Indians are a relatively middle class community (and a diverse one with that, with a large Punjabi Sikh minority and a secondary migrant populations of East African Indian origin), the Bangladeshis are even poorer than the Pakistanis, in part because they are a predominantly immigrant population (the majority of the Pakistanis in Britain today are not immigrants). In light of other data I’ve seen my immediate thought is that the elevated infant mortality rates among the Pakistani Briton children had to be due to inbreeding because of the more common practice of cousin marriage in this community.
A little searching resulted in finding the original source of figure, Towards an understanding of variations in infant mortality rates between different ethnic groups in England and Wales. In it there is another figure, and it is clear that the elevated infant mortality among Pakistanis is be attributed to congenital defects, which are almost certainly generally of the recessive variety which get exposed due to cousin marriage. Interestingly, and unfortunately, the Pakistani infant mortality rate is also about double the Bangladeshi rate (though the base rate is much higher as these are developing nations). I assume many would superficially attribute this to greater penetration of NGOs and efficacy of development aid in Bangladesh, but it may just be a function of the difference in inbreeding (India’s rate is somewhat higher than Bangladesh’s, but much lower than Pakistan’s).
Concern about the risks to children from first-cousin marriage has been described as the last great taboo.
Former environment minister Phil Woolas was rebuked by Downing Street in 2008 for saying British Pakistanis are fuelling rates of birth defects by marrying their cousins, with the spokesman for then prime minister Gordon Brown saying the issue was not one for ministers to comment on.
Mohammed Saleem Khan, chief executive of the Bradford Council for Mosques, said: ‘It is important to discuss these issues, but I just do not know of any firm evidence backing up Professor Jones’s claims. I think we need more conclusive studies so we can know for certain if there is any genuine risk.
‘Marriages between cousins is certainly common within south Asia, but it is becoming less so in Britain and also in Bradford. Islam allows you to marry anyone you want, so in many ways Islam promotes diversity.’
I suspect that Mohammed Saleem Khan is ignorant, but saying that there’s no evidence that inbreeding leads to elevated disease risk is classic “denialism.” There’s a whole section on inbreeding in Principles of Population Genetics, the canonical textbook in that field (actually, any text on population genetics has to tackle inbreeding since it is a deviation away from HWE random mating). There’s even a classical equation which predicts the proportion of a recessive disease that is likely to accounted for by the number of offspring of first cousin marriage within the population (the rarer the condition, the more likely inbreeding is the culprit, since rare alleles are more likely to be brought together by marriage between relations than non-relations). So we actually know the outcomes of inbreeding scientifically to a first approximation. Whether we choose to do anything in terms of public health or not (there is a ~5 IQ decrease from expectation for the products of first cousin marriage, or about 1/3 of a standard deviation).
At heart the issue is ultimately of collective social responsibility on a national level vs. individual choice & subcultural norms. Even with aggressive screening for deleterious alleles it seems unlikely that all of the fitness drag can viably be accounted for without massive preimplantation genetic diagnosis projects. A small number of first cousin marriages is something that society can easily handle in the developed world, but when inbreeding is ubiquitous, in can become the focus of public health, as it has in the Gulf countries, which combine high rates of consanginuity with extensive free health care. In other words, subcultural norms rather than individual choice are really the major dynamic to be worried about, since all things equal the preference for marrying your cousin is not that strong for individuals (to my knowledge Tindr does not have a “match cousins” option).
Of course it is easy to point fingers when something is not your cultural norm. In the developed West it is normative for educated middle class individuals to delay childbearing, often into one’s 30s (as I did). But, delaying childbearing does have some negative consequences, as we all know anecdotally and statistically. Submitted for your approval, Older fathers’ children have lower evolutionary fitness across four centuries and in four populations:
Higher paternal age at offspring conception increases de novo genetic mutations (Kong et al., 2012). Based on evolutionary genetic theory we predicted that the offspring of older fathers would be less likely to survive and reproduce, i.e. have lower fitness. In a sibling control study, we find clear support for negative paternal age effects on offspring survival, mating and reproductive success across four large populations with an aggregate N > 1.3 million in main analyses. Compared to a sibling born when the father was 10 years younger, individuals had 4-13% fewer surviving children in the four populations. Three populations were pre-industrial (1670-1850) Western populations and showed a pattern of paternal age effects across the offspring’s lifespan. In 20th-century Sweden, we found no negative paternal age effects on child survival or marriage odds. Effects survived tests for competing explanations, including maternal age and parental loss. To the extent that we succeeded in isolating a mutation-driven effect of paternal age, our results can be understood to show that de novo mutations reduce offspring fitness across populations and time. We can use this understanding to predict the effect of increasingly delayed reproduction on offspring genetic load, mortality and fertility.
The above is what I have flippantly referred to as “standard PSMC plot you always see” from the origin paper which debuted the method. Basically PSMC uses the pattern of variation over a good quality whole genome to infer the population history of that individual’s genealogy. All of use unique snowflakes do after all reflect the combined histories of our ancestors.
But in the over 4 years since that paper was published science has moved on. Heng Li, the first author, even has a post up, Alternatives to PSMC, which states that:
PSMC is okay, but now there are better models and implementations at least in theory. MSMC, which has recently been published in Nature Genetics, not only extends PSMC to multiple haplotypes, but also improves PSMC for a diploid genome. It precalculates transition matrices over long runs of homozygosity and becomes fast enough to perform whole-genome inference without binning the input like PSMC. More importantly, for a diploid genome, MSMC implements the PSMC’ model. It is a better approximation to the coalescent-with-recombination model by allowing non-effective recombinations. It is able to give a much better estimate of the recombination rate. I was lazy when I was working on PSMC. I knew PSMC’ is better, but I skipped that because its derivation is more complicated and because PSMC worked well to infer other parameters.
Another important tool is dical by Kelly Harris et al. It also uses better model and has a time complexity linear in the number of states. This is a significant advantage over the PSMC implementation whose time complexity is quadratic in the number of states. Dical runs much faster.
In an ideal world you’d have awesome data and infinite computational power. But these Markovian coalescent models take a long time to run, and Stephan Schiffels has told me that you probably need 20× coverage to get MSMC to work for you. This is just too high a standard for many population genomic projects, though it is probably necessary for really good phasing using sequence data (conditional on genetic diversity, etc.).
Molecular data sampled from extant individuals contains considerable information about their demographic history. In particular, one classical question in population genetics is to reconstruct past population size changes from such data. Relating these changes to various climatic, geological or anthropogenic events allows characterizing the main factors driving genetic diversity and can have major outcomes for conservation. Until recently, mostly very simple histories, including one or two population size changes, could be estimated from genetic data. This has changed with the sequencing of entire genomes in many species, and several methods allow now inferring complex histories consisting of several tens of population size changes. However, analyzing entire genomes, while accounting for recombination, remains a statistical and numerical challenge. These methods, therefore, can only be applied to small samples with a few diploid genomes. We overcome this limitation by using an approximate estimation approach, where observed genomes are summarized using a small number of statistics related to allele frequencies and linkage disequilibrium. In contrast to previous approaches, we show that our method allows us to reconstruct also the most recent part (the last 100 generations) of the population size history. As an illustration, we apply it to large samples of whole-genome sequences in four cattle breeds.
So first, they focus on AFS and LD. Second, because the sample size is increased they catch more recent coalescent events, which is critical to obtain power to detect recent demographic events. If you are interested in population genomics, that is pretty essential. Within the paper they admit MSMC’s precision and utility in very specific ranges of time and instances, but suggest that their method’s ability to use larger population sizes makes it practically more useful
Finally, because they use unphased data it seems that you don’t need very whole quality sequences. The cattle in their empirical data set were 13×, which is a reasonable coverage for whole genome results.
You may know that there is a reproducibility crisis in psychology. Or is there? Wired and Slate both have pieces up reviewing the current debates on whether there is, or isn’t, a crisis. Perhaps the media is biased, but the behavior and explanations who assert there isn’t a crisis seems informative to me. From the Wired piece:
Emotions are running high. Two groups of very smart people are looking at the exact same data and coming to wildly different conclusions. Science hates that. This is how beleaguered Gilbert feels: When I asked if he thought his defensiveness might have colored his interpretation of this data, he hung up on me.
And now, from Slate:
In his lab, Baumeister told me, the letter e task would have been handled differently. First, he’d train his subjects to pick out all the words containing e, until that became an ingrained habit. Only then would he add the second rule, about ignoring words with e’s and nearby vowels. That version of the task requires much more self-control, he says.
Second, he’d have his subjects do the task with pen and paper, instead of on a computer. It might take more self-control, he suggested, to withhold a gross movement of the arm than to stifle a tap of the finger on a keyboard.
If the replication showed us anything, Baumeister says, it’s that the field has gotten hung up on computer-based investigations. “In the olden days there was a craft to running an experiment. You worked with people, and got them into the right psychological state and then measured the consequences. There’s a wish now to have everything be automated so it can be done quickly and easily online.” These days, he continues, there’s less and less actual behavior in the science of behavior. “It’s just sitting at a computer and doing readings.”
Of course even those who accept and promote a replication crisis often feel their own unreplicated work is an exception to the rule. Basically what this is telling us is that psychologists are subject to the sorts of cognitive biases they themselves study. Some researchers though seem to be facing the problems head-on, Reckoning with the past:
To be fair, this is not social psychology’s problem alone. Many other allied areas in psychology might be similarly fraught and I look forward to these other areas scrutinizing their own work—areas like developmental, clinical, industrial/organizational, consumer behavior, organizational behavior, and so on, need an RPP project or Many Labs of their own. Other areas of science face similar problems too.
During my dark moments, I feel like social psychology needs a redo, a fresh start. Where to begin, though? What am I mostly certain about and where can my skepticism end? I feel like there are legitimate things we have learned, but how do we separate wheat from chaff? Do we need to go back and meticulously replicate everything in the past? Or do we use those bias tests Joe Hilgard is so sick and tired of to point us in the right direction? What should I stop teaching to my undergraduates? I don’t have answers to any of these questions.
This blogpost is not going to end on a sunny note. Our problems are real and they run deep. Okay, I do have some hope: I legitimately think our problems are solvable. I think the calls for more statistical power, greater transparency surrounding null results, and more confirmatory studies can save us. What is not helping is the lack of acknowledgement about the severity of our problems. What is not helping is a reluctance to dig into our past and ask what needs revisiting.
From what I have hear psychological experiments are relatively cheap. Replication is feasible, even if it’s not glamorous. In contrast in biomedicine replication is more expensive. There might therefore be bigger problems lurking out there….
“The root of this paradigm comes from the era of Victorian Imperialism in which manly vigor and scientific discovery provided the dominant way of both understanding and dominating foreign spaces,” Rushing said. “This results in a total lack of consideration of alternative ways of understanding glacial ice, which is especially troubling in the current age of rapid melt.”
…
“We do a lot of modeling and study satellite images, but what if we look at literature, at art, at drawings and recordings of glaciers?” Carey said. “We need to be looking at the cultural lenses on how people describe and talk about their landscape.”
I know that 1984 is a commentary on Stalin’s purges. And it also prefigures what happened in China later on. But it’s general commentary on human psychology was prescient. I was recently talking to a friend who is a pretty conventional liberal American (Sander’s supporter, but OK with supporting Clinton in the general). We were talking about Donald Trump’s appeal to many people. On the specific issue about banning Muslims I think he has tapped into a broad vein of American opinion which observes rightly that Muslim majority nations are very illiberal. In response to this, my friend looked behind her (we were in my living room, and I don’t have a roommate), and whispered under her breath “Well, Muhammed was a pretty bad guy, so what do you expect?” My point is that it’s as if there are Telescreens monitoring people, even though they aren’t.
Many people have “taboo” thoughts all the time, but are aware that we live in a social environment where we can’t express them (to be fair, this is a pretty universal human norm). There are a laundry list of things that are “not OK” for American liberals to believe, but they believe them often anyway. Similarly, there was a laundry list of positions which were supposed to be held by all conservatives by the conservative elites (e.g., extreme pro-Israel support without any qualification or moderation). Donald Trump has violated these norms, and lived to tell the tale. I wouldn’t be surprised if in the near future a Bernie Sanders type figure emerges who is able to overturn the power of the Democratic establishment. Social norms are strong and hard to change…until you change them. Things can happen fast.
The most state-of-the-art evolutionary genetics suggests that depigmentation in Europeans is a very recent feature of human biogeographic variation (though, the same techniques tell us that Europeans as we understand them as a genetic cluster are also a very recent feature of human biogeographic variation!). A new paper in Nature Communications, A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features, suggests that the same is true of straight hair. That is, straight hair is a derived characteristic which emerged independently several times outside of Africa, with curly hair being the ancestral state. This is an open access paper, so I invite you to peruse the list of SNPs. I checked the markers in my own pedigree and we’re surprisingly monomorphic, as in no variation (also, many of them are not in the SNP-chips, but this is where DNA Land can help by imputing a VCF).
This gets at one of the major findings it strikes me in this paper. Researchers have long known that there are large effect variants that impact hair color; it segregates within families in a quasi-Mendelian manner, albeit not as clearly as eye color. Though it is polygenic, big explainers of the variation like EDAR above lurk in the genome. But, as highlighted in the paper there are more genes under heaven than EDAR, and even within EDAR there may be different SNPs under selection in different regions. This resembles another complex human trait subject to normal variation: skin color.
And yet there are complexities within complexities. True, the selective sweep we see in EDAR in East Asians and Amerindians is relatively recent. But it is almost certainly older than the Holocene, because most of the ancestors (>90%) of Amerindians diverged from the ancestors of modern East Asians ~15,000 years ago. Probably after the LGM ~18,000 years ago, but definitely before the end of the Ice Age. To my knowledge it is the same haplotype acrosss the two regions. A similar sequence of markers which are hallmarks of commmon genealogical descent from an original copy. Second, Mathieson et al. discovered several copies of this same EDAR haplotype in ancient Swedish hunter-gatherers who flourisheed ~8,000 years ago. If you check in modern Europeans this variant is totally absent from all Europeans except Finnns, and the proportion of admixture is very easily explained by relatively recent Siberian ancestry equivalent to the fraction of the derived EDAR haplotype (very much the same can be said of South Asians, as the fraction in the Bangladeshi sample can be explained by the allele frequency of derived EDAR segregating in Southeast Asian populations). What I’m trying to get at is that the emergence of straight hair, and definitely very straight hair, is just not a good fit with the model of recent changes during the Holocene due to shifts in economic modes of production.
Additionally, if you look at the frequency distribution of the derived EDAR variation you note that it often does not fix. To me that is suspiciously reminiscent of some sort of balancing selection going on. Possibly frequency dependent, or, more likely in my opinion, the putative target of selection being dominantly expressed (since the phenotype which selected against then has the frequency of a recessive trait as the q allele > 0.10 against it almost ceases). That means that it is highly unlikely to be hair form itself which being under selection, as opposed to that trait just being swept along for the right because of pleiotropy. EDAR has many effects. No one knows which one, or ones, may have been the target of selection.
Which goes back to my original title. Is straight hair recent in evolutionary history? I’ll make a prediction that it isn’t. Rather, modifications in hair form have emerged repeatedly in hominin evolutionary history, often as a side effect of altering the functions of developmental genes such as EDAR. Many of the salient morphological characteristics of humans which we have co-opted for culturally easy identification in racial groups may have a similar origin. Basically, outputs from the G-matrix have become the basis of whole industries!
Addendum: This post is not an invitation for a particular reader to engage in a “core dump” on sexual selection.
The above faux O’Reilly edition really struck a nerve with me. Obviously software engineering on big projects continues apace. But for many quick & dirty tasks instead of laboriously (or frankly, not so laboriously) assembling together a script often a precise query into Stack Overflow suffices. It reminds me of the universe of David Brin’s Uplift saga.
In that universe there is a Galactic Library, which basically allows sentient species to not need to reinvent the wheel. Humans are unique because of their need to understand the technology that they use. At the time I found the premise interesting…but after reading Joe Henrich’s The Secret Of Our Success, I have come to think that the galactic civilization which Brin depicts is simply an extrapolation of our own technological world. Much of our life is a magic “turn-key” black-box. It is probably self-evident that average humans don’t know how computers or even automobiles work. But Henrich points out that even customs such as manner in which indigenous American peoples detoxify cassava is “encapsulated” from conscious understanding. The “galactic library” is just a metaphor for the crutch that is social cognition.
Cultural encapsulation is not a bug, it’s a feature….