But soon after the study’s high-profile launch on 24 August, autistic people and some ASD researchers expressed concern that it had gone ahead without meaningfully consulting the autism community. Fears about the sharing of genetic data and an alleged failure to properly explain the benefits of the research have been raised by a group called Boycott Spectrum 10K, which is led by autistic people. The group plans to protest outside the ARC premises in Cambridge in October. A separate petition against the study gathered more than 5,000 signatures.
Damian Milton, a researcher in intellectual and developmental disabilities at the University of Kent in Canterbury, UK, is one of those who signed the Boycott Spectrum 10K petition. Milton has been diagnosed with Asperger’s syndrome, a form of ASD. He says it is not clear how the study will improve participants’ well-being, and its “aim seems to be more about collecting DNA samples and data sharing”.
As a result of the backlash, the Spectrum 10K team paused the study on 10 September, apologized for causing distress, and promised a deeper consultation with autistic people and their families.
I assume they’ll restart, but this sort of research will happen somewhere. Autism is a reasonably heritable trait, and many of the people with autism are not “high functioning.”
Inherited genetic factors can influence the severity of COVID-19, but the molecular explanation underpinning a genetic association is often unclear. Intracellular antiviral defenses can inhibit the replication of viruses and reduce disease severity. To better understand the antiviral defenses relevant to COVID-19, we used interferon-stimulated gene (ISG) expression screening to reveal that OAS1, through RNase L, potently inhibits SARS-CoV-2. We show that a common splice-acceptor SNP (Rs10774671) governs whether people express prenylated OAS1 isoforms that are membrane-associated and sense specific regions of SARS-CoV-2 RNAs, or only express cytosolic, nonprenylated OAS1 that does not efficiently detect SARS-CoV-2. Importantly, in hospitalized patients, expression of prenylated OAS1 was associated with protection from severe COVID-19, suggesting this antiviral defense is a major component of a protective antiviral response.
You can find the SNP in you 23andMe raw data (unless you are on the recent chip; I looked for a tag variant but found none). If I’m reading the paper correctly, having the AA genotype increases your risk of severe COVID-19 by an odds of 1.58, all things equal. Not crazy bad, but not great either. The haplotype that carries the G allele in non-Africans seems to come from Neanderthals. In Africa, the ancestral G is the majority, though a minority of individuals are A, and that was passed on to Eurasians.
Here is a plot for the 1000 Genomes populations.
One thing I immediately noticed is that Peruvians have the highest frequency of the A allele in the dataset. Peru has had the highest COVID-19 death rate in the world, and its frequency of A means that a great number of people will be AA (the frequency of A squared).
I looked in Anders Bergstrom’s HGDP whole-genome data and found an interesting pattern in the frequencies of the G alelle:
Three of the four populations with no copies of the protective G allele are indigenous to the Americas. The Maya, who are known to have European admixture, also have very low frequencies of the G allele. Now, it is true that East Asians also have low frequencies of the G allele (the Yakuts also lack it, so perhaps this was ancestral to Siberians?), but they may have other protective variants (or, suffered through an earlier coronavirus epidemic). I think OAS1 may turn out to be one of the loci that could be associated with a higher risk to severe COVD-19 in the New World.
Over on Twitter the eminent population geneticist Molly Przeworski has an important and lauded thread up:
The thread has been widely re-tweeted and quote-tweeted by biologists. This prompted a response by a prominent sociologist, who quoted this from Kathryn Paige Harden’s discussion with Sam Harris: What Harden is alluding to is that heritability within populations is not portable necessarily to between populations. In less sophisticated hands, this is almost used as an incantation. In my review of Harden’s book I said the following:
The biological reason that this extrapolation founders is that human populations differ, and those differences matter. The genetic architecture of intelligence may vary between populations so that predictions from the markers in one population are poorly predictive of variation in another, in line with the general concerns for GWAS portability…Harden points out correctly that population structure exhibits different layers of granularity and continuity. Perhaps a prediction trained on British samples is poorly predictive in Pakistanis. But what about Iranians? If it is poorly predictive in Iranians, what about in Bulgarians? The ability to infer within and between-group heritability is conditional on what you mean by “group,” and that is to some extent a subjective choice guided more by heuristics and instrumental utility than idealistic differences between races.
To be entirely frank I think Harden was on solid ground as a behavior geneticist with psychological training who relied on what population geneticists say publicly all the time about heritability and group differences. The issue is that I do not believe population geneticists were entirely candid about the deep texture of their assumptions, beliefs, and expectations. They wanted to be left alone to do their research, and so relied on a mantra to make people leave them alone, and now that mantra taken so literally is coming back to haunt them. One reason Prezworski’s thread got a lot of attention is privately this is the sort of intuition and sense that’s widely understood, but the issues are subtle, so to outsiders people just leave it off with the quick quips about portability. A friend told me “Molly doing this is like a goddess descending to Earth to speak to mere mortals so it will get a lot of attention.”
The real issue though is that some are now rather perturbed that Harden and behavior geneticists are trying to shield their study of psychological trait heritability from charges of racism by separating the discussion of between and within-group differences by implicitly reifying “population.” Additionally, some geneticists are quite unhappy at the discussion of heritability when it comes to psychological characteristics, so what was a convenient mantra to have people leave them alone is now coming back to haunt them, as it’s opening up avenues for research that they’re not comfortable with, are not interesting in, and believe are possibly dangerous. To be candid if I was Harden I’d be a bit peeved since all she’s doing is repeating what a lot of authorities in the field have been writing and saying for decades.
Nevertheless, if you take a look at the people who re-tweeted and commented on Przeworski’s thread it’s pretty much everyone. The high and mighty, all the way to the low. It was positively re-tweeted by people who are very skeptical of the study of heritability in psychological characteristics in humans (to be charitable). And, it was positively re-tweeted by me. Since so many people liked it and re-tweeted it, I can tell you it was re-tweeted by people who are actually quite open to and interested in the study of psychological characteristics in humans, within and between groups, without divulging confidence (I checked who commented and re-tweeted and liked).
So what’s going on? Prezworski’s group has published several papers in this area (for example, The evolution of group differences in changing environments), and one of the upshots for many is that there’s a lot less certainty about the heritability of many traits and its utility for polygenic risk scores even within groups because of uncorrected confounds. Some people took from this that polygenic risk scores are useless (not necessarily Prezworski and her group!). But when I talked about these findings with Amit Khera, who works on polygenic risk scores relating to cardiovascular disease, he was actually happy about these results. Why? Because he wanted to correct any confounds there were. He viewed these results not as a death knell for polygenic risk scores, but as a way to make them better, more accurate, more precise. He’s a medical doctor who is trying to help people in their health decisions. All he cares about is greater effectiveness. He’s not invested in a particular result, he’s invested in outcomes (OK, at least ideally, but I talked to him and his enthusiasm seemed genuine).
This is almost certainly why people who think polygenic risk scores are useful, and heritability in psychological characteristics are real, and vary widely in human populations, re-tweeted the Prezworski explainer. I myself did for this reason. My own current belief is there’s good evidence for heritability for a lot of behavioral traits, and that polygenic risk scores can be useful, at least on the margin. But we need to get better, and to do that, we need to explore all the subtle distinctions and details in relation to environmental and genetic variation. This is no guarantee. Perhaps the skeptics of polygenic risk scores will be correct (I doubt it, but who knows). But we’re not at the point where we can settle that question right now. More science needs to be done.
Finally, we need to address the magic of genes. People put a lot of stock in genes for various ideological reasons. But the reality is a lot of environmental factors taken for granted by many (e.g., shared home environment) are a lot less clear and well understood than genes are. And yet the skeptical takes don’t rain down on social science inferences and correlations. Mostly because they’re not seen as insidious because they’re environmental. But causes are causes. When there is a great deal of environmental variation in an outcome that doesn’t mean that you can control it, or you even know what it is. A lot of what is in the “E” in the ACE model is mysterious. Many focus on genes because they’re clear and distinct.
A few weeks ago I saw the Y chromosomal haplogroup group distribution in Finland and Sweden. I’d know this disjunction for a while, but it really struck me. I got the numbers above from Eupedia, but you can find them elsewhere. Most of you probably know that Finland has a high fraction of N (they keep changing the nomenclature, so I’ll leave the number off). What’s curious to me is how low the fraction of N in the rest of Scandinavia is. Much of the N we see in Sweden may even be historical era migration of Finns into Sweden when the two nations were in political union (Finland was basically a Swedish colony). Another notable fact is that N is very common among Baltic people, whether Finnic in the language (Estonian) or Indo-European (Latvian and Lithuanian).
Another strange thing is that while the Indo-European lineages of R1 are both at very low frequency in Finland, I1, which is common to the west in Scandinavia, is not. The latest ancient DNA makes it clear that Finnic languages seem to have arrived in the Baltic in the period between 1000 and 500 BC. Before then Corded Ware/Battle Axe people seem to have been dominant in the East Baltic. These people usually carried Y chromosome R1a.
The fact that N is so high in the Baltic nations shows that newcomers arrived, and in the northern region language shifted happened, but in the south, it did not. Meanwhile, further north in Finland almost all the R1a lineages disappeared. Not so with I1. There are all sorts of tortured explanations for this pattern, so I won’t offer one.
Genome-wide the Finns aren’t that different. The largest proportion of their ancestry is still Yamnaya/steppe:
I only post this to illustrate how strong “male-mediated” dynamics can be. The proportion of Siberian ancestry in Finns is rather low, but > 50% of their Y chromosomes are N. I think it is plausible that one of the reasons for the massive reduction in R1 in Finland might be due to climate change and massive population collapse among the Battle Axe people of southern Finland, and the later arrival of Siberians.
Previous genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genetic and questionnaire data from >4,000 British Pakistani individuals, mostly with roots in Azad Kashmir and Punjab. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low effective population sizes (Ne) over the last 50 generations, with some showing a decrease in Ne 15-20 generations ago that has resulted in extensive identity-by-descent sharing and increased homozygosity. Using new theory, we show that the footprint of regions of homozygosity in the two largest subgroups is about twice that expected naively based on the self-reported consanguinity rates and the inferred historical Ne trajectory. These results demonstrate the impact of the cultural practices of endogamy and consanguinity on population structure and genomic diversity in British Pakistanis, and have important implications for medical genetic studies.
None of this is entirely surprising. The media in the UK has written about recessive disease load because of cousin-marriage amongst Pakistani Britons. But there are also things in the preprint that need to be made explicit. The “biraderi” social system is apparently a paternal lineage system in the northwest of the Indian subcontinent which transcends religion (i.e., it is present across the border in Indian Punjab). These are “tribal” or “clan” societies in a way that is not present across much of the Indian subcontinent. For example, my family is from eastern Bengal. Before the partition between India and Pakistan, the far northwest and northeast of the subcontinent had the highest proportions of Muslims. But that did not mean that the two regions were culturally very similar, explaining in part the war in 1971 that resulted in Bangladesh. In Bangladesh, biraderi is not known, and the rates of cousin-marriage are much lower than in Pakistan.
One of the things I immediately noticed in the 1000 Genomes data is that Bangladeshis exhibit a lot less structure and stratification than Indians and the samples from Pakistani Punjab. In many ways, the patterns in the Bangladeshi genomes resemble the type of patterns in non-South Asian genomes: an outbreeding population without much internal structure.
This is not typical in South Asia. Rather, Indian populations tend to have lots of differences between jati/caste groups due to endogamy. To my surprise, Pakistani samples from Lahore were similar, though I attributed some of that to the migration of people from India after 1947 (a similar pattern does not hold for Bangladesh, as only a small number of people migrated from India). Additionally, the runs of homozygosity among Pakistani populations indicated lots of consanguineous marriages. While some South Indians marry cousins, the practice is very rare among North Indian Hindus. Rather, the genetic homogeneity of North Indian Hindus is due to the very high endogamy rates. They do not marry outside of their caste.
The results from the British Pakistanis are roughly in line with the 1000 Genome Pakistanis, but in this case, the researchers had much more granular ethnic data, as well as information on whether individuals were or were not the product of cousin-marriages. In terms of worldwide population affinity, there isn’t a great surprise. The Pathans, who are Iranian speaking, were distinct. The groups with putative Arab ancestry (Syeds), did not seem to have much of that (really, any).
The figure above shows the long-term effective population size patterns. Within the preprint the authors note that these northwest Indian populations began to diverge ~2,000 years ago. That is roughly in line with what Moorjani et al. found for their Indian samples. This tells us that these Pakistani populations were part of the same cultural milieu as Hindu populations in India itself, whose caste endogamy did not seem to crystallized until about that time. This also seems to run against the thesis presented by some Pakistani nationalists that the northwestern populations were very distinctive “non-Hindu” mlecchas. Al-Biruni and earlier observers identified caste as distinctively Indian, and the likelihood of population structure emerging at the same period in the northwest indicates that these people are broadly part of that milieu.
But I want to focus on the more recent period. Using various methods the authors estimate that the effective population sizes of many of these groups dropped 10-20 generations ago. If you assume 10 generations with generation times of 15 years, that’s 150 years. If you assume 20 generations with generation times of 25 years, that gives you 500 years. So let’s take that as our interval. What’s going on here? I think what this may illustrate is the spread of Muslim practices among Islamicized peoples of the northwest.
In my podcast with Henrich he mentions that Islamic societies are peculiar in their ubiquitous practice of “parallel-cousin-marriage.” This means that brothers will marry their children off to each other (a contrast with “cross-cousin-marriage”, common in South India, where brothers and sisters marry their children to each other). The ubiquity of cousin-marriage among Pakistani Muslims is a contrast with genetically and culturally similar populations across the border in India (Indian Punjabis do not marry cousins if Sikh or Hindu).
The fact that this practice occurred among an endogamous group for many generations has consequences. The figure to the right illustrates just how homogeneous some of these groups are against a generic European reference population. And, the fact that even unrelated individuals from the same biraderi group are often quite related. As you can see even people whose parents are unrelated still exhibit excess runs of homozygosity. This is simply a function of pedigrees being narrow, as just in Indian castes these individuals share many not-so-recent-ancestors.
A positive note is that this high level of inbreeding does not apply to Pakistani Britons where both parents were born in the country. That means that biraderi dynamics are maintained due to continuous migration from Pakistan. They’re not perpetuating themselves in the UK.
I started this post with Joe Henrich for a reason: if Henrich is correct that the differences in social structure and relatedness matter for development and economists, then Pakistan and Bangladesh might have different trajectories. Bangladesh is a corrupt and familialist society, just like Pakistan. But, that familialism is not as robust and articulated as is the norm in Pakistan. A transition to a more high-trust and non-familial society is more viable and an easier lift for a non-tribal culture where clans do not extend much beyond first cousins.
There was an offhand comment on Twitter that in the 1970s genetics was barely a field because we’ve made so much progress since then. For obvious reasons, many scientists took umbrage at this. I think it’s wrong and gives the lay public the incorrect impression. But, the reality is that I do think that the way the media and some geneticists have presented the development of the field since the understanding of DNA as the substrate of inheritance in the 1950s and the explosion of genomics in the 2000s has fed into this misimpression.
What’s the truth? Genetics predates genomics by a century or more, and DNA by decades. The basics of the field were elucidated by Gregor Mendel in the 1860s. He originated the “laws of inheritance”, though unfortunately his work was ignored by contemporaries. By “laws of inheritance” I mean that Mendel formulated an analytic model that allowed for discrete inheritance and predictions of the outcome of that inheritance. Naive human understanding of heritability usually relies on an intuitive “blended theory”. It works, after a fashion, but it does not explain many patterns we see around us (e.g., recessive expression).
Charles Darwin famously relied upon blended inheritance (in part) as a basis for the heritability which was essential to his theory of natural selection. But, a major problem with blended inheritance is that blending removes variation as everyone becomes a similar “mix”. This is not an issue with Mendelian inheritance, which is discrete. Alleles do not “mix”, but reconfigure every generation. Variation is retained. The “math” of evolution “works” in this manner.
The utility of Mendelian genetics is why the field exploded in the first two decades of the 20th century. Read A. H. Sturdevant’s 1913 paper on the first genetic map. I think it gives you a flavor of the rate of advancement. Genetics was definitely a field. In The Origins of Theoretical Population Genetics Will Provine outlines how this particular field of genetics developed between 1920 and 1940 to become the core of evolutionary biology. Again, this suggests that even before DNA genetics was an important field.
But, I do think it is fair to say that before 1950 genetics was very much “basic” science, and remained mostly so to the last decades of the 20th century.* DNA was interesting because it opened up the molecular biology revolution, but that had a very long fuse in terms of applications. PCR made it easier to do DNA testing, while new computing technologies made it much easier to generate and analyze data.
No one needs to be told about how genomics revolutionized genetics. But it’s major impact has been transforming an often theoretical field into a massively empirical one. Modern genomics is still underpinned by the logic of Mendelian genetics. Analysis.
* The main exception here I’m going to make is for agricultural genetics, but much of this work doesn’t need “genes” as such.
Long-time readers of this weblog know that about fifteen years ago I dabbled in a little worm-work. At that time I read In the Beginning Was the Worm: Finding the Secrets of Life in a Tiny Hermaphrodite. As Brenner was involved in promoting C. elegans as a model he occupies a lot of this book. I recommend it. It’s short and packed with historical nuggets that make the 21st-century trajectory of science more comprehensible.
I’ve been looking at the data from the recent Munda paper. Standard stuff, admixture, treemix, and f-statistics.The northern Munda samples were collected in Bangladesh. So I thought: I can test the hypothesis that the East Asian ancestry in Bangladesh is to a large part Santhal. After looking at it every which way, I think that in fact, the Munda may not have ever been very populous in much of northeast India. The Santhal is just not a good donor population to Bengalis, at least not when comparing mixes such as Dai + Tamil.
Additionally, the Santhal are really not that well modeled by mixing South Asians with any particular Southeast Asian group, though it works. I think that’s suggestive of the possibility that the Austro-Asiatic group which gave rise to the Munda don’t exist in their current form anywhere in Southeast Asia. Additionally, the Lao samples that are provided in the new paper I think may have Indian ancestry via admixture from Austro-Asiatic Mon or Khmer groups.
Basically, there is so much bidirectional gene flow that I think it’s really hard to get a grip on what’s going on. Additionally, the Burmese and northeast Indian populations (e.g., the Mizos) clearly have a strand of ancestry that derives from relatively recent migrants that came down from the region of eastern Tibet, and perhaps Sichuan or even further north. And this component shows up in Bengalis as well.
On top of this, there is the “Australo-Melanesian” substrate that is present all across Southeast Asia, and probably was present in modern southern China in the early Holocene, which has distant affinities with the “Ancient Ancestral South Indians” (AASI).
At this point, I keep my own counsel. But there may be an interesting story to tell related to how efficient and effective different forms of agriculture were, and how that interplayed with genes and language.
One of the reasons that I don’t post AdmixTools results too much is that the framework requires more statistical “deep thought” than just popping out a PCA or even running some model-based clustering. Read the methods supplements of one of the Reich lab’s papers, and you’ll see what I’m getting at. But a more prosaic reason is that I generally work in the plink format, and format conversion, as well as editing parameter files, is a pain. In general, I don’t do much “exploratory AdmixTools” stuff for a reason.
Martin Petr has made the second excuse a lot less of an excuse. His admixr package gives one an easy interface into AdmixTools. In particular, it allows one not to have to edit parameter files so much. It took me about ~15 minutes to get it downloaded and running. I’m on a Mac and for R use RStudio.
– remember to install wget if you are on a Mac (this will show up if you want to use online datasets)
– You need to make sure to set the path to AdmixTools. In the RStudio console, I just entered:
If you can get AdmixTools installed in the first place, admixr should be very easy.
Like many Americans in the year 2018 I’ve got a whole pedigree plugged into personal genomic services. I’m talking from grandchild to grandparent to great-aunt/uncles. A non-trivial pedigree. So we as a family look closely at these patterns, and we’re not surprised at this point to see really high correlations in some cases compared to what you’d expect (or low).
This means that you can see empirically the variation between relatives of the same nominal degree of separation from a person of interest. For example, each of my children’s’ grandparents contributes 25% of their autosomal genome without any prior information. But I actually know the variation of contribution empirically. For example, my father is enriched in my daughter. My mother is my sons.
The sample principle applies to siblings. Though they should be 50% related on their autosomal genome, it turns out there is variation. I’ve seen some papers large data sets (e.g., 20,000 sibling pairs) which gives a standard deviation of 3.7% in relatedness. But what about other degrees of relation?