Selection in E Asians due to coronavirus epidemics

The above map shows cumulative coronavirus cases. One of the things that I’m still confused by are some geographic patterns. For example, Thailand with 70 million people has had fewer than 4,000 cases and 60 deaths attributed to COVID-19. Bolivia with 11 million people has had 8,900 deaths. Why? There are many theories out there. One thing that is hard to deny: mainland Southeast Asia and Northeast Asia seem to be handling the pandemic “well” (at least after the Wuhan outbreak).

A new preprint on bioRxiv has gotten people even more curious, An ancient coronavirus-like epidemic drove adaptation in East Asians from 25,000 to 5,000 years ago:

The current SARS-CoV-2 pandemic has emphasized the vulnerability of human populations to novel viral pressures, despite the vast array of epidemiological and biomedical tools now available. Notably, modern human genomes contain evolutionary information tracing back tens of thousands of years, which may help identify the viruses that have impacted our ancestors – pointing to which viruses have future pandemic potential. Here, we apply evolutionary analyses to human genomic datasets to recover selection events involving tens of human genes that interact with coronaviruses, including SARS-CoV-2, that started 25,000 years ago. These adaptive events were limited to ancestral East Asian populations, the geographical origin of several modern coronavirus epidemics. An arms race with an ancient corona-like virus may thus have taken place in ancestral East Asian populations. By learning more about our ancient viral foes, our study highlights the promise of evolutionary information to combat the pandemics of the future.

The evidence in the preprint is pretty persuasive. First, I need to communicate something the last author told me: there is no evidence in their results that East Asians have particular robustness or vulnerability to COVID-19. That is due to the fact that these selection sweeps can cut both ways with this particular virus. The GWAS themselves need to be done, and they haven’t been (something like the GWAS done in Europeans).

But, if you eliminate this possibility that makes us ask, why are diverse East Asian societies doing relatively well? Thailand is not Confucian. Vietnam is somewhat, and South Korea is a great deal. But all these nations have been doing well (Confucian South Korea actually has about 10 times more per capita deaths than Thailand).

Second, what was going on 25,000 years ago? One of the things I learned in a book like Fate of Rome is that pandemics are a feature of civilized dense global empires. So it seems unlikely that the ancient proto-Asians were subject to pandemics. But I have read that even endemic infectious diseases may have had issues at hunter-gather population densities. But the results from this preprint indicate a massive sweep for many generations right before the Last Glacial Maximum. Figure 1 in the preprint makes it obvious that this is restricted to East Asians. That being said, the signal in Japanese seems a bit attenuated compared to the Kinh and groups from China, so I wonder if this did not impact to the Jomon (25% or so of Japanese ancestry) but was restricted to somewhere in mainland East Asia?


The pareto principle and stochasticity in COVID-19

Most of you know difference between parameters such as mean and standard deviation. Or, that distributions have variable dispersion or multi-modalities. Standard stuff.

In relation to COVID-19 it was clear early on that “superspreader events” were critical. That in fact, these events were driving the pandemic in some deep way, with there being huge variance in the number of people individuals have spread the disease to if they were infected.

Readers of this weblog will not be intimidated by a word like kurtosis. But it is different for readers of The Atlantic, notwithstanding the fact that they have the pleasure of imbibing the deep insights of America’s foremost public intellectual, Dr. Ibram X. Kendi. But rather than the august Dr. Kendi, I want to point you to Zeynep Tufekci, now at The Atlantic, but originally hired by The New York Times in March of 2015. Her piece, This Overlooked Variable Is the Key to the Pandemic, is probably what you should share with less statistically literate members of your family.

Tufekci’s piece is strewn with gems of fact. For example, ~70% of people infected with COVID-19 may not transmit it to anyone else, even if the mean number of transmissions is closer to 3 individuals. The explanation of what’s going on is that like many social science phenomena, but unlike influenza, the spreading of COVID-19 occurs through a minority of big transmitters. That is, it follows a power-law distribution and adheres to the Pareto principle.

Last spring I read that Japan was focused on super-spreader events, and was somewhat skeptical of this strategy. How could they identify superspreader events? Well, it turns out that these events tend to adhere to some necessary, though not sufficient, conditions. Large crowds, enclosed spaces, and poor ventilation. But, Tufekci also points out that superspreader events occur stochastically. Not all, or even most, instances, where conditions are met, will produce an outbreak. Some will. This is not surprising, but as she admits in the piece it’s a really difficult thing for people to accept and internalize. Sometimes we can’t always ascertain a specific cause. Stuff just happens.

Anyway, pass the piece on to your relatives and friends.


A possible reason for inter-regional differences in COVID-19 prevalence?

There have been striking differences in COVID-19 severity/penetration by region. There are all sorts of reasons posited. This post from Derek Lowe at In The Pipeline, New Data on T Cells and the Coronavirus, suggests a possibility:

And turning to patients who have never been exposed to either SARS or the latest SARS CoV-2, this new work confirms that there are people who nonetheless have T cells that are reactive to protein antigens from the new virus. As in the earlier paper, these cells have a different pattern of reactivity compared to people who have recovered from the current pandemic (which also serves to confirm that they truly have not been infected this time around). Recognition of the nsp7 and nsp13 proteins is prominent, as well as the N protein. And when they looked at that nsp7 response, it turns out that the T cells are recognizing particular protein regions that have low homology to those found in the “common cold” coronaviruses – but do have very high homology to various animal coronaviruses.

Very interesting indeed! That would argue that there has been past zoonotic coronavirus transmission in humans, unknown viruses that apparently did not lead to serious disease, which have provided some people with a level of T-cell based protection to the current pandemic. This could potentially help to resolve another gap in our knowledge, as mentioned in that recent post: when antibody surveys come back saying that (say) 95% of a given population does not appear to have been exposed to the current virus, does that mean that all 95% of them are vulnerable – or not? I’ll reiterate the point of that post here: antibody profiling (while very important) is not the whole story, and we need to know what we’re missing.

It seems that later we will find out that perhaps the Vietnamese benefited from some immunity conferred by a previous asymptomatic coronavirus outbreak? If you’ve been following Spencer Wells (more specifically, here, here, and here on the general hypotheses) on Twitter you know he’s been suggesting this for several months. The pattern seems to extend to neighboring nations too.


Using 23andMe/Ancestry/Family Tree DNA to identify risk allele for respiratory failure with COVID-19

Several people have asked about the risk haplotype in the post below. If you have been genotyped on Ancestry, 23andMe, and Family Tree DNA (unless you are on 23andMe after summer of 2017) there is one SNP in high LD with the causal variant you can look up. It’s rs10490770. The risk allele is C and non-risk allele T. If you download the raw data from any of these services you can find rs10490770 with a search, and look for your genotype (if by some chance it is a reverse strand, the risk allele may actually be G and the non-risk allele A).

What does risk vs. non-risk mean? You can read the original paper, Genomewide Association Study of Severe Covid-19 with Respiratory Failure. They say: “We identified a 3p21.31 gene cluster as a genetic susceptibility locus in patients with Covid-19 with respiratory failure and confirmed a potential involvement of the ABO blood-group system…”

It looks like in their sample CT = 1.75 times greater chance of severe respiratory problems, and CC = 3 times greater chance. The frequency is ~10% or less in Western Europeans, so very few people are CC (~1%). But in South Asians the risk allele is 30-40%, which means that 10-15% of people have the CC genotype!

Here are the results. Focus on rs11385942 at locus 3p21.31 (the top one):


Neanderthal introgression at COVID-19 severity locus at high frequency in South Asians

To review, as most of you know about ~2% of the ancestry outside of Africa is attributable to ancestry form Neanderthals. The fraction is a bit higher in East Asia, a bit lower in Europe, and lower still in the Near East. That being said, a disproportionate fraction of the Neanderthal ancestry is found in non-genic regions, which implies there’s been “purification.” Negative selection. The implication being that Neanderthal variation doesn’t “work well” with the genetic background of modern humans, the main lineage of Neo-Africans.

But there are exceptions to this, whether through natural selection (the Neanderthal variant was beneficial) or drift.

Svante Paabo’s group has discovered a new candidate for introgression with a newsy twist, The major genetic risk factor for severe COVID-19 is inherited from Neandertals:

A recent genetic association study (Ellinghaus et al. 2020) identified a gene cluster on chromosome 3 as a risk locus for respiratory failure in SARS-CoV-2. Recent data comprising 3,199 hospitalized COVID-19 patients and controls reproduce this and find that it is the major genetic risk factor for severe SARS-CoV-2 infection and hospitalization (COVID-19 Host Genetics Initiative). Here, we show that the risk is conferred by a genomic segment of ~50 kb that is inherited from Neandertals and occurs at a frequency of ~30% in south Asia and ~8% in Europe.

First, they need to change the title. From “The” to “a”. The study they’re piggybacking off of, Genomewide Association Study of Severe Covid-19 with Respiratory Failure, found in several thousand Spanish and Italian individuals that a particular SNP that Paabo’s group found to be embedded in a haplotype of Neanderthal origin was a risk variant. There are other genetic risk locations, probably in Africans, and non-Europeans.

Within European populations, the odds ratio of severe respiratory failure all things equal is ~1.75 if you are a heterozygote for the risk allele, and ~3.0 if you are a homozygote. This is not anything, though presumably hypertension and all the other covariations are still a massive deal.

The authors highlight the South Asian angle. But they focused on the 1000 Genomes. One of the SNPs in perfect LD with the causal variant. (1.0) is in major SNP-arrays. I computed the frequency in a range of modern and ancient (page down for the ancient) populations. You can see the pattern below.

  1. It is a huge deal in South Asians
  2. It is found in high frequency in Papuans too…
  3. Non-trivial fractions in Near Eastern groups as well as Europeans
  4. Found in some ancient samples

My assumption is that there is some selective benefit from this. Notice how in Northeast Asians it’s almost totally expunged. Perhaps ancient coronavirus sweeps? I don’t know.

(rs10490770 by the way)

Update: In SE Asia a lot of the frequency seems likely attributable to South Asia or Negrito ancestry. Borneo Austronesians have low frequencies of the risk allele. Burmese at 0.075% can be just due to South Asian. Looks like there is some correlation between ASI/AASI ancestry and the risk allele.

Read More


Late spring in the age of coronavirus

I haven’t posted on COVID-19 in a while. What’s there to say? The last month or so has been a great muddle. We soldier on, without purpose or direction. At least here in the United States of America. In regards to the pandemic, we’re in, all I can say is that I feel a sense of listless ennui. But perhaps I should say something, just for historical purposes of tracking where we’re at for this weblog?

On March 23th, T. A. Frank mentioned me in Vanity Fair as being a COVID-hawk. You can search this weblog and note I was relatively sanguine at the end of January, but we began to stockpile in early February. By the middle of February, I was alarmed. On February 19th news broke that Covid-19 was spreading Iran, and to be frank I flipped out.

Between February 20th and March 10th, there was a slow and gradual shift in thinking. But the real switch was flipped between March 10th and March 15th, as broad swaths of the culture moved into a high state of alarmism. It was curious seeing scientists who I followed who were fixated on Richard Dawkins in February joining the alarm about Covid-19. When they’d give a thought many (though not as many privately!) were reassuring.

They shouldn’t have been.

Some considerations and observations:

The COVID-doves: early in the pandemic there were critics who were accusing me of alarmism. This was March, so who knew? I asked for some numbers. One individual said that at most there would be 20,000 deaths. We are around 100,000 now. Over time the initial wave of skeptics faded away because the numbers were too high.

But, the second round of skeptics emerged. The interesting thing here though is that the second wave of skeptics was more focused on the opportunity costs of the lock-down. The key problem I have with this wave of COVID-doves is that I wish they would just admit that 250,000 miserable deaths may be the price we have to pay. Perhaps. We just need to put the numbers on the table and remember that the deaths seem quite unpleasant and protracted.

I am on friendly terms with many COVID-doves. I disagree with them, but I have friends and many who are liberals too, and I disagree with them. In fact, in an ideal world, I would be convinced by their arguments, and become a COVID-dove. I am not convinced by their arguments. Yet.

There is a broader class of COVID-skeptic which is, to be frank, unhinged, conspiratorial, and a promoter of misinformation. This is a serious problem.

The COVID-hysterics: another class of individuals are those who are hysterical about the impact of COVID. They want a two-year-long lockdown. They believe that the governor of Georgia has blood on his hands. They believe that COVID could kill anyone! Any skepticism or cost-vs.-benefit thinking is anathema to the COVID-hysteric.

The data is clear now that COVID-19 is particularly dangerous for older people. But the number of media profiles of young women who die of COVID-19 is quite high. There is, to my mind, a clear attempt by the media to make it seem like everyone is at risk. In fact, for people in their 20s and younger the seasonal flu seems to be more risk. The spate of stories about Kawasaki disease and children is, in my opinion, part of the issue. To convince COVID-skeptics those who wish people to take this pandemic seriously need to not exaggerate, or they’ll lose all credibility.

The IFR: I now believe that the infection fatality rate in the United States is around 0.75%. This is, as the above comment should make clear, not unconditional. For the young, it is quite low. For the aged, it is much higher. But when estimating how many Americans may die of COVID-19, this is the number that I think is reasonable. Perhaps higher. Perhaps lower. But this it the ballpark. If 50% of Americans become infected, that’s 1.2 million or so deaths. The IFR, like the R, is not a fixed parameter. Perhaps the virus will change. Perhaps our therapeutics will get better. But we go to war with the parameters we have, not the ones we want.

The uncertainty: There is still a great deal of uncertainty as we proceed forward. We know some things (e.g., no, children are not at high risk of death), but not enough. I have stopped paying attention to whether the weather impacts COVID-19. I think it does, but more in the range of 25-50% changes in the R, not an order of magnitude. There are lots of small things that are having impacts that we don’t know. And there are likely stochastic factors as well. We look through the mirror darkly.

Perhaps COVID-19 will fade away. Burn itself out. But that’s hope. A guess. We have no idea. We’re still not clear why the outbreak in New York City was so much worse on the West coast of the USA. Why Southeast Asia has been left relatively unscathed.

Pre-COVID-19 times

The quarantine: The major lacunae in the Western response has been quarantine-containment. The lockdown has, on the whole, not taken COVID-19 positive people, and put them in some sort of quarantine. It doesn’t look like it will happen.

That means COVID-19 is endemic. For now.

Where are we? It looks like as we move into fall the number of American deaths will be in the low 100,000s. This is a victory, after a fashion. My family is still self-quarantining. We have no date when we’re not going to keep doing this, at least for the foreseeable future. My children have grandparents that they want to see. What are we supposed to do? But the day will come when we go back out into the world…


Not too many young are dying from COVID-19

When does COVID-19 get more dangerous than the flu? The CDC has some deaths listed for COVID-19. It also has deaths recorded for influenza. These are not perfect records, but, they give us a general comparative sense.

The total count in their data for the column I’ve plotting is about half of or so of the current death total for the USA. With that said, COVID-19 seems to be a really marginal disease in terms of mortality for those 24 years and under. For those 85 years or old COVID-19 is killing order of magnitude more than the flu.

Of course, there is morbidity as well as mortality. COVID-19 seems to have a longer course of progression for the symptomatic, and, there is the worry that it may cause lifetime problems in many people who survive from the severe cases (and even possibly the asymptomatic).

But, the number of people who are under the age of 40 who are dying doesn’t seem that high. And yet when I see headlines and profiles in the media, a huge number of feature focuses seem to be about younger people who die of COVID-19. Why? Obviously, because the deaths of the younger are surprising. But, I also think that part of it is the same rationale for the HIV-AIDS campaign: by pretending as if everyone is vulnerable, you obtain mass social mobilization.

I happen to know lots of people will not look at the raw data to understand what’s happening. But enough will to get annoyed.

50% of the deaths in Europe are in care homes. My family is self-quarantining no because we feel at risk, we’re not. But because there are older people in our family from whom we don’t want to be exiled. Does the media think if we admit and highlight the enormous danger that older people in particular face, we’ll conclude that they’re disposable?


COVID-19 status update, mid-April

Spencer and I recorded another coronavirus episode of The Insight. It should be live in a day or so. Therefore, I thought it was good to take stock and make some comments (my Twitter autodeletes).

– A few weeks ago I had been optimistic and suggested that the USA would have 40,000 deaths. That seems unlikely. I will remain optimistic and suggest 85,000 deaths by August 31st.

– I think most of the country will “open up” between May 15th and June 15th.

– Heterogeneity in trajectory persists. Some of this is through clear policy (e.g., Taiwan). But some of it is through demographics (USA is 40% obese, Japan is 3% obese). And, some of it is probably genetics.

– Many commentators make the correct observation that “no evidence of X” is not good evidence. E.g., “we have no evidence of human-to-human transmission…”

– The term “conspiracy theory” is totally debased. Just like the word racist or squish.

– High levels of uncertainty on everything. For example, many preprints which find confusing associations between weather and COVID-19 somehow transform in the media to titles of the form “COVID-19 won’t disappear in the summer!”


The role of obesity in the COVID-19 crisis

There has been a fair amount of anecdotal and a bit of statistical evidence that obesity is somehow associated with individuals who have worse progression of COVID-19. The data out of China I saw wasn’t significant statistically speaking. The problem? There didn’t seem to be enough obese people in their samples. Then anecdotes and some data came out of Europe implicating obesity as a risk factor. And, doctors started reporting a disproportionate number of obese patients in the ICU.

Now we have really good evidence, Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City:

We conducted a cross-sectional analysis of all patients with laboratory-confirmed Covid-19 treated at a single academic health system in New York City between March 1, 2020 and April 2, 2020, with follow up through April 7, 2020. Primary outcomes were hospitalization and critical illness (intensive care, mechanical ventilation, hospice and/or death). We conducted multivariable logistic regression to identify risk factors for adverse outcomes, and maximum information gain decision tree classifications to identify key splitters….Strongest hospitalization risks were age ≥75 years (OR 66.8, 95% CI, 44.7-102.6), age 65-74 (OR 10.9, 95% CI, 8.35-14.34), BMI>40 (OR 6.2, 95% CI, 4.2-9.3), and heart failure (OR 4.3 95% CI, 1.9-11.2)…In the decision tree for admission, the most important features were age >65 and obesity; for critical illness, the most important was SpO2<88, followed by procalcitonin >0.5, troponin <0.1 (protective), age >64 and CRP>200. Conclusions: Age and comorbidities are powerful predictors of hospitalization; however, admission oxygen impairment and markers of inflammation are most strongly associated with critical illness.

click to enlarge

I’ve reformated table 3 of the regression below. It’s important to note here that the whole population is infected. The table is assessing the risk out of the infected sample that someone is going to go critical (which means a host of things, but entails hospitalization). Remember that a lot of the comorbidities associated with obesity are in the table. That means the risk of obesity is viewed as an independent variable. One can make some mechanistic arguments about the inflammatory effects of lipids, etc. That’s neither here nor there.

When assessing the risk of various nations is that 3% of Japanese are obese, while 40% of Americans are obese.
Read More


COVID-19, another panic?

Michael Fumento became prominent with his provocative book, The myth of heterosexual AIDS. On the whole I think Fumento’s point, that HIV-AIDS was not a major issue outside of “at-risk” groups in the United States, was the correct one.

I grew up as part of a generation that was taught about HIV-AIDS in a very apocalyptic manner. One of my health teachers even suggested that HIV-AIDS might lead to the extinction of the human race. When I saw Fumento make his case on a local public affairs television show, it was clear to me that despite everything I’d been told, he was probably correct. To counter his facts and figures the other guests appealed to anecdotes and vague predictions of the future.

So I noticed today that on March 16th, Fumento published Panic Never Helped Any Pandemic And Won’t Start Now:

COVID-19 is just the latest, albeit the most extreme, in a long series of epidemic hysterias I have covered going back to the “heterosexual AIDS explosion” (“Now No One is Safe from AIDS”) of the 1980s, avian flu, Ebola I and Ebola II, the Zika virus and others. They are known scientifically as “mass psychogenic illness,” and even more specifically as “moral panic” – the same type of hysteria that led to centuries of witch hunts.

Thus I was writing such articles as “Hysteria, Thy Name Is SARS” in 2003 while highly respected journals such as the New Scientist were screaming “SARS Could Eventually Kill Millions.” It ultimately killed only 774, and zero Americans, before simply disappearing in a hot July.

Yes, identified cases are still going up (albeit at a slower rate than before, per Farr’s Law), but that may just be an artifact. Indeed, it’s possible the epidemic is coming close to a worldwide plateau – in real terms, at least. The hint is in the category of “serious and critical cases.” It peaked in late February, with a steady decline to less than half that number. This in and of itself good news, of course. But why?

This time Fumento’s prediction was wrong:

Read More