A possible reason for inter-regional differences in COVID-19 prevalence?

There have been striking differences in COVID-19 severity/penetration by region. There are all sorts of reasons posited. This post from Derek Lowe at In The Pipeline, New Data on T Cells and the Coronavirus, suggests a possibility:

And turning to patients who have never been exposed to either SARS or the latest SARS CoV-2, this new work confirms that there are people who nonetheless have T cells that are reactive to protein antigens from the new virus. As in the earlier paper, these cells have a different pattern of reactivity compared to people who have recovered from the current pandemic (which also serves to confirm that they truly have not been infected this time around). Recognition of the nsp7 and nsp13 proteins is prominent, as well as the N protein. And when they looked at that nsp7 response, it turns out that the T cells are recognizing particular protein regions that have low homology to those found in the “common cold” coronaviruses – but do have very high homology to various animal coronaviruses.

Very interesting indeed! That would argue that there has been past zoonotic coronavirus transmission in humans, unknown viruses that apparently did not lead to serious disease, which have provided some people with a level of T-cell based protection to the current pandemic. This could potentially help to resolve another gap in our knowledge, as mentioned in that recent post: when antibody surveys come back saying that (say) 95% of a given population does not appear to have been exposed to the current virus, does that mean that all 95% of them are vulnerable – or not? I’ll reiterate the point of that post here: antibody profiling (while very important) is not the whole story, and we need to know what we’re missing.

It seems that later we will find out that perhaps the Vietnamese benefited from some immunity conferred by a previous asymptomatic coronavirus outbreak? If you’ve been following Spencer Wells (more specifically, here, here, and here on the general hypotheses) on Twitter you know he’s been suggesting this for several months. The pattern seems to extend to neighboring nations too.


Using 23andMe/Ancestry/Family Tree DNA to identify risk allele for respiratory failure with COVID-19

Several people have asked about the risk haplotype in the post below. If you have been genotyped on Ancestry, 23andMe, and Family Tree DNA (unless you are on 23andMe after summer of 2017) there is one SNP in high LD with the causal variant you can look up. It’s rs10490770. The risk allele is C and non-risk allele T. If you download the raw data from any of these services you can find rs10490770 with a search, and look for your genotype (if by some chance it is a reverse strand, the risk allele may actually be G and the non-risk allele A).

What does risk vs. non-risk mean? You can read the original paper, Genomewide Association Study of Severe Covid-19 with Respiratory Failure. They say: “We identified a 3p21.31 gene cluster as a genetic susceptibility locus in patients with Covid-19 with respiratory failure and confirmed a potential involvement of the ABO blood-group system…”

It looks like in their sample CT = 1.75 times greater chance of severe respiratory problems, and CC = 3 times greater chance. The frequency is ~10% or less in Western Europeans, so very few people are CC (~1%). But in South Asians the risk allele is 30-40%, which means that 10-15% of people have the CC genotype!

Here are the results. Focus on rs11385942 at locus 3p21.31 (the top one):


Neanderthal introgression at COVID-19 severity locus at high frequency in South Asians

To review, as most of you know about ~2% of the ancestry outside of Africa is attributable to ancestry form Neanderthals. The fraction is a bit higher in East Asia, a bit lower in Europe, and lower still in the Near East. That being said, a disproportionate fraction of the Neanderthal ancestry is found in non-genic regions, which implies there’s been “purification.” Negative selection. The implication being that Neanderthal variation doesn’t “work well” with the genetic background of modern humans, the main lineage of Neo-Africans.

But there are exceptions to this, whether through natural selection (the Neanderthal variant was beneficial) or drift.

Svante Paabo’s group has discovered a new candidate for introgression with a newsy twist, The major genetic risk factor for severe COVID-19 is inherited from Neandertals:

A recent genetic association study (Ellinghaus et al. 2020) identified a gene cluster on chromosome 3 as a risk locus for respiratory failure in SARS-CoV-2. Recent data comprising 3,199 hospitalized COVID-19 patients and controls reproduce this and find that it is the major genetic risk factor for severe SARS-CoV-2 infection and hospitalization (COVID-19 Host Genetics Initiative). Here, we show that the risk is conferred by a genomic segment of ~50 kb that is inherited from Neandertals and occurs at a frequency of ~30% in south Asia and ~8% in Europe.

First, they need to change the title. From “The” to “a”. The study they’re piggybacking off of, Genomewide Association Study of Severe Covid-19 with Respiratory Failure, found in several thousand Spanish and Italian individuals that a particular SNP that Paabo’s group found to be embedded in a haplotype of Neanderthal origin was a risk variant. There are other genetic risk locations, probably in Africans, and non-Europeans.

Within European populations, the odds ratio of severe respiratory failure all things equal is ~1.75 if you are a heterozygote for the risk allele, and ~3.0 if you are a homozygote. This is not anything, though presumably hypertension and all the other covariations are still a massive deal.

The authors highlight the South Asian angle. But they focused on the 1000 Genomes. One of the SNPs in perfect LD with the causal variant. (1.0) is in major SNP-arrays. I computed the frequency in a range of modern and ancient (page down for the ancient) populations. You can see the pattern below.

  1. It is a huge deal in South Asians
  2. It is found in high frequency in Papuans too…
  3. Non-trivial fractions in Near Eastern groups as well as Europeans
  4. Found in some ancient samples

My assumption is that there is some selective benefit from this. Notice how in Northeast Asians it’s almost totally expunged. Perhaps ancient coronavirus sweeps? I don’t know.

(rs10490770 by the way)

Update: In SE Asia a lot of the frequency seems likely attributable to South Asia or Negrito ancestry. Borneo Austronesians have low frequencies of the risk allele. Burmese at 0.075% can be just due to South Asian. Looks like there is some correlation between ASI/AASI ancestry and the risk allele.

Read More


Late spring in the age of coronavirus

I haven’t posted on COVID-19 in a while. What’s there to say? The last month or so has been a great muddle. We soldier on, without purpose or direction. At least here in the United States of America. In regards to the pandemic, we’re in, all I can say is that I feel a sense of listless ennui. But perhaps I should say something, just for historical purposes of tracking where we’re at for this weblog?

On March 23th, T. A. Frank mentioned me in Vanity Fair as being a COVID-hawk. You can search this weblog and note I was relatively sanguine at the end of January, but we began to stockpile in early February. By the middle of February, I was alarmed. On February 19th news broke that Covid-19 was spreading Iran, and to be frank I flipped out.

Between February 20th and March 10th, there was a slow and gradual shift in thinking. But the real switch was flipped between March 10th and March 15th, as broad swaths of the culture moved into a high state of alarmism. It was curious seeing scientists who I followed who were fixated on Richard Dawkins in February joining the alarm about Covid-19. When they’d give a thought many (though not as many privately!) were reassuring.

They shouldn’t have been.

Some considerations and observations:

The COVID-doves: early in the pandemic there were critics who were accusing me of alarmism. This was March, so who knew? I asked for some numbers. One individual said that at most there would be 20,000 deaths. We are around 100,000 now. Over time the initial wave of skeptics faded away because the numbers were too high.

But, the second round of skeptics emerged. The interesting thing here though is that the second wave of skeptics was more focused on the opportunity costs of the lock-down. The key problem I have with this wave of COVID-doves is that I wish they would just admit that 250,000 miserable deaths may be the price we have to pay. Perhaps. We just need to put the numbers on the table and remember that the deaths seem quite unpleasant and protracted.

I am on friendly terms with many COVID-doves. I disagree with them, but I have friends and many who are liberals too, and I disagree with them. In fact, in an ideal world, I would be convinced by their arguments, and become a COVID-dove. I am not convinced by their arguments. Yet.

There is a broader class of COVID-skeptic which is, to be frank, unhinged, conspiratorial, and a promoter of misinformation. This is a serious problem.

The COVID-hysterics: another class of individuals are those who are hysterical about the impact of COVID. They want a two-year-long lockdown. They believe that the governor of Georgia has blood on his hands. They believe that COVID could kill anyone! Any skepticism or cost-vs.-benefit thinking is anathema to the COVID-hysteric.

The data is clear now that COVID-19 is particularly dangerous for older people. But the number of media profiles of young women who die of COVID-19 is quite high. There is, to my mind, a clear attempt by the media to make it seem like everyone is at risk. In fact, for people in their 20s and younger the seasonal flu seems to be more risk. The spate of stories about Kawasaki disease and children is, in my opinion, part of the issue. To convince COVID-skeptics those who wish people to take this pandemic seriously need to not exaggerate, or they’ll lose all credibility.

The IFR: I now believe that the infection fatality rate in the United States is around 0.75%. This is, as the above comment should make clear, not unconditional. For the young, it is quite low. For the aged, it is much higher. But when estimating how many Americans may die of COVID-19, this is the number that I think is reasonable. Perhaps higher. Perhaps lower. But this it the ballpark. If 50% of Americans become infected, that’s 1.2 million or so deaths. The IFR, like the R, is not a fixed parameter. Perhaps the virus will change. Perhaps our therapeutics will get better. But we go to war with the parameters we have, not the ones we want.

The uncertainty: There is still a great deal of uncertainty as we proceed forward. We know some things (e.g., no, children are not at high risk of death), but not enough. I have stopped paying attention to whether the weather impacts COVID-19. I think it does, but more in the range of 25-50% changes in the R, not an order of magnitude. There are lots of small things that are having impacts that we don’t know. And there are likely stochastic factors as well. We look through the mirror darkly.

Perhaps COVID-19 will fade away. Burn itself out. But that’s hope. A guess. We have no idea. We’re still not clear why the outbreak in New York City was so much worse on the West coast of the USA. Why Southeast Asia has been left relatively unscathed.

Pre-COVID-19 times

The quarantine: The major lacunae in the Western response has been quarantine-containment. The lockdown has, on the whole, not taken COVID-19 positive people, and put them in some sort of quarantine. It doesn’t look like it will happen.

That means COVID-19 is endemic. For now.

Where are we? It looks like as we move into fall the number of American deaths will be in the low 100,000s. This is a victory, after a fashion. My family is still self-quarantining. We have no date when we’re not going to keep doing this, at least for the foreseeable future. My children have grandparents that they want to see. What are we supposed to do? But the day will come when we go back out into the world…


Not too many young are dying from COVID-19

When does COVID-19 get more dangerous than the flu? The CDC has some deaths listed for COVID-19. It also has deaths recorded for influenza. These are not perfect records, but, they give us a general comparative sense.

The total count in their data for the column I’ve plotting is about half of or so of the current death total for the USA. With that said, COVID-19 seems to be a really marginal disease in terms of mortality for those 24 years and under. For those 85 years or old COVID-19 is killing order of magnitude more than the flu.

Of course, there is morbidity as well as mortality. COVID-19 seems to have a longer course of progression for the symptomatic, and, there is the worry that it may cause lifetime problems in many people who survive from the severe cases (and even possibly the asymptomatic).

But, the number of people who are under the age of 40 who are dying doesn’t seem that high. And yet when I see headlines and profiles in the media, a huge number of feature focuses seem to be about younger people who die of COVID-19. Why? Obviously, because the deaths of the younger are surprising. But, I also think that part of it is the same rationale for the HIV-AIDS campaign: by pretending as if everyone is vulnerable, you obtain mass social mobilization.

I happen to know lots of people will not look at the raw data to understand what’s happening. But enough will to get annoyed.

50% of the deaths in Europe are in care homes. My family is self-quarantining no because we feel at risk, we’re not. But because there are older people in our family from whom we don’t want to be exiled. Does the media think if we admit and highlight the enormous danger that older people in particular face, we’ll conclude that they’re disposable?


COVID-19 status update, mid-April

Spencer and I recorded another coronavirus episode of The Insight. It should be live in a day or so. Therefore, I thought it was good to take stock and make some comments (my Twitter autodeletes).

– A few weeks ago I had been optimistic and suggested that the USA would have 40,000 deaths. That seems unlikely. I will remain optimistic and suggest 85,000 deaths by August 31st.

– I think most of the country will “open up” between May 15th and June 15th.

– Heterogeneity in trajectory persists. Some of this is through clear policy (e.g., Taiwan). But some of it is through demographics (USA is 40% obese, Japan is 3% obese). And, some of it is probably genetics.

– Many commentators make the correct observation that “no evidence of X” is not good evidence. E.g., “we have no evidence of human-to-human transmission…”

– The term “conspiracy theory” is totally debased. Just like the word racist or squish.

– High levels of uncertainty on everything. For example, many preprints which find confusing associations between weather and COVID-19 somehow transform in the media to titles of the form “COVID-19 won’t disappear in the summer!”


The role of obesity in the COVID-19 crisis

There has been a fair amount of anecdotal and a bit of statistical evidence that obesity is somehow associated with individuals who have worse progression of COVID-19. The data out of China I saw wasn’t significant statistically speaking. The problem? There didn’t seem to be enough obese people in their samples. Then anecdotes and some data came out of Europe implicating obesity as a risk factor. And, doctors started reporting a disproportionate number of obese patients in the ICU.

Now we have really good evidence, Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City:

We conducted a cross-sectional analysis of all patients with laboratory-confirmed Covid-19 treated at a single academic health system in New York City between March 1, 2020 and April 2, 2020, with follow up through April 7, 2020. Primary outcomes were hospitalization and critical illness (intensive care, mechanical ventilation, hospice and/or death). We conducted multivariable logistic regression to identify risk factors for adverse outcomes, and maximum information gain decision tree classifications to identify key splitters….Strongest hospitalization risks were age ≥75 years (OR 66.8, 95% CI, 44.7-102.6), age 65-74 (OR 10.9, 95% CI, 8.35-14.34), BMI>40 (OR 6.2, 95% CI, 4.2-9.3), and heart failure (OR 4.3 95% CI, 1.9-11.2)…In the decision tree for admission, the most important features were age >65 and obesity; for critical illness, the most important was SpO2<88, followed by procalcitonin >0.5, troponin <0.1 (protective), age >64 and CRP>200. Conclusions: Age and comorbidities are powerful predictors of hospitalization; however, admission oxygen impairment and markers of inflammation are most strongly associated with critical illness.

click to enlarge

I’ve reformated table 3 of the regression below. It’s important to note here that the whole population is infected. The table is assessing the risk out of the infected sample that someone is going to go critical (which means a host of things, but entails hospitalization). Remember that a lot of the comorbidities associated with obesity are in the table. That means the risk of obesity is viewed as an independent variable. One can make some mechanistic arguments about the inflammatory effects of lipids, etc. That’s neither here nor there.

When assessing the risk of various nations is that 3% of Japanese are obese, while 40% of Americans are obese.
Read More


COVID-19, another panic?

Michael Fumento became prominent with his provocative book, The myth of heterosexual AIDS. On the whole I think Fumento’s point, that HIV-AIDS was not a major issue outside of “at-risk” groups in the United States, was the correct one.

I grew up as part of a generation that was taught about HIV-AIDS in a very apocalyptic manner. One of my health teachers even suggested that HIV-AIDS might lead to the extinction of the human race. When I saw Fumento make his case on a local public affairs television show, it was clear to me that despite everything I’d been told, he was probably correct. To counter his facts and figures the other guests appealed to anecdotes and vague predictions of the future.

So I noticed today that on March 16th, Fumento published Panic Never Helped Any Pandemic And Won’t Start Now:

COVID-19 is just the latest, albeit the most extreme, in a long series of epidemic hysterias I have covered going back to the “heterosexual AIDS explosion” (“Now No One is Safe from AIDS”) of the 1980s, avian flu, Ebola I and Ebola II, the Zika virus and others. They are known scientifically as “mass psychogenic illness,” and even more specifically as “moral panic” – the same type of hysteria that led to centuries of witch hunts.

Thus I was writing such articles as “Hysteria, Thy Name Is SARS” in 2003 while highly respected journals such as the New Scientist were screaming “SARS Could Eventually Kill Millions.” It ultimately killed only 774, and zero Americans, before simply disappearing in a hot July.

Yes, identified cases are still going up (albeit at a slower rate than before, per Farr’s Law), but that may just be an artifact. Indeed, it’s possible the epidemic is coming close to a worldwide plateau – in real terms, at least. The hint is in the category of “serious and critical cases.” It peaked in late February, with a steady decline to less than half that number. This in and of itself good news, of course. But why?

This time Fumento’s prediction was wrong:

Read More


Learning from variation in Northern Italy in response to COVID-19

One of the major issues when discussing pretty much anything is the tendency to aggregate nations into a single unit and then compare to other nations that are not comparable. For example, the United States is a federal republic of 330 million people. New York state is not Washington state. And neither is Texas.

The same applies to Italy, which is a diverse nation of 60 million. The normal way to understanding Italian variation is from north to south. But, during the recent COVID-19 outbreak one aspect that is important to note is that Lombardy and Veneto in the Po river valley have taken very different tracks. Lombardy is about twice as populous as Veneto but has five times as many confirmed cases of Covid-19. And 15 times the death toll (8905 vs. 631 dead as of April 5th).

A Italian-speaking friend, who has been tracking the Italian press notes that the big difference seems to be that Veneto is attempting to implement the test-and-trace philosophy that South Korea rolled out. And, in Veneto, they aggressively test people who are not symptomatic to catch silent spreaders who don’t exhibit Covid-19 (in contrast to Lombardy where they tend to test once symptoms present and not even always then).

Below is a recent interview with professor Andrea Crisanti, quickly translated from Italian, where he outlines his philosophy and the path he sees forward for getting COVID-19 under control.

Read More


Perhaps the Chinese government is not covering up the number of Covid-19 cases?

A big debate on the internet is whether China is covering up the number of cases of Covid-19 in Hubei, and more specifically Wuhan. Right now JHU says that China has 82,000 confirmed cases, as opposed to 300,000 in the USA. Both are underestimates, but there are those who believe that the Chinese death toll is not 3,000, but in the millions! I think a more sober take is that they could be underreporting by an order of magnitude. That being said, many epidemiologists believe that China’s numbers are roughly correct. And certainly, some demographic patterns to be robust and holding up (e.g., the proportion of the aged that die).

But there’s another way to estimate how many people were infected: look at the variation in the genome sequences of SARS-coV-2 itself. The genetic variation patterns in viruses that underwent massive rapid demographic expansion will be different from those that are subject to constant population size.

From what I can see Trevor Bedford and his group at UW have done the best and most thorough estimate of the number infected from the SARS-coV-2 genomes, Phylodynamic estimation of incidence and prevalence of novel coronavirus (nCoV) infections through time.

Here is a part of the abstract and methods:

Here, we use a phylodynamic approach incorporating 53 publicly available novel coronavirus (nCoV) genomes to the estimate underlying incidence and prevalence of the epidemic. This approach uses estimates of the rate of coalescence through time to infer underlying viral population size and then uses assumptions of serial interval and heterogeneity of transmission to provide estimates of incidence and prevalence. We estimate an exponential doubling time of 7.2 (95% CI 5.0-12.9) days. We arrive at a median estimate of the total cumulative number of worldwide infections as of Feb 8, 2020, of 55,800 with a 95% uncertainty interval of 17,500 to 194,400. Importantly, this approach uses genome data from local and international cases and does not rely on case reporting within China.

…. We began by running the Nextstrain nCov pipeline to align sequences and mask spurious SNPs. We took the output file masked.fasta as the starting point for this analysis. We loaded this alignment into BEAST and specified an evolutionary model to estimate:

* strict molecular clock (CTMC rate reference prior)
* exponential growth rate (Laplace prior with scale 100)
* effective population size at time of most recent sampled tip (uniform prior between 0 and 10)

We followed Andrew in using a gamma distributed HKY nucleotide substitution model. MCMC was run for 50M steps, discarding the first 10M as burnin and sampling every 30,000 steps after this to give a dataset of 1335 MCMC samples.

The file ncov.xml contains the entire BEAST model specification. To run it will require filling in sequence data; we are not allowed to reshare this data according to GISAID Terms and Conditions. The Mathematica notebook ncov-phylodynamics.nb contains code to analyze resulting BEAST output in ncov.log and plot figures.

It’s been many years since I used BEAST but it’s a complicated piece of software and has a lot of options and parameters. I’m very curious about how robust the estimate is when considering sentences such as “assume that variance of secondary cases is at most like SARS with superspreading dynamics with k=0.15.” Bedford and his colleagues know 1,000 times more about this than I do, but I am really curious about other groups looking at the data and running their models.

If all of the results are in the range of the order of magnitude of above, I think some of us really have to update our priors about how much misreporting the Chinese are engaging in…

Update: Lots of sequences here. I may try to brush up on my BEAST skills…