The growth of human genomics

Citation: Aylwyn Scally

The above figure is from Aylwyn Scally, or as I like to think of him, the Irish Matt Hahn. I’m not going to add any comments as the chart speaks for itself, doesn’t it?

Also, looks like my son is about the 10,000th person in the history of the human race who was whole-genome sequenced. That’s not a shabby record. First prenatal whole-genome sequence of a healthy born individual, and in the first ~0.000125% of the human race alive today to be sequenced.

The Ubiquitous Sequencing Age

Several years ago Yaniv Ehrlich published A Vision for Ubiquitous Sequencing. We’re inching in that direction. In The Atlantic Sarah Zhang has a piece, An Abandoned Baby’s DNA Condemns His Mother, while The New York Times just came out with, Old Rape Kits Finally Got Tested. 64 Attackers Were Convicted:

Still, even with such successes, the problem of untested rape kits persists. Advocates for rape victims estimate that about 250,000 kits remain untested across the country.

Unfortunately, until recently, the ‘forensic genetics’ employed rather primitive 1990s technology. But that’s changing, though both money and expertise need to be brought to bear. Companies such as Gencove and Othram are bringing that expertise to a broader market, with the latter company focusing specifically on the forensic market.

So ubiquitous sequencing is happening. Soon. What does that mean? We need to think about privacy. We need to think about data. We need to reflect on the broader implications of this world beyond specific targeted tasks such as forensic identification.

Laws of engineering are meant to be broken

A reader pointed out a very interesting passage in Richard Dawkins’ The Greatest Show on Earth: The Evidence for Evolution on the future possibilities of genome sequencing. Since the book was published in the middle of 2009, it is quite possible the passage was written in 2008, or even earlier.

Unfortunately for Dawkins’ prognostication track-record, but fortunately for science, he was writing at the worst time to make a prediction:

…the doubling time [data produced for a given fixed input] is a bit more than two years, where the Moore’s Law doubling time is a bit less than two years. DNA technology is intensely dependent on computers, so it’s a good guess that Hodgkin’s Law is at least partly dependent on Moore’s Law. The arrows on the right indicate the genome sizes of various creatures. If you follow the arrow towards the left until it hits the sloping line of Hodgkin’s Law, you can read off an estimate of when it will be possible to sequence a gnome the same size as the creature concerned for only £1,000 (of today’s money). For the genome the size of yeast’s, we need to wait only till about 2020. For a new mammal genome…the estimated date is just this side of 2040

Obsolete plot from The Greatest Show on Earth

The cost for a sequence here is somewhat fuzzy. The first assembly of a genome sequence of an organism is much more difficult than subsequent alignments of later organisms (though more in computation than in the sequencing). But, the upshot is that Dawkins was writing when “Hodgkin’s Law” was collapsing. From 2008 to 2011 Moore’s Law was destroyed by the sequencing revolution pushed forward by Illumina.

Though you can get a $1,000 consumer human sequence today, the reality is that this is for 30× coverage. For lower coverage, which means you aren’t as sure of the validity of any given variant, the price drops rapidly. And for the type of evolutionary questions Dawkins is interested in, the coverage needed is far lower than 30× (you probably want to get a larger number of samples than a single high-quality sample).

On the whole genomics will not be individually transformative…for now

A new piece in The Guardian, ‘Your father’s not your father’: when DNA tests reveal more than you bargained for, is one of the two major genres in writings on personal genomics in the media right now (there are exceptions). First, there is the genre where genetics doesn’t do anything for you. It’s a waste of money! Second, there is the genre where genetics rocks our whole world, and it’s dangerous to one’s own self-identity. And so on. Basically, the two optimum peaks in this field of journalism are between banal and sinister.

In response to this, I stated that for most people personal genomics will probably have an impact somewhere in the middle. To be fair, someone reading the headline of the comment I co-authored in Genome Biology, Consumer genomics will change your life, whether you get tested or not, may wonder as the seeming contradiction.

But it’s not really there. On the aggregate social level genomics is going to have a non-trivial impact on health and lifestyle. This is a large proportion of our GDP. So it’s “kind of a big deal” in that sense. But, for many individuals, the outcomes will be quite modest. For a small minority of individuals, there will be real and important medical consequences. In these cases, the outcomes are a big deal. But for most people, genetic dispositions and risks are diffuse, of modest effect, and often backloaded in one’s life. Even though it will impact most of society in the near future, it’s touch will be gentle.

An analogy here can be made with BMI or body-mass-index. As an individual predictor and statistic, it leaves a lot to be desired. But, for public health scientists and officials aggregate BMI distributions are critical to getting a sense of the landscape.

Finally, this is focusing on genomics where we read the sequence (or get back genotype results). The next stage that might really be game-changing is the write revolution. CRISPR genetic engineering. In the 2020s I assume that CRISPR applications will mostly be in critical health contexts (e.g., “fixing” Mendelian diseases), or in non-human contexts (e.g., agricultural genetics). Like genomics, the ubiquity of genetic engineering will be kind of a big deal economically in the aggregate, but it won’t be a big deal for individuals.

If you are a transhumanist or whatever they call themselves now, one can imagine a scenario where a large portion of the population starts “re-writing” themselves. That would be both a huge aggregate and individual impact. But we’re a long way from that….

Sequence them all and let God sort it out!

Researchers reboot ambitious effort to sequence all vertebrate genomes, but challenges loom:

In a bid to garner more visibility and support, researchers eager to sequence the genomes of all vertebrates today officially launched the Vertebrate Genomes Project (VGP), releasing 15 very high quality genomes of 14 species. But the group remains far short of raising the funds it will need to document the genomes of the estimated 66,000 vertebrates living on Earth.

The project, which has been underway for 3 years, is a revamp and renaming of an effort begun in 2009 called the Genome 10K Project (G10K), which aimed to decipher the genomes of 10,000 vertebrates. G10K produced about 100 genomes, but they were not very detailed, in part because of the cost of sequencing. Now, however, the cost of high-quality sequencing has dropped to less than $15,000 per billion DNA bases…

Funding remains an obstacle. To date, the VGP has raised $2.5 million of the $6 million needed to sequence a representative species from each of the 260 major branches of the vertebrate family tree. To reach the goal of all 66,000 vertebrates will require about $600 million, Jarvis says.

Though a lot of the details are different (sequencing vs. genotyping, vertebrates vs. humans), many of the general issues that David Mittelman and I brought up in our Genome Biology comment, Consumer genomics will change your life, whether you get tested or not, apply. That is, to some extent this is an area of science where technology and economics are just as important as science in driving progress.

I remember back in graduate school that people were talking about sequencing hundreds of vertebrates. But even in the few years since then, the landscape has shifted. I’m so little a biologist that I actually didn’t know there were only ~66,000 vertebrate species!

And yet this brings up a reasonable question from many scientists who came up in an era of more data scarcity: what are the questions we’re trying to answer here?

Science involves people. It’s not an abstraction. Throwing a whole lot of data out there does not mean that someone will be there to analyze it, or, that we’ll get interesting insights. To be frank, the original Human Genom Project project should probably tell us that, as its short-term benefits were clearly oversold.

In relation to how cheap data storage is and the declining price point of sequencing, I think my assertion that a genome, a sequence, is not a depreciating asset still holds. There is the initial cost of sequencing and assembling and the long term cost of storage, but these are small potatoes. The bigger considerations are the salaries of scientific labor and the opportunity costs. Sequencing tens of thousands of genomes may not get us anywhere, but really we’re not going to lose that much.

Ultimately I side with those who believe that the existence of the data itself will change the landscape of possible questions being asked, and therefore generate novel science. But it’s pretty incredible to even be debating this issue in 2018 of sequencing all vertebrates. That’s something to reflect on.

Apes just being apes

A while back I made fun of bonobos and chimpanzees for being kind of losers for looking across at each other on either side of the Congo river for ~1.5 million years the time elapsed since their diversion. I finally ended up reading the paper from last year, Chimpanzee genomic diversity reveals ancient admixture with bonobos, which reported complex population history between these two species. In other words, “they got it on”.

The key was a reasonable sample size of N=40 and high coverage genomes (>20x), to give them the amount of information necessary to have the power to detect admixture. If you aren’t human and have a reasonable size genome, and all mammals do, get to the back of the line. But the Pan‘s turn finally arrived.

The paper primary result is that over past few hundred thousand years there have been reciprocal gene flow events of small, but detectable, magnitude between chimpanzees and bonobos. Naturally, there was some geographic specificity here, in that chimpanzees from far West Africa lack much evidence of this while those from Central Africa have a great deal. The admixture is directly proportional to proximity to b0nobo range.

To obtain the result their initial focus on high-frequency bonobo derived alleles that were at low to moderate frequencies in chimpanzees. There was a notable excess for this class among Central African chimpanzees. And, these alleles seem to have introgressed recently.

I suppose the major takeway is that hominids do it like they do it on the Discovery Channel.

Selection swimming against the genomic tide

One of the major issues that confuses people is that the distribution of a trait or gene is often only weakly correlated with overall phylogeny and the rest of the genome.

To give a strange but classic example, the MHC loci are subject to strong balancing selection. This means that novel alleles do not substitute and replace ancestral alleles. Substitution of this sort results in “lineage sorting,” so that when you look at chimpanzees and humans you can see many polymorphic loci where all humans carry one variant and all chimpanzees the other. In contrast at the MHC loci there is frequency-dependent selection for rare variants, so the normal cycling process does not occur. Humans and chimpanzees overlap quite a bit on MHC, and any given human may have a more similar profile to a given chimpanzee than another human.

There are 19,000 human genes. At 3 billion base pairs only about ~100 million are polymorphic on a worldwide scale (using some liberal definitions). There are lots of unique stories to tell here.

A new preprint, Inferring adaptive gene-flow in recent African history, illustrates how certain genes with functional significance may differ from genome-wide background. The authors find that among the Fula (Fulani) people of West Africa there has been introgression from a Eurasian mutation that confers lactase persistence. The area of the genome around this gene is much more Eurasian than the rest of the genome. In contrast, the area around the Duffy allele is much less Eurasian. The variation in this locus is related to malaria resistance. Finally, in other African populations, they found gene flow of MHC variants.

None of this is entirely surprising, though the authors apply novel haplotype-based methods which should have wider utility.

Quantitative genomics, adaptation, and cognitive phenotypes

The human brain utilizes about ~20% of the calories you take in per day. It’s a large and metabolically expensive organ. Because of this fact there are lots of evolutionary models which focus on the brain. In Catching Fire: How Cooking Made Us Human Richard Wrangham suggests that our need for calories to feed our brain is one reason we started to use fire to pre-digest our food. In The Mating Mind Geoffrey Miller seems to suggest that all the things our big complex brain does allows for a signaling of mutational load. And in Grooming, Gossip, and the Evolution of Language Robin Dunbar suggests that it’s social complexity which is driving our encephalization.

These are all theories. Interesting hypotheses and models. But how do we test them? A new preprint on bioRxiv is useful because it shows how cutting-edge methods from evolutionary genomics can be used to explore questions relating to cognitive neuroscience and pyschopathology, Polygenic selection underlies evolution of human brain structure and behavioral traits:

…Leveraging publicly available data of unprecedented sample size, we studied twenty-five traits (i.e., ten neuropsychiatric disorders, three personality traits, total intracranial volume, seven subcortical brain structure volume traits, and four complex traits without neuropsychiatric associations) for evidence of several different signatures of selection over a range of evolutionary time scales. Consistent with the largely polygenic architecture of neuropsychiatric traits, we found no enrichment of trait-associated single-nucleotide polymorphisms (SNPs) in regions of the genome that underwent classical selective sweeps (i.e., events which would have driven selected alleles to near fixation). However, we discovered that SNPs associated with some, but not all, behaviors and brain structure volumes are enriched in genomic regions under selection since divergence from Neanderthals ~600,000 years ago, and show further evidence for signatures of ancient and recent polygenic adaptation. Individual subcortical brain structure volumes demonstrate genome-wide evidence in support of a mosaic theory of brain evolution while total intracranial volume and height appear to share evolutionary constraints consistent with concerted evolution…our results suggest that alleles associated with neuropsychiatric, behavioral, and brain volume phenotypes have experienced both ancient and recent polygenic adaptation in human evolution, acting through neurodevelopmental and immune-mediated pathways.

The preprint takes a kitchen-sink approach, throwing a lot of methods of selection at the phenotype of interest. Also, there is always the issue of cryptic population structure generating false positive associations, but they try to address it in the preprint. I am somewhat confused by this passage though:

Paleobiological evidence indicates that the size of the human skull has expanded massively over the last 200,000 years, likely mirroring increases in brain size.

From what I know human cranial sizes leveled off in growth ~200,000 years ago, peaked ~30,000 years ago, and have declined ever since then. That being said, they find signatures of selection around genes associated with ‘intracranial volume.’

There are loads of results using different methods in the paper, but I was curious note that schizophrenia had hits for ancient and recent adaptation. A friend who is a psychologist pointed out to me that when you look within families “unaffected” siblings of schizophrenics often exhibit deviation from the norm in various ways too; so even if they are not impacted by the disease, they are somewhere along a spectrum of ‘wild type’ to schizophrenic. In any case in this paper they found recent selection for alleles ‘protective’ of schizophrenia.

There are lots of theories one could spin out of that singular result. But I’ll just leave you with the fact that when you have a quantitative trait with lots of heritable variation it seems unlikely it’s been subject to a long period of unidirecitional selection. Various forms of balancing selection seem to be at work here, and we’re only in the early stages of understanding what’s going on. Genuine comprehension will require:

– attention to population genetic theory
– large genomic data sets from a wide array of populations
– novel methods developed by population genomicists
– and funcitonal insights which neuroscientists can bring to the table

The future will be genetically engineered


If the film Rise of the Planet of the Apes had come out a few years later I believe there would have been mention of CRISPR. Sometimes science leads to technology, and other times technology aids in science. On occasion the two are one in the same.

The plot I made above shows that in the first five years of the second decade of the 20th century CRISPR went from being an obscure aspect of bacterial genetics to ubiquitous. Friends who had been utilizing “advanced” genetic engineering methods such as TALENS and zinc fingers switched overnight to a CRISPR/Cas9 framework.

As I’ve said before the 2010s are the decade when “reading” the genome becomes normal. We really don’t know what the CRISPR/Cas9 technology is capable of. It’s early years yet. With that, First Human Embryos Edited in U.S.. Technically they’re single celled zygotes. The science itself is not astounding. Rather, it is that the human rubicon has been passed in the United States. As indicated in the article there has been some jealousy about what the Chinese have been able to do because of a different cultural and regulatory framework.

There are those calling for a moratorium on this work (on humans). I’m not in favor or opposed. Rather, my question is simple: if CRISPR/Cas9 makes genetic engineering cheap, easy, and effective, how exactly are we going to enforce a world-wide moratorium? A Butlerian Jihad?

Note: I know that people are freaking about humans + genetic engineering. But most geneticists I know are more excited about the prospects of non-human work, since human clinical trials are going to be way in the future. Over 20 years since Dolly it’s notable to me that no human has been cloned from adult somatic cells yet.