The Great Stagnation, genomics edition


Yaniv Erlich has been talking about the stability of the cost of sequencing for the last few years on Twitter. For what it’s worth, I think the stagnation is probably due to lack of competition. Illumina could surely move the price point further down through squeezing more efficiencies out of the process. In fact, they are surely squeezing more efficiencies out and profiting from that.

What is really striking though is how the period between the end of 2007 and to the beginning of 2010 was like something that we never saw before, and will never see again. Not uncoincidentally this is also when 23andMe also brought direct-to-consumer to a broader audience interested in health.

Veritas’ price drop to $599

For many years friends have been asking me whole genome sequencing retail would be “cheap.”  Well, as Emily Mullin reports, Veritas is now dropping their retail price to $599. You Can Now Get Your Whole Genome Sequenced for Less Than an iPhone:

Veritas Genetics is making a big bet that people want to know what’s in their genome.

The Boston-based company, which started offering whole genome sequencing in 2016 for $999 — the first company to do so below four figures — announced today that it is lowering the price to $599. For much less than the price of the latest iPhone model, consumers can get a full readout of their DNA.

Veritas’ move is a clear signal that genetic sequencing technology is getting cheaper as it becomes more automated — but whether people will want to know about the disease risks that may lurk in their genomes is yet to be seen. But Veritas thinks its new price point will be low enough to convince customers that decoding their entire genome is worth it.

There are still some “gaps” in our knowledge in terms of repetitive areas, as well as the whole area of secondary and tertiary structure and details of gene regulation and epigenetics. But, we’re fast converging on total sequence information.

The growth of human genomics

Citation: Aylwyn Scally

The above figure is from Aylwyn Scally, or as I like to think of him, the Irish Matt Hahn. I’m not going to add any comments as the chart speaks for itself, doesn’t it?

Also, looks like my son is about the 10,000th person in the history of the human race who was whole-genome sequenced. That’s not a shabby record. First prenatal whole-genome sequence of a healthy born individual, and in the first ~0.000125% of the human race alive today to be sequenced.

The Ubiquitous Sequencing Age

Several years ago Yaniv Ehrlich published A Vision for Ubiquitous Sequencing. We’re inching in that direction. In The Atlantic Sarah Zhang has a piece, An Abandoned Baby’s DNA Condemns His Mother, while The New York Times just came out with, Old Rape Kits Finally Got Tested. 64 Attackers Were Convicted:

Still, even with such successes, the problem of untested rape kits persists. Advocates for rape victims estimate that about 250,000 kits remain untested across the country.

Unfortunately, until recently, the ‘forensic genetics’ employed rather primitive 1990s technology. But that’s changing, though both money and expertise need to be brought to bear. Companies such as Gencove and Othram are bringing that expertise to a broader market, with the latter company focusing specifically on the forensic market.

So ubiquitous sequencing is happening. Soon. What does that mean? We need to think about privacy. We need to think about data. We need to reflect on the broader implications of this world beyond specific targeted tasks such as forensic identification.

Laws of engineering are meant to be broken

A reader pointed out a very interesting passage in Richard Dawkins’ The Greatest Show on Earth: The Evidence for Evolution on the future possibilities of genome sequencing. Since the book was published in the middle of 2009, it is quite possible the passage was written in 2008, or even earlier.

Unfortunately for Dawkins’ prognostication track-record, but fortunately for science, he was writing at the worst time to make a prediction:

…the doubling time [data produced for a given fixed input] is a bit more than two years, where the Moore’s Law doubling time is a bit less than two years. DNA technology is intensely dependent on computers, so it’s a good guess that Hodgkin’s Law is at least partly dependent on Moore’s Law. The arrows on the right indicate the genome sizes of various creatures. If you follow the arrow towards the left until it hits the sloping line of Hodgkin’s Law, you can read off an estimate of when it will be possible to sequence a gnome the same size as the creature concerned for only £1,000 (of today’s money). For the genome the size of yeast’s, we need to wait only till about 2020. For a new mammal genome…the estimated date is just this side of 2040

Obsolete plot from The Greatest Show on Earth

The cost for a sequence here is somewhat fuzzy. The first assembly of a genome sequence of an organism is much more difficult than subsequent alignments of later organisms (though more in computation than in the sequencing). But, the upshot is that Dawkins was writing when “Hodgkin’s Law” was collapsing. From 2008 to 2011 Moore’s Law was destroyed by the sequencing revolution pushed forward by Illumina.

Though you can get a $1,000 consumer human sequence today, the reality is that this is for 30× coverage. For lower coverage, which means you aren’t as sure of the validity of any given variant, the price drops rapidly. And for the type of evolutionary questions Dawkins is interested in, the coverage needed is far lower than 30× (you probably want to get a larger number of samples than a single high-quality sample).

Sequence them all and let God sort it out!

Researchers reboot ambitious effort to sequence all vertebrate genomes, but challenges loom:

In a bid to garner more visibility and support, researchers eager to sequence the genomes of all vertebrates today officially launched the Vertebrate Genomes Project (VGP), releasing 15 very high quality genomes of 14 species. But the group remains far short of raising the funds it will need to document the genomes of the estimated 66,000 vertebrates living on Earth.

The project, which has been underway for 3 years, is a revamp and renaming of an effort begun in 2009 called the Genome 10K Project (G10K), which aimed to decipher the genomes of 10,000 vertebrates. G10K produced about 100 genomes, but they were not very detailed, in part because of the cost of sequencing. Now, however, the cost of high-quality sequencing has dropped to less than $15,000 per billion DNA bases…

Funding remains an obstacle. To date, the VGP has raised $2.5 million of the $6 million needed to sequence a representative species from each of the 260 major branches of the vertebrate family tree. To reach the goal of all 66,000 vertebrates will require about $600 million, Jarvis says.

Though a lot of the details are different (sequencing vs. genotyping, vertebrates vs. humans), many of the general issues that David Mittelman and I brought up in our Genome Biology comment, Consumer genomics will change your life, whether you get tested or not, apply. That is, to some extent this is an area of science where technology and economics are just as important as science in driving progress.

I remember back in graduate school that people were talking about sequencing hundreds of vertebrates. But even in the few years since then, the landscape has shifted. I’m so little a biologist that I actually didn’t know there were only ~66,000 vertebrate species!

And yet this brings up a reasonable question from many scientists who came up in an era of more data scarcity: what are the questions we’re trying to answer here?

Science involves people. It’s not an abstraction. Throwing a whole lot of data out there does not mean that someone will be there to analyze it, or, that we’ll get interesting insights. To be frank, the original Human Genom Project project should probably tell us that, as its short-term benefits were clearly oversold.

In relation to how cheap data storage is and the declining price point of sequencing, I think my assertion that a genome, a sequence, is not a depreciating asset still holds. There is the initial cost of sequencing and assembling and the long term cost of storage, but these are small potatoes. The bigger considerations are the salaries of scientific labor and the opportunity costs. Sequencing tens of thousands of genomes may not get us anywhere, but really we’re not going to lose that much.

Ultimately I side with those who believe that the existence of the data itself will change the landscape of possible questions being asked, and therefore generate novel science. But it’s pretty incredible to even be debating this issue in 2018 of sequencing all vertebrates. That’s something to reflect on.

Apes just being apes

A while back I made fun of bonobos and chimpanzees for being kind of losers for looking across at each other on either side of the Congo river for ~1.5 million years the time elapsed since their diversion. I finally ended up reading the paper from last year, Chimpanzee genomic diversity reveals ancient admixture with bonobos, which reported complex population history between these two species. In other words, “they got it on”.

The key was a reasonable sample size of N=40 and high coverage genomes (>20x), to give them the amount of information necessary to have the power to detect admixture. If you aren’t human and have a reasonable size genome, and all mammals do, get to the back of the line. But the Pan‘s turn finally arrived.

The paper primary result is that over past few hundred thousand years there have been reciprocal gene flow events of small, but detectable, magnitude between chimpanzees and bonobos. Naturally, there was some geographic specificity here, in that chimpanzees from far West Africa lack much evidence of this while those from Central Africa have a great deal. The admixture is directly proportional to proximity to b0nobo range.

To obtain the result their initial focus on high-frequency bonobo derived alleles that were at low to moderate frequencies in chimpanzees. There was a notable excess for this class among Central African chimpanzees. And, these alleles seem to have introgressed recently.

I suppose the major takeway is that hominids do it like they do it on the Discovery Channel.

Genome sequencing for the people is near

When I first began writing on the internet genomics was an exciting field of science. Somewhat abstruse, but newly relevant and well known due to the completion of the draft of the human genome. Today it’s totally different. Genomics is ubiquitous. Instead of a novel field of science, it is transitioning into a personal technology.

But life comes at you fast. For all practical purposes the $1,000 genome is here.

And yet we haven’t seen a wholesale change in medicine. What happened? Obviously a major part of it is polygenicity of disease. Not to mention that a lot of illness will always have a random aspect. People who get back a “clean” genome and live a “healthy” life will still get cancer.

Another issue is a chicken & egg problem. When a large proportion of the population is sequenced and phenotyped we’ll probably discover actionable patterns. But until that moment the yield is going to not be too impressive.

Consider this piece in MIT Tech, DNA Testing Reveals the Chance of Bad News in Your Genes:

Out of 50 healthy adults [selected from a random 100] who had their genomes sequenced, 11—or 22 percent—discovered they had genetic variants in one of nearly 5,000 genes associated with rare inherited diseases. One surprise is that most of them had no symptoms at all. Two volunteers had genetic variants known to cause heart rhythm abnormalities, but their cardiology tests were normal.

There’s another possible consequence of people having their genome sequenced. For participants enrolled in the study, health-care costs rose an average of $350 per person compared with a control group in the six months after they received their test results. The authors don’t know whether those costs were directly related to the sequencing, but Vassy says it’s reasonable to think people might schedule follow-up appointments or get more testing on the basis of their results.

Researchers worry about this problem of increased costs. It’s not a trivial problem, and one that medicine doesn’t have a response to, as patients often find a way to follow up on likely false positives. But it seems that this is a phase we’ll have to go through. I see no chance that a substantial proportion of the American population in the 2020s will not be sequenced.

When conquered pre-Greece took captive her rude Hellene conqueror


When I was a child in the 1980s I was captivated by Michael Wood’s documentary In Search of the Trojan War (he also wrote a book with the same name). I had read a fair amount of Greek mythology, prose translations of the Iliad, as well as ancient history. The contrast between the Classical Greeks and the strangeness of their mythology was always something that on the surface of my mind. The reality that Bronze Age Greeks were very different from Classical Greeks resolved this issue to some extent, as the mythos no doubt drew from the alien world of the former.

Though Classical Greeks were very different from us (e.g., slavery), to some extent Western civilization began with them, and they are very familiar to us for this reason. Rebecca Goldstein’s Plato at the Googleplex was predicated on the thesis that the ancient Greek philosopher had something to tell us, and that if he was alive today he would be a prominent public speaker.

I’m going to dodge the issue of Julian Jaynes’ bicameral mind, and just assert that people of the Bronze Age were fundamentally different from us in a way Plato was not. And that difference is preserved in aspects of Greek mythology. Though it is fashionable, and correct, to assert that Homer’s world was not that of Mycenaeans, but the barbarian period of the Greek Dark Age, it is not entirely true. Homer clearly preserved traditions where citadels such as Mycenae and Pylos were preeminent. Details such as the boar’s tusk helmets are also present in the Iliad. His corpus of oral history clearly preserved some ancient folkways which had fallen out of favor.

But aesthetic details or geopolitics are not what struck me about Greek mythology, but events such as the sacrifice of Iphigenia. Like Abraham’s near sacrifice of his son, this plot element seems to moderns cruel, barbaric, and unthinking. And though the Classical Greeks did not have our conception of human rights, they had turned against human sacrifice (and the Romans suppressed the practice when they conquered the Celts) on the whole. But it seems to have occurred in earlier periods.

The rupture between the world of the Classical Greeks and the strange edifices of Mycenaean Greece were such that scholars were shocked that the Linear B tablets of the Bronze Age were written in Greek when they were finally deciphered. In fact many of the names and deities on these tablets would be familiar to us today; the name Alexander and the goddess Athena are both attested to in Mycenaean tablets.

Preceding the Mycenaeans, who  emerge in the period between 1400-1600 BCE, are the Minoans, who seem to have developed organically in the Aegean in the 3rd millennium. This culture had relations with Egypt and the Near East, their own system of writing, and deeply influenced the motifs of the successor Mycenaean Greek civilization. The aesthetic similarities between Mycenaeans and Minoans is one reason that many were surprised that the former were Greek, because the Minoan language was likely not.

Mycenaean civilization seems to have been a highly militarized and stratified society. There is a reason that this is sometimes referred to as the “age of citadels.” Allusions to the Greeks, or Achaeans, in the diplomatic missives of the Egyptians and Hittites suggests that the lords of the Hellenes were reaver kings. In 1177 B.C. Eric Cline repeats the contention that a fair portion of the “sea peoples” who ravaged Egypt in the late Bronze Age were actually Greeks.

So when did these Greeks arrive on the shores of Hellas? In The Coming of the Greeks Robert Drews argued that the Greeks were part of a broader movement of mobile charioteers who toppled antique polities and turned them into their own. The Hittites and Mitanni were two examples of Indo-European ruling elites who took over a much more advanced civilizational superstructure. While the Hittites and other Indo-Europeans, such as the Luwians and Armenians, slowly absorbed the non-Indo-European substrate of Anatolia, the Indo-Aryan Mitanni elite were linguistically absorbed by their non-Indo-European Hurrian subjects. Indo-Aryan elements persisted only their names, their gods, and tellingly, in a treatise on training horses for charioteers.

Drews’ thesis is that the Greek language percolated down from the warlords of the citadels and their retinues over the Bronze Age, with the relics who did not speak Greek persisting into the Classical period as the Pelasgians. Set against this is the thesis of Colin Renfrew that Greek was one of the first Indo-European languages, as Indo-European languages began in Anatolia.

The most recent genetic data suggest to me that both theses are likely to be wrong. The data are presented in two preprints The Population Genomics Of Archaeological Transition In West Iberia and The Genomic History Of Southeastern Europe. The two papers cover lots of different topics. But I want to focus on one aspect: gene flow from steppe populations into Southern Europe.

We know that in the centuries after 2900 BCE there was a massive eruption of individuals from the steppe fringe of Eastern Europe, and Northern Europe from Ireland to to Poland was genetically transformed. Though there was some assimilation of indigenous elements, it looks to be that the majority element in Northern Europe were descended from migrants.

For various reasons this was always less plausible for Southern Europe. The first reason is that Southern Europeans shared a lot of genetic similarities to Sardinians, who resembled Neolithic farmers. Admixture models generally suggested that in the peninsulas of Southern Europe the steppe-like ancestry was the minority component, not the majority, as was the case in Northern Europe.

These data confirm it. The Bronze Age in Portugal saw a shift toward steppe-inflected populations, but it was not a large shift. There seems to have been later gene flow too. But by and large the Iberian populations exhibit some continuity with late Neolithic populations.  This is not the case in Northern Europe.

In The Genomic History Of Southeastern Europe the authors note that steppe-like ancestry could be found sporadically during early periods, but that there was a notable increase in the Bronze Age, and later individuals in the Bronze Age had a higher fraction. Nevertheless, by and large it looks as if the steppe-like gene flow in the southerly Balkans (focusing on Bulgarian samples) was modest in comparison to the northern regions of Europe. Unfortunately I do not see any Greece Bronze Age samples, but it seems likely that steppe-like influence came into these groups after they arrived in Bulgaria, which is more northerly.

Down to the present day a non-Indo-European language, Basque, is spoken in Spain. Paleo-Sardinian survived down to the Common Era, and it too was not Indo-European. Similarly, non-Indo-European Pelasgian communities continued down to the period of city-states in Greece.

These long periods of coexistence point to the demographic equality (or even superiority) of the non-Indo-European populations. The dry climate of the Mediterranean peninsulas are not as suitable for cattle based agro-pastoralism. This may have limited the spread and dominance of Indo-Europeans. Additionally, the Mediterranean peninsulas were likely touched by Indo-European migrations relatively late. Much of the early zeal for expansion may have already dissipated by them. The high frequency of likely Indo-European R1b lineages among the Basques is curious, and may point to the spreading of male patronization networks, and their assimilation into non-Indo-European substrates where necessary. R1b is also found in Sardinia, and in high frequencies in much of Italy.

The interaction and synthesis between native and newcomer was likely intensive in the Mediterranean. For example, of the gods of the Greek pantheon only Zeus is indubitably of Indo-European origin. Some, such as Artemis, have clear Near Eastern antecedents. But other Greek gods may come down from the pre-Greek inhabitants of what became Greece.

Ultimately these copious interactions and transformations should not be a great surprise. The sunny lands of the Mediterranean attracted Northern European tribes during Classical antiquity. The Cimbri invasion of Italy, Galatians in Thrace and Anatolia, the folk wandering of Vandals and Goths into Iberia, are all instances of population movements southward. These likely moved the needle ever so slightly toward convergence between Northern and Southern Europe in terms of genetic content.

In relation to the more general spread of Indo-Europeans, I believe there are a few areas like Northern Europe, where replacement was preponderant (e.g., the Tarim basin). But I also believe there were many more which presented a Southern European model of synthesis and accommodation.