Also, looks like my son is about the 10,000th person in the history of the human race who was whole-genome sequenced. That’s not a shabby record. First prenatal whole-genome sequence of a healthy born individual, and in the first ~0.000125% of the human race alive today to be sequenced.
Several years ago Yaniv Ehrlich published A Vision for Ubiquitous Sequencing. We’re inching in that direction. In The Atlantic Sarah Zhang has a piece, An Abandoned Baby’s DNA Condemns His Mother, while The New York Times just came out with, Old Rape Kits Finally Got Tested. 64 Attackers Were Convicted:
Still, even with such successes, the problem of untested rape kits persists. Advocates for rape victims estimate that about 250,000 kits remain untested across the country.
Unfortunately, until recently, the ‘forensic genetics’ employed rather primitive 1990s technology. But that’s changing, though both money and expertise need to be brought to bear. Companies such as Gencove and Othram are bringing that expertise to a broader market, with the latter company focusing specifically on the forensic market.
So ubiquitous sequencing is happening. Soon. What does that mean? We need to think about privacy. We need to think about data. We need to reflect on the broader implications of this world beyond specific targeted tasks such as forensic identification.
A reader pointed out a very interesting passage in Richard Dawkins’ The Greatest Show on Earth: The Evidence for Evolution on the future possibilities of genome sequencing. Since the book was published in the middle of 2009, it is quite possible the passage was written in 2008, or even earlier.
Unfortunately for Dawkins’ prognostication track-record, but fortunately for science, he was writing at the worst time to make a prediction:
…the doubling time [data produced for a given fixed input] is a bit more than two years, where the Moore’s Law doubling time is a bit less than two years. DNA technology is intensely dependent on computers, so it’s a good guess that Hodgkin’s Law is at least partly dependent on Moore’s Law. The arrows on the right indicate the genome sizes of various creatures. If you follow the arrow towards the left until it hits the sloping line of Hodgkin’s Law, you can read off an estimate of when it will be possible to sequence a gnome the same size as the creature concerned for only £1,000 (of today’s money). For the genome the size of yeast’s, we need to wait only till about 2020. For a new mammal genome…the estimated date is just this side of 2040
The cost for a sequence here is somewhat fuzzy. The first assembly of a genome sequence of an organism is much more difficult than subsequent alignments of later organisms (though more in computation than in the sequencing). But, the upshot is that Dawkins was writing when “Hodgkin’s Law” was collapsing. From 2008 to 2011 Moore’s Law was destroyed by the sequencing revolution pushed forward by Illumina.
Though you can get a $1,000 consumer human sequence today, the reality is that this is for 30× coverage. For lower coverage, which means you aren’t as sure of the validity of any given variant, the price drops rapidly. And for the type of evolutionary questions Dawkins is interested in, the coverage needed is far lower than 30× (you probably want to get a larger number of samples than a single high-quality sample).
In a bid to garner more visibility and support, researchers eager to sequence the genomes of all vertebrates today officially launched the Vertebrate Genomes Project (VGP), releasing 15 very high quality genomes of 14 species. But the group remains far short of raising the funds it will need to document the genomes of the estimated 66,000 vertebrates living on Earth.
The project, which has been underway for 3 years, is a revamp and renaming of an effort begun in 2009 called the Genome 10K Project (G10K), which aimed to decipher the genomes of 10,000 vertebrates. G10K produced about 100 genomes, but they were not very detailed, in part because of the cost of sequencing. Now, however, the cost of high-quality sequencing has dropped to less than $15,000 per billion DNA bases…
Funding remains an obstacle. To date, the VGP has raised $2.5 million of the $6 million needed to sequence a representative species from each of the 260 major branches of the vertebrate family tree. To reach the goal of all 66,000 vertebrates will require about $600 million, Jarvis says.
Though a lot of the details are different (sequencing vs. genotyping, vertebrates vs. humans), many of the general issues that David Mittelman and I brought up in our Genome Biology comment, Consumer genomics will change your life, whether you get tested or not, apply. That is, to some extent this is an area of science where technology and economics are just as important as science in driving progress.
I remember back in graduate school that people were talking about sequencing hundreds of vertebrates. But even in the few years since then, the landscape has shifted. I’m so little a biologist that I actually didn’t know there were only ~66,000 vertebrate species!
And yet this brings up a reasonable question from many scientists who came up in an era of more data scarcity: what are the questions we’re trying to answer here?
Challenges include justifying sequencing all these crappy genomes without research communities behind them https://t.co/HjpIlhXDSu
— Andrew Kern (@pastramimachine) September 14, 2018
Science involves people. It’s not an abstraction. Throwing a whole lot of data out there does not mean that someone will be there to analyze it, or, that we’ll get interesting insights. To be frank, the original Human Genom Project project should probably tell us that, as its short-term benefits were clearly oversold.
In relation to how cheap data storage is and the declining price point of sequencing, I think my assertion that a genome, a sequence, is not a depreciating asset still holds. There is the initial cost of sequencing and assembling and the long term cost of storage, but these are small potatoes. The bigger considerations are the salaries of scientific labor and the opportunity costs. Sequencing tens of thousands of genomes may not get us anywhere, but really we’re not going to lose that much.
Ultimately I side with those who believe that the existence of the data itself will change the landscape of possible questions being asked, and therefore generate novel science. But it’s pretty incredible to even be debating this issue in 2018 of sequencing all vertebrates. That’s something to reflect on.
A while back I made fun of bonobos and chimpanzees for being kind of losers for looking across at each other on either side of the Congo river for ~1.5 million years the time elapsed since their diversion. I finally ended up reading the paper from last year, Chimpanzee genomic diversity reveals ancient admixture with bonobos, which reported complex population history between these two species. In other words, “they got it on”.
The key was a reasonable sample size of N=40 and high coverage genomes (>20x), to give them the amount of information necessary to have the power to detect admixture. If you aren’t human and have a reasonable size genome, and all mammals do, get to the back of the line. But the Pan‘s turn finally arrived.
The paper primary result is that over past few hundred thousand years there have been reciprocal gene flow events of small, but detectable, magnitude between chimpanzees and bonobos. Naturally, there was some geographic specificity here, in that chimpanzees from far West Africa lack much evidence of this while those from Central Africa have a great deal. The admixture is directly proportional to proximity to b0nobo range.
To obtain the result their initial focus on high-frequency bonobo derived alleles that were at low to moderate frequencies in chimpanzees. There was a notable excess for this class among Central African chimpanzees. And, these alleles seem to have introgressed recently.
I suppose the major takeway is that hominids do it like they do it on the Discovery Channel.
When I first began writing on the internet genomics was an exciting field of science. Somewhat abstruse, but newly relevant and well known due to the completion of the draft of the human genome. Today it’s totally different. Genomics is ubiquitous. Instead of a novel field of science, it is transitioning into a personal technology.
And yet we haven’t seen a wholesale change in medicine. What happened? Obviously a major part of it is polygenicity of disease. Not to mention that a lot of illness will always have a random aspect. People who get back a “clean” genome and live a “healthy” life will still get cancer.
Another issue is a chicken & egg problem. When a large proportion of the population is sequenced and phenotyped we’ll probably discover actionable patterns. But until that moment the yield is going to not be too impressive.
Consider this piece in MIT Tech, DNA Testing Reveals the Chance of Bad News in Your Genes:
Out of 50 healthy adults [selected from a random 100] who had their genomes sequenced, 11—or 22 percent—discovered they had genetic variants in one of nearly 5,000 genes associated with rare inherited diseases. One surprise is that most of them had no symptoms at all. Two volunteers had genetic variants known to cause heart rhythm abnormalities, but their cardiology tests were normal.
There’s another possible consequence of people having their genome sequenced. For participants enrolled in the study, health-care costs rose an average of $350 per person compared with a control group in the six months after they received their test results. The authors don’t know whether those costs were directly related to the sequencing, but Vassy says it’s reasonable to think people might schedule follow-up appointments or get more testing on the basis of their results.
Researchers worry about this problem of increased costs. It’s not a trivial problem, and one that medicine doesn’t have a response to, as patients often find a way to follow up on likely false positives. But it seems that this is a phase we’ll have to go through. I see no chance that a substantial proportion of the American population in the 2020s will not be sequenced.
When I was a child in the 1980s I was captivated by Michael Wood’s documentary In Search of the Trojan War (he also wrote a book with the same name). I had read a fair amount of Greek mythology, prose translations of the Iliad, as well as ancient history. The contrast between the Classical Greeks and the strangeness of their mythology was always something that on the surface of my mind. The reality that Bronze Age Greeks were very different from Classical Greeks resolved this issue to some extent, as the mythos no doubt drew from the alien world of the former.
Though Classical Greeks were very different from us (e.g., slavery), to some extent Western civilization began with them, and they are very familiar to us for this reason. Rebecca Goldstein’s Plato at the Googleplex was predicated on the thesis that the ancient Greek philosopher had something to tell us, and that if he was alive today he would be a prominent public speaker.
I’m going to dodge the issue of Julian Jaynes’ bicameral mind, and just assert that people of the Bronze Age were fundamentally different from us in a way Plato was not. And that difference is preserved in aspects of Greek mythology. Though it is fashionable, and correct, to assert that Homer’s world was not that of Mycenaeans, but the barbarian period of the Greek Dark Age, it is not entirely true. Homer clearly preserved traditions where citadels such as Mycenae and Pylos were preeminent. Details such as the boar’s tusk helmets are also present in the Iliad. His corpus of oral history clearly preserved some ancient folkways which had fallen out of favor.
But aesthetic details or geopolitics are not what struck me about Greek mythology, but events such as the sacrifice of Iphigenia. Like Abraham’s near sacrifice of his son, this plot element seems to moderns cruel, barbaric, and unthinking. And though the Classical Greeks did not have our conception of human rights, they had turned against human sacrifice (and the Romans suppressed the practice when they conquered the Celts) on the whole. But it seems to have occurred in earlier periods.
The rupture between the world of the Classical Greeks and the strange edifices of Mycenaean Greece were such that scholars were shocked that the Linear B tablets of the Bronze Age were written in Greek when they were finally deciphered. In fact many of the names and deities on these tablets would be familiar to us today; the name Alexander and the goddess Athena are both attested to in Mycenaean tablets.
Preceding the Mycenaeans, who emerge in the period between 1400-1600 BCE, are the Minoans, who seem to have developed organically in the Aegean in the 3rd millennium. This culture had relations with Egypt and the Near East, their own system of writing, and deeply influenced the motifs of the successor Mycenaean Greek civilization. The aesthetic similarities between Mycenaeans and Minoans is one reason that many were surprised that the former were Greek, because the Minoan language was likely not.
Mycenaean civilization seems to have been a highly militarized and stratified society. There is a reason that this is sometimes referred to as the “age of citadels.” Allusions to the Greeks, or Achaeans, in the diplomatic missives of the Egyptians and Hittites suggests that the lords of the Hellenes were reaver kings. In 1177 B.C. Eric Cline repeats the contention that a fair portion of the “sea peoples” who ravaged Egypt in the late Bronze Age were actually Greeks.
So when did these Greeks arrive on the shores of Hellas? In The Coming of the Greeks Robert Drews argued that the Greeks were part of a broader movement of mobile charioteers who toppled antique polities and turned them into their own. The Hittites and Mitanni were two examples of Indo-European ruling elites who took over a much more advanced civilizational superstructure. While the Hittites and other Indo-Europeans, such as the Luwians and Armenians, slowly absorbed the non-Indo-European substrate of Anatolia, the Indo-Aryan Mitanni elite were linguistically absorbed by their non-Indo-European Hurrian subjects. Indo-Aryan elements persisted only their names, their gods, and tellingly, in a treatise on training horses for charioteers.
Drews’ thesis is that the Greek language percolated down from the warlords of the citadels and their retinues over the Bronze Age, with the relics who did not speak Greek persisting into the Classical period as the Pelasgians. Set against this is the thesis of Colin Renfrew that Greek was one of the first Indo-European languages, as Indo-European languages began in Anatolia.
The most recent genetic data suggest to me that both theses are likely to be wrong. The data are presented in two preprints, The Population Genomics Of Archaeological Transition In West Iberia and The Genomic History Of Southeastern Europe. The two papers cover lots of different topics. But I want to focus on one aspect: gene flow from steppe populations into Southern Europe.
We know that in the centuries after 2900 BCE there was a massive eruption of individuals from the steppe fringe of Eastern Europe, and Northern Europe from Ireland to to Poland was genetically transformed. Though there was some assimilation of indigenous elements, it looks to be that the majority element in Northern Europe were descended from migrants.
For various reasons this was always less plausible for Southern Europe. The first reason is that Southern Europeans shared a lot of genetic similarities to Sardinians, who resembled Neolithic farmers. Admixture models generally suggested that in the peninsulas of Southern Europe the steppe-like ancestry was the minority component, not the majority, as was the case in Northern Europe.
These data confirm it. The Bronze Age in Portugal saw a shift toward steppe-inflected populations, but it was not a large shift. There seems to have been later gene flow too. But by and large the Iberian populations exhibit some continuity with late Neolithic populations. This is not the case in Northern Europe.
In The Genomic History Of Southeastern Europe the authors note that steppe-like ancestry could be found sporadically during early periods, but that there was a notable increase in the Bronze Age, and later individuals in the Bronze Age had a higher fraction. Nevertheless, by and large it looks as if the steppe-like gene flow in the southerly Balkans (focusing on Bulgarian samples) was modest in comparison to the northern regions of Europe. Unfortunately I do not see any Greece Bronze Age samples, but it seems likely that steppe-like influence came into these groups after they arrived in Bulgaria, which is more northerly.
Down to the present day a non-Indo-European language, Basque, is spoken in Spain. Paleo-Sardinian survived down to the Common Era, and it too was not Indo-European. Similarly, non-Indo-European Pelasgian communities continued down to the period of city-states in Greece.
These long periods of coexistence point to the demographic equality (or even superiority) of the non-Indo-European populations. The dry climate of the Mediterranean peninsulas are not as suitable for cattle based agro-pastoralism. This may have limited the spread and dominance of Indo-Europeans. Additionally, the Mediterranean peninsulas were likely touched by Indo-European migrations relatively late. Much of the early zeal for expansion may have already dissipated by them. The high frequency of likely Indo-European R1b lineages among the Basques is curious, and may point to the spreading of male patronization networks, and their assimilation into non-Indo-European substrates where necessary. R1b is also found in Sardinia, and in high frequencies in much of Italy.
The interaction and synthesis between native and newcomer was likely intensive in the Mediterranean. For example, of the gods of the Greek pantheon only Zeus is indubitably of Indo-European origin. Some, such as Artemis, have clear Near Eastern antecedents. But other Greek gods may come down from the pre-Greek inhabitants of what became Greece.
Ultimately these copious interactions and transformations should not be a great surprise. The sunny lands of the Mediterranean attracted Northern European tribes during Classical antiquity. The Cimbri invasion of Italy, Galatians in Thrace and Anatolia, the folk wandering of Vandals and Goths into Iberia, are all instances of population movements southward. These likely moved the needle ever so slightly toward convergence between Northern and Southern Europe in terms of genetic content.
In relation to the more general spread of Indo-Europeans, I believe there are a few areas like Northern Europe, where replacement was preponderant (e.g., the Tarim basin). But I also believe there were many more which presented a Southern European model of synthesis and accommodation.
The link is up, The Beaker Phenomenon And The Genomic Transformation Of Northwest Europe, but the paper is still processing:
I’ll update the post when I can read the paper.
Epistasis is one of those terms in biology which has multiple meanings, to the point that even biologists can get turned around (see this 2008 review, Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems, for a little background). Most generically epistasis is the interaction of genes in terms of producing an outcome. But historically its meaning is derived from the fact that early geneticists noticed that crosses between individuals segregating for a Mendelian characteristic (e.g., smooth vs. curly peas) produced results conditional on the genotype of a secondary locus.
Molecular biologists tend to focus on a classical, and often mechanistic view, whereby epistasis can be conceptualized as biophysical interactions across loci. But population geneticists utilize a statistical or evolutionary definition, where epistasis describes the extend of deviation from additivity and linearity, with the “phenotype” often being fitness. This goes back to early debates between R. A. Fisher and Sewall Wright. Fisher believed that in the long run epistasis was not particularly important. Wright eventually put epistasis at the heart of his enigmatic shifting balance theory, though according to Will Provine in Sewall Wright and Evolutionary Biology even he had a difficult time understanding the model he was proposing (e.g., Wright couldn’t remember what the different axes on his charts actually meant all the time).
These different definitions can cause problems for students. A few years ago I was a teaching assistant for a genetics course, and the professor, a molecular biologist asked a question about epistasis. The only answer on the key was predicated on a classical/mechanistic understanding. But some of the students were obviously giving the definition from an evolutionary perspective! (e.g., they were bringing up non-additivity and fitness) Luckily I noticed this early on and the professor approved the alternative answer, so that graders would not mark those using a non-molecular answer down.
My interested in epistasis was fed to a great extent in the middle 2000s by my reading of Epistasis and the Evolutionary Process. Unfortunately not too many people read this book. I believe this is so because when I just went to look at the Amazon page it told me that “Customers who viewed this item also viewed” Robert Drews’ The End of the Bronze Age. As it happened I read this book at about the same time as Epistasis and the Evolutionary Process…and to my knowledge I’m the only person who has a very deep interest in statistical epistasis and Mycenaean Greece (if there is someone else out there, do tell).
In any case, when I was first focused on this topic genomics was in its infancy. Papers with 50,000 SNPs in humans were all the rage, and the HapMap paper had literally just been published. A lot has changed.
So I was interested to see this come out in Science, Negative selection in humans and fruit flies involves synergistic epistasis (preprint version). Since the authors are looking at humans and Drosophila and because it’s 2017 I assumed that genomic methods would loom large, and they do.
And as always on the first read through some of the terminology got confusing (various types of statistical epistasis keep getting renamed every few years it seems to me, and it’s hard to keep track of everything). So I went to Google. And because it’s 2017 a citation of the paper and further elucidation popped up in Google Books in Crumbling Genome: The Impact of Deleterious Mutations on Humans. Weirdly, or not, the book has not been published yet. Since the author is the second to last author on the above paper it makes sense that it would be cited in any case.
So what’s happening in this paper? Basically they are looking to reduced variance of really bad mutations because a particular type of epistasis amplifies their deleterious impact (fitness is almost always really hard to measure, so you want to look at proxy variables).
Because de novo mutations are rare, they estimate about 7 are in functional regions of the genome (I think this may be high actually), and that the distribution should be Poisson. This distribution just tells you that the mean number of mutations and the variance of the the number of mutations should be the same (e.g., mean should be 5 and variance should 5).
Epistasis refers (usually) to interactions across loci. That is, different genes at different locations in the genome. Synergistic epistasis means that the total cumulative fitness after each successive mutation drops faster than the sum of the negative impact of each mutation. In other words, the negative impact is greater than the sum of its parts. In contrast, antagonistic epistasis produces a situation where new mutations on the tail of the distributions cause a lower decrement in fitness than you’d expect through the sum of its parts (diminishing returns on mutational load when it comes to fitness decrements).
These two dynamics have an effect the linkage disequilibrium (LD) statistic. This measures the association of two different alleles at two different loci. When populations are recently admixed (e.g., Brazilians) you have a lot of LD because racial ancestry results in lots of distinctive alleles being associated with each other across genomic segments in haplotypes. It takes many generations for recombination to break apart these associations so that allelic state at one locus can’t be used to predict the odds of the state at what was an associated locus. What synergistic epistasis does is disassociate deleterious mutations. In contrast, antagonistic epistasis results in increased association of deleterious mutations.
Why? Because of selection. If a greater number of mutations means huge fitness hits, then there will strong selection against individuals who randomly segregate out with higher mutational loads. This means that the variance of the mutational load is going to lower than the value of the mean.
How do they figure out mutational load? They focus on the distribution of LoF mutations. These are extremely deleterious mutations which are the most likely to be a major problem for function and therefore a huge fitness hit. What they found was that the distribution of LoF mutations exhibited a variance which was 90-95% of a null Poisson distribution. In other words, there was stronger selection against high mutation counts, as one would predict due to synergistic epistasis.
Thus, the average human should carry at least seven de novo deleterious mutations. If natural selection acts on each mutation independently, the resulting mutation load and loss in average fitness are inconsistent with the existence of the human population (1 − e−7 > 0.99). To resolve this paradox, it is sufficient to assume that the fitness landscape is flat only outside the zone where all the genotypes actually present are contained, so that selection within the population proceeds as if epistasis were absent (20, 25). However, our findings suggest that synergistic epistasis affects even the part of the fitness landscape that corresponds to genotypes that are actually present in the population.
Overall this is fascinating, because evolutionary genetic questions which were still theoretical a little over ten years ago are now being explored with genomic methods. This is part of why I say genomics did not fundamentally revolutionize how we understand evolution. There were plenty of models and theories. Now we are testing them extremely robustly and thoroughly.
Addendum: Reading this paper reinforces to me how difficult it is to keep up with the literature, and how important it is to know the literature in a very narrow area to get the most out of a paper. Really the citations are essential reading for someone like me who just “drops” into a topic after a long time away….
Citation: Science, Negative selection in humans and fruit flies involves synergistic epistasis.