Hypotheses are overrated

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

So says the European Journal of Human Genetics, in response to the flood of data from genome-wide association studies and other genomic data in the field of human genetics:

[O]ne might maliciously wonder if we are not (temporarily, in this field and pending subsequent functional studies) close to the ultimate consumption date of the Popperian approach of hypothesis-driven research. For was not a main goal of this to unravel the truth in the most efficient, that is, plausible way, faced with a daunting scarcity of collectible data? Well, if it becomes cheaper to just collect all data required than to run after a hundred consecutive, plausible, but wrong hypotheses, starting with a hypothesis becomes an economic futility. The hypothesis as a guiding principle is then replaced by a truism: if one does not throw away anything before thoroughly assessing its irrelevance, one will always find what one is looking for.

Labels:

17 Comments

  1. I suppose this makes sense if we’re only talking about first-level phenotypic instances such as diabetes contra colon cancer. But let’s take the claim “homosexuality is genetic”. Are we supposed to think that there’s specific allele that is the “homosexual allele”? At some multi-level of abstraction from the genotype we’re going to need to offer some sort of hypothetical groupings of lower-level phenotypes in order to establish a geno-pheno link to higher level phenotypes. For example, testosterone is a phenotype, not a genotype. Now let’s say that a relative propensity toward polygamy (higher level phenotype) in a male is due to a higher than average testosterone, low time-benefit behavior, and a r-mating strategy relative to other human population groups.  
     
    This is all just off the top of my head because something just feels wrong about the claim. 
     
    I think I remember reading somewhere that during an infection the particular bacterial strain mutates to counter the individual’s immune defenses and that the same genetic strain infecting two different people at the same time would have different genetic mutations at the end of the infection. So, are we supposed to make up different labels for every different infection? Categorization is an inherently hypothetical activity.

  2. Oh, goodness. How moronic. If your goal is to publish replicable association studies with vanishingly small P-values, perhaps the hypothesis is dead — and was never alive to begin with. 
     
    If your goal is to cure/prevent diabetes, or understand its biology, may I humbly submit that hypotheses will be needed?

  3. If your goal is to cure/prevent diabetes, or understand its biology, may I humbly submit that hypotheses will be needed? 
     
    oh, of course (there are caveats in the quote: “temporarily, in this field and pending subsequent functional studies”). but one road to understanding the biology of a trait is hypothesis-free– simply map its genetic variation. this is exploratory data analysis at its best–instead of hypothesizing about the biology of diabetes and finding candidate genes to test, let the data find the hypothesis for you.

  4. well, a GWAS conducted by current methods starts with the hypothesis that there are common variants of moderately large effect size that contribute to common variation in a trait. it’s a rather broad hypothesis, but one which is being tested with every GWAS. 
     
    moreover, at the end of the GWAS, you hopefully end up with many more hypotheses to test. 
     
    however, i think the point being made is that in an era of high-throughput biology, it’s scope and scale of a reasonable, testable hypothesis needs to be reconsidered.

  5. in an era of high-throughput biology, it’s scope and scale of a reasonable, testable hypothesis needs to be reconsidered. 
     
    exactly.

  6. If your goal is to cure/prevent diabetes, or understand its biology, may I humbly submit that hypotheses will be needed? 
     
    I agree. It’s important for us to develop a systemic picture from -omics, sure, but hypotheses are necessary so that research can be focused into the most important areas.

  7. This is an old dispute in the history of science – see Science as a Process by David L Hull in relation to ‘phenetics’ (BTW I regard SaaP as probably the best book ever written about science).  
     
    This sort of claim, ie. that hypotheses are no longer necessary, comes in the wake of any powerful new technique whereby scientists who use the technique can generate prestigious publishable results (?merely) on the basis of using pre-existing (often implicit, internalized) hypotheses.  
     
    Actually, they are still testing hypotheses, but may not be aware of the fact.  
     
    So long as data using the new technique is easily publishable, then people can continue the delusion that they are merely data-driven.  
     
    But when the competition in the field hots-up, as it will, then the people with the best hypotheses will begin to pull-ahead… 
     
    Declaration of interest – I am editor of a journal called Medical Hypotheses!

  8. If the authors are concerned about the problem of running after hundreds of hypotheses, what will they think about running after billions of seemingly independent data points? 
     
    Charlton is right that the hypothesis is implicit, as in a paradigm. When the paradigm outlives its usefulness, they will come crying to Popper.

  9. GWAS don’t dispense with hypotheses, quite the contrary. What a GWAS does is, essentially, test thousands of “micro-hypotheses” all at the same time, where each hypothesis has the form “alleles a, b, c at loci x, y, z are associated with phenotypic trait Phi”. So the old paradigm is inverted: instead of one big, heavy hypothesis tested by many small repetitive experiment, you have many little hypotheses tested by one big experiment. 
     
    Of course you’re still left with the central problem of association studies, i.e. how to be sure that your link is actually causal rather than an artifact. Nothing changed there.

  10. The “hypothesis-free” approach can be justified in part by the failure of candidate-gene approaches to predict the major loci found by replicable GWASs. And the author’s caveats are well-taken. I’m reacting to the somewhat sensationalist title, which should have instead said, “Hypotheses are overrated for genome-wide association studies.” 
     
    One might ask whether there is any generality to the claim that once you have sufficient scale, you might as well throw out insight and do brute-force attack. If we could (if we had enough people, that is), would it be better to uncover the biology of diabetes and extract prevention/cure from that knowledge, or throw the Sigma-Aldrich catalog at a sufficiently large sample and feed everyone else whatever worked? 
     
    So long as you have infinite resources, brute-force generally wins. But it’s expensive, and that, I think, is the major failing of hypothesis-free research (although compared to pursuing irrelevant hypotheses, the cost-per-useful result may be much higher!). 
     
    Roughly this large-scale, insight-free approach was tried by Big Pharma, and they are still reeling from its failure. It was the great robotic hope: combinatorial chemistry. A nice quote from a lead Bristol-Myers scientist: 
     
    You end up making things that you can make, rather than what you should make. 
     
    The same is likely true for GWAS — we end up testing what we can, rather than what we should. Given our failures of insight, that’s a good and realistic approach (“temporarily, in this field”). But rather than accepting that hypotheses are overrated, a point of view which naturally leads people to abandon hypothesis-generation, I believe the proper response is to figure out what’s wrong with present methods of hypothesis-generation so that we can fix/replace them. The editorial I would have written would bear the title, “Why are our hypotheses so crappy?”

  11. I believe the proper response is to figure out what’s wrong with present methods of hypothesis-generation so that we can fix/replace them. The editorial I would have written would bear the title, “Why are our hypotheses so crappy?” 
     
    well, in the field at hand, I wouldn’t look at it that way. what hypothesis-generating mechanism would lead you to test gene deserts for association with crohn’s disease, for example? or a gene of unknown function for association with obesity? the knowledge of biology just simply isn’t there–these would be major leaps. better to just iterate through the search space.  
     
    the point about GWAS having implicit assumptions/hypotheses is well-taken, and I agree major advances will come through people examining these assumptions more carefully and finding methods to get around them. but for the moment, in this field, the technology is completely driving the biology in a largely hypothesis-free manner, and I think it’s fun :)

  12. Genomic association studies seem formally similar to genetic screens for mutations that alter a specific biological function of interest (e.g. embryonic body plan in Drosophila). Both hinge on the idea that there are indeed genes controlling the process. If they succeed, then the genes identified are candidates for deeper, hypothesis-based study. The genetically identified genes have the advantage that we know they actually contribute to the normal process at some level. 
     
    Human diseases are more problematic as the mapping is harder and the mutations are catch-as-catch-can. For these, association with particular SNPs is a powerful way to localize candidate genes, especially with a sufficient pool of genomes to scan. Even in the case of implicated SNPs, I’m willing to bet that if there is a nearby gene with a series of well defined motifs, and one that is lacking or short of well defined domains of known function, all the action will go to the gene that looks like something we already know. 
     
    By this logic, association studies, like large screens for new mutations, are the beginning of the process, not the end. For human traits, Association is a powerful way to search for candidate genes and should be used well and often. As with genetic screens, some loci will, for various reasons, seem better candidates for initial study. In the case of associations, the stronger the association with the trait (which already requires a substantial data set on the phenotypes of people in the study), the more worthy of immediate study. In any case, from that point on, this is still Popper in action. 
     
    One final comment. Those who do genomic/proteomic/computational studies often introduce their talks by touting their methods over hypothesis test approaches, and then proudly describe what they do as ?Discovery Research.? For those in the audience who have been making discoveries for some time without calling their research ?Discovery Research,? this is irritating arrogance. This feeling is compounded at the end of many of these seminars by the paucity of any meaningful conclusions or take home messages other than, ?well, we have some genes that might be involved?, or ?some potential regulatory sites?, or ?some genes are expressed more and some are expressed less.?

  13. In any case, from that point on, this is still Popper in action. 
     
    yes, absolutely. and I think the comparison to genetic screening is apt. when nusslein-volhard did her legendary massive screens, I wonder if she was criticized for not having any particular hypothesis in mind beforehand. I think it’s fair to say it turned out alright in this case :)

  14. what hypothesis-generating mechanism would lead you to test gene deserts for association with crohn’s disease, for example? or a gene of unknown function for association with obesity? 
    I think we may be in violent agreement. I’m saying our previous way of generating hypotheses — namely, thinking hard and extrapolating from anecdotal clinical results — has largely failed. However, does anyone doubt that there is a good biological reason for some gene-desert regions to be associated with Crohn’s disease? The value of GWASs are to provide unbiased evaluation of the previously dominant approach, and they have found that approach wanting. Instead of abandoning the approach, why not use GWASs to fix it? Unless you believe that *there are no new predictive principles to be learned*. The goal should be to obviate expensive GWASs where the vast majority of results are “no effect”, and replace them with something more efficient in cost per positive result. 
     
    And how will GWASs fare for traits involving dose-dependent epistatic interactions between 5 loci? There aren’t enough humans in the world to sequence. Delight in brute-force while it lasts — I’m with you — but I’d be leery of letting that glee lead to devaluation of genuine insight. GWASs and their ilk should be seen as ways to build insight, not as insight’s alternative.

  15. Instead of abandoning the approach, why not use GWASs to fix it? Unless you believe that *there are no new predictive principles to be learned*. 
     
    actually, I think we’re pretty much in complete agreement! this is the exploratory data analysis thing I linked to before–generate tons of data pre-hypothesis, then play around to find the hypotheses worth testing. the idea is that this approach will lead to broader predictive principles.

  16. in fact, I’d bet that in the future, any study of a trait in humans will start from something akin to a GWA study–either mining existing datasets or conducting your own. if you want to understand the biology of a trait, a well-conducted screen of the genome will narrow down “hypothesis space” quite a bit.

  17. *nod*. damn, concurrence is so boring. ;)

a