Based on the comments on my previous post, I’m going to lay out an argument which I find reasonable for sequencing studies in human disease:
Let’s follow Goldstein’s back-of-the-envelope calculations: assume there are ~100K polymorphisms (assuming Goldstein isn’t making the mistake I attribute to him, this includes polymorphisms both common and rare) that contribute to human height, that we’ve found the ones that account for the largest fractions of the variance, and that these fractions of variance follow an exponential distribution.
Now, assume you have assembled a cohort of 5000 individuals and done a genome-wide association study using common SNPs. You find some interesting things, but you want more. Now, you have two choices: sequence those 5000 individuals to look for rarer variation, or increase sample size to 20,000 and perform another association study using the same set of common polymorphisms.
As Daniel Macarthur points out, you’ve not yet sucked every drop of marrow out of those 5000 individuals: there are presumably some (many?) rarer SNPs that have modest effect sizes (in sense 2 from this post), and thus account for measurable (though still small) fractions of the variance in your trait. Those are low-hanging fruit for you to find if you pony up the cash for some sequencing (the price of which keeps dropping). This is especially true if there are more rare variants than common ones that influence the trait, as is likely the case (there’s more rare variation than common variation overall). So instead of spending on scaling up your sample size, spend on sequencing, and have impact now.
Is this along the lines the argument Goldstein is making? I don’t really think so, but welcome comment. In any case, the choice above is somewhat arbitrary–if you want to look for very rare variation, you need a sample size larger than 5000 anyways, and if you’re sequencing, you’re obviously not just going to look at the rare variants since the common ones come along for free.