Posts with Comments by Daniel MacArthur
Another candidate gene association bites the dust
Hi Nanonymous,
The 5-HTT literature was sufficiently large for negative studies to have been published; but in general, if you plot the odds ratios for each published study on one of these associations they tend to start off high and then slowly converge to 1.0 over time.
Similar results occur in genetic association studies of other traits - e.g. from a large meta-analysis of Alzheimer's disease genes:Overall, 19 meta-analyses (all with eventually nonsignificant summary effects) had a documented excess of O over E: Typically single studies had significant effects pointing in opposite directions and early summary effects were dissipated over time. Here, "excess of O over E" means an excess of observed significant results over the expectation given the estimated effect size (calculated by combining all studies).
To be fair, having had a chance to scan through the MAOA literature in a bit more detail this morning, I concede that the evidence for an association between the VNTR variant and antisocial behaviour is substantially more consistent than most of these associations. This may well be one of the rare cases of genuine associations stumbled across by researchers doing candidate genes studies; but in this case it is very much the exception rather than the rule.
The 5-HTT literature was sufficiently large for negative studies to have been published; but in general, if you plot the odds ratios for each published study on one of these associations they tend to start off high and then slowly converge to 1.0 over time.
Similar results occur in genetic association studies of other traits - e.g. from a large meta-analysis of Alzheimer's disease genes:Overall, 19 meta-analyses (all with eventually nonsignificant summary effects) had a documented excess of O over E: Typically single studies had significant effects pointing in opposite directions and early summary effects were dissipated over time. Here, "excess of O over E" means an excess of observed significant results over the expectation given the estimated effect size (calculated by combining all studies).
To be fair, having had a chance to scan through the MAOA literature in a bit more detail this morning, I concede that the evidence for an association between the VNTR variant and antisocial behaviour is substantially more consistent than most of these associations. This may well be one of the rare cases of genuine associations stumbled across by researchers doing candidate genes studies; but in this case it is very much the exception rather than the rule.
p-ter is spot on here. The vast, vast majority of published studies in the psychiatric genetics literature prior to the advent of GWAS studies (i.e. almost all of it) are certainly wrong. The wide-scale "replication" of variant-trait associations is predominantly due to publication bias rather than an indicator that these associations are genuine.
In fact, simply discarding the last fifteen years of this literature would have a net positive effect on science as a whole: the effort expended on chasing up false positives far outweighs the benefits of the few genuine associations stumbled across by researchers. We'd be better off simply ignoring it all and waiting for GWAS and (eventually) whole-genome sequencing studies to give us a clear picture of the genetic architecture of these traits.
As for MAOA: I wouldn't bet my house on this not turning up in well-powered GWAS for violent behaviour (as p-ter predicts), but I'd say the odds of this happening are much better than 50%.
In fact, simply discarding the last fifteen years of this literature would have a net positive effect on science as a whole: the effort expended on chasing up false positives far outweighs the benefits of the few genuine associations stumbled across by researchers. We'd be better off simply ignoring it all and waiting for GWAS and (eventually) whole-genome sequencing studies to give us a clear picture of the genetic architecture of these traits.
As for MAOA: I wouldn't bet my house on this not turning up in well-powered GWAS for violent behaviour (as p-ter predicts), but I'd say the odds of this happening are much better than 50%.
An argument for searching for rare variants in human disease
Sure - chips are already being designed based on 1KG pilot data. Given Illumina's speedy production cycle I'd guess these will be available in late 2009, but bear in mind that pilot data = low coverage sequencing of 60 individuals per HapMap population, so it will still be missing a pretty large proportion of the 0.1-1% variants.
To get a better handle on those we'll have to wait for the final 1KG data (500 individuals per pop, including high coverage of all exons), which I guess would be converted into chips some time in 2010. Of course, if sequencing costs keep falling at current rates we may never end up using those chips...
To get a better handle on those we'll have to wait for the final 1KG data (500 individuals per pop, including high coverage of all exons), which I guess would be converted into chips some time in 2010. Of course, if sequencing costs keep falling at current rates we may never end up using those chips...
we've found the ones that account for the largest fractions of the variance, and that these fractions of variance follow an exponential distribution
Not sure about the first part of that sentence - we haven't necessarily found low-frequency SNPs that account for a substantial chunk of the variance. For instance, a 1% variant would be largely invisible to current GWAS, even if it had a large effect size (let's say a per-allele increase of 1 SD, or 2 inches of height, which would mean it explained 2% of the total variance in height - very respectable compared to most common variants). No doubt we've already picked up some of these through tagging (in which case they would look like common SNPs with small effects), but there would be plenty that we've missed.
So we have a whole swathe of variants for which we don't currently know the empirical distribution of effect sizes, but for which there are fairly good theoretical reasons (i.e. selection against common risk variants) to expect larger per-allele effects. Perhaps Goldstein believes that the per-allele effect sizes of these rare variants won't follow the same exponential distribution, in which case fewer overall variants would be required and sequencing will have a dramatically better yield than GWAS. I don't know enough quant genetics to know if this is at all plausible.
As for Goldstein's overall argument - my impression is that he wants the money currently going into ever-larger GWAS (now approaching 100,000 individuals for some disease/traits, with an overall cost probably exceeding $1000 per individual counting labour, infrastructure etc.) to be diverted instead into both the development and the application of sequencing technology.
I'm fairly skeptical about this argument; there isn't necessarily a conflict between the two approaches. Performing GWAS while sequencing technology matures (which is happening incredibly rapidly anyway) seems a good way to go; it provides some yield in terms of risk variants, and it justifies to funding bodies the collection of the large, well-phenotyped sample sets that will be required for sequencing anyway. (And related to Steve's point: most GWAS consent forms now include broad consent for other analyses including whole-genome sequencing.)
One last point:
if you want to look for very rare variation, you need a sample size larger than 5000 anyways
Not necessarily. This is definitely true if you want to obtain significance for a single variant, but assuming that rare disease-causing variants tend to cluster in disease-related genes you don't necessarily need to do this; you can just treat each protein-coding gene as a unit and then aggregate all of the low-frequency coding/regulatory variants within it and treat them as a single common variant. This is insensitive for all sorts of reasons, but it might pick
More....
Not sure about the first part of that sentence - we haven't necessarily found low-frequency SNPs that account for a substantial chunk of the variance. For instance, a 1% variant would be largely invisible to current GWAS, even if it had a large effect size (let's say a per-allele increase of 1 SD, or 2 inches of height, which would mean it explained 2% of the total variance in height - very respectable compared to most common variants). No doubt we've already picked up some of these through tagging (in which case they would look like common SNPs with small effects), but there would be plenty that we've missed.
So we have a whole swathe of variants for which we don't currently know the empirical distribution of effect sizes, but for which there are fairly good theoretical reasons (i.e. selection against common risk variants) to expect larger per-allele effects. Perhaps Goldstein believes that the per-allele effect sizes of these rare variants won't follow the same exponential distribution, in which case fewer overall variants would be required and sequencing will have a dramatically better yield than GWAS. I don't know enough quant genetics to know if this is at all plausible.
As for Goldstein's overall argument - my impression is that he wants the money currently going into ever-larger GWAS (now approaching 100,000 individuals for some disease/traits, with an overall cost probably exceeding $1000 per individual counting labour, infrastructure etc.) to be diverted instead into both the development and the application of sequencing technology.
I'm fairly skeptical about this argument; there isn't necessarily a conflict between the two approaches. Performing GWAS while sequencing technology matures (which is happening incredibly rapidly anyway) seems a good way to go; it provides some yield in terms of risk variants, and it justifies to funding bodies the collection of the large, well-phenotyped sample sets that will be required for sequencing anyway. (And related to Steve's point: most GWAS consent forms now include broad consent for other analyses including whole-genome sequencing.)
One last point:
if you want to look for very rare variation, you need a sample size larger than 5000 anyways
Not necessarily. This is definitely true if you want to obtain significance for a single variant, but assuming that rare disease-causing variants tend to cluster in disease-related genes you don't necessarily need to do this; you can just treat each protein-coding gene as a unit and then aggregate all of the low-frequency coding/regulatory variants within it and treat them as a single common variant. This is insensitive for all sorts of reasons, but it might pick
More....
Notes on the Common Disease-Common Variant debate: two years later
Hey p-ter,
I'm not sure that Goldstein actually directly conflates per-allele effect size with the fraction of variance explained.
I can't say for sure, but I imagine he's thinking about variants at a frequency of perhaps 0.1% with a per-allele effect of, say, 0.5 SD (about an inch of adult height, IIRC). Each such variant would explain 0.5% of the population variance and yet would be essentially completely undetectable using current GWAS.
There are various other permutations of frequency and effect size that would have the same sort of effect - basically, a lot of the variants that fall (in terms of both frequency and per-allele effect size) somewhere in the vast grey area between completely penetrant Mendelian variants and common variants with very low ORs.
Will these variants explain all of the missing heritability? I strongly doubt it, but I do think it's reasonable to expect them to explain a reasonable chunk of it - and as I said at the end of my post, because of their large effect sizes these variants will actually be much more useful than common low-OR variants at informing individual health predictions.
I'm not sure that Goldstein actually directly conflates per-allele effect size with the fraction of variance explained.
I can't say for sure, but I imagine he's thinking about variants at a frequency of perhaps 0.1% with a per-allele effect of, say, 0.5 SD (about an inch of adult height, IIRC). Each such variant would explain 0.5% of the population variance and yet would be essentially completely undetectable using current GWAS.
There are various other permutations of frequency and effect size that would have the same sort of effect - basically, a lot of the variants that fall (in terms of both frequency and per-allele effect size) somewhere in the vast grey area between completely penetrant Mendelian variants and common variants with very low ORs.
Will these variants explain all of the missing heritability? I strongly doubt it, but I do think it's reasonable to expect them to explain a reasonable chunk of it - and as I said at the end of my post, because of their large effect sizes these variants will actually be much more useful than common low-OR variants at informing individual health predictions.
Inbreeding over time
Yuck... that'll teach me to not check out the supplementary data before posting on an article.
Like you say, the result seems plausible (in fact I'd be shocked if there wasn't SOME decrease in autozygosity over time), but it's a good example of highly selective reporting of data. I'm a little surprised it got past the referees.
Like you say, the result seems plausible (in fact I'd be shocked if there wasn't SOME decrease in autozygosity over time), but it's a good example of highly selective reporting of data. I'm a little surprised it got past the referees.

Recent Comments