Tuesday, January 26, 2010
There's a bit of press surrounding the interesting result from David Goldstein's group that, in certain situations, a number of "rare" (defined as an allele frequency less than 5% ) variants influencing a trait can lead to an association signal at "common" SNPs. This phenomenon they authors call a "synthetic association".
The authors claim this is potentially the cause of many of the associations found in genome-wide association studies (with common SNPs), as well as a potential solution to the "missing heritability problem" (this isn't mentioned in the paper itself, but rather in a Times article describing it). In other words, this could be a panacea for all the ills of the human genetics community. Unfortunately, this seems rather unlikely.
1. There are a range of parameter values for which "synthetic associations" are plausible--where the effect of the rare variants is small enough to have avoided detection by linkage studies but big enough to show up via correlation with common variants. This range of parameters is kind of small--from Figure 2, it looks like maybe a set of mutations at a gene with a genotypic relative risk greater than 2 but less than 6. Will this be the case for some loci? Sure, that sounds plausible. Is it going to explain everything? No, of course not.
2. It has been pointed out (rightly) that diseases that are selected against should have their genetic component enriched for rare variants. Goldstein himself has made this argument about diseases like schizophrenia. So if schizophrenia has all these rare variants, and rare variants cause rampant "synthetic associations" at common SNPs, why hasn't anyone picked up whopping associations using common SNPs in schizophrenia?
3. The sickle cell anemia example, as presented in the paper, is extremely misleading. It seems the authors did a simple case control test for sickle cell in an African-American population. Recall that African-Americans are an admixed population, with each individual carrying large chunks of "European" and "African" chromosomes. Anyone will sickle cell will have at least one block of African chromosome surrounding the beta-globin locus, while those without will have two chromosomes sampled from the overall distribution of chromosomes in the population--15-20% of which, approximately, will be of European descent . So any SNP with an allele frequency difference between African and European populations in this region will show up as a highly significant association with the disease due to the way they've done the test, and these associations will extend out to the length of admixture linkage disequilibrium--well, well beyond the LD found in African populations alone. The presentation of this example in the paper--the large block of association contrasting with the small blocks of LD in the Yoruban population--is a bit silly.
If I had to guess, and put a concrete bet on how this will play out, let's take the associations listed in their Table 1, which they call candidates for being due to synthetic associations. My bet: none of them are. Ok, maybe one.
 These sorts of thresholds are important to watch--in a year people will be calling things at 1% frequency "common" if it suits them for rhetorical purposes.
 Corrected from: "... will have two large blocks of "African" chromosomes surrounding the beta-globin locus, and everyone without will have at least one European chromosome in the same area"; see comments.