Posts with Comments by p-ter
How much of the genome is transcribed? Or, the utility of a good genome browser
yeah, i was looking at hg18. the encode pseudogene track seems to be the same thing i'm linking to, except limited to the encode pilot regions. odd.
How do scientists distinguish introns and exons?
if you sequence a processed mRNA, you can align the sequence back to the genome--the things included in your sequence are exons, and the big gaps are introns.
Genome-wide association studies work
you know, it's possible to argue about the relative importance of rare variants versus common variants in different diseases. fair enough. McClellan and King, however, are arguing that many (most?) associations between common polymorphisms and diseases are false positives. this is a very different point, and one that is (luckily) wrong.
a few examples of mistakes:
Many SNPs, inversions/deletions (indels), and short tandem repeats vary widely in allele frequency among populations. This is especially true for variants that are not in coding or regulatory regions because these alleles vary with population clusters in patterns more consistent with neutral drift and migration rather than with selection (Coop et al., 2009). The colonization of the world by modern humans was carried out by a series of founder populations with subsequent rapid expansion of population size. Neutral alleles emerging at the forefront of these expansions “surfed” waves of population growth. Variations in allele frequencies across populations stem from differences in the timing of the variant's emergence in the expansion.
I have no idea what to make of this paragraph. First, it's not clear what this has to do with GWAS. Yes, sometimes allele frequencies are different between Africa and Europe, but this should not affect a GWAS in Europeans, for example. Second, Coop et al. (2009) does not claim that most large allele frequency differences between populations are neutral (though again, why does it matter for GWAS whether they're neutral or not?). Third, most genetic variation in humans was present in Africa before humans expanded out, so the variation in allele frequencies has absolutely nothing to do with the "timing" of the emergence of the allele.
We suggest that associations based on such highly variable SNPs are often artifacts of cryptic population stratification. Wang et al. argue that standard GWAS strategies have been adopted to control for population stratification. However, these methods control by person, not by SNP. Because populations from large geographic areas (e.g., Europe) are genetically heterogeneous, outlier SNPs that vary widely among subgroups of such populations are not excluded by these methods and often drive positive associations.
My emphasis. Where is the evidence of this? How can you make an off-hand comment like, "Oh, by the way, i've found the fundamental flow in modern genetics", when no one agrees with you and you have no evidence? For the case of the autism locus discussed, Wang et al. make a very clear case that 1) it's not extremely variable between populations, and 2) that it doesn't really matter if it were. For example, are all the 96 loci identified in the lipids study false positives due to population structure? Of course not; probably none of them are.
Both Klein et al. and Wang et al. suggest that the vast majority of GWAS risk alleles are in LD with causal mutations, and that intergenic and i
More....
More....
so how does this happen? they have to pass their letters to some other people for second opinions, right?
in this case, I think they must just be insulated. and since it's a response to letters to the editor, i don't think anyone else has to see it before its published.
My reading is that neither rebuttal letter addressed McClellan and King’s point about lack of reproducibility between studies — and this new study of blood lipid loci may be as false-positive-prone as others cited in the original editorial and the counter-rebuttal.
the current standard in GWAS is to reproduce an association in multiple cohorts before publication (in the lipid study, the authors go even beyond this and replicate the associations in several non-European populations). McClellan and King are simply misinformed about how reproducible these results are.
I’m guessing that most GWAS investigators had their fingers crossed for coding region mutations (like any other forward geneticists)
yes, i imagine they were. that said, biology is complicated, such is life. Instead of being disappointed, I think instead that this is an exciting time for human biology.
Second, as an experimental biologist, I share McClellan and King’s skepticism that effects with odds ratios in the 1.33 range will ever be understandable in a mechanistic sense.
fair enough. time will tell; i'm an optimist. look at the paper i linked to above, for example. it's a great example of the cool biology that can come out of these things
http://www.nature.com/nature/journal/v466/n7307/abs/nature09266.html
Is the “missing heritability” right under our noses?
Only additive genetic variance contributes to the (narrow-sense) heritability. So non-linear effect are not directly relevant here, though they may be important in other situations.
The Times on the human genome at 10
that one's a classic :)
“We started with a very strong bias against mixture”
There was no positive genetic data of admixture, even after the first analyses of nuclear loci, and you can make non-silly justifications about why there wouldn't be any (maybe hybrids had very decreased fitness, for example--even an advantageous locus has to last through the first few generations when it's linked to the rest of the introgressed genome).
Or maybe they just convinced themselves there wasn't any admixture so they could enjoy the surprise of finding it :)
eh, if you look for evidence of mixture (as they did previously, using a bunch of nuclear loci, not just mtDNA) and don't find it, you come up with reasons why not. obviously their prior belief about the probability of admixture wasn't 0, or they wouldn't have even bothered looking. so maybe your prior was 90% on admixture and theirs was 10%; fair enough.
How do non-genic polymorphisms influence disease risk?
but I can’t help thinking that if we are seeing the majority of these mutations in regions with no obvious biological significance, then the big question is why there and not as frequently in the regions that do have biological significance?
so take the example of celiac disease in my point 2. I don't know how many of the SNPs they identified would have "obvious" biological significance if you were to look at them; my guess is few of them. But that's beside the point--when they went and assayed gene expression in a relevant tissue, they found that the SNPs influenced it (or half of them did). Now it doesn't matter if the SNPs have any "obvious" function a priori--they clearly have some function we're not good at identifying yet from genome sequence alone.
Or is it as the authors suggest – that because the effects tend to be on the order of an odds ratio of 1.5 or less, what we are seeing could as easily be due to cryptic population stratification causing spurious associations?
this is the most bogus of the authors' claims, since it's readily testable. In fact, people have looked to see whether SNPs associated with disease show more population structure than random SNPs (in the context of wondering if the structure is due to selection, but the point remains). In general, the answer is no, SNP associated with disease look about the same as your average SNP, with a few interesting exceptions. see here:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440747/
http://genome.cshlp.org/content/19/5/826
the authors identified a single SNP (one of the signals from a GWAS that hasn't replicated) that they think might be due to population structure, but this is clearly a cherry-picked example (and not even a conclusive one at that)
What comes to your point three, I’d be more optimistic and say the two papers below pretty much nailed the colorectal cancer risk function for the SNP on 8q24
good point, i'd forgotten the exact results from those two papers
Good question. I think the data is there, but I don't think anyone has done a really careful analysis of this sort. maybe something like this is the closest that's been published:
http://www.pnas.org/content/106/23/9362.long
this question would definitely be worth revisiting.
interesting. I gave McClellan and King the benefit of the doubt and assumed were using the standard shorthand when describing an association (ie. I thought when they said "associated SNPs have no known function" they meant "neither the associated SNPs, nor any correlated SNP in the region, have a known function". )
On re-reading, I think you might be right that they're actually actually wondering why all associated SNPs aren't functional. On the other hand, they're certainly familiar with linkage studies, which also use random polymorphisms as markers to track the inheritance of functional ones, so they must see the analogy, right?
Common versus rare variants, again
strictly speaking, every allele has some selection coefficient. the number of alleles with selection coefficient X at some frequency Y depends on the population size, the mutation rate, blah, blah...i'm sure you're familiar with this. alleles with a small effect on risk are nearly neutral.
>As it happens, very few such associations have emerged from GWAS for psychiatric disorders, indicating a small contribution of common variants to overall phenotypic variance
again, this is a non sequitur--an alternative is that common variants have very small effects.
>On the other hand, an increasing number of very rare mutations with large effects have been and continue to be discovered
this is certainly true. but rare mutation, by definition, contribute very little to the overall phenotypic variance. let's say we do full genome resequencing from 500 schizophrenia cases and 500 controls (presumably this will happen in the next few years). A number of interesting things will be identified, without a doubt. What fraction of the phenotypic variance do you predict these things will explain?
>Common variants are common because they are almost invariably neutral.
right. and a polymorphism with a slight effect on disease risk is essentially neutral. neutral != no phenotypic effect.
Natural selection and recombination
>A distinction we might make is the a new mutation may be better or (more likely) worse than the “wild” type, in the former case we expect “positive” selection and in the latter “negative”.
Yep, that's what I mean.
>Of course they’re not finding new selected alleles; they’re seeing the effect of linkage with selected alleles which probably have not been identified.
Right. So we agree, it has nothing to do with low frequency, geographically restricted alleles :)
mean Fst between china and japan in the hapmap is ~0.005, fwiw.
>mean Fst between china and japan in the hapmap is ~0.005, fwiw.
So in light of that, let me give a different example (I admit I sort of randomly pulled those initial numbers out of the air without much thought):
Allele 1: 60% in Japan, 50% in China
Allele 2: 55% in Japan, 50% in China
The observation (again, I think) in this paper is that the former are more common in regions of low recombination compared to the latter.
ok, i see the argument. yes, that's plausible. It's also consistent with background selection on new deleterious alleles, and positive selection on standing variation. probably a weighted average of all these effects, and the weights are unclear.
There are no common disorders (just extremes of quantitative traits)
>Still, there are limitations to this approach, namely: “for most disorders, we do not know what the relevant quantitative traits are”.
this is more of a fatal flaw rather than a limitation, no?
1 million SNPs to bind us all
hm, that figure looks familiar.

Recent Comments