Saturday, April 12, 2008

SNPs don't lie   posted by gcochran @ 4/12/2008 10:25:00 PM

There was an interesting paper in BMC Genetics back in in February: "Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping. " They ran 500K Affy chips on 100 Ashkenazi women and on 60 CEPH-derived HapMap (CEU) individuals. They hoped to find greater levels of linkage disequilibrium and lower haplotype complexity among the Ashkenazim, as a putatively bottlenecked population. This would simply some forms of genetic mapping. Some earlier work had suggested that this might be the case - but that earlier work had either looked at a single chromosome or at a small samples from a number of chromosomes.

The expected pattern is not there. Average LD is very similar in the two populations, although it varies from chromosome to chromosome. It's slightly smaller among the Ashkenazi at short distances, slighter greater for longer distances, but overall very similar, as you can see.

There were somewhat _more_ haplotype blocks among the Ashkenazi sample, not fewer.
You would expect a bottlenecked population to have more monomorphic sites, but the Ashkenazi sample had noticeably fewer, 9.1 % versus 12.4 %.

Altogether, the paper concludes that "These data are more consistent with the AJ as an older, larger population than CEU. " Which means that there is no sign of any bottleneck in this data. The paper, obviously written by several people, _refers_ to several bottlenecks that have been discussed in earlier studies, but this measurement set contains thousands of times more data than those earlier studies. If there had been a bottleneck, they would have seen it, and if they don't see it, there must not have been one.

They see very significant gene frequency differences in a couple of fair-sized regions: LCT and and HLA. Those differences were of course generated by selection. There are differences in smaller regions at a number of other positions, and long homozygous regions in the Ashkenazi sample average about 20% longer - so at least some of their long haplotypes are younger.

Fact: we find long haplotypes around the mutations causing common Ashkenazi diseases, on the order of one to ten Mb.

Bottlenecks affect the whole genome, but selection only affects a small fraction. Selection would not change genome-wide LD much, would not much increase the number of monomorphic sites, but it could generate long haplotypes around selected mutations.

The authors think that these differences "reflect the impact of both selection as well as genetic drift." - but there is, as far as I can tell, no evidence of drift in this data at all. Perhaps I'm missing something.

This SNP study (and others) also shows that Ashkenazim are genetically distinct from other Europeans, which allows fairly accurate identification of group membership. Almost perfectly distinct, if you look at Ashkenazim whose grandparents are all Ashkenazi (the violet dots). Obviously, there was low inward gene flow for a long time, but that has increased a lot in the last century. Distinct local selection pressures could have caused noticeable change when gene flow was that low.

Check out this figure, from a recent paper in PLOS Genetics ( Tian et al, Analysis and Application of European Genetic Substructure Using 300 K SNP Information):

Heny Harpending and I came to these same conclusions several years ago, using a far smaller data set: the evidence indicated low gene flow that would allow local selection, and we found no evidence for - indeed, solid evidence against - the kind of bottleneck that would explain the observed spectrum of genetic disease among the Ashkenazim. Which leaves selection as the only explanation - but selection for what?