SNPs don’t lie

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

There was an interesting paper in BMC Genetics back in in February: “Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping. ” They ran 500K Affy chips on 100 Ashkenazi women and on 60 CEPH-derived HapMap (CEU) individuals. They hoped to find greater levels of linkage disequilibrium and lower haplotype complexity among the Ashkenazim, as a putatively bottlenecked population. This would simply some forms of genetic mapping. Some earlier work had suggested that this might be the case – but that earlier work had either looked at a single chromosome or at a small samples from a number of chromosomes.

The expected pattern is not there. Average LD is very similar in the two populations, although it varies from chromosome to chromosome. It’s slightly smaller among the Ashkenazi at short distances, slighter greater for longer distances, but overall very similar, as you can see.

There were somewhat _more_ haplotype blocks among the Ashkenazi sample, not fewer.
You would expect a bottlenecked population to have more monomorphic sites, but the Ashkenazi sample had noticeably fewer, 9.1 % versus 12.4 %.

Altogether, the paper concludes that “These data are more consistent with the AJ as an older, larger population than CEU. ” Which means that there is no sign of any bottleneck in this data. The paper, obviously written by several people, _refers_ to several bottlenecks that have been discussed in earlier studies, but this measurement set contains thousands of times more data than those earlier studies. If there had been a bottleneck, they would have seen it, and if they don’t see it, there must not have been one.

They see very significant gene frequency differences in a couple of fair-sized regions: LCT and and HLA. Those differences were of course generated by selection. There are differences in smaller regions at a number of other positions, and long homozygous regions in the Ashkenazi sample average about 20% longer – so at least some of their long haplotypes are younger.

Fact: we find long haplotypes around the mutations causing common Ashkenazi diseases, on the order of one to ten Mb.

Bottlenecks affect the whole genome, but selection only affects a small fraction. Selection would not change genome-wide LD much, would not much increase the number of monomorphic sites, but it could generate long haplotypes around selected mutations.

The authors think that these differences “reflect the impact of both selection as well as genetic drift.” – but there is, as far as I can tell, no evidence of drift in this data at all. Perhaps I’m missing something.

This SNP study (and others) also shows that Ashkenazim are genetically distinct from other Europeans, which allows fairly accurate identification of group membership. Almost perfectly distinct, if you look at Ashkenazim whose grandparents are all Ashkenazi (the violet dots). Obviously, there was low inward gene flow for a long time, but that has increased a lot in the last century. Distinct local selection pressures could have caused noticeable change when gene flow was that low.

Check out this figure, from a recent paper in PLOS Genetics ( Tian et al, Analysis and Application of European Genetic Substructure Using 300 K SNP Information):

Heny Harpending and I came to these same conclusions several years ago, using a far smaller data set: the evidence indicated low gene flow that would allow local selection, and we found no evidence for – indeed, solid evidence against – the kind of bottleneck that would explain the observed spectrum of genetic disease among the Ashkenazim. Which leaves selection as the only explanation – but selection for what?


Labels:

33 Comments

  1. but selection for what? 
     
    From the pictures, I would say: for bad hair… 
     
    Also: am I correct in interpreting this PCA plot as “Irish and Greeks are closer to Ashkenazi than to each other” ? At least that’s what the naive Euclidean distance seems to say.  
     
    Something strange: both PC1 and PC2 are strongly correlated with latitude for Europeans; they only seem to differ in that one puts Ashkenazi with Northerners, while the other puts them with (indeed, beyond) Mediterraneans.  
     
    Obvious null hypothesis: Ashkenazi are just Mediterraneans who have assimilated some Northern European genes, simply because, well, that’s where they lived, so (limited) gene flow and selection pressures encouraged the inclusion and propagation (respectively) of those Northern alleles. If that is the case, then they’re not really that different from “Europeans” – just different from any single European sub-group, and the Euclidean distance in the PCA plot is misleading. 
     
    (This comment offered to you by the Knights who say “Null!”)

  2. Selection for the ability to persist in Ashkenazi society, which includes managing a functional relationship not only with other Ashkenazi but the greater society in which you exist. 
     
    This can be selection for as well as selection against.

  3. I find no mystery in the selection process. Look at the 5 or 6 million Jews in America. Whose grandchildren will be Jews in one hundred years? Only those who want to and are ready to make important sacrifices for it in their personal lives. Those who are able to live a life of limitations (for example, not working on the Shabbath) and marry Jewish and have children. Maybe 20% of the 6 million American Jews. What is being selected for? The need to belong to a community, I assume.

  4. I had thought it obvious, but the question is what traits were favored by selection in the_past_, not in the present. Generally speaking, gene frequencies and such are determined by things that happened in the past, unless you can find matter with negative energy density.

  5. That graph showing that the two groups are essentially identical for how much LD there is at short or long distances, is from the supplementary info. 
     
    All throughout that novella of a paper, they keep hyping up the difference: “One group shows higher LD at short distances, the other at long distances.” But then you rummage through the supplementary info to see what the magnitude is — and they’re the same.

  6. It’s really too bad they didn’t include a comparable group of pure Mid-Easterners/Levantines in the plot. 
     
    It would be interesting to see if the Ashk Jews did indeed tend to fall roughly half-way between Europeans and Mid-Easterners, as their crude raw-ancestry fractions might suggest. 
     
    BTW, it’s funny that Von Neumann’s face is so much less famous than many of his comparable contemporaries. I didn’t recognize it myself until I checked the name of the BMP image.

  7. well since this is greg Cochran posting this the obvious answer is intelligence.  
     
    I have no training in this sort of thing but if you are going to say that Jews have genetic diseases because of selection for high intelligence can’t you just check to see if people with the genes associated with those diseases have higher iqs?

  8. I remember that at least one of the Jewish genetic diseases is associated with higher IQs. maybe g cochran will write back and tell us which one it is. 
     
    In a very unscientific observation of my own Ashkenazi family and others I know, I wonder if we Ashkenazim also ended up with a lot of psychiatric problems and learning disabilities as a by product of selecting for high intelligence. I know LOTS of very smart Ashkenazim who also have manic-depression or ADHD. Does any one know if Jews tend to have more of these types of disorders than average? It seems like we do! g cochran, any insight?

  9. Schizophrenic personality is correlated with creativity. Thinking out of box is needed for creativity. But too much out of reality are delusion and hallucination which are signs of psychosis. Without math basis, Einstein’s relativity would have looked like psychotic idea.

  10. “I remember that at least one of the Jewish genetic diseases is associated with higher IQs. maybe g cochran will write back and tell us which one it is.” 
     
    Torsion Dystonia, if I remember correctly. 
     
    Also, I believe there was some data showing Israeli Jews with a certain affliction were substantially more likely to be physicists (or perhaps it was physicians…).  
     
    It’s been a while since I’ve read the original paper.

  11. refresh your memories: 
    http://homepage.mac.com/harpend/.Public/AshkenaziIQ.jbiosocsci.pdf

  12. Granted, the first three guys are geniuses, but haven’t you guys ever watched “Horsefeathers”? That Professor Quincy Adams Wagstaff from Huxley U. is a moron.

  13. OK, I’ll admit it — I didn’t get that the title was a play on “Hips don’t lie” until just now. Anyone else want to admit it too?

  14. ok, i didn’t realize it was a play on that either….

  15. While somewhat informative, two-dimensional plots are by themselves often misleading. The main reason being that components that are small for the whole continental or global sample can be very important and even dominant locally, and vice versa: components that are large globally can be ridiculously small locally, creating a very distorted graph (in relation with reality). 
     
    A relevant illustrative case is Bauchet et al, 2007 (http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17436249) (that also has autosomal data on Askenazi Jews, btw) where (fig 4) you can see how the PC1/PC2 plot happens to be meaningless for many populations when contrasted with the K=5 or K=6 Bayesian graphs.  
     
    Basically the two main components in the European sample are Eastern Mediterranean and Finnic but, when you look at the Baysian structure, you see at least three other clusters (Central-Northern, Iberian, Basque) that are not visible at all in the 2-PC diagram.  
     
    In the 2-PC diagram, for instance Basques tend to be very spread around but when looking at the K=5 structure you see that’s just because they have virtually none of the two main components, being instead internally very homogeneous (the main admixture would be with Iberians, but that’s also not visible in the PC1/PC2 diagram).  
     
    In Baucher’s study, Askenazis look virtually the same as Greek or Armenians. But this may be due to lack of sampling and resolution in West Asia. I wonder why in Baucher’s they fall side by side with Eastern Mediterraneans, while in this new study they are totally apart. Choice of markers? US Askenazis genetically different from Euro-Israeli ones?

  16. Maybe Jewish history selected for neuroticism (Big 5)? It could even be something as recent as the Holocaust. Remember the old Jewish guy in Cabaret who says ‘oh, this will pass’ as Weimar Germany starts to go bad? Optimistic, and look what happened to him. 
     
    Y’know, as soon enough time passes that people don’t go nuts over it, it’d be interesting to see the selective effects of the Holocaust. That’s one heck of a bottleneck, and we know exactly when it happened and why.

  17. Diseases listed in the article, in just one sentence: 
     
    “A sample of Gaucher disease patients show a startling occupational spectrum of high IQ jobs, and several other Ashkenazi disorders, idiopathic torsion dystonia and non-classical adrenal hyperplasia, are known to elevate IQ.”

  18. Sorry I should have been more specific in my previous comment but I was typing from a friend’s iPhone during a break playing Rock Band. (God I am such a nerd.)  
     
    Anyway, I remember reading in the original paper about the Gaucher patients being way over-represented in professions requiring high IQs. But the researchers didn’t actually give Gaucher patients IQ tests (which might not have been practical since the patients are Israelis, IIRC, and the researchers are Americans). Still, as an interested layman, wouldn’t such tests go a long way toward showing that the alleles associated with Gauchers actually are IQ-boosting?  
     
    More generally, couldn’t someone test the IQ of people with the alleles associated with the four sphingolipid mutations referred to in the article, then, based on the frequency of said alleles in the Ashkenazi population relative to their frequency in the European population, calculate a rough estimate of the former’s net IQ gain? 
     
    I know it’s probably not as easy as all that, I’m just curious. And kind of surprised the Israelis aren’t already doing this.

  19. I know LOTS of very smart Ashkenazim who also have manic-depression or ADHD. Askenazis look virtually the same as Greek or Armenians. But this may be due to lack of sampling and resolution in West Asia.

  20. Were there any differences in the partner selection between Askenazi Jews and gentile Europeans in medieval Eastern Europe? Maybe the selection wasn’t through romantic love, but some kind of pre-arranged marriage. Maybe it was the parents who decided who was a good partner and who wasn’t.  
     
    If I’d choose a partner for my son or daughter, I’d use reason alone in partner selection. Top of my list would be wealth and good manners. Stuff like that correlate with IQ, is it not?

  21. The Holocaust was not a genetic bottleneck, any more than the Black Death was. In both cases, millions survived.  
     
    Genetic representation in the next generation is analogous to taking a poll. Suppose that exactly half of the US population are Blues and half are Greens. If we poll 100 million people, we’ll get answers that are extremely close to the true split (50-50) almost every time. In fact the chance of a 1% error is less than one in a zillion. If we poll 50 million people, that is still the case.  
     
    On the other hand, if we interview 1000 people, we will be off by more than 3% about 5% of the time. 
     
    In order for there to be a 1% probability of something like Tay-Sachs (a recessive lethal with a 2% gene frequency) you have to invoke really tight bottlenecks. Montgomery Slatkin used a historical population model with _two_ population bottlenecks. The first bottleneck in that model was the founding of the Roman Jewish population, which he modeled using a range of population sizes (150, 600, and 3,000), with a second bottleneck around 1350 (600, 3,000, and 6,000). These are the census sizes: he assumes that the effective population size was 1/3rd of census size.  
     
    You need to have populations in those size ranges – low thousands or fewer – to have any significant chance of perturbing gene frequencies enough to get the sort of mutation spectrum we see among the Ashkenazim. Even then you won’t see clustering in a couple of metabolic pathways. And those scenarios have other effects on genetic statistics, effects we do not observe.  
     
    Moreover, any really strong bottleneck would make a population somewhat dumber.

  22. Greg: You need to present real evidence explicitly proving how particular mutations increase IQ complete with biologically plausible mechansisms, and detail the population history of these lineages. Arguments from exclusion will only get you so far. 
     
    Moreover a profound contraction/bottleneck was already proven two years ago where it was shown that almost half of present day Ashkenazi Jews or 8 million people are descended from just four women. 
     
    The matrilineal ancestry of Ashkenazi Jewry: portrait of a recent founder event. 
     
    Am J Hum Genet. 2006 Mar;78(3):487-97. 
     
    Behar DM, Metspalu E, Kivisild T, Achilli A, Hadid Y, Tzur S, Pereira L, Amorim A, Quintana-Murci L, Majamaa K, Herrnstadt C, Howell N, Balanovsky O, Kutuev I, Pshenichnov A, Gurwitz D, Bonne-Tamir B, Torroni A, Villems R, Skorecki K. 
     
    Finally many contemporary Near Eastern populations like Ashkenazi Jews are older and larger than the CEU population from Europe because Europe WAS populated from the Near East. Just like Africa is an older, larger population than the rest of the world because Africa was not associated with such a massive bottleneck approximately 70 KYA that the rest of the world was.

  23. Moreover a profound contraction/bottleneck was already proven two years ago where it was shown that almost half of present day Ashkenazi Jews or 8 million people are descended from just four women. 
     
    so you weight a uniparental lineage more than many autosomal loci? “proven”? jesus. you know that say it like that elides the fact that that applies only to the mtDNA lineage.

  24. >>The Holocaust was not a genetic bottleneck, any more than the Black Death was. In both cases, millions survived. 
     
    The above is a poor analogy.  
     
    I don’t know if the Black Death meets everyone’s definition of a genetic bottleneck, but it certainly was an event that had at least one long standing genetic effect. Look up CCR5, if you don’t believe this is the case.

  25. Razib an argument of exclusion for selection based on the purported exclusion of genetic drift in an ancestral population need only be refuted by the demonstation of an established instance of drift to disprove the contention. Similarly, one counter-example in mathematics can disprove a theorem.  
     
    Obviously mtdna and NRY for that matter are uniparentally transmitted lineages and only elucidate the population history of a fraction of present day genetic diversity, but thus far they have done pretty well in describing human population history. 
     
    Detailed autosomal studies are in their infancy but thus far have depicted similar worldwide population histories as NRY and mtDNA albeit autosomal loci have much older recent common ancestors but this is an artifact of the uniparental transmission of NRY and mtDNA. It is therefore naive to assert that such a profound mtDNA contraction as demonstrated above will not be likewise verified among autosomal loci once they are more fully investigated.

  26. Razib, I can add that the first Ashkenazi mtDNA studies such as those of Thomas et al. 2002 failed to detect this profound contraction. It was only later when the world-wide frequencies and haplotypes of the relevant lineages were found that such a profound contraction was found.

  27. Obviously mtdna and NRY for that matter are uniparentally transmitted lineages and only elucidate the population history of a fraction of present day genetic diversity, but thus far they have done pretty well in describing human population history. 
     
    no. it depends on context. you know very well you can’t take mtDNA or NRY of mexican populations and assume that one lineage gives you a good picture of population history (or i hope you do). milder forms of the same distortions exist in many other populations. if you take all world populations and sum them together this is less of an issue, since you’re making catchall generalizations (i.e., population expansion out of africa). but when inferring about one particular population you need to consider all the contingent data we have on that population.

  28. Detailed autosomal studies are in their infancy but thus far have depicted similar worldwide population histories as NRY and mtDNA albeit autosomal loci have much older recent common ancestors but this is an artifact of the uniparental transmission of NRY and mtDNA.  
     
    and to clarify, the concordance between NRY, mtDNA and autosomal seems strongest in the case of very big questions such as the out-of-africa population expansion within the last 50-100 K BP. narrow issues in scope (temporal and spatial) tend to be in far less concordant because of specific population genetic conditions (e.g., female-mediate gene flow, polygyny, etc.).

  29. Strange as it may seem, you learn more from looking at ~500,000 SNPs than you do from looking at one non-recombining locus. If you’re talking one locus, the effect can always be caused by selection.  
     
    Smallpox makes for a more plausible selective agent favoring CCR5-?32 than the Black Death does, although relatively high frequencies back in the Bronze Age, as well as as the protection it is known to give against other viral infections, suggests that other selective factors may have played a role. 
    Today CCR5-?32 has a gene frequency of about 10%. If you assume that the Black Death killed off half the European population – a pessimistic assumption – the largest possible effect on CCR5-?32 would have been to double the gene frequency. You need a lot more than a single doubling to raise a fresh mutation to high frequency in a large population.  
     
    It’s not easy for a single-generation event to have a strong selective effect.

  30. If you look at LCT, you would conclude that ~80% of Northern European ancestry stems from one person a few thousand years ago. And you would be right, for that locus. You would conclude that this is the product of a tight bottleneck, if you were Neil Risch. Selection seems more likely.

  31. Razib: I am refuting an argument from exclusion by citing an instance of genetic drift. By citing African populations, I am merely illustrating the utility of the NRY and mtDNA loci in deciphering population history on a global scale even though these two loci are obviously limited and imperfect. 
     
    You assert that the utility of the NRY and mtDNA while useful on the global scale is somehow lessened when examining individual population histories such as the population history of Central America. But in fact the inference gained from the NRY and mtDNA studies of Central America namely of a mixed population descended largely from Spanish men and MesoAmerican Women was in fact corroborated in a recent genomewide autosomal study which showed, surprise! a mixed population largely descended from both Spanish and MesoAmerican populations (1). 
     
    Attacking the usefulness of the NRY and mtDNA in elucidating population histories is almost as ridiculous as asserting that a population nearly half of whom are recently descended from 4 individuals will not exhibit parallel instances of genetic drift in other loci which is what Greg Cochran requires for his argument from exclusion. 
     
    To the contrary, mtDNA studies have been very useful not only globally but in helping to elucidate the population histories of even smaller  
    founding populations such as those in Central America and the Americas (2). They will likewise almost certainly be helpful in elucidating the population history of the ancestral European Jewish population together with a more detailed autosomal history once its assembled. Is mtDNA the final word? Of course not, but neither can one blithely avoid the evidence that it suggests. 
     
    1) Wang S, Ray N, Rojas W, Parra MV, Bedoya G, Gallo C, Poletti G, Mazzotti G, Hill K, Hurtado AM, Camrena B, Nicolini H, Klitz W, Barrantes R, Molina JA, Freimer NB, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Dipierri JE, Alfaro EL, Bailliet G, Bianchi NO, Llop E, Rothhammer F, Excoffier L, Ruiz-Linares A. Geographic patterns of genome admixture in Latin American Mestizos.PLoS Genet. 2008 Mar 21;4(3):e1000037. 
    http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000037 
     
    2)Fagundes NJ, Kanitz R, Eckert R, Valls AC, Bogo MR, Salzano FM, Smith DG, Silva WA Jr, Zago MA, Ribeiro-dos-Santos AK, Santos SE, Petzl-Erler ML, Bonatto SL. Mitochondrial population genomics supports a single pre-Clovis origin with a coastal route for the peopling of the Americas. Am J Hum Genet. 2008 Mar;82(3):583-92. 
     
    [you need to type slower, it's really ridiculous to use my name as your handle - Razib]

  32. But in fact the inference gained from the NRY and mtDNA studies of Central America namely of a mixed population descended largely from Spanish men and MesoAmerican Women was in fact corroborated in a recent genomewide autosomal study which showed, surprise! a mixed population largely descended from both Spanish and MesoAmerican populations (1). 
     
     
    two loci are more informative than one, right? that’s my point. the rest of your comment is pretty irrelevant if you refuse to acknowledge my point.

  33. The statement in Behar et al about 4 mtDNA ancestresses is kind of trite: of course there were 4, then 3, then 2, then 1 going backward in time, 5,6,7… going forward in time. That is just simple coalescence. 
     
    Has anyone here actually read that paper? I have stared at it for a long time and can’t for the life of me see where they come up with any bottleneck evidence at all. That is the problem with all that tree literature: one looks at the tree a long time and interprets it rather than testing any hypothesis.  
     
    I came away thinking that maybe there was a hint of diversity loss _within_ some mtDNA lineages but no loss of overall diversity at all. I couldn’t even see a hint of any real bottleneck, interpret as I might. This is of course what Mike Hammer said a long time ago. 
     
    Henry

a