Posts with Comments by SusanC

The Jermyn Program

  • gcochran asks: "As for none of the skin color genes introgressing from Neanderthals, how do you know?" For example, there's this paper: "A Melanocortin 1 Receptor Allele Suggests Varying Pigmentation Among Neanderthals" http://www.sciencemag.org/cgi/content/abstract/1147417 Given that we know many of the alleles affecting pigmentation in modern humans, and we have the genome of some Neanderthals, it should be easy to look for matches. If there's a pigmentation allele present in both Neanderthals and non-African humans (but rare/absent in Africa) it would seem likely that we got it from the Neanderthals or they got it from us. Different mutations for the same phenotype (e.g. red hair) would suggest convergent evolution, rather than gene flow.
  • Is Mental Illness Good For You?

  • If I recall correctly, schizophrenia is more common in urban than rural areas, and more common in people born at certain times of the year. One hypothesis that (more or less) fits is that symptoms are due to prenatal infection by an agent that is more common in towns (where there are more people to catch it from) and more common in winter (cf. influenza). Pure mutation/selection balance doesn't quite explain the pattern. As Steve says, if the underlying cause is infectious, genes may affect resistance to infection. P.S. "mental illness" groups together many different kinds of condition. Depression, schizophrenia, autism and drug addiciton may turn out to have different causes, or kinds of causes. P.P.S. I suspect that people with prexisting psychiatric conditions are more likely to want to become psychiatrists, rather than the mechanism being psychiatrists catching an infection from their patients. Dealing with psychiatric patients on a regular basis might be depressing, of course.
  • There are no common disorders (just extremes of quantitative traits)

  • If you've got a way to measure the quantitative trait, there are plenty of statistical techniques you can use. e.g. if you have IQ tests, you can see what's corrrelated with the numeric IQ score, rather than doing a case-control study with participants divided into "smart" and "less smart". (Here, I won't go into the usual concerns over whether IQ tests are a suitable measure of "intelligence"). But if you don't have any idea how to measure the trait quantitatively, having more data is not going to help much.
  • Creative destruction in the personal genomics industry?

  • A hard question you can ask a start-up is, why would anyone want to buy your product? In the case of 23andme, the medical diagnostic applications aren't very convincing. For many of the diagnostic tests they do, there are too many false positives. (Base rate fallacy, etc) The "family history" angle looks more promising---discover from your mtDNA you have Ashkenazi Jewish ancestry you didn't know about; get in touch with your fifth cousin who was adopted as a baby, etc. There are some serious legal and ethical aspects with both aspects of the service, of course.
  • The genetic map of Europe we already knew….

  • I've been looking somemore at the SNP's listed in table 2 of 
    paper by Heath et al. 
     
    I took the SNP's that didn't show statistically significant geographical 
    variation in Hapmap phase II (CEU, CHB,JPT, YRI), and looked to 
    see if there was significant variation between the Hapmap phase III 
    populations. Phase III looks at more populations, so has a better chance 
    of detecting variation that is more geographically localized, or that 
    requires a bigger sample to detect. 
     
    As I'm looking at several datasets until I find one that gives statistically 
    significant results, you need to take account of this multiple testing when 
    interpreting p values. 
     
    rs1473602. Gene IRF4. chi2=29.55 
     
    G is more common in CEU than other populations. 
     
    Nearby SNP's that have been the subject of association studies include 
    rs12203592 (black vs blond hair), rs12203592 (black vs red hair), 
    rs872071 (leukemia), rs150771 (freckles). 
     
    rs7047524. Gene DMRT1. chi2=14.7. 
     
    Retain the null hypothesis that there is no geographical variation 
    in allele frequency. 
     
    rs4803866. chi2=40.05 
     
    Frequency of C lower in GIH and MEX. Hapmap phase II doesn't contain 
    any samples from India or of Native Americans, so that may explain why 
    it didn't detection this local variation. 
     
    Conclusions 
     
    a) Apart from rs7047524, these SNPs all showed geographic variation 
    on a global scale. It might be worth checking rs7047524 against other 
    datasets to see if the within-Europe and within-UK variation reported 
    by Heath et al is real or a chance fluctuation in their data. 
     
    b) Hapmap phase III is giving us added value here. There seems to be geographic 
    variation specific to Indian or Native American samples that the 4 original 
    Hapmap samples miss. (MEX is "Mexican Ancestry in Los Angeles", which presumably 
    includes some Native American ancestry).
  • I'm half expecting someone to make a joke about me being a Neanderthal. These maps suggest that new alleles diffuse across Europe rather slowly. If any of homo sapiens have a few surviving Neanderthal genes, it's most plausible in someone from the western edge of Europe like me.
  • www.23andme.com has its own version of the genetic map. You can see it by going to "Global Similarity" and clicking on "advanced view". The map has several different views. As I understand it, if you switch to the "European" view, it doesn't just zoom in on the world plot, it recomputes the coordinates using only European reference populations. The reference populations they've used look as through they might be the ones from HGDP. 
     
    Their example individual, known by the pseudonym "Greg Mendel" is placed close to the English, French and German clusters by this algorithm (they're not well-separated on the 23andme map). He's reasonably clear of the Irish, Orcadian and Austrian clusters.
  • Why does the genetic map of Europe still work?

  • As both John Hawks and Greg Cochran say above, the number of immigrants is small relative to the population of the place they're immigrating into. So that's part of the answer. 
     
    I was also thinking about the cumulative effect of multiple generations. 
     
    To oversimplify the model a lot, if vector v(n) gives the proportions of an allele in 2 different countries after n generations, and 
     
    V(n) = A**n . v(0), where A=((1-p p)(p 1-p)) 
     
    then for large n you'll get significant diffusion, even if p is small. Assuming about 25 years per generation, there are around 8 generations since the Napoleonic era. (So n isn't that big, either). 
     
    You could use nineteenth century UK census data to estimate the diffusion rate: look for children still living with their parents (and hence at same address when enumerated for the census) and see how many were born in the same parish/county/country as their parents. You could do this for an urban centre like Liverpool versus some rural.
  • I'd also throught about the Huguenots. If you look at 19th century parish registers and census data for Wales, you see a fair number of French-derived names (possibly, though not definitively, indicative of Huguenot ancestry), even in rural areas. 
     
    But it may be that the proportion of people involved in these migrations is small enough that it doesn't disrupt the genetic map too much. Birth, marriage and death records for the UK are reasonably complete going back to 1837, which conveniently takes us almost back to the point where I'm suggesting mass migration became more significant (around 1815). So you could use BMD records to estimate how much migration there's been since. 
     
    It's a good point that urban areas probably attract more immigrants than rural ones. So sampling rural areas only - and excluding the major cities - might give you a better picture of what the genetic map used to look like.
  • Another genetic map of Europe

  • I've taken the SNP's from table 2 (all of which had significant geographical variation within both the UK and Europe), and computed a chi-squared statistic for the same SNP across the 4 Hapmap populations (CEU, CHB, JPT, YRI). The idea behind this is that if an allele varies in frequency on a local scale, we would expect it to vary on a wider scale too. 
     
    SNP Chi2 p 
    r6531684 97.08 6.60E-21 TLR1, etc. 
    r1473602 8.08 4.44E-02 
    r7047524 3.65 3.02E-01 
    r10741780 127.16 2.22E-27 NAV2 
    r3794060 92.31 6.97E-20 
    r11063148 37.99 2.85E-08 
    r7157080 115.76 6.31E-25 HECTD1, etc. 
    r4803866 1.80 6.16E-01 
     
    r1446585 305.30 7.08E-66 LCT 
    r2612131 10.07 1.80E-02 
    r2844513 62.34 1.86E-13 HLA 
    r6029180 67.88 1.21E-14 
     
    You need to take a little care interpreting the "p" value, because of related individuals 
    in the hapmap datasets (etc.), but it gives a rough idea. 
     
    As expected, most of these SNPs show significant variation between Hapmap populations. A few of them don't: r4803866, r7047524, r1473603. 
     
    I can think of several possible reasons for this: 
     
    a) I made a mistake in calculating these 
     
    b) There is variability, but the Hapmap datasets are too small to detect it (the original paper used larger datasets) 
     
    c) These alleles vary in frequency within Europe. It is just about possible that the ancestors of the people in Hapmap CEU came from a part of Europe where the allele frequency is much the same as the rest of the world, missing the parts of Europe where frequency is higher. (e.g. they're not from far enough north, south, east or west) 
     
    d) These alleles don't really vary in frequency, and only appear to do so in the dataset used by the original paper due to chance fluctuations. (When you have a huge number of dimensions, and relatively few points, you expect a lot of noise in the coefficients). 
     
    What I'd really like right now is data for 100 or so people at two known points in Europe (rather than somewhere like Utah, where the entire European-origin population is due to immigration within the past ~500 years). 
     
    It also should be possible to do some analysis if we had the SNP's of several hundered people scattered all over Europe, and knew the approximate lattitude and longitude of each of them.
  • I wrote: 
     
    Of these, rs1473602 surprised me the most. You don't normally think of Orkney and Melanesia as being similar. (Islands, maybe?) 
     
    The nearby SNP rs12203592 (also in IRF4) has been associated with skin colour, eye colour and tanning response. 
     
    Orkney is far north, where you'ld expect fair skin to be especially advantageous. So it'd be interesting to know what effect these SNPs have in Melanesians.
  • IRF4 is a good one. In the HGDP dataset, linkage disequilibrium (measured by XP-EHH) shows a marked rise over surrounding areas in the line for "Oceania". The per-continent haplotype plots show a long uninterrupted haplotype in all continents except Africa. The per-population haplotypes plots show a long haplotype in several populations, especially the Papuan, Melanesian and Orcadian (the populations that also have the highest frequency of rs1473602=G), but also in other European populations (e.g. Russian) and Native American (e.g. Pima, Maya).
  • I took the SNPs from table 2 in the paper, and looked at the geographical distribution in the HGDP browser. HGDP data conforms that these SNPs all show strong geographical variation: 
     
    Principal Component 1 
     
    (SNP) (ancestral->derived) (nearby gene) (comments) 
     
    rs6531684 A->G TLR1 G more common in Africa, middle East 
    rs1473602 A->G IRF4 G more common in Orkney, Melanesia 
    rs7047524 A->G DRMT1 G rarer in Native Americans 
    rs10741780 T->C NAV2 C rarer in Africa, Melanesia 
    rs3794060 C->T NADSYN1 T rare in Africa, Melanesia; intermediate in far East; more common in Europe 
    rs11063148 C->T DYRK4 T more common in Europe, middle East 
    rs7157080 G->T HECTD1 T more common in Far East, intermediate in Europe, rarer in Africa 
    rs4803866 T->C DMPK C more common in southern Europe, middle East 
     
    Principal Component 2 
     
    rs1446585 A->G LCT A rarer in Africa and far East, more common in Europe 
    rs2612131 C->T - T rarer in Africa 
    rs2844513 A->G HLA G rarer in Africa, more common in Europe to Japan 
    rs6029180 A->G - G more common in far East 
     
    Of these, rs1473602 surprised me the most. You don't normally think of Orkney and Melanesia as being similar. (Islands, maybe?)
  • I'm still surprised that this works as well as it does, given that there were mass movements of people during the nineteenth and twentieth century. 
     
    For Europe prior to 1815, I'd expect it to work. Genealogical records show that people were very often born in the same village that their parents were, or the next village along. I would guess the rate of diffusion to be a few km per generation. 
     
    After the Napoleonic Wars, though, it goes nuts. Changing methods of agriculture (e.g. enclosure of land) meant that many rural agricultural labourers were put out of work, and had to move to the major industrial cities. This migration could easily be in the range of 100km in one generation, or even transcontinental - people emigrating to North America or Australia. 
     
    Moving forward to the Second World War, many people from central Europe fled the Nazis and came to settle in Britain. 
     
    So if you take a British person today, and ask them where their grandmother was born, likely answers range from Aberystwyth to Krakow, even if they answer "white" to an ethnicity question. (Of course there's plenty of evidence of immigration from e.g. India or the Caribbean, too)
  • The SNPs in table 2 (SNPs with maximum correlations to PC 1 or PC 2 ... with significant allele frequency differences between different geographic regions in the United Kingdom) are all in the 23andme dataset, so 23andme users will be able to see which ones of those they have. You'ld like to know at least the sign of the correlation (which allele increases as you go north/east, and which decreases), and preferably the coefficient as well, which I don't see in the paper. 
     
    With only these 12 SNP's, the estimate of geographic position that you get will contain a lot of noise - possibly too much noise to be interesting. But the paper reports reasonable results with a panel of only 391 markers - see figure 5c.
  • If someone has their own SNP data from a service like 23andme, is it possible for them to work where they are on these graphs? 
     
    That would be cool. (With the obvious disclaimer that the co-ordinate location you get out of it isn't necessarily somewhere that any of your ancestors actually lived - it might even be in the middle of the Atlantic Ocean if you had the right combination of ancestors, like one parent Native American and the other of European ancestry). 
     
    The main techincal problem I can see with doing this is that the coefficients you get out of principal component analysis will depend on which subset of SNPs were used in the training set, and the SNP chips used by 23andme might be looking at different SNPs. Other than that, you'ld just need the co-efficients (and a trivial program to multiply and add).
  • Political Behavior through the Lens of Behavior Genetics

  • If you've only got twin studies, and don't know the mechanism, some caution is in order. 
     
    For example, supposing African-Americans were more likely to vote for Obama, then you'ld expect that twins separated at birth (both of whom were African-American) would be more likely to vote for him. As political parties often try to appeal to particular demographic groups (including ethnic groups), you'ld expect to see this kind of "heritability" without needing to postulate a gene for voting democrat. 
     
    It's also interesting that the main results in the cited paper are about whether you vote for/join a political party, not which party you join. 
     
    Compare, for example, supporting a football[*] team. I think you've got a better chance of finding a genetic link with "being a football supporter" than for supporting one team over another. 
     
    In a way, this is encouraging for multi-ethnic democracies. If peoples's preferences over the actual policy issues had a strong genetic component, it'd be a bigger problem. Imagine some future in which the Scots Nationalist Party is able to point to some biochemical pathway and a map of the geographical distribution of the corresponding alleles, and say that Scotland should have independent government because the Scots are genetically wired to think differently. 
     
    [*] I'm British[**].... for football, substitute baseball if you wish :-) 
     
    [**] Well, Welsh :-)
  • Horse genetics & color

  • In two words: horse racing 
     
    There's potentially a lot of money in a faster racehorse. A genetically modified horse wouldn't be a Thoroughbred by definition, but better understanding of the genetics could help you decide which horses to mate naturally. I know of at least one university with a professor who specialises in horse genetics.
  • The myth of sexual predators: a positive feedback model

  • As one of many possibly hypotheses to explain the decline in child abuse, Finkelhor & Jones suggest the rise in prescribing rates of Prozac and other psychiatric drugs. Two possible mechanisms: (a) if depressed parents are more likely to abuse their children, reducing depression might reduce child abuse; (b) drugs with libido-reducing side-effects might reduce the number of sexual offenses.
  • It's a good point that the exponential growth might be in the use of the words "sexual predator" rather than the type of news article. 
    (You might expect that a new word will follow a logistic curve). 
     
    You could try looking at the frequencies of related words, to see if they all rise together. 
     
    It's also worth noting that "secual predator" anxiety is often found together with Internet anxiety, and usage of the Internet has been rising rapidly. 
     
    I think I agree with the point you;re trying to make, but it might be worthwhile eliminating the obvious alternative explanations.
  • Next

    a