Another genetic map of Europe

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

I pointed to the paper at my other weblog, but since ScienceBlogs has a narrow page width, I’ve put the important charts below the fold.

Table 4 – Each horizontal line in the table shows the proportions of test samples originating from a given country that were assigned to each possible target country. I made a few edits, see paper for original.

PopulationsSpainFranceBelgiumUKNorwaySwedenRomaniaGermanyHungarySlovakiaCzechPolandRussia

Spain 0.945 0.055 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
France 0.085 0.515 0.270 0.105 0.000 0.000 0.004 0.014 0.007 0.000 0.000 0.000 0.000
Belgium 0.000 0.086 0.854 0.059 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
UK 0.000 0.009 0.027 0.947 0.000 0.000 0.000 0.017 0.000 0.000 0.000 0.000 0.000
Norway 0.000 0.000 0.000 0.000 0.991 0.010 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Sweden 0.000 0.000 0.000 0.000 0.099 0.901 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Romania 0.000 0.000 0.000 0.000 0.000 0.000 0.960 0.000 0.040 0.000 0.000 0.000 0.000
Germany 0.000 0.000 0.102 0.004 0.029 0.022 0.008 0.644 0.003 0.003 0.177 0.008 0.000
Hungary 0.000 0.000 0.000 0.000 0.000 0.000 0.022 0.051 0.546 0.292 0.090 0.000 0.000
Slovakia 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.077 0.220 0.453 0.250 0.000 0.000
Czech 0.000 0.000 0.000 0.000 0.000 0.000 0.038 0.052 0.161 0.205 0.484 0.062 0.000
Poland 0.000 0.000 0.000 0.000 0.000 0.000 0.008 0.002 0.009 0.025 0.021 0.802 0.134
Russia 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.008 0.008 0.000 0.040 0.944

Labels:

17 Comments

  1. “Asthma cases”?

  2. this sort of stuff gets many samples from disease studies. and the point of it all is to smoke out genetic substructure which might result in spurious associations.

  3. If someone has their own SNP data from a service like 23andme, is it possible for them to work where they are on these graphs? 
     
    That would be cool. (With the obvious disclaimer that the co-ordinate location you get out of it isn’t necessarily somewhere that any of your ancestors actually lived – it might even be in the middle of the Atlantic Ocean if you had the right combination of ancestors, like one parent Native American and the other of European ancestry). 
     
    The main techincal problem I can see with doing this is that the coefficients you get out of principal component analysis will depend on which subset of SNPs were used in the training set, and the SNP chips used by 23andme might be looking at different SNPs. Other than that, you’ld just need the co-efficients (and a trivial program to multiply and add).

  4. Aaaah! Thanks, Razza.

  5. The SNPs in table 2 (SNPs with maximum correlations to PC 1 or PC 2 … with significant allele frequency differences between different geographic regions in the United Kingdom) are all in the 23andme dataset, so 23andme users will be able to see which ones of those they have. You’ld like to know at least the sign of the correlation (which allele increases as you go north/east, and which decreases), and preferably the coefficient as well, which I don’t see in the paper. 
     
    With only these 12 SNP’s, the estimate of geographic position that you get will contain a lot of noise – possibly too much noise to be interesting. But the paper reports reasonable results with a panel of only 391 markers – see figure 5c.

  6. I’m still surprised that this works as well as it does, given that there were mass movements of people during the nineteenth and twentieth century. 
     
    For Europe prior to 1815, I’d expect it to work. Genealogical records show that people were very often born in the same village that their parents were, or the next village along. I would guess the rate of diffusion to be a few km per generation. 
     
    After the Napoleonic Wars, though, it goes nuts. Changing methods of agriculture (e.g. enclosure of land) meant that many rural agricultural labourers were put out of work, and had to move to the major industrial cities. This migration could easily be in the range of 100km in one generation, or even transcontinental – people emigrating to North America or Australia. 
     
    Moving forward to the Second World War, many people from central Europe fled the Nazis and came to settle in Britain. 
     
    So if you take a British person today, and ask them where their grandmother was born, likely answers range from Aberystwyth to Krakow, even if they answer “white” to an ethnicity question. (Of course there’s plenty of evidence of immigration from e.g. India or the Caribbean, too)

  7. I took the SNPs from table 2 in the paper, and looked at the geographical distribution in the HGDP browser. HGDP data conforms that these SNPs all show strong geographical variation: 
     
    Principal Component 1 
     
    (SNP) (ancestral->derived) (nearby gene) (comments) 
     
    rs6531684 A->G TLR1 G more common in Africa, middle East 
    rs1473602 A->G IRF4 G more common in Orkney, Melanesia 
    rs7047524 A->G DRMT1 G rarer in Native Americans 
    rs10741780 T->C NAV2 C rarer in Africa, Melanesia 
    rs3794060 C->T NADSYN1 T rare in Africa, Melanesia; intermediate in far East; more common in Europe 
    rs11063148 C->T DYRK4 T more common in Europe, middle East 
    rs7157080 G->T HECTD1 T more common in Far East, intermediate in Europe, rarer in Africa 
    rs4803866 T->C DMPK C more common in southern Europe, middle East 
     
    Principal Component 2 
     
    rs1446585 A->G LCT A rarer in Africa and far East, more common in Europe 
    rs2612131 C->T – T rarer in Africa 
    rs2844513 A->G HLA G rarer in Africa, more common in Europe to Japan 
    rs6029180 A->G – G more common in far East 
     
    Of these, rs1473602 surprised me the most. You don’t normally think of Orkney and Melanesia as being similar. (Islands, maybe?)

  8. “…e.g. enclosure of land .. meant that many rural agricultural labourers were put out of work”: this is widely repeated in England, but apparently there’s little evidence for it. It doesn’t make much sense – to first order, the fact that the landowners now had their acreages in contiguous lumps rather than scattered portions doesn’t alter their demand for labour. To begin with, it would increase demand for labour as they put in new hedges and so on. One possible effect is that enclosure might have led to people being much more conscious of their commoners’ rights, leading to ejection of labourers who had been squatting on land that belonged to someone else, and using common land on which they had no rights, but I’ve no idea whether there’s much evidence to show how many or few that might be.

  9. I’m surprised too at how small the share of Swedish and Germans are in the UK, given the Viking and Anglo-Saxon invasions. 
     
    It seems to me that I’ve seen other studies showing that the share of “Celtic” genes is about the same in Ireland, England and Scotland; slightly higher in Ireland and Scotland, but not much. Don’t know if that’s relevent.

  10. I’m surprised too at how small the share of Swedish and Germans are in the UK, given the Viking and Anglo-Saxon invasions. 
     
    the swedes went to east baltic, not to england. that was more the purview of norwegians and danes (yes, i know these are are anachronistic terms). so why are you surprised? and look at how difference munich and dresden are. the anglo-saxons didn’t come from germany, they came from the lands between frisia and denmark. 
     
    also, some of the other studies looked at Y chromosomal lineages. that will give a different snapshot.

  11. There have been about 40 generations of interbreeding since the Vikings were in Britain, so by now there would be few people with a ‘pure’ Scandinavian genetic profile.

  12. IRF4 is a good one. In the HGDP dataset, linkage disequilibrium (measured by XP-EHH) shows a marked rise over surrounding areas in the line for “Oceania”. The per-continent haplotype plots show a long uninterrupted haplotype in all continents except Africa. The per-population haplotypes plots show a long haplotype in several populations, especially the Papuan, Melanesian and Orcadian (the populations that also have the highest frequency of rs1473602=G), but also in other European populations (e.g. Russian) and Native American (e.g. Pima, Maya).

  13. I wrote: 
     
    Of these, rs1473602 surprised me the most. You don’t normally think of Orkney and Melanesia as being similar. (Islands, maybe?) 
     
    The nearby SNP rs12203592 (also in IRF4) has been associated with skin colour, eye colour and tanning response. 
     
    Orkney is far north, where you’ld expect fair skin to be especially advantageous. So it’d be interesting to know what effect these SNPs have in Melanesians.

  14. In the enclosures some land was put to different use (sheep) which was less labot intensive, and large estates can be run more efficiently (using less labor) than small farms. Common land was often privatized.  
     
    Talking about “squatters who had no rights” begs a number of questions. Rights were redefined during this period, and some people lost rights they’d previously had under a different legal system. It’s a controversial issue, and I’m not up to date on it, but your arguments aren’t very convincing.  
     
    As a general rule, economic development involves moving labor from agriculture to industry and reducing the relative economic power of agriculture (and other primary production such as fishing and mining) compared to industry, trade, and finance. Locally things can be different, of course.

  15. Perhaps you identified them in a previous post, but what are Component 1 & Component 2, the axes of this chart? If Component 1 is latitude and Component 2 is longitude, there is nothing remarkable about this chart. Since this is “Gene Expression”, I suppose these components are some quantities related to genes, but that information is not provided on the chart or in the post.

  16. I’ve taken the SNP’s from table 2 (all of which had significant geographical variation within both the UK and Europe), and computed a chi-squared statistic for the same SNP across the 4 Hapmap populations (CEU, CHB, JPT, YRI). The idea behind this is that if an allele varies in frequency on a local scale, we would expect it to vary on a wider scale too. 
     
    SNP Chi2 p 
    r6531684 97.08 6.60E-21 TLR1, etc. 
    r1473602 8.08 4.44E-02 
    r7047524 3.65 3.02E-01 
    r10741780 127.16 2.22E-27 NAV2 
    r3794060 92.31 6.97E-20 
    r11063148 37.99 2.85E-08 
    r7157080 115.76 6.31E-25 HECTD1, etc. 
    r4803866 1.80 6.16E-01 
     
    r1446585 305.30 7.08E-66 LCT 
    r2612131 10.07 1.80E-02 
    r2844513 62.34 1.86E-13 HLA 
    r6029180 67.88 1.21E-14 
     
    You need to take a little care interpreting the “p” value, because of related individuals 
    in the hapmap datasets (etc.), but it gives a rough idea. 
     
    As expected, most of these SNPs show significant variation between Hapmap populations. A few of them don’t: r4803866, r7047524, r1473603. 
     
    I can think of several possible reasons for this: 
     
    a) I made a mistake in calculating these 
     
    b) There is variability, but the Hapmap datasets are too small to detect it (the original paper used larger datasets) 
     
    c) These alleles vary in frequency within Europe. It is just about possible that the ancestors of the people in Hapmap CEU came from a part of Europe where the allele frequency is much the same as the rest of the world, missing the parts of Europe where frequency is higher. (e.g. they’re not from far enough north, south, east or west) 
     
    d) These alleles don’t really vary in frequency, and only appear to do so in the dataset used by the original paper due to chance fluctuations. (When you have a huge number of dimensions, and relatively few points, you expect a lot of noise in the coefficients). 
     
    What I’d really like right now is data for 100 or so people at two known points in Europe (rather than somewhere like Utah, where the entire European-origin population is due to immigration within the past ~500 years). 
     
    It also should be possible to do some analysis if we had the SNP’s of several hundered people scattered all over Europe, and knew the approximate lattitude and longitude of each of them.

  17. John Emerson: 
     
    I really don’t know much about this specific history but still suspect that you’ve got a garbled notion of what “went on” or are, at least, confusing the sequence of events. 
     
    The “general tendency” isn’t to move labor from one specific purpose to another or to favor one economic sector over another by increasing or decreasing its relative prominence. The general tendency (of all, to the extent they are able to see the alternatives clearly) is to economize (that is, to use more efficiently) whatever magnitude is available and subject to such variance in an effort to increase net returns (and to concentrate on such increase–on the part of government–as seems likely to be of benefit to the government itself). It is unremarkable that the welfare of agricultural laborers drew 
    little consideration; they were–at least at that time–considered politically unimportant. 
     
    My “take” would be that some change in the processing of wool (vis-a-vis other agricultural output) or its conversion into items of apparel had already occurred which raised the returns that could be expected from such activity and from every increase(and, concomitantly, the “returns” that could be levied on both producers and processors by government enabling 
    of such increase). That the labor formerly devoted to agricultural tasks was, in a short time, dislocated en masse with nowhere to turn for gainful employment except in industry engaged in the processing of wool and apparel–too small at first to absorb more than a fraction of their number—is simply commentary on the relatively small value added by the labor itself in either employment, especially before the industrial system had become more pronounced than it was. 
     
    I’m sure people suffered in these dislocations and there were, no doubt, grave injustices as well. But context would be rendered more appreciable by looking at population figures, mortality rates, etc. (standard-of-living indicators), over intervals of 10, 20, and 50-year intervals as compared with other, especially former, periods. I don’t know what these would show (but I’d be surprised if my guesses were wrong!).

a