Y chromosomes around the Baltic

A new paper on rare Y chromosomal lineages around the Baltic, Phylogenetic history of patrilineages rare in northern and eastern Europe from large-scale re-sequencing of human Y-chromosomes:

…a considerable number of men in every population carry rare paternal lineages with estimated frequencies around 5%…Here we harness the power of massive re-sequencing of human Y chromosomes to identify previously unknown population-specific clusters among rare paternal lineages in NEE. We construct dated phylogenies for haplogroups E2-M215, J2-M172, G-M201 and Q-M242 on the basis of 421 (of them 282 novel) high-coverage chrY sequences collected from large-scale databases focusing on populations of NEE. Within these otherwise rare haplogroups we disclose lineages that began to radiate ~1–3 thousand years ago in Estonia and Sweden and reveal male phylogenetic patterns testifying of comparatively recent local demographic expansions. Conversely, haplogroup Q lineages bear evidence of ancient Siberian influence lingering in the modern paternal gene pool of northern Europe…

For context, over 90% of the Y chromosomal lineages in Northeastern Europe localize to just four haplogroups. R1a, R1b, I1, and N3 (TAT-C). R1a and R1b are associated with early Indo-Europeans. I1 is local to European hunter-gatherers, but probably got integrated early on into the Corded Ware lineages (it shows recent star phylogeny). N3 is associated with the male-mediated expansion of Siberians over the last 3,000 years and the expansion of Finnic languages in the region.

Taking a step back it’s rather shocking how high the frequency here is of these common lineages. Finland stands out: “the screened sample of 506 Finnish males we did
not detect any rare NEE lineages as almost all Finnish samples belong to hgs common among neighbouring populations – a probable reflection of either differing migration history or of demographic bottleneck(s) that have affected the Finnish population.” This is partly due to the overwhelming dominance of N3 in Finland. But, it is also a function of the fact that Neolithic agriculture never took root in Finland. The “Neolithic” ancestry is Finland is due to Corded Ware migration, and that varied depending on the Corded War population (some of the early Corded Ware in Estonia seem to have been pure Yamnaya).

G-M201 seem to be survival from European farmers. The low frequency of this lineage shows the great winnowing of older paternal lineages with the arrival of Corded Ware. Not totally clear about J2 in this paper, but that too might be a survival. The E2 lineage they adduce to hunter-gatherers associated with the Villabruna culture, because the coalescence with Middle Eastern lineages is too ancient (E has been found in Villabruna). Seems weak. But the result on Q is fascinating. I assumed it came with the Corded Ware or the Siberian migration. But it’s not really found in the Finns, and the Estonian lineages seem to be derived from the more common Swedish ones? The authors infer from this that it’s a hunter-gatherer (Scandinavian hunter-gatherer) survival, as this lineage has been found in Mesolithic European populations. I still think it might be due to Corded Ware, as Q is found in some of the Sintashta too. But it warrants further investigation.


  1. You don’t see Y DNA studies much anymore. Thanks for sharing.

    Q1a in Europe today looks like the mysterious remains of a brother clade to R1b, R1a who didn’t join the Indo European system.

    But people forget that before Corded Ware, R1a was just as rare as Q1a. It used to be all about R1b. R1b was number one originally! R1a, Q1a were both rare.

    Well actually, Q1a was very common in Central Asia & Siberia in relatives to ancient Eastern Europeans. While, R1a was common nowhere.

    Corded Ware R1a M417 founder saved R1a from extinction. If not for that glorious R1a M417 great grandfather whoever he was, R1a today would be like Q1a. A strange mysterious brother clade to R1b that no one talks about.

    Then Corded Ware R1a went on to replace the old R1b lineages in hunter gatherers in Northeast Europe. Who we have now forgotten about.

    Of course Davidski is happy to talk about this R1b replacement. He is some type of R1a supremacist.

  2. Davidski’s a funny guy with some screws loose. I remember watching a talk of his, and I was surprised to see how cordial he acts at conferences, given that I had only known his online persona.

    Sometimes I get pretty sad about the absolute collapse of human diversity in Eurasia. Lots of interesting history, divergent culture likely lost. But I’m sure that we retain the majority of the most useful genes of history’s demographic losers.

  3. On wider tangent of y-dna, from a genetic-historical perspective, one of the things I find annoying about y-dna is that we really should be able to have good national scale frequency set of tables, somewhere within academia, based on really rock solid large samples sizes for each European nation of something like at least 2-5k.

    But as far as I can tell, there isn’t anything like this and its all a mix of smaller studies that get cobbled together by amateur groups who have their own biases (where those with a more southern european or northern european constitution often seem biased to represent particular haplogroups).

    For example, if I want to look up what the overall constitution of y haplogroups are in the Netherlands, for instance, there are a couple of good studies which are mostly talking about other things – / – and both have a good 1000-2000 sample size and find R: 60%, I: 26%, Others (E, G, J mainly): 15%.

    So we know in Netherlands about 1/6 men do not trace their ancestry to R or I today. But even here, the I is not broken down into I1 and I2, which have quite different stories in historical terms.

    To look at this, I can easily find a source for neighbouring Britain, with a random resource that has a sample size of 5k kids that tells us that I is about 20% of their sample, and then from that, I2 clades represent about 40% of the I total (ergo scaled up about 8% of the sample or 1/13 males). I could combine these estimates to say about 8% of Netherlands is I2 (assuming some balance between I1 being higher frequency in Netherlands and slightly higher overall I frequency).

    Then we potentially have a story where about 1/5 or 1/4 men in NLD today have a male line ancestor that is not R1 or I1. And then we can start thinking about the history of how we get there from something like the Beaker Culture which seems 100% R1b-M269. (Is this the imprint of Bronze Age complex societies from the south, Iron Age Celtic expansions, Roman expansions, south->north mercantile prosperity through medieval era?).

    But it’s taken a lot of cobbling together to get there and there, and as much as I complain about amateur biases, I have my own amateur biases here. And this is for Western European countries that are probably better represented in terms of y-dna. It seems really strange to me that in the era of Big Consumer Genetic Data, there just isn’t anything that is easy to give us a better alternative.

    (The lack of resolution between I1 and I2 is quite a common thing in big studies; take this big Polish study from last year – Good size, very comprehensive regional coverage, gets overall figures of – R (71.02%), I (15.71%), N (4.29%) – and remaining 10% scattered which is pretty expected. But how much of that I is I1? We can expect it to be a lower proportion of the total than in Netherlands or Britain, since I1 is well correlated with Germanic languages, but how much less? Typically studies suggest I1:I2 ratio in Poland 1:2, so we would expect about 10% of Poles to have an I2 male ancestor and 5% to have an I1 male ancestor. But this is not very solid.)

  4. @Jacob: Sometimes I get pretty sad about the absolute collapse of human diversity in Eurasia. Lots of interesting history, divergent culture likely lost. But I’m sure that we retain the majority of the most useful genes of history’s demographic losers.

    Yeah, like for example in Central Asia, I would tend to think that, even if only 1/200 of the ancestors of the MLBA steppe cultures were from these ancient Central Asian “ANE” like cultures, and essentially below the resolution of detection or close to it, that’s possibly enough to actually seed through the world any really useful reasonably common variant that could be subject to selection, if they had any which other populations didn’t. (Or maybe my assumptions about drift are wrong here!). But the sort of loss of a selected human phenotype (and culture etc) is kind of a shame. Hopefully we can reconstruct at least what some of these people were like through ancient dna analysis, even if it’s not the same as having them around.

    Re; “demographic losers”, that’s a fair term, but might also benefit from being set into the context of where these groups started from. Like, the Jomon of Japan for example are about 10-15% of Japanese today, so that’s about 12-20 million virtual people. That isn’t very much set alongside the 1600 million of Han Chinese for example. But when I’ve computed some Fsts the ancient dna seems to show that the Jomon went through some kind of really harsh bottleneck in their history – just huge Fsts comparable to the most bottlenecked WHG Mesolithic population, in Italy. This was not the case for the mainland East Asians so much (even from pre-farming pottery neolithic hunter-gatherers it seems). So the equivalent of 20 million people today is not necessarily as “bad” a “performance” as we might think if we calibrated to that starting point (even if there is still a relative difference). The case of WHG is maybe similar to some degree, comparing where WHGs started from after the LGM to the Anatolian populations.

  5. ‘I2a’ was not mentioned here at all. Some of them migrated to Baltic (where their genetic cousins, I1, already migrated) and British Isles before Yamnaya came to Europe. Some linguists proved that their Vincan language later influenced much younger Proto-Germanic language. I wrote about this in Brown Pundits. There is a good I2a map in Eupedia where we can see their presence in Baltic.

  6. I1 isn’t a local Baltic HG. The Baltic HGs were mostly different kinds of R1a, R1b and I2. There is also one Q1a among Baltic HGs but no I1 whatsoever. Given how I1 also has a relatively young age it might or might not have originated in a hunting population. It could have, but definitely not in the Baltic region and it wasn’t introduced there by hunters. Most Baltic HG were R1.

  7. Razib – I1 is local to European hunter-gatherers, but probably got integrated early on into the Corded Ware lineages (it shows recent star phylogeny).

    This a fair comment and hypothesis but for a contrary hypothesis I think worth considering:

    The expansion time of I1 looks pretty similar to the R1b-M269 and R1a-M417, judging by Kivisild 2017, around 5kya. So putatively an Indo-European linked starburst. (Albeit there might be some challenges to this timeline if some recent experiments show strong differences in mutation rates between haplogroups using cell-lines do bear out).

    On the other hand, Sardinia shows an strong I2-M26 starburst in modern dna – Ancient dna seems to corroborate this haplogroup becoming more frequent at post 2500 kya, in the same kind of timeframe – Though not represented at the much frequencies today at the start.

    But I think this is before the signals of any steppe ancestry turn up on the island and seems hard to associate with Indo-Europeans. They think there’s 0% Steppe ancestry until after this time-frame according to all their models. And the usual argument is that a non-Indo-European Paleo-Sardinian persists until around the Latin era. (So between the two of these I think an argument that “Oh, Indo-European patrilineal culture got to Sardinia via a wave so diluted without any steppe ancestry or y chromosomes” seems kind of weak). I think that may challenge an equation of “star-phylogeny” with “Indo-European speaking”.

    This makes me wonder if there is a general short and sharp shift to male-reproductive skew at this time, which is amplified in low population regions within Western Eurasia by the development of more productive pastoral economies. Just something more general than IE but which happens at the same time. “Heroic Age” of high skew to early kings and heroes (even if the culture is no more or less patriarchal, just higher variance within males) before massed groups of people and bringing in all male stakeholders to fight en-masse became important again…?

    (Shades of teh argument TC Zeng and his collaborators advanced in of a short sharp period of high male-reproductive inequality that then gave way as male kinship group transitioned to societies with better ability to incorporate cross-patrilineal kin alliances and really win battles and kill off any strict patrilineal holdouts).

    That might leave I1 open to question – was it an early Indo-European incorporated group, or some other group that, like paleo-Sardinians, experienced a starburst through a change in culture and was only later incorporated into an IE speaking group? (And some people argue that Germanic bears a particularly strong non-IE HG imprint and influence of course).

    There are some indications that some high HG populations may have persisted late in Central Europe, in that many of the samples we have from Hungary and Serbia in the Bronze Age have odd high HG proportions. Also note this forthcoming ISBA abstract –“Gerber Dániel et al. Uncanny genetic proportions from Hungary suggest a long lasting Hunter-Gatherer ancestry in Central Europe at the Bronze Age. There might be some possibility that I1 is not from where we think it is, and from somewhere quite different and only boomed large in the traditional Germanic urheimat… (Caveat: I don’t think any of these high HG “uncanny” samples actually have I1 so far!).

    Specific figures, excerpted:

  8. @Razib,

    I’m looking forward to your talk with Dr. Kristian Kristiansen.

    You should mention to him that…..
    Tollense battle, 1200 BC, in Germany is mostly I2a2.

    I made youtube video about it.

    You can reference this. They got dozens of Y DNA samples. Mostly I2a2.

    Tollense is in northern, northern Europe. But are mostly I2a2. They also have only 30% Steppe ancestry, which less than the 50% standard.

    They’re apart of a bigger family we see in Bronze age ancient DNA in Poland, Hungary, and Serbia. High WHG, lots of I2a, “low” Steppe ancestry.

    The oldest example is in Corded Ware Poland, ID N49, 2700 BC. He carries Y DNA I2a2. He directly related to the Tollense warriors.

    So, there seems to have been a WHG-rich clan in Central Europe who “survived” the Corded Ware invasion well. Kept their Y DNA, remained under 50% Kurgan, and remained none-Indo European.

  9. We have only been able to verifiably confirm that in Baltic area and in Western Europe, that I2a was completely replaced by R1b, R1a.

    We have Bronze age DNA from both places.

    In Scandinavia there is I1.
    In Tollense battle we see I2a.

    Unetice culture shows a mix mainly of R1b, R1a. But also significant I2.

    It is safe to assume Poland to Ukraine, was mostly R1a in Bronze age. But there were probably I2a hold outs. Which eventually got absorbed by the historical era.

    But…survived in form of I2a1b in modern Southern Slavs.

