Substack cometh, and lo it is good. (Pricing)

Reanalyzing data, it does a mind good

Mindaugas
Mindaugas

There’s been a lot of talk on Twitter and the blogs about PLOS’ new data sharing policy. I don’t have much deep to say, except that I’m for it. I do think from what I can tell that there is a cultural element to the reaction, pro or con. People in genomics seem to be responding of the form “yes, of course.” On the other hand those in other fields have less positive reactions.

You can go elsewhere to hear “both sides.” I am confident that this will be the future, and the naysayers will have to deal. One of the major reasons that formalized data release is good is that in a field like genomics there is more data than people to analyze the data. By this, I mean that you can ask many different questions of data, but you may only be interested in a subset of those questions. Other people in your lab might have different questions, but ultimately you’re probably leaving avenues on the table because you don’t have the time or inclination. To give you a funny example, a few years ago I stumbled on the fact that Dan MacArthur probably has recent (>200 years) South Asian ancestry. As an academic genomicist Dan could have dug up this fact himself, but he has grants and papers to write, not to mention a non-scientific life. So it was left to me to stumble upon the fact. On the margin it’s not that useful to Dan, but it’s something. You never know what’s going to happen when you release data, because you can’t read the minds of others. And that sort of surprise is a good thing.

One of the greatest intellectual philanthropists in recent years has been Mait Metspalu. He has plenty of publications to his name, but he’s also generously released and assembled the data together in convenient form. This allows for easy reanalysis. A few days ago I noticed that he had put up a few more European populations, including understudied groups like Greeks. With the recent flair up on Ukraine I thought I would process some of the new data. I pruned the data set down to 230,000 high quality SNPs, and focused on a large and small data set respectively of 500 and 340 individuals.

Click for larger images.

md1

mds2

treemix1

TreeMix2

admixture

– As suggested by Dienekes modern Greeks seem to have been impacted more by northern gene flow (Slavs) than the inhabitants of Magna Graecia (Southern Italy and Sicily)

– There’s not much difference between Poles, Ukrainians, and Russians (though there are Russian samples from traditionally Finnic regions which are more diverse)

– Not much difference between Romanians, Bulgarians, and Hungarians

– The Northern European clusters can separate reasonably. Slavic, Finnic, and Germanic

I’ll leave it to readers to make further comments.

Tools used: Plink 1.9, ADMIXTURE and TreeMix.

Methods: First two plots are MDS representations of pairwise genetic differences between individuals. I used kerneling to lasso around the centroids of specific populations. The middle two are from TreeMix, and I asked for 5 migrations, rooting with outgroups, and allowed to reorder globally. Finally, the last is just ADMIXTURE. Ran at K = 6. You see the mean for each population.

Posted in Uncategorized

Comments are closed.