Substack cometh, and lo it is good. (Pricing)

Allen Ancient DNA Resource in PLINK format

The Reich lab released a bunch of data in January 2021. Someone emailed me about the format. I converted their earlier release to PLINK (PEDIGREE) format, and they wondered if I could do the same again for this relase. I did so. Remember that the “FAMILY ID” is the population as identified from their annotation files.

Here are the files:

v44.3_1240K_public.bed
v44.3_1240K_public.bim
v44.3_1240K_public.fam

v44.3_HO_public.bed
v44.3_HO_public.bim
v44.3_HO_public.fam

4 thoughts on “Allen Ancient DNA Resource in PLINK format

  1. In the wee hours of the morning as I read the subject line in my rss reader, the word “allen” sure looked a lot like the word “alien” to my initial surprise and excitement.

  2. For a moment I thought that said Alien Ancient DNA and I got all excited that you were really letting loose now.

    EDIT: Purpleslog beat me to this but in my defense that comment didn’t show up when I first loaded the page.

  3. For a quick estimate of how much this dataset will expand, David Reich’s new talk says about 10-20% of samples good enough for this sort of treatment. Their published sample set is 3,810 and there are about 200 samples here, so this probably isn’t even as many of the published samples as can get this treatment, and would expect (optimistically) at least a doubling of this set.

    Reich also states they have about 12k more samples unpublished(https://imgur.com/a/uqWhiXy), so quick naive estimate is that could be up to potentially 1.5k of these “medical quality” ancient genomes when they are able to fund it. More if including some high coverage efforts from other groups, like high-coverage La Brana, Loschbour, etc, would push up the number.

    It’ll be interesting to see what the key characteristics are of the samples that can get this treatment. I had a quick look to see if there was any correlation between shotgun coverage in this set and the 1240k coverage in the latest Human Origins file: https://imgur.com/a/FOVTnQ0. There is some, but lots of outliers. It’s probably a factor though, as average capture coverage of the entire 1240k set is 1.4x with 244202 SNPs, vs these samples having capture coverage of 4.1x and 433463 SNPs.

Comments are closed.