Substack cometh, and lo it is good. (Pricing)

Razib Khan 30x whole-genome sequence data

About four years ago I posted my genotype data for anyone who wanted it. This included the raw export files from consumer genomics firms + my VCF file generated by Dante Labs.

Today I will make my raw data all public from Dante Labs. This means you can access

– raw reads
– .bam file
– .vcf files, as well as files with CNVs and SVs

This is all 30x coverage so be warned these aren’t the smallest files. Here is the link to my data.

If you find something noteworthy, reach out to me! For those who want geographic provenance, seven of my eight great-grandparents were born in the Comilla region of modern Bangladesh. The eight was born in the Noakhali region, just to the south of Comilla.

9 thoughts on “Razib Khan 30x whole-genome sequence data

  1. I find interesting and courageous your initiative. I did the same with my data. My FGC (which costed 1850 dollars /1350 euro) was put at disposal of the Emory University behind the request of the same Justin Low, and my Dante labs was sent to YFull. Of course I used my data pretty much only for the uniparental markers, except for MyTrueAncestry too. I never had the PC and the programs for doing more, so I used the data of others, above all YFull and Genetiker until he was active. If you want I may put my Dante labs account at your disposal for doing the same, hoping that my data are always alive. I bought also WinRAR, tired of using a trick for reading the compressed files.

  2. Who do you recommend for Whole-Genome Sequencing? The Dante Labs website seems to have issues. What do you think of Sequencing.com?

  3. What is the difference between the bam format ant the raw-reads (assuming the fastq}?

    I’ve been following this area for 4 years now, and to be honest have not really bothered to learn the specific differences between the files.

    Thanks

  4. I did Nebula 30x, and I am happy with my results. However, before getting Whole Genome Sequenced, I tested with 23andme, Ancestry, FTDNA, Helix Genographic 2.0, and I combined all of them with a superkit with over 1 million SNPs. My Nebula file is over 4 million SNPs, but the results are roughly the same as the Superkit in calculators. Of the consumer genomic tests, Ancestry was the closest to the WGS 30x, and Superkit.

  5. I also tested with Living DNA, it was also part of the Superkit. The Living DNA raw data file was very similar to that of 23andme V5.

  6. Hi Razib,

    Other novices may want to know the Broad Institute IGV tool comes preloaded with a human genome and the option to pull other genomes from their database.

    https://software.broadinstitute.org/software/igv/download

    Have you thought of offering the DIY DNA class again? You could charge participants a fee if that would make it worth your while. I certainly would be interested (now that I have access to a viewer).

    Thanks!

    PS: I finally did manage to get your .bam and .bai onto my thumb drive fwiw

Comments are closed.