Forensic genetics after Golden State Killer

It’s been a year and a half since the Golden State Killer was arrested. That was a big day in the genetics community, as genealogy was leveraged for forensics in a big way. One of the people who I began to have discussions with regarding this development was my friend David Mittelman. Since then David has started his own forensic genetics company, Othram.

He moves fast!

But there’s a major issue with any project moving forward into this space: the strange ethical grayland of genomic databases. A lot of the breakthroughs are coming through GED Match, a site that feels like it stepped out of the late 1990s, with both the innocence and design sense of that period. You’ve probably read about the fire which the proprietors of GED Match have come under due to confusions about terms of use. Curtis Rogers, a co-founder of GED Match, thinks it’s a “distraction.” Certainly, it has been for him.

GED Match is great, and the founders tried to do great things with the best of aims. But the world comes at you fast.

As someone who has put their own genotype into the public domain, I’m not super worried about privacy. Yaniv Erlich of MyHeritage was one of those aggressively asserting that he would be happy for people to solve violent crime with his genotype when the Golden State Killer was caught. Many of us feel that way, though not all of us.

To get at the forensic and criminal justice aspect of genomics, and around some of the ethical hurdles of prior databases, David’s company has created a new database, DNA Solves. Since it was designed and coded this year it definitely feels 2019. I uploaded some of my raw genotype data and it was very easy and quick. The FAQ is explicit in what the aim here is. Othram is a forensic genetics firm that gains from public buy-in, but the current options are not optimal. Everyone is worried that GED Match will get shut down. There need to be alternatives out there.

This database is aimed only at helping law enforcement. There’s no public search. And, David told me they’re only going to return matches, not the whole genotype. This is basically a tool that allows people to want to get involved to remain involved.

If you are as open about your genes as I am, I’d recommend checking it out.

(in the near future they will begin providing “reports” to people who volunteer to upload to get the database bigger)

Note: Dante is telling me that my sample is being sequenced. I will be posting my whole genome online soon (I promised about a decade ago that I’d do this if I got WGS).

On the whole genomics will not be individually transformative…for now

A new piece in The Guardian, ‘Your father’s not your father’: when DNA tests reveal more than you bargained for, is one of the two major genres in writings on personal genomics in the media right now (there are exceptions). First, there is the genre where genetics doesn’t do anything for you. It’s a waste of money! Second, there is the genre where genetics rocks our whole world, and it’s dangerous to one’s own self-identity. And so on. Basically, the two optimum peaks in this field of journalism are between banal and sinister.

In response to this, I stated that for most people personal genomics will probably have an impact somewhere in the middle. To be fair, someone reading the headline of the comment I co-authored in Genome Biology, Consumer genomics will change your life, whether you get tested or not, may wonder as the seeming contradiction.

But it’s not really there. On the aggregate social level genomics is going to have a non-trivial impact on health and lifestyle. This is a large proportion of our GDP. So it’s “kind of a big deal” in that sense. But, for many individuals, the outcomes will be quite modest. For a small minority of individuals, there will be real and important medical consequences. In these cases, the outcomes are a big deal. But for most people, genetic dispositions and risks are diffuse, of modest effect, and often backloaded in one’s life. Even though it will impact most of society in the near future, it’s touch will be gentle.

An analogy here can be made with BMI or body-mass-index. As an individual predictor and statistic, it leaves a lot to be desired. But, for public health scientists and officials aggregate BMI distributions are critical to getting a sense of the landscape.

Finally, this is focusing on genomics where we read the sequence (or get back genotype results). The next stage that might really be game-changing is the write revolution. CRISPR genetic engineering. In the 2020s I assume that CRISPR applications will mostly be in critical health contexts (e.g., “fixing” Mendelian diseases), or in non-human contexts (e.g., agricultural genetics). Like genomics, the ubiquity of genetic engineering will be kind of a big deal economically in the aggregate, but it won’t be a big deal for individuals.

If you are a transhumanist or whatever they call themselves now, one can imagine a scenario where a large portion of the population starts “re-writing” themselves. That would be both a huge aggregate and individual impact. But we’re a long way from that….

Razib Khan’s raw genotype data on 23andMe, Family Tree DNA, Geno 2.0 and Ancestry

It has been a while since I posted an update on my genotype. Since then I’ve been tested on most of the major platforms. I don’t see any harm in releasing this to the public or researchers who want to look at it (though I don’t know why anyone would).

You can download all the files here.

Having my genotypes public is pretty useful for me. If I inquire about someone’s genetics oftentimes people get weirdly defense and ask “what about you?” I Just invite them to look at my raw data and analyze it for themselves! I’m not a hypocrite about this.

Over the years I’ve had researchers inquire about my ethnicity when they stumble upon my genotype on platforms such as openSNP. So in full disclosure, most of my ancestry is pretty standard eastern Bengali. I’m more East Asian shifted than most Bangladeshi samples in the 1000 Genomes project, but then my family is from Comilla, in the far east of eastern Bengal (anyone who cares, my Y is of course R1a1a-Z93 and my mtDNA U2b).

As before I’ll put the genotype under a Creative Commons license:Creative Commons License

Bank your exome with Helix for free ($0.00) [update, SALE ENDED!]

Update: Sale over!

I wasn’t going to do this again, but I’ve decided to promote Helix’s special discount. It ends at 2:59 AM EDT November 10th. Eight hours from when I push this post.

Obviously, there is a conflict of interest as I work for one of Helix’s partners. What does that mean?

  • Helix does an exome+ sequence and stores your data.
  • Then, you buy applications which use that data.
  • The company I work for is one of the application providers.
  • “Exome” means that Helix does a very accurate medical grade sequence of all your genes. The “+” points to the fact that they include a substantial number of positions which are not within genes (in the “junk DNA”). That totals up to 30,000,000+ markers (the exome is 1% of your whole genome). This is not trivial. Current direct-to-consumer genomics companies are looking at 500,000 to 1,000,000 markers with SNP arrays.
  • Helix keeps this data. Within a few months, you can buy the data at cost (it won’t be cheap!). But the model is that you buy a la cart apps, which will be affordable (our products are affordable).

I’m laying this all out very plainly because many people are asking me about these details right now as the sale winds down, and this includes people who are pretty savvy about personal genomics. Here is why I think you should get the kits now:

  1. It gets my company more customers. That’s the self-interested part, and less important for the target audience.
  2. For you, it gets you an exome that you can buy later without any upfront cost. For the next eight hours, Helix is basically waiving the kit costs by dropping the price $100.

Our Neanderthal product is now $9.99. Our Metabolism product is $19.99. These products are great, as they give you functional information in a very user-friendly manner. But a lot of my readers can analyze their own data, so what’s the incentive then? Again, the incentive is that you get an exome for free, and can later buy it if you want, or, perhaps even a savvy personal genomics consumer will find an app they’ll want to purchase. Normally the kit is $80, so buying it now means you’ll never have to pay this cost. If you are the type of person who has qualms about a private company keeping your data, this may not be for you.

Of course, there are other app developers in the Helix store, so just buy whatever you want. This is a way to get your exome sequenced for free nowI will tell you that the Insitome apps are among the cheapest.

Finally, a lot of people are buying “family-pack” quantities. I got four kits for example for my immediate family. Unfortunately, there are some issues with the Helix site and the extra purchases. You can buy more than one easily at Amazon right now. Our Neanderthal product is not in low stock. The Metabolism product has only a few left, though I don’t know what that means.

Note: The discount is client-side, so you may need to switch browsers if you are going to the Helix site to buy (or turn off ad-block). From what I can see Amazon does not have these issues.

10 million DTC dense marker genotypes by end of 2017?


Today I got an email from 23andMe that they’d hit the 2 million customer mark. Since they reached their goal of 1 million kits purchased the company seems to have taken its foot off the pedal of customer base growth to focus on other things (in particular, how to get phenotypic data from those who have been genotyped). In contrast Ancestry has been growing at a faster rate of late. After talking to Spencer Wells (who was there at the beginning of the birth of this sector) we estimated that the direct-to-consumer genotyping kit business is now north of 5 million individuals served. Probably closer to 6 or 7 million, depending on the numbers you assume for the various companies (I’m counting autosomal only).

This pretty awesome. Each of these firm’s genotype in the range of 100,000 to 1 million variant markers, or single nucleotide base pairs. 20 years ago this would have been an incredible achievement, but today we’re all excited about long-read sequencing from Oxford Nanopore. SNP-chips are almost ho-hum.

But though sequencing is the cutting edge, the final frontier and terminal technology of reading your DNA code, genotyping in humans will be around for a while because of cost. At ASHG last year a medical geneticist was claiming price points in bulk for high density SNP-chips are in the range of the low tens of dollars per unit. A good high coverage genome sequence is still many times more expensive (perhaps an order of magnitude ore more depending on who you believe). It also can impose more data processing costs than a SNP-chip in my experience.

Here’s a slide from Spencer:

I suspect genotyping will go S-shaped before 2025 after explosive growth in genotyping. Some people will opt-out. A minority of the population, but a substantial proportion. At the other extreme of the preference distribution you will have those who will start getting sequenced. Researchers will begin talk about genotyping platforms like they talk about microarrays (yes, I know at places like the Broad they already talk about genotyping like that, but we can’t all be like the Broad!).

Here’s an article from 2007 on 23andMe in Wired. They’re excited about paying $1,000 genotyping services…the cost now of the cheapest high quality (30x) whole genome sequences. Though 23andMe has a higher price point for its medical services, many of the companies are pushing their genotyping+ancestry below $100, a value it had stabilized at for a few years. Family Tree DNA has a father’s day sale for $69 right now. Ancestry looks to be $79. The Israel company MyHeritage is also pushing a $69 sale price (the CSO there is advertising that he’s hiring human geneticists, just so you know). It seems very likely that a $50 price point is within site in the next few years as SNP-chip costs become trivial and all the expenses are on the data storage/processing and visualization costs. I think psychologically for many people paying $50 is not cheap, but it is definitely not expensive. $100 feels expensive.

Ultimately I do wonder if I was a bit too optimistic that 50% of the US population will be sequenced at 30x by 2025. But the dynamic is quite likely to change rapidly because of a technological shift as the sector goes through a productivity uptick. We’re talking about exponential growth, which humans have weak intuition about….

Addendum: Go into the archives of Genomes Unzipped and reach the older posts. Those guys knew where we were heading…and we’re pretty much there.