How related should you expect relatives to be?

Like many Americans in the year 2018 I’ve got a whole pedigree plugged into personal genomic services. I’m talking from grandchild to grandparent to great-aunt/uncles. A non-trivial pedigree. So we as a family look closely at these patterns, and we’re not surprised at this point to see really high correlations in some cases compared to what you’d expect (or low).

This means that you can see empirically the variation between relatives of the same nominal degree of separation from a person of interest. For example, each of my children’s’ grandparents contributes 25% of their autosomal genome without any prior information. But I actually know the variation of contribution empirically. For example, my father is enriched in my daughter. My mother is my sons.

The sample principle applies to siblings. Though they should be 50% related on their autosomal genome, it turns out there is variation. I’ve seen some papers large data sets (e.g., 20,000 sibling pairs) which gives a standard deviation of 3.7% in relatedness. But what about other degrees of relation?

On the whole genomics will not be individually transformative…for now

A new piece in The Guardian, ‘Your father’s not your father’: when DNA tests reveal more than you bargained for, is one of the two major genres in writings on personal genomics in the media right now (there are exceptions). First, there is the genre where genetics doesn’t do anything for you. It’s a waste of money! Second, there is the genre where genetics rocks our whole world, and it’s dangerous to one’s own self-identity. And so on. Basically, the two optimum peaks in this field of journalism are between banal and sinister.

In response to this, I stated that for most people personal genomics will probably have an impact somewhere in the middle. To be fair, someone reading the headline of the comment I co-authored in Genome Biology, Consumer genomics will change your life, whether you get tested or not, may wonder as the seeming contradiction.

But it’s not really there. On the aggregate social level genomics is going to have a non-trivial impact on health and lifestyle. This is a large proportion of our GDP. So it’s “kind of a big deal” in that sense. But, for many individuals, the outcomes will be quite modest. For a small minority of individuals, there will be real and important medical consequences. In these cases, the outcomes are a big deal. But for most people, genetic dispositions and risks are diffuse, of modest effect, and often backloaded in one’s life. Even though it will impact most of society in the near future, it’s touch will be gentle.

An analogy here can be made with BMI or body-mass-index. As an individual predictor and statistic, it leaves a lot to be desired. But, for public health scientists and officials aggregate BMI distributions are critical to getting a sense of the landscape.

Finally, this is focusing on genomics where we read the sequence (or get back genotype results). The next stage that might really be game-changing is the write revolution. CRISPR genetic engineering. In the 2020s I assume that CRISPR applications will mostly be in critical health contexts (e.g., “fixing” Mendelian diseases), or in non-human contexts (e.g., agricultural genetics). Like genomics, the ubiquity of genetic engineering will be kind of a big deal economically in the aggregate, but it won’t be a big deal for individuals.

If you are a transhumanist or whatever they call themselves now, one can imagine a scenario where a large portion of the population starts “re-writing” themselves. That would be both a huge aggregate and individual impact. But we’re a long way from that….

There could be 100 million genotyping kits sold by January 1st 2020

The figure to the right is from the comment David Mittelman and I wrote for Genome Biology, Consumer genomics will change your life, whether you get tested or not. The original numbers are from ISOGG, which does a great job collating information from a variety of sources. When final revisions for the comment were due, we only found data up to 5/1/2018.

That being said, I thought it would be useful to generate a chart where I combined and smoothed the results from the various companies. It is clear that the period after 2016 is when you see massive takeoff and adoption, driven first by Ancestry, but later by 23andMe joining the race. The other companies have been increasing their sales as well, with new players such as MyHeritage making a big play.

All this makes me wonder: what does the future have it store? Year-to-year the total number of kits in circulation were doubling in 2013 and 2014. That rate dropped to ~1.6-fold increases in 2015 and 2016. A lot of this is due to 23andMe turning away from customer acquisition (more marketing always leads to more sales). With 23andMe competing with Ancestry again in 2017 one saw a >2.5-fold increase in the number of kits sold.

My back-of-the-envelope calculations indicate that around 1.8 million kits were being sold per month between the big players in the first in the first 4 months of 2018. That’s about 18 million kits this year. That means 29 million kits total in circulation by January 1st of 2019. The wildcard here though is that this space is “consumer”, which means that a disproportionate number of kits are going to be sold between Halloween and Christmas. Extrapolating from the period between January 1st to May 1st, as I’m doing above, could be way too conservative.

The sales in markets outside of the USA, along with customer acquisition through marketing, need to keep increasing up until January 1st of 2020 for there to be 100 million kits sold. But I think it’s very possible. I’m on the bubble of saying even likely. The wholesale price of arrays (the chips) keeps decreasing, so the price point of the consumer product is also decreasing. This isn’t a situation where the market is growing linearly, it’s exponential. A few positive shocks here and there 100 million by January 1st of 2020 may seem conservative.

Addendum: There has been some confusion in the media between sequencing and genotyping platforms. These are different technologies. Genotyping platforms, SNP-arrays, are targeting a genome-wide subset of polymorphisms. 23andMe’s current chip seems to probe about 630,000 markers. The whole genome consists of 3 billion bases. In the 2020s sequencing will probably replace targeted genotyping arrays in consumer products, but it will probably really come to the fore first in the medical space.

Consumer Genomics in 2018, beyond the future’s threshold

In 2013 David Mittelman and I wrote Rumors of the death of consumer genomics are greatly exaggerated. This was in the wake of the FDA controversy with 23andMe, and continuing worries about DNA and privacy. Today David and I came out with a new comment in Genome BiologyConsumer genomics will change your life, whether you get tested or not.

Really transformative technology becomes beneath comment. As long as we’re having to comment about genomics, it isn’t really mainstream. But I think in 2018 it is much clearer that the 2020s will see legitimate mainstreaming. The numbers speak for themselves. I hadn’t realized in a visceral manner how much had changed since our original comment came out. It’s pretty much an order of magnitude shift.

My hypothesis for why 23andMe plateaued for a while at ~1 million is that that was the sample size which maximized the statistical power they wanted to catch loci of particular effect sizes. In the initial years, 23andMe was not just buying customers with marketing, it was subsidizing the array costs. Today Illumina SNP arrays are well under $50 (some people say less than $25) wholesale, so I think at some point in early 2017 they realized even though 10 million wasn’t worth much to them in comparison to 1 million for GWAS, they were going to lose the luster of being “market leader” to Ancestry, who were acquiring customers at a massive clip through their marketing (my understanding is that at some point Illumina was having issues processing the samples that Ancestry was returning to them it was at such high scale; higher than Ancestry had anticipated!).

At least today we can explore personal genomics

A very long piece on the “personal genomics industry.” Lots of quotes from my boss Spencer Wells, since he has been in the game so long.

The piece covers all the bases. I actually think some of the criticisms of direct-to-consumer genetics are on base. I just don’t think they’re insoluble problems, or problems so large that that should discourage the industry from growing. I think part of the problem is that many of the people journalists can talk to who can comment on the industry are based in academia, and academia has a different focus when it comes to comes to genetics than the nascent industry. For rational reasons academics need to be very careful when it comes to ethics. Consumer products I think are somewhat different.

But I do think we need to reflect how far we’ve come in 10 years. Back in the 2000s when I was reading stuff on Y, mtDNA and autosomal studies, I honestly didn’t imagine that I would know my own haplogroups and genome-wide ancestry decomposition. It seemed like science fiction. That all changed rather rapidly over a few years, and I purchased kits in the early years when the price was still high. Today it’s a mass industry, with a sub-$100 price point in many cases.

Yes, there are plenty of cautions and worries we need to consider. But the future is already the present, and the horse has left the stable.

Personal genomics lives!

Reflecting back to it I think I started “exploring personal genomics” in the late 2000s. That’s when direct-to-consumer testing started to become popular, albeit very niche. The book Exploring Personal Genomics is now 5 years old, and a lot has changed since then. In the same year, 2013, David Mittelman and I cowrote Rumors of the death of consumer genomics are greatly exaggerated in Genome Biology.

Now Science has a commentary out, Crowdsourced genealogies and genomes, which reviews how large amounts of public data, genetic and classical genealogical, are being used to change the field before our very eyes. I would recommend though that you read the less edited (longer, more detailed) version on the website of the authors, Crowdsourcing big data research on human history and health: from genealogies to genomes and back again.

This fact from that piece is really illustrative of what’s happening today:

As the number of customers of whole-genome DTC genetic testing just crossed 16 million, it is worth noting that almost two-thirds of them joined since the beginning of 2017 [19]. Based on current rates, this number of customers is predicted to be close to 100 million by end of 2020.

Notes from the personal genomic inflection point

There’s a debate that periodically crops up online about the utility, viability, and morality of returning results from genetic tests to consumers. Consumers here means people like you or me. Pretty much everyone.

If you want to caricature two stylized camps, there are information maximalists who proclaim a utopia now, where people can find out so much about themselves through their genome. And then there are information elitists, who emphasize that the public can’t handle the truth. Or, more accurately, that throwing information without context and interpretation from someone who knows better is not just useless, it’s dangerous.

Of course, most people will stake out more nuanced complex positions. That’s not the point. Here is my bottom-line, which I’ve probably held since about ~2010:

  1. The value for most people in actionable information in direct-to-consumer genetics is probably not there yet when set against the cost.
  2. With the reduction in the cost of genotyping and sequencing, there’s no way that we have enough trained professionals to handle the surfeit of information. And there will really be no way in 10 years when a large proportion of the American population will be sequenced.

At some point, the cost will come down enough, and the science probably is strong enough, that direct-to-consumer genetics moves away from novelty and early adopters to the mass market. At that point, we need to be able to make the best use of that data. Genetic counselors, geneticists, and doctors all cost a fair amount of money and have a finite amount of labor supply to provide to the public. They need to focus on serious, complex, and consequential cases.

To some extent, we need to reduce much of interpretation in the personal genomics space to an information technology problem. For example, if someone’s genotype pulls out a bunch of statistically significant hits of interest the tool should automatically condition significance on that individual’s genetic background.

Yes, there are primitive forms of these sorts of tools out there already. But they’re not good enough. And that’s because there isn’t the market need. But there will be.

The 23andMe BRCA test

In case you were sleeping under a rock, 23andMe got FDA approval for DTC testing of markers related to BRCA risk. Obviously, this is a pretty big step, in principle.

But the short-term implications are not that earth-shaking.

From the FDA release:

The three BRCA1/BRCA2 hereditary mutations detected by the test are present in about 2 percent of Ashkenazi Jewish women, according to a National Cancer Institute study, but rarely occur (0 percent to 0.1 percent) in other ethnic populations. All individuals, whether they are of Ashkenazi Jewish descent or not, may have other mutations in BRCA1 or BRCA2 genes, or other cancer-related gene mutations that are not detected by this test. For this reason, a negative test result could still mean that a person has an increased risk of cancer due to gene mutations….

Apparently, women with one of these variants have a 45-85% chance of developing breast cancer by age 70. So the penetrance is high.

It seems that you’ll know if this sort of test is going to have utility for you based on family history.

The big thing is the transition to DTC. This will increase availability and drive the price down. That’s probably going to mean more work for those engaged in interpretation and education. False positives are going to start being a major thing….

Helix kit price waived until December 26 at 2:59am EST

Happy Hanukkah! My main qualm with wishing you a happy holiday is that I’m a thorough assimilator and I don’t want to be disemboweled.

For the context, listen to the Stuff You Missed in History Class episode on the Maccabean Revolt. As a Jewish friend of mine once observed, the Maccabees were kind of the Al-Qaeda of their day (today she would have said ISIS).

With that out of the way, I want to give you a heads up that Helix has a sale going until December 26 at 2:59am EST where the $80 kit cost for purchase of any app is waived if you haven’t purchased at app before. Just enter the promotion code HOLIDAY at checkout.

That means presales of Insitome’s Regional Ancestry is no more than $19.99, while Neanderthal is $29.99 and Metabolism is $39.99 (this applies to all of Helix’s products except embodyDNA by Lose It! and Geno 2.0 by National Geographic).

Why does it matter? Again, Helix banks a high quality exome+ (the + is for non-exonic positions) when you purchase any of their apps. If you want subsequent apps you don’t have to sent another kit in, you just buy the app and get the results. Also, I do have to say that from what I’ve seen and heard Helix’s laboratory facilities are top-notch in terms of getting results turned around rapidly.

Genomic ancestry tests are not cons, part 2: the problem of ethnicity

The results to the left are from 23andMe for someone whose paternal grandparents were immigrants from southern Germany. Their mother had a father who was of English American background (his father was a Yankee American with an English surname and his mother was an immigrant from England), and grandparents who were German (Rhinelander) and French Canadian respectively on their maternal side.

Looking at the results from 23andMe one has to wonder, why is this individual only a bit under 25% French & German, when genealogical records show places of birth that indicates they should be 75% French & German (more precisely, 62.5% German and 12.5% French). Though their ancestry is 25% English, only 13% of their ancestry is listed as such.

First, notice that nearly half of their ancestry is “Broadly Northwestern European.” Last I  checked  23andMe uses phased haplotypes to detect segments of ancestry. This is a very powerful method and is often quite good at zeroing in on people of European ancestry. But with Americans of predominant, but mixed, Northern European background rather than giving back precise proportions often you obtain results of the form of “Broadly…” because presumably, recombination has generated novel haplotypes in white Americans.

But this isn’t the whole story. Why, for example, are many of the Finnish people I know on 23andMe assigned as >90% Finnish, while a Danish friend is 40% Scandinavian?

The issue here is that to be “Finnish” and “Scandinavian” are not equivalent units in terms of population genetics. Finns are a relatively homogeneous ethnic group who seem to have undergone a recent population bottleneck. In contrast, Scandinavia encompasses several different, albeit related, ethnicities which are geographically widely distributed.

Ethnic identities are socially and historically constructed. Additionally, they are often clear and distinct. This is not always the case for population genetic classifications. On a continental scale, racial classification is trivial, and feasible with only a modest number of genetic markers. Why? Because the demographic and evolutionary history of Melanesians and West Africans, to give two concrete examples, are distinct over tens of thousands of years. Population genetic analyses which attempt to identify or differentiate these groups have a lot of raw material to work with.

