DIY your DTC DNA

The screenshot to the right is from my updated 23andMe results. The inference that my ancestry is from “Chittagong Division” is 100% correct. More precisely, my family is from Comilla. There are some cases where consumer genomics tells you exactly what you know, and this is one of those. 23andMe has an excellent method to infer ancestry, and the power of a massive database. If you want to see how they do it, check out their new preprint. It’s pretty fancy.

What would be more informative is if they let me see what it means to be from Chittagong Division compared to other Bengalis. Or to be Bengali compared to other South Asians. You probably already know your recent history through basic family genealogy, but what do these results tell you about your deep history and your relatedness to global populations?

Of course, realistically even the largest direct-to-consumer genomics companies can only deliver so much, because they are simultaneously serving millions of customers. A custom approach isn’t a feasible ask, even if it is what many consumers are longing for. Little surprise then that people have been reaching out to me about anomalies in their results for over a decade. People like me who got no value-added information (as in: they already know where their great-grandparents were born) reach out too. We’re driven to know more.

That is why this coming week I’m offering a first workshop through Speakeasy titled Analyze Your 23andMe and Ancestry Data (and N.B. it didn’t make the title but I’ve prepared everything to work right on results from Family Tree DNA as well). It’s Wednesday, Jan 27, 5pm PT/8 pm ET. 

Here’s how it will work. You’ll arrive in class with your (or a friend’s or relative’s) 23andMe, Ancestry or Family Tree DNA raw data. Before class you’ll get a zip folder with all the files and utilities you need for class. You’ll have downloaded R & R Studio if you don’t already have them (I include instructions, but don’t worry, this is quick and easy!). And you’ll want to decide whether to Zoom into the meeting on one device and access your data on the other (or work on a two-screen setup if you’re on a desktop); neither is essential, but I’d consider them nice to have. 

At that point you’re ready for the workshop. And we’ll get straight into digging into your data. You can come to class with a question or questions about your ancestry. Or I can help you zero in on what might be interesting given what you already know.

Over the course of the hour, I’ll guide you through the use of three tools. No lecture, just hands-on doing, with your own actual data. Two of your tools are open-source utilities written by academics for their colleagues that have been in wide use in genomics for over a decade. Even long-time readers who aren’t here for the genetics will recognize Plink and Admixture, which I’ve referenced on this blog thousands of times. 

In addition to easing you right into using these core tools of the trade (without any of the usual slow initial learning curve), I have built an automated pipeline just for participants in the class. This is your third tool, which will save you untold startup hours no matter who you are. I’ve created an automated workflow so that you can input your raw data from any of those three DTC genetics companies and analyze it (including automatically generating formatted output) against your choice of reference populations (a library of which I’ve also prepared for you) and 2. automatically plot and visualize your results in a flexible, customizable format.

I have written all the scripts for you in order to create a custom, automated pipeline. This draws on my years of experience using these tools and guiding others. Then, the bespoke reference population library of human genomes I’ve curated for this purpose instantly equips you to measure your relatedness to any branch or branches of humanity (you get 5000 human genotypes culled from public datasets and selected to represent 250 distinct populations, on a quarter of a million markers (SNPs).

And in class I will teach you and guide you through using your toolkit in real time. Getting started in Plink and Admixture can entail hours of trouble-shooting and false starts. A decade into using them, I know the quirks and idiosyncrasies of these programs all too well; and that’s why I’ve built a pipeline that allows you to leapfrog over those slow early steps and get right into your (or any) genetic data. Building and merging a reference panel from publicly available sources is time-consuming and a headache and the individuals don’t come clearly labeled by population. I’ve got everything ready and clearly identified for you. With these two headstarts, and lots of pointers about best practices along the way, you’ll be asking and answering (and outputting visualizations of) your own questions in your first hour.

By the end of the workshop, you’ll have the skills and the tools to analyze genotypes against world population data. You’ll be able to use Plink, Admixture and the pipeline I’ve created for you on any DTC genomic results from those main three companies. You’ll have both the curated reference library and the pipeline I built for you… And the know-how to use them to your ends. You’ll also have reference cheat-sheets to remember how to do everything you tried in the workshop (I don’t want you having to take notes when you can be learning by doing!) 

Who is this for? You. I promise. My goal with this project is to make it accessible and easy for everyone with basic personal computing literacy. Not programming, not command line, not R. Just be comfortable on your computer. (And for this iteration, you need to be on a Mac or Ubuntu/Linux OS. I’m still working out a kink in Windows, so DM me or comment below if you’d like to trial it on Windows once I get that working.) 

I want to reach people who aren’t geneticists. I want to reach people who think they can’t do this. I want to show curious people who have never heard of any of the tools I’m naming that they can still delve into their DNA on their own terms the first time they sit down and try. (I did a trial run of the course with a crew of friends recently. Everyone did great, including the two who were anxious beforehand. And let’s be real: if you’re thinking you’re not tech-savvy enough, it probably says less about your actual tech skills than it does about your friends/family and how tech-nerdy they are!)

But enough about you. Let’s get back to me and what I did with my 23andMe results. To answer the question of what it means to be “Bengali from Chittagong” I analyzed myself against only a few populations and compared myself to the Bengalis in the 1000 Genomes. 

Read More

David Shor did nothing wrong, and is he now the most famous American Moroccan Jew?

The future and the past

Unless you have been sleeping under a rock, you know the saga of David Shor. You have read about him in New York Magazine, The Atlantic, and New York Magazine again. Shor’s firing by Civis Analytics was so craven that even social justice scolds think it was ridiculous.

But there is another aspect of David Shor that people don’t know: his parents are Moroccan Jews. About a decade ago Shor sent me his DNA when I was doing deep-dives into peoples’ genomes. He got some strange results since there weren’t any Sephardic Jewish reference populations in 23andMe and such at the time.  I didn’t have Moroccan Jews to test against, but I do now. This is relevant for two reasons

  1. Shor has a family legend that his father may have substantial Ashkenazi ancestry
  2. Until Shor’s rise to prominence, Emmanuelle Chriqui was arguably the most famous Moroccan Jew in America (she being Canadian in origin). As he’s mentioned on every third podcast I listen to (and this has been true for a month or so), he’s definitely more “Razib famous” than Emmanuelle Chriqui, who I had to Google to find out about (she’s starting as a secondary character in a new Superman television series next year).

What are the results? David Shor seems to be 100% Moroccan Jewish. Whatever that means.

Read More

Genetics got personal in this decade

In the spring of 2010 I began to “eat my own dog-food.” By this, I mean that I entered the world of “personal genomics.” I ordered a bunch of kits from 23andMe for myself and my family.

I didn’t have too many strong expectations of surprises. One thing though I did suspect: my parents would differ some in ancestry. My mother had family lore of someone of “Chinese” background in the 19th-century.

What did I find out? First I got my Y and mtDNA results. I was at a Japanese restaurant in Japantown in San Francisco when I got the email. My Y was R1a, and my mtDNA was U2b. I was a bit surprised by the mtDNA. Bangladesh is 80% macrohaplogroup M. The Y wasn’t as surprising. I knew a substantial minority of Bengalis were R1a from the literature. But it was cool knowing for certain.

What personal genomics in the 2010s has done is making the abstract concrete. The general personal. It’s now part of the mainstream. In 2010 personal genomics was very niche, and it’s not anymore.

Another thing that 23andMe told me is that my parents are very similar genome-wide. Depending on how you calculate it they are between 10 and 20 percent East Asian (their results are highly correlated using the same parameters). This surprised me. Whatever the family legends were, my parents are pretty generic East Bengalis.

This year, DNA from an ancient woman of the Indus Valley Civilization was analyzed from Rakhighari. It turns out she was U2b!

So on the paternal side my lineage extends back to the Eurasian steppe, and the Sintashta-Andronovo cultural horizon. But on the maternal side, it is deeply rooted northwest South Asia, with the Indus Valley Civilization. That’s a pretty cool duet of facts to learn in this decade about myself.

Note: If you want to download my VCF generated from high coverage whole genome sequencing, here is the link.

Forensic genetics after Golden State Killer

It’s been a year and a half since the Golden State Killer was arrested. That was a big day in the genetics community, as genealogy was leveraged for forensics in a big way. One of the people who I began to have discussions with regarding this development was my friend David Mittelman. Since then David has started his own forensic genetics company, Othram.

He moves fast!

But there’s a major issue with any project moving forward into this space: the strange ethical grayland of genomic databases. A lot of the breakthroughs are coming through GED Match, a site that feels like it stepped out of the late 1990s, with both the innocence and design sense of that period. You’ve probably read about the fire which the proprietors of GED Match have come under due to confusions about terms of use. Curtis Rogers, a co-founder of GED Match, thinks it’s a “distraction.” Certainly, it has been for him.

GED Match is great, and the founders tried to do great things with the best of aims. But the world comes at you fast.

As someone who has put their own genotype into the public domain, I’m not super worried about privacy. Yaniv Erlich of MyHeritage was one of those aggressively asserting that he would be happy for people to solve violent crime with his genotype when the Golden State Killer was caught. Many of us feel that way, though not all of us.

To get at the forensic and criminal justice aspect of genomics, and around some of the ethical hurdles of prior databases, David’s company has created a new database, DNA Solves. Since it was designed and coded this year it definitely feels 2019. I uploaded some of my raw genotype data and it was very easy and quick. The FAQ is explicit in what the aim here is. Othram is a forensic genetics firm that gains from public buy-in, but the current options are not optimal. Everyone is worried that GED Match will get shut down. There need to be alternatives out there.

This database is aimed only at helping law enforcement. There’s no public search. And, David told me they’re only going to return matches, not the whole genotype. This is basically a tool that allows people to want to get involved to remain involved.

If you are as open about your genes as I am, I’d recommend checking it out.

(in the near future they will begin providing “reports” to people who volunteer to upload to get the database bigger)

Note: Dante is telling me that my sample is being sequenced. I will be posting my whole genome online soon (I promised about a decade ago that I’d do this if I got WGS).

The age of prenatal genetic screening is here (let’s call it that!)

In the spring of 2010, I went to the studios of KQED in San Francisco to record an interview with a radio show on the BBC about PGD. Preimplantation genetic diagnosis. I haven’t thought much about the issue in the near ten years since then. Which in a personal sense certainly reflects my luck and circumstance.

But I’m thinking about the issue after reading this story from Emily Mullin, We’re Already Designing Babies: Expanded genetic testing of embryos represents a new era of family planning. But how far should the technology go?:

JJill Pinarowicz’s life has been shaped by a mutation in her mother’s DNA. The genetic error gave her two brothers a rare disease called Wiskott-Aldrich syndrome….

Both of Pinarowicz’s brothers passed away from complications of the disease. One died as a toddler, before she was born, and her other brother died at age 18, when Pinarowicz was a teenager.

Pinarowicz thought it would be too risky to have her own children….

The technique is called preimplantation genetic testing (PGT). By using PGT together with in-vitro fertilization, Pinarowicz and her husband had a healthy son in May 2017.

An incredible “feel-good” outcome so far. And not surprising. I have become more conservative about technology since I first started writing on the internet in the early 2000s, but I will never oppose these sorts of genetic technologies that allow couples whose offspring are at high risk of developing serious debilitating conditions to avoid these scenarios. But the magnitude of how common this now took me aback:

The U.S. Centers for Disease Control and Prevention (CDC) reported in January that PGT was used in 22 percent of IVF cases in 2016, up from just 5 percent in the previous year.

Since the last statistic Mullin could find was from 2016, it’s almost certain that the proportion is greater than 22 percent today. The numbers for 2018 seem difficult to find, but it seems likely that ~75,000 live-births per year in the USA are now due to IVF. Worldwide there are in the range of 10 million humans alive today due to IVF.

How relevant IVF is to fertility varies by social and demographic variables. I know a fair number of people who have done IVF. The average age of a mother at her first birth is 32 in San Francisco and 31 in Manhattan. As many of you probably know many options relating to fertility and genetic testing come “online” for American insurance companies at age 35.

When you transform blue-sky exotic basic science into mass technology they become far less controversial. One of the major themes of Carl Zimmer’s new book, She Has Her Mother’s Laugh, was the vocal and mainstream nature of 20th-century eugenics. A major criticism of Robert Plomin’s Blueprint is that it was resurrecting genetic determinism. Let me quote Mullin:

In Iceland, for instance, the widespread availability of prenatal genetic testing has meant that nearly 100 percent of women choose to abort a fetus with Down syndrome, which has led to a near eradication of babies being born with the condition.

What is in a word? Something in the future is worrisome. Something that professional dual-income-no-kids couples do in their attempt to attain the classic bourgeois lifestyle is not so worthy of comment? Outside of the pro-life movement the discussion of the ubiquity of screening for Down syndrome seems rather muted, even though it is widespread. While we may furrow our brows over decisions made based on polygenic risk scores, the reality is that the age of Mendelian screening is here. It is not speculative science, but applied medicine.

Call it what you want to call it.

A genetic history of the human race is not controversial science nor is it fraught

Recently I was talking to a journalist about genetic genealogy, and we both agreed that soon Christine Kenneally’s The Invisible History of the Human Race: How DNA and History Shape Our Identities and Our Futures will need an update. Though published in 2015, much of the research in The Invisible History of the Human Race dates to much earlier.  In the last few years, personal genomics has gone from a sector of millions to tens of millions. In years after 2020 it will go to hundreds of millions.

And yet I’m not sure the educated public is ready to understand what a genomic future is going to look like.

This is why I think that the Elizabeth Warren DNA story is important to get right. The reality is that this isn’t really about Elizabeth Warren’s ancestry, it’s a story at the intersection of high politics and culture wars, and genetics is getting caught in the undertow. Recently I heard Ben Shapiro comment that Warren likely had “maybe 1/1024th Native American.” Actually, I think it’s very likely she has 0.5 to 1% Native American ancestry (read this Elizabeth Warren DNA post for why I say that) Not to be trite, but facts don’t care about Ben Shapiro’s feelings. I know he’s not a fan of Warren, but he shouldn’t be laundering misrepresentations.

Even in the comments of this website motivated reasoning cropped up when the original Warren story became a national sensation. Many on the Right side of the spectrum laughed at the results and interpreted them in the least generous terms. The falsehoods and misunderstandings promoted by the media, often inadvertently because most journalists don’t have the skills to navigate the science, were injected into the conservative memesphere.  Shapiro has admitted, to his own chagrin, his lack of science background, and I suspect if I explained it to him he wouldn’t use “maybe 1/1024th Native American” line. He doesn’t need to. If you are a conservative there are many reasons to be critical of Elizabeth Warren.

But I can’t blame Shapiro too much. He was reacting to this story in The New York Times, Elizabeth Warren Stands by DNA Test. But Around Her, Worries Abound. In this piece, the attacks on Warren are coming from the Left and Native American activists. There is a real story here. The Boston Globe has published an editorial warning her not to run. The air has changed around her.

From the piece in The Times:

Warren’s presidential ambitions, she has yet to allay criticism from grass-roots progressive groups, liberal political operatives and other potential 2020 allies who complain that she put too much emphasis on the controversial field of racial science — and, in doing so, played into Mr. Trump’s hands.

Ms. Warren’s allies also say she unintentionally made a bigger mistake in treading too far into the fraught area of racial science — a field that has, at times, been used to justify the subjugation of racial minorities and Native Americans.

There is “racial science” like there is “evolution science” or “Creation science.” The term is not used by any scientist that I know of, but comes up by critics and polemicists. The New York Times, whether consciously or not, is going to convince a lot of scientifically illiterate people who don’t read their science pages that there is a field of “racial science” (using the term “race science” liberally is a thing on the Left…reminds me of social conservatives who used to call everyone who was not an evangelical Protestant a “non-Christian”)

Here’s what went on in the Warren case is:

  1. Not scientifically controversial
  2. But scientifically new

Here is a review, A comprehensive survey of models for dissecting local ancestry deconvolution in human genome which looks at “20 methods or tools to deconvolve local ancestry.” There may be disagreement on the best method for various reasons, but there is no disagreement that local ancestry deconvolution is possible. It is not controversial. In fact, it is rather important in areas such as admixture mapping for diseases.

The science isn’t that hard to explain at a high level. The figure to the left is from a new paper that recently came on the genetics of the New World (using ancient DNA). What you see is that some human populations are isolated from other human populations. For example, the last common substantial ancestry of Native American populations before 1492 and Northern Europeans dates to the period between 20,000 to 40,000 years ago.

Tens of thousands of years of genetic separation result in genetic distinctiveness. This is a standard old population genetic model. When populations come back together and mix, that daughter population is clearly going to be genetically a mix between the two parent populations. But the human genome is a sequence of three billion distinct base pairs, and the mixing exhibits discrete patterns within the genome.

Humans are diploid, which means we have two copies of each gene. These genes are aligned along homologous chromosomes. One homolog you inherit from the father and one homolog you inherit from the mother. These two homologs are the basis for Mendel’s Law of Segregation.

When sex cells, sperm and eggs, are formed they carry only one of the homologs. They are haploid, with single gene copies. If they weren’t, you’d end up tetraploid instead of diploid. You get one gene copy from the mother and one gene copy from the father.

But, before the formation of sex cells, during meoisis, the homologs undergo recombination. In humans that means that there is swapping between stretches of homologous chromosomes. The average human has between 20 and 40 recombination events across the genome. A concrete way to think about it is that the individual who is producing sperm or egg is taking the chromosomes they inherited from their parents, and mixing them together, so the final set of chromosomes are a synthetic combination of the chromosomes of grandparents.

Purple segments half-identical to paternal grandfather

To make this concrete, to the left is a partial depiction of one of my children’s chromosomes, and the relatedness to my father. The purple regions are genomic stretches where the child is half-identical to the paternal grandfather. The light gray sections have no genetic descent from my father. The reason is that one of the homologs is from the maternal side. The other homolog is from me, and could be from either my father or my mother. Where the purple alternates with light gray, you see clearly where recombination events happened, as maternal and paternal homologs broke and paired together to produce sperm with novel chromosomes (e.g., my contributed chromosome 11 is 90% my father, 10% my mother…while chromosome 19 is more balanced.

But that’s not the only way to look at recombinations. To the right is an ancestry painting for 23andMe from a friend of mine who is ~25% East Asian and ~75% Northern European. On their chromosome you see two homologs. The blue segments are Northern European. The dark brown segments are East Asian. Notice the alternation between European and East Asian on one of the homologs: this chromosome is almost certainly from the parent who is 50% East Asian and 50% European. There was a recombination event where an “East Asian” homolog, inherited from the parent of East Asian origin, recombined with the “European” homolog, inherited from the parent of European origin.

The resultant chromosome is something new in a physical sequence, with alternating segments of East Asian and European ancestry. Just as the whole genome has an imprint of the genetic history of a population, so sequences of the genome also exhibit distinctiveness due to their origins. Because each generation introduces recombination events, the lengths of these distinct ancestry blocks can tell you how many generations in the past the admixture may have happened.

That’s the theory. The new aspect is that genomic technology has allowed science to assess patterns of local ancestry to a much greater extent than was possible even 15 years ago. With hundreds of thousands of genomic positions, variants, scientists are now able to map regions of the genome to an incredible level of granularity, deploying theoretical understanding of Mendelian and population genetics that dates back to the 20th century.

To look at Elizabeth Warren’s genome, and discover that a small segment of a particular length derives from a Native American population, is not a “controversial field of racial science.” This sort of analysis is now becoming de rigueur in much of medical genetics in larger part because population history has a major impact on disease risk susceptibility. To be fair, doing a local ancestry deconvolution on populations which are much, much, closer genetically due to recent shared history is difficult. But Warren’s is not one of those cases!

Honestly, I don’t know what the outcome of The New York Times calling this “racial science” is going to be, seeing as how it seems likely in the next few years >100 million Americans will have likely done ancestry tests. Many scientists, fairly, do criticize of the interpretations of these tests, and how the public perceives them. But the underlying models and methods are workaday.

It is the interpretation, and how they interact with social and political values, is fraught. The link in the phrase “controversial field of racial scienceactually goes to an article where social and political commentators and activists react to Warren’s decision to take the DNA test. There is no discussion of the science at all. It’s controversial because of what they believe the implications are, not because the science is faulty or unsound.

For example, many (though not all) Native Americans object to the idea of using genetic science to shed light on the status of particular individuals as Native American or not. The decision to take this DNA test, in an environment where many already privately grumbled about Warren’s claims, was obviously clearly a political and public relations misstep. But that does not speak to whether the science itself is sound or unsound.

Conservatives will be highly skeptical of Warren because of her policy positions. And, if the above article is correct, it seems that some of the Left is now against her on the grounds of her impolitic foray into Native American identity. That is all fine, and not much of a concern of mine. But when non-science journalists get their hands on a science story, they tend to mess it up, and that is a problem in the long-term. The sands of politics and society are protean, and always shifting. Science is something more solid, and we should not try to muddy the waters.

On the whole genomics will not be individually transformative…for now

A new piece in The Guardian, ‘Your father’s not your father’: when DNA tests reveal more than you bargained for, is one of the two major genres in writings on personal genomics in the media right now (there are exceptions). First, there is the genre where genetics doesn’t do anything for you. It’s a waste of money! Second, there is the genre where genetics rocks our whole world, and it’s dangerous to one’s own self-identity. And so on. Basically, the two optimum peaks in this field of journalism are between banal and sinister.

In response to this, I stated that for most people personal genomics will probably have an impact somewhere in the middle. To be fair, someone reading the headline of the comment I co-authored in Genome Biology, Consumer genomics will change your life, whether you get tested or not, may wonder as the seeming contradiction.

But it’s not really there. On the aggregate social level genomics is going to have a non-trivial impact on health and lifestyle. This is a large proportion of our GDP. So it’s “kind of a big deal” in that sense. But, for many individuals, the outcomes will be quite modest. For a small minority of individuals, there will be real and important medical consequences. In these cases, the outcomes are a big deal. But for most people, genetic dispositions and risks are diffuse, of modest effect, and often backloaded in one’s life. Even though it will impact most of society in the near future, it’s touch will be gentle.

An analogy here can be made with BMI or body-mass-index. As an individual predictor and statistic, it leaves a lot to be desired. But, for public health scientists and officials aggregate BMI distributions are critical to getting a sense of the landscape.

Finally, this is focusing on genomics where we read the sequence (or get back genotype results). The next stage that might really be game-changing is the write revolution. CRISPR genetic engineering. In the 2020s I assume that CRISPR applications will mostly be in critical health contexts (e.g., “fixing” Mendelian diseases), or in non-human contexts (e.g., agricultural genetics). Like genomics, the ubiquity of genetic engineering will be kind of a big deal economically in the aggregate, but it won’t be a big deal for individuals.

If you are a transhumanist or whatever they call themselves now, one can imagine a scenario where a large portion of the population starts “re-writing” themselves. That would be both a huge aggregate and individual impact. But we’re a long way from that….

Consumer Genomics in 2018, beyond the future’s threshold

In 2013 David Mittelman and I wrote Rumors of the death of consumer genomics are greatly exaggerated. This was in the wake of the FDA controversy with 23andMe, and continuing worries about DNA and privacy. Today David and I came out with a new comment in Genome BiologyConsumer genomics will change your life, whether you get tested or not.

Really transformative technology becomes beneath comment. As long as we’re having to comment about genomics, it isn’t really mainstream. But I think in 2018 it is much clearer that the 2020s will see legitimate mainstreaming. The numbers speak for themselves. I hadn’t realized in a visceral manner how much had changed since our original comment came out. It’s pretty much an order of magnitude shift.

My hypothesis for why 23andMe plateaued for a while at ~1 million is that that was the sample size which maximized the statistical power they wanted to catch loci of particular effect sizes. In the initial years, 23andMe was not just buying customers with marketing, it was subsidizing the array costs. Today Illumina SNP arrays are well under $50 (some people say less than $25) wholesale, so I think at some point in early 2017 they realized even though 10 million wasn’t worth much to them in comparison to 1 million for GWAS, they were going to lose the luster of being “market leader” to Ancestry, who were acquiring customers at a massive clip through their marketing (my understanding is that at some point Illumina was having issues processing the samples that Ancestry was returning to them it was at such high scale; higher than Ancestry had anticipated!).

At least today we can explore personal genomics

A very long piece on the “personal genomics industry.” Lots of quotes from my boss Spencer Wells, since he has been in the game so long.

The piece covers all the bases. I actually think some of the criticisms of direct-to-consumer genetics are on base. I just don’t think they’re insoluble problems, or problems so large that that should discourage the industry from growing. I think part of the problem is that many of the people journalists can talk to who can comment on the industry are based in academia, and academia has a different focus when it comes to comes to genetics than the nascent industry. For rational reasons academics need to be very careful when it comes to ethics. Consumer products I think are somewhat different.

But I do think we need to reflect how far we’ve come in 10 years. Back in the 2000s when I was reading stuff on Y, mtDNA and autosomal studies, I honestly didn’t imagine that I would know my own haplogroups and genome-wide ancestry decomposition. It seemed like science fiction. That all changed rather rapidly over a few years, and I purchased kits in the early years when the price was still high. Today it’s a mass industry, with a sub-$100 price point in many cases.

Yes, there are plenty of cautions and worries we need to consider. But the future is already the present, and the horse has left the stable.

Personal genomics lives!

Reflecting back to it I think I started “exploring personal genomics” in the late 2000s. That’s when direct-to-consumer testing started to become popular, albeit very niche. The book Exploring Personal Genomics is now 5 years old, and a lot has changed since then. In the same year, 2013, David Mittelman and I cowrote Rumors of the death of consumer genomics are greatly exaggerated in Genome Biology.

Now Science has a commentary out, Crowdsourced genealogies and genomes, which reviews how large amounts of public data, genetic and classical genealogical, are being used to change the field before our very eyes. I would recommend though that you read the less edited (longer, more detailed) version on the website of the authors, Crowdsourcing big data research on human history and health: from genealogies to genomes and back again.

This fact from that piece is really illustrative of what’s happening today:

As the number of customers of whole-genome DTC genetic testing just crossed 16 million, it is worth noting that almost two-thirds of them joined since the beginning of 2017 [19]. Based on current rates, this number of customers is predicted to be close to 100 million by end of 2020.