Saturday, January 05, 2008

Rethinking the rate of gene losses and gains   posted by p-ter @ 1/05/2008 09:28:00 AM

In the comments on a previous post, I made reference to a paper by Demuth et al. on the evolution of gene families in mammals. As this was published in PLoS One, I took a look at the annotations. One of the comments by Laurent Duret brings up a potentially major issue--the authors use a database of gene families for their analysis, but don't try to test how exhaustive the database is. He continues:
As a control for the reliability of their analyses I looked at the 49 gene families that were considered as having been lost in the human lineage ("extinctions" in their Table 2). I retrieved in the supplementary Table S2 all the gene families that contain at least one chimp sequence and one non-primate sequence but no human sequence. These 49 gene families are all represented by a single gene in chimp...Then I extracted the corresponding chimp protein from Ensembl release 41 using BioMart...The 49 chimp genes correspond to 77 proteins (some genes encode alternative splice variants). Then I downloaded all human proteins annotated in Ensembl release 41...Finally, I BLASTed the 77 chimp proteins against the human proteome (Ensembl release 41): each of these chimp proteins has a very strong match in human : average identity (at the protein level) = 99%; minimum = 86%. Thus, none of these 49 gene families has been lost in the human lineage.

In conclusion, the rate of gene family extinction in the human lineage (Table 2) appears to be overestimated ... by a factor of 100%.[!!] It is likely that similar problems affect also the numbers given for other species.
I think it's fair to say that any of the numbers on gene losses/gains between species presented in this paper should be taken with a grain of salt. This is one of the advantages of the PLoS One system--critiques can be appended directly rather than floating around unpublished or getting published in a minor journal. (Of course, the modal paper published by PLoS One probably doesn't get read closely enough to generate real critiques.)

Addendum: a reader points out that overestimating the number of gene extinctions by a factor of 100% is, in fact, not overestimating it at all (a factor of 100% is a factor of 1). Perhaps Dr. Duret should have written something like "a false positive rate of 100%", but I imagine everyone got his point.