Saturday, January 05, 2008
In the comments on a previous post, I made reference to a paper by Demuth et al. on the evolution of gene families in mammals. As this was published in PLoS One, I took a look at the annotations. One of the comments by Laurent Duret brings up a potentially major issue--the authors use a database of gene families for their analysis, but don't try to test how exhaustive the database is. He continues:
As a control for the reliability of their analyses I looked at the 49 gene families that were considered as having been lost in the human lineage ("extinctions" in their Table 2). I retrieved in the supplementary Table S2 all the gene families that contain at least one chimp sequence and one non-primate sequence but no human sequence. These 49 gene families are all represented by a single gene in chimp...Then I extracted the corresponding chimp protein from Ensembl release 41 using BioMart...The 49 chimp genes correspond to 77 proteins (some genes encode alternative splice variants). Then I downloaded all human proteins annotated in Ensembl release 41...Finally, I BLASTed the 77 chimp proteins against the human proteome (Ensembl release 41): each of these chimp proteins has a very strong match in human : average identity (at the protein level) = 99%; minimum = 86%. Thus, none of these 49 gene families has been lost in the human lineage.I think it's fair to say that any of the numbers on gene losses/gains between species presented in this paper should be taken with a grain of salt. This is one of the advantages of the PLoS One system--critiques can be appended directly rather than floating around unpublished or getting published in a minor journal. (Of course, the modal paper published by PLoS One probably doesn't get read closely enough to generate real critiques.) Addendum: a reader points out that overestimating the number of gene extinctions by a factor of 100% is, in fact, not overestimating it at all (a factor of 100% is a factor of 1). Perhaps Dr. Duret should have written something like "a false positive rate of 100%", but I imagine everyone got his point. Labels: Genetics |