Wednesday, May 10, 2006
I recently returned from the European Human Genetics Meeting in Amsterdam, and plan on writing a few posts on some of the presentations that might be of interest to your average GNXP reader. So here goes:
With the completion of the sequencing of a human genome, much attention has turned to sorting out what the hell all of that DNA stuff actually does. Given that no one really knew how to figure that out, new technologies were needed. The ENCODE Project, which is currently ongoing, was designed to address that need. In the first stage, 1% of the genome was selected (some regions at random and some for specific reasons) as a sort of "testing ground" on which a huge number of groups are testing their computational and experimental methods for genome annotation. Once that 1% is annotated to everyone's content, the best methods will be scaled up to annotated the whole genome.
At the conference, a speaker from the ENCODE Genes and Transcripts Analysis Group gave a talk on what they're seeing so far. Here are a couple of his most attention-getting points:
1. In the ENCODE regions, about 90% of the sequence is transcribed (from one strand or the other). For comparison, the percentage of the human genome that codes for proteins is about 2 or 3%.
2. Most genes have alternate site where transcription can start, and of these, 30% are more that 100kb away from the actual gene. In many cases, this leads to the mixing and matching of exons from different genes.
I find those numbers absolutely mind-blowing (ok, ok, given what we see in yeast this shouldn't be that surprising. And I imagine some molecular biologist has known this for years, but this was a human genetics meeting-- we're a bit behind the curve). The "one gene - one enzyme" model is long gone, but it's still provides the general framework for how your average biologist thinks.
In the coming years, however, there will be a sea change in how we understand the genome, The simple notion of discrete "genes" bounded by "regulatory regions" has been useful, but it's always been just a simplification; it's time for a more realistic null hypothesis.