Thursday, September 21, 2006

Is intergenic expression functional?   posted by JP @ 9/21/2006 06:42:00 PM
Share/Bookmark

Both Coffee Mug and I have made a big deal out of the fact that a large percentage of the genome in eukaryotes is transcribed into RNA. In the comments, however, gc has been skeptical of inferring much from this fact, noting that the transcription machinery of the cell is inherently "leaky". A new paper adds some fresh data to the mix.

Here's what they did: in five chimps and five humans, the researchers isolated four different tissues and measured the expression levels (using a tiling array) of a number of overlapping 25-bp segments in the ENCODE regions. These segments could then be divided into genic (i.e. protein coding) and intergenic regions.

There are a number of ways these data can be looked at to determine whether the expression of different classes of sequence is functional. First, for a given tissue, one can determine the overlap between the probes expressed in chimpanzees and humans. If the percentage overlap in genic and intergenic regions is the same, that would support the hypothesis that the intergenic regions are acting similarly, in an evolutionary sense, to genic regions, and are possibly functional.

Further, previous studies have shown that gene expression in the brain is highly conserved between humans and chimps, implying negative selection, while gene expression in the testis is highly diverged, implying positive selection. If intergenic probes show the same pattern, it would suggest that the same forces are acting on intergenic transcription. And as natural selection is not expected to play a role in determining the expression of non-functional transcripts, this would then suggest the intergenic transcription is functional.

On doing both of these tests, the authors found that intergenic transcription does indeed behave like genic transcription, giving strong support for the functionality of these transcripts. Previously, gc has raised the point that expression is a continuous, rather then a binary, variable, and indeed, among the top 5% of expressed probes, there are many more genic regions, but apart from that, the raw number of genic and non-genic probes expressed at a given intensity seem pretty similar (leading to an overrepresentation of genic regions, but still...). In my opinion, this paper gives hope to those who think that the "dark matter" of the genome plays an important role in phenotypes and that leaky transcription isn't going to be the dominant story coming out of the ENCODE data.