Wednesday, July 26, 2006

What is the soundtrack of our genome?   posted by rosko @ 7/26/2006 05:19:00 PM

Okay, this is my first post, and I must admit it has no interesting science news in it. However, I think many on here will find this funny.

A few days ago I took a look at Mendel's Garden #3, and among the featured posts is a discussion of the work of Japanese biologist Susumu Ohno. He took a part of the gene encoding the large subunit of RNA polymerase II and converted it into music, considering both the base sequence itself and the properties (size and charge) of the encoded amino acids. He thought that this piece sounded like a Chopin nocturne, so he took the nocturne and "reverse translated" it to a DNA sequence. He proceeded to demonstrate that this sequence contains a 160-codon open reading frame (see this site for more), and went on to make lots of philosophical speculations about how DNA sequences and music evolve in the same manner. The probability that any given sequence of 160 base triples would start with a start codon and not contain stop codons is a little less than 1/130,000. However, this could be artificially raised by many orders of magnitude by assigning the start and stop codons to sequences of notes that are very frequent and rare, respectively, in the nocturne, which shouldn't be difficult to find with the right software.

Perhaps most interesting is the musician Colin Angus of the group The Shamen, who teamed up with biologist Ross King to create the piece "S2 Translation" containing the full sequence of a serotonin receptor. The program they used for this, called ProteinMusic, is available as a free download. I got the program and tried some random gene sequences, making sure to trim off any bases before the start codon (ProteinMusic doesn't do this automatically). The program went straight through the stop codon at the end of the transcript, calling it "Z". I could not hear any difference in the sound between the actual polypeptide and the 3' UTR. The poly(A) tail was easy to recognize because of its repetitiveness, but that's about it. Someone commented that

"It may be possible for somebody who has heard the pattern of a calcium-binding site or an enzyme active site to recognize its occurrence in a novel protein."

Yeah right. I doubt 1% of bioinformatics scientists could identify the seven transmembrane helices in that serotonin receptor by ear, something that is typically easy to do by eye using hydropathy plots. This isn't to say that the idea of turning DNA sequences into music isn't neat in a purely fun sense, just that it doesn't do anything for science except maybe increase popularity.