Thursday, February 08, 2007

When is a synonymous sequence not synonymous?   posted by rosko @ 2/08/2007 06:28:00 PM

...when it specifies a different fold! At least that's the latest word from Kimchi-Sarfaty et. al., who reported in the Jan 26 issue of Science that a supposedly "silent" mutation in the multidrug resistance gene MDR1 changes the function of the encoded protein. If the authors' conclusions are correct, this result could radically expand the number of gene variants that have the potential to influence phenotypes, and that's why I chose to post this one on GNXP. To understand the meaning of this, you have to know something about the genetic code and the subject of protein folding.

As I'm sure most of you know, the amino acid sequence of a protein is specified precisely by the sequence of bases in the DNA of its gene. Each amino acid is specified by three consecutive bases, and since there four possible bases, there are 4^3=64 possible base triples (called codons). Considering that there are 20 amino acids found in proteins, and three codons that act as stop signs, that means there are on average three codons per amino acid. These duplicate codons often differ by a single base, which means that many point mutations even within the coding region of a gene will not alter the protein sequence. Different organisms have clear preferences for what codons they use to specify each amino acid, and while any known life form can recognize them all, the preferred ones are read a little faster (the mechanism for this involves the abundance of the tRNAs that actually do the recognition, though that doesn't matter for our discussion). It is a piece of cake to determine whether any two variations on a common gene sequence, for example, one from a healthy person and another from someone with a disease, will encode the same exact protein. Ones that do are routinely dismissed as being "silent", i.e. unable to cause a new phenotype, except in cases where the change is in a regulatory region that controls expression of the gene, splicing, etc.

That brings us to the question of protein folding. It is generally accepted that a protein's function is intimately related to the three-dimensional shape into which it folds, and that this shape is determined by the sequence of amino acids from which it is made up. Therefore, researchers have assumed (and typically rightfully so) that no matter how you make a given sequence of amino acids, under typical cellular conditions it will fold up into its proper shape and start functioning. The prevailing view also holds that the correct three-dimensional shape is formed because that one has the overall lowest (free) energy of any possible shape.

This energetic criterion cannot be the whole story, because a quick calculation (see Levinthal's Paradox) shows that a protein could never "visit" all possible shapes in any reasonable amount of time. Therefore, most people studying protein folding believe that protein sequences are optimized not only to dictate the correct fold but to specify a "roadmap" of how to get there, by setting up fast initial interactions among the atoms that "push" the folding in the right direction. However, there has been no good evidence that the final state isn't a unique energy minimum, regardless of how complex the journey to get there.

Kimchi-Sarfaty et. al. looked at a transmembrane pump called P-glycoprotein (P-gp) that is well-known in the cancer research community for being something that cancer cells turn on to expel the kind of toxic drugs we throw at them. P-gp tends to grab large, greasy molecules, without regard to their precise structure, and use ATP to power their expulsion from the cell, thereby allowing the cell to evade their effects. Several variants of the MDR1 gene, which encodes P-gp, can be seen to specify the exact same sequence of amino acids, except that one of these is encoded by a slightly different set of three bases in each variant. However, the change converts a codon that is common in mammalian cells to one that is rarer, and as mentioned above, the rarer one would be expected to create a slight pause at that point in the synthesis of the protein.

The very surprising result is that the proteins produced from the different variants show a bunch of functional differences, including the affinity with which they bind several drugs, even though they are expressed at the same level and their amino acid sequences were verified to be identical. The authors conclude that the pause induced by the rare codon causes the protein folding (which starts long before the protein is finished) to follow a different pathway. This implies that the rate at which a protein is allowed to fold, and not just the set of final states, is crucial for determining the final shape. For the geneticist, the take-home message here is that a change in a coding sequence should not be dismissed as a possible cause of a phenotypic trait just because it doesn't alter the protein sequence.