Gene Expression: Positive selection in regulatory sequences

Front page

Saturday, August 18, 2007

Positive selection in regulatory sequences posted by p-ter @ 8/18/2007 07:11:00 PM

As many of our readers are aware, humans and chimpanzees are rather different, and have diverged considerably since we last shared a common ancestor a few million years ago. An interesting question in evolutionary biology is: what the hell happened? What makes us so different? Comparing the consensus genome sequences of humans and chimps shows millions and millions of base pairs that have changed; which of those are functionally relevant?

One of the hypotheses that has been "rediscovered" in the last few years is that the important changes should be involved in gene regulation, rather than protein sequence itself (this is of course from the classic King and Wilson paper). A new paper takes a look at putative regulatory regions at a number of genes, and concludes that, yes, some of them have been under positive selection since the divergence of humans and chimps. Further, they run a quick and dirty analysis on the functions of the genes whose regulation appear to be under positive selection, and see some enrichment in neural and nutrition-related categories. From this, they conclude, "the present survey...suggests that human cognitive, behavioral and dietary adaptations have arisen primarily through changes in cis-regulatory sequences." Meh.

That could, of course, be true. But frankly, the input of this paper didn't really change the weight I put on that possibility at all (that is, I'd estimate the Bayes factor of this paper at about zero). Here's why:

1. I'm starting to become somewhat ill-at-ease with tests for selection that are based on estimating evolutionary substitution rates (in this paper, they compare rates at promoters with those in introns). What exactly do tests like these detect? Ultimately, this is a population genetics question-- how many new positively selected alleles have to arise and become fixed for this test to be significant? And why would you expect that many are necessary for phenotypic change (as opposed to, say, one really well-placed substitution)? For example, a recent paper showed that three regulatory substitutions completely accounted for a major morphological difference between two Drosophila species. Is that enough for a significant result in a test like this? If not, what kind of functional biases are being introduced into the results?

2. I'm always ill-at-ease with results based on enrichment in gene ontology categories. In this case, the most significant result is in "protein folding", which is neither a neural nor nutrition-related category. The next ones: 0.01 p-values for "other neuronal activity" and "neurogenesis", and a 0.02 p-value for "neuronal activities". The categories are non-disjoint (presumably "neuronal activities" and "other neuronal activity" share many of the same genes, for example), so are the enrichment results due to a few genes that fall in every neural-related category? And how can one conclude, based on this evidence, that "human cognitive, behavioral and dietary adaptations have arisen primarily through changes in cis-regulatory sequences" ? Given what we're working with, that's still pretty much speculation.

So yes, changes in regulatory sequences seem to be important in driving the divergence of humans from chimps (though not necessarily to the exclusion of other changes). But the precise changes, and the traits they involve, are certainly still up in the air.

Labels: Evolution, Genetics