Haplotype Maps and QTL
Murtaugh talks today about the upcoming effort to make a haplotype map of the human genome. I'd been meaning to talk about this as it's very close to my research interests, but I hadn't gotten around to it. What's a "haplotype map", you ask? Well, the basic idea is that stretches of DNA tend to segregate in clumps, and these such clumps are often large enough to contain several genes and/or markers at a time. Here's Science's description:
The idea for the HapMap emerged from the gradual realization that the genome has a surprisingly structured architecture. Rather than being thrown together randomly, thousands of DNA bases--as well as patterns of single-base variations found among them--line up in roughly the same order in many different people. Like an interior decorator debating among four kitchen designs, a person's genome has just one of a few potential blocks of DNA to slap into a defined space on a chromosome. Each DNA block--or kitchen design--is a haplotype.
To stretch the analogy, individual genes are analogous to the sink color or the floor tiling. You can't simply mix and match sinks and floors from the different kitchen designs - you have to pick one of the haplotypes and accept all the genes that come with it. Making a HapMap means identifying the blocks of DNA that can be slapped into the various points of the chromosome.
What's the point of all this? Well, making the HapMap will require large scale surveying of haplotype
diversity . Quoting Science again:
The first will be to create haplotype maps of the genomes of three populations: those of northern and western European ancestry, Japanese and Chinese, and Yorubans. In the second stage, scientists will test whether the haplotypes they find in those very large populations also appear in about 10 others.
In other words, if you think of the human genome project as a massive effort to provide a "
first order" approximation to human sequence space, the HapMap will be a massive effort to provide a "
second order" approximation to human sequence space.
How is this useful? Suppose we want to describe the sequence of a randomly selected Joe. If you're limited to describing Joe's sequence with a single string, you'd give the consensus human genome sequence. If you can afford to be more accurate than that, you'll start figuring out which haplotypes are most common in Joe's population group, and give the haplotype distribution instead. A higher degree of accuracy would of course be to sequence Joe's genome
de novo , but that's not yet cost effective.
We can thus see that while the consensus
human genome sequence is an approximation of what we have in common,
the HapMap is fundamentally about finding the genetic roots of human differences. Yes, it may be useful for curing diseases, but that will only be the beginning of the applications and not a major one at that. There is much dispute over whether combinations of common mutations cause disease or whether rare mutations are more likely to do so, but such disputes miss the forest for the trees.
The main haul of the HapMap will be a flood of data that will overwhelm those who would deny that significant genetic differences exist between humans. Even more importantly, it will provide an invaluable base of information for
those who would usher us into an age of reengineered humans.
Oh, and by the way, Charles - we won't need these techniques to find the genetic roots of IQ, though they will help. IQ, being a quantifiable variable, can be studied with
QTL analysis, as I've detailed in the past.