When looking for genes or alleles involved in a phenotype, especially “complex” phenotypes where many genetic factors are involved, the most powerful approach is often an association study– type a large number of variants in some cases and some controls (or just a bunch of people if you’re talking about a quantitive trait) and see if a certain variant is present more often in the cases than the controls.
As it’s currently impractical to sequence a large number of genomes (and thus genotype all the possible genetic variants in the genome), one has to pick and choose which sites to genotype. Nowadays, the markers that are used are Single Nucleotide Polymorphisms (SNPs). But even with chips that can now type 500,000 SNPs in parallel, there’s no way to genotype them all (there are an estimated 10 million SNPs in the genome). Plus, there are deletions, duplication, inversions, etc– there are plenty of ways that two genomes can differ.
Luckily, some SNPs are correlated with others, so that the genotype at one gives you information about the genotype at the other. So by typing 500,000 SNPs in a person, you could theoretically have information about the genotypes at millions of sites. The HapMap Project is an attempt to identify all the common SNPs in the genome and map the correlations between them.
But as I noted before, there are also stuctural polymorphisms in the genome. An important question is: are there correlations between SNPs and structural polymorphisms? If there aren’t, then doing an association study using only SNPs might miss a lot of the potential variation in the genome.
Previous studies have shown that common deletions and SNPs are correlated. This is encouraging. However, a new study takes a closer look at some areas where copy number polymorphisms (i.e. one person might have three copies of the region, another person only two) are common, and finds that the variation is not very closely correlated with SNPs. This is less encouraging, as it means that it would be possible to do a full-genome association scan without taking into account any copy number variation in the sample.
The authors present some possible reason for this lack of correlation, and I’m inclined to be optimistic– most of the copy number polymorphisms they identify are rare, while the SNPs in the HapMap (especially the first version, which they use) are generally common. The correlation coefficient being highly dependant on the frequencies of the variants, it’s possible that the newest version of the HapMap, or directed sequencing, could find better correlated SNPs.
That said, if it turns out that structural polymorphisms aren’t well-correlated with SNPs, it will be important to keep in mind– perhaps SNP-based studies could be supplemented with assays designed to detect structural polymorphism.