Gene Expression

Sunday, November 08, 2009

The quest for common variants & cognition posted by Razib @ 11/08/2009 11:39:00 AM

A genome-wide study of common SNPs and CNVs in cognitive performance in the CANTAB:

Psychiatric disorders such as schizophrenia are commonly accompanied by cognitive impairments that are treatment resistant and crucial to functional outcome. There has been great interest in studying cognitive measures as endophenotypes for psychiatric disorders, with the hope that their genetic basis will be clearer. To investigate this, we performed a genome-wide association study involving 11 cognitive phenotypes from the Cambridge Neuropsychological Test Automated Battery. We showed these measures to be heritable by comparing the correlation in 100 monozygotic and 100 dizygotic twin pairs. The full battery was tested in 750 subjects, and for spatial and verbal recognition memory, we investigated a further 500 individuals to search for smaller genetic effects. We were unable to find any genome-wide significant associations with either SNPs or common copy number variants. Nor could we formally replicate any polymorphism that has been previously associated with cognition, although we found a weak signal of lower than expected P-values for variants in a set of 10 candidate genes. We additionally investigated SNPs in genomic loci that have been shown to harbor rare variants that associate with neuropsychiatric disorders, to see if they showed any suggestion of association when considered as a separate set. Only NRXN1 showed evidence of significant association with cognition. These results suggest that common genetic variation does not strongly influence cognition in healthy subjects and that cognitive measures do not represent a more tractable genetic trait than clinical endpoints such as schizophrenia. We discuss a possible role for rare variation in cognitive genomics.

David Goldstein is one of the authors. I wonder if this influenced his views on the evolution of intelligence.

Labels: Association, IQ, Population genetics

Wednesday, August 27, 2008

Why are Finns anxious? posted by Razib @ 8/27/2008 12:44:00 PM

An Association Analysis of Murine Anxiety Genes in Humans Implicates Novel Candidate Genes for Anxiety Disorders:

Specific alleles and haplotypes of six of the examined genes revealed some evidence for association (p ≤ .01). The most significant evidence for association with different anxiety disorder subtypes were: p = .0009 with ALAD (δ-aminolevulinate dehydratase) in social phobia, p = .009 with DYNLL2 (dynein light chain 2) in generalized anxiety disorder, and p = .004 with PSAP (prosaposin) in panic disorder.

ScienceDaily:

Furthermore, the team's international collaborators in Spain and the United States are trying to replicate these findings in their anxiety disorder datasets to see whether the genes identified by Finnish scientists predispose to anxiety disorders in other populations as well. Only by replicating the results firm conclusions can be drawn about the role of these genes in the predisposition to anxiety in more general.

Haplotter shows selection around ALAD for Africans. PSAP is interesting:

This gene encodes a highly conserved glycoprotein which is a precursor for 4 cleavage products: saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease, Tay-Sachs disease, and metachromatic leukodystrophy....

Labels: Association, Finn baiting, Genomics

Thursday, January 31, 2008

Recombination week at Science posted by p-ter @ 1/31/2008 09:23:00 PM

It's like shark week, only better! Whet your appetite with "High-Resolution Mapping of Crossovers Reveals Extensive Variation in Fine-Scale Recombination Patterns Among Humans", then top it off with "Sequence Variants in the RNF212 Gene Associate with Genomewide Recombination Rate". Enjoy!

Labels: Association, Genetics, Population genetics

Thursday, August 16, 2007

Common disease, single gene? posted by p-ter @ 8/16/2007 06:33:00 PM

The prevailing consensus on "common" or "complex" diseases is that they result from the interplay of multiple genetic and environmental inputs. This is assumed largely because results from linkage studies were discouraging, which rules out a single, highly-penetrant gene as a possibility. There are many disease models where linkage studies could fail, however; a model involving multiple factors just seems to make more sense than the alternatives (and has been confirmed for a growing number of diseases). But consider this paper on a certain type of glaucoma:

Glaucoma is a leading cause of irreversible blindness. A genome-wide search yielded multiple SNPs in the 15q24.1 region associated to glaucoma. Further investigation revealed that the association is confined to exfoliation glaucoma (XFG). Two non-synonymous SNPs in exon 1 of the gene LOXL1 explain the association and the data suggest that they confer risk to XFG mainly through exfoliation syndrome (XFS). Approximately 25% of the general population is homozygous for the highest risk haplotype and their risk of suffering XFG is over 100 times that of those only carrying low-risk haplotypes.

Crunching the numbers on the table in the publication suggests that, in the population over 60, people carrying the allele have a 22% chance of developing the disease, while people without it have about a 1.5% chance (the difference between my calculations and those in the abstract is that I'm considering presence or absence of the allele, rather than the genotype, and our numbers on the prevalence of the disease might be slightly different--I used 15%). That's a huge effect, and the authors claim that the population attributable risk of the risk haplotype is almost 100%-- that is, if the haplotype were to disappear from the population (which is obviously not a real possibility), the disease would as well.

Labels: Association, Genetics

Monday, July 30, 2007

GWA for multiple sclerosis posted by p-ter @ 7/30/2007 09:34:00 PM

The latest phenotype to get the scrutiny of a genome-wide association study is multiple sclerosis: three separate reports (ok, only one of them is genome-wide) point to variation in various immune system genes as predisposing to the disease. The effects of one of the variants seems to be non-additive-- one group reports that the heterozygotes for the "causal" allele seem to actually be protected, while the homozygotes have a higher risk.

There are a number of reason why this could be the case-- linkage disequilibrium patterns and the existence of multiple predisposing alleles can lead to odd patterns of risk, even flipping the apparent effect in some cases. Another possibility, of course, is that there's some interesting biology there. More research, as they say, is needed.

Labels: Association, Genetics

Monday, July 23, 2007

The genetics of HIV infection posted by p-ter @ 7/23/2007 08:16:00 PM

AIDS is obviously not a genetic disease-- if one were to make a list of risk factors predisposing to HIV infection, genetics would be a pretty low-ranking member (though still present on the list, of course). Yet genetics is still a useful tool for understanding the disease, as evidenced by this paper:

Understanding why some people establish and maintain effective control of HIV-1 and others do not is a priority in the effort to develop new treatments for HIV/AIDS. Using a whole-genome association strategy we identified polymorphisms that explain nearly 15% of the variation among individuals in viral load during the asymptomatic set point period of infection. One of these is found within an endogenous retroviral element and is associated with major histocompatibility allele HLA-B*5701, while a second is located near the HLA-C gene. An additional analysis of the time to HIV disease progression implicated a third locus encoding a RNA polymerase subunit. These findings emphasize the importance of studying human genetic variation as a guide to combating infectious agents.

People trained to think about disease from a "public health" (read: short-term) standpoint might be a little appalled by the amount of money spent on a study like this-- those hundreds of thousands of dollars could very well have been spent on far more effective ways to reduce AIDS prevalence.

However, the goal here is more long-term-- understanding the variation in how humans interact with pathogens will lead to more effective drug targeting and greater understanding of immunity down the road. Genome-wide association studies also, given the fact that they are largely hypothesis-free, also provide a way to generate novel hypotheses (or confirm old ones) about disease aetiology. In this study, for example, one of the major signals lies in an endogenous retrovirus-- that is, a virus that has incorporated itself into the genome. This raises the intriguing possibility that some of our immune response is mediated by viruses that previously spliced themselves into the genome (the authors mention that antisense transcripts would be a very plausible mechanism by which that could work).

The genetics of any phenotype you can think of will eventually be mapped, and this information will be useful not necessarily for its predictive value (though in some cases that will be the case), but also for the basic understanding of the phenotype that it carries with it. This site sometimes sees speculation as to the causes of variation in sexual orientation, for example-- genetic studies (assuming they're carried out) will severely restrict the plausible "hypothesis space" on that question.

Labels: Association, disease, Genetics

Sunday, July 22, 2007

Reporting genome-wide association studies posted by p-ter @ 7/22/2007 08:56:00 PM

RPM points to a post from Mark Lieberman at Language Log on the reporting of genome-wide association (GWA) studies. His request (for the popular press; these things are always in the actual paper): report the allele frequency of the associated allele in cases, as well as the frequency in controls. I've often argued that people will get used to the complexity of "complex" disease once they're able to say "Oh, I know several people with the 'diabetes gene' that never got diabetes", but this is a more proactive measure towards that end, and I think it's a great idea.

Lieberman includes the numbers for the "restless leg syndrome"-associated allele I mentioned recently; the other disease I mentioned in that post was gallstone disease, in which the associated allele has a frequency of 10% in cases and 5% in controls. I'll try to remember to report those numbers every time I mention a GWA study from now on.

ADDENDUM: I also like the suggestion from the comments that posterior probabilities be given, as they are sometimes more intuitive. That is, if A is the disease allele and D is disease status, P(A|D) is less interesting, in some sense, than P(D|A). Unforunately, disease prevalances aren't always well-defined, and P(D) is necessary for the calculation. For gallstone disease, P(D) is about 0.15, so P(D|A) = P(A|D)P(D)/[P(A|D)P(D) + P(A|!D)P(!D)] = 26%. The corresponding posterior probability P(!D|A) is 74%. So someone carrying the A allele has a 26% chance of developing the disease, and a 74% chance of staying healthy.

Labels: Association, Genetics

Thursday, July 19, 2007

The continuing success of genome-wide association studies posted by p-ter @ 7/19/2007 08:05:00 PM

The first wave of genome-wide association studies has largely been confined to "big-name" diseases-- things like diabetes, heart disease, breast cancer, etc. There's a financial reason behind this, of course-- funding agencies like the NIH are most interested in diseases of major public health import, as are companies like DeCode. But in the next few years, there's no doubt it will become clear that any phenotype is amenable to this sort of genetic dissection. Genome-wide association studies are an important new tool in the biologist's toolkit, and it's worth noting that genetic data is (or will be) much richer in humans than in any other organism.

A couple new papers add towards what could eventually be a detailed understanding of the genetics of human phenotypic variation: first, a genome-wide association study of "restless leg syndrome", an ill-defined, heterogeneous disorder. The authors describe it thusly:

Nightwalkers, as individuals with RLS call themselves, are forced to move their legs during periods of rest especially in the evening and night to relieve uncomfortable or painful sensations in the deep calf. This diurnal variation leads to impaired sleep onset, and the periodic leg movements during sleep in the majority of patients contribute to sleep disruption and a reduced quality of life as a major consequence

The association study identified three risk factors near genes about which, as tends to be the case in these studies, very little is known. However, these are leads that will be immediately followed up, and likely with great impact.

Second, see this GWA study of "gallstone disease". Like restless leg syndrome (which has a prevalence of around 2-3%), this is hardly a rare phenotype-- the authors give the prevalence as between 10 and 20% of the population in industrialized nations. Though characterized as "diseases", both of these phenotypes lie within the range of normal human variation.

As prices on this sort of technology drops, the most interesting results will not necessarily come from the big genome centers, but rather from the people that choose to study interesting phenotypes. There's plenty of low-hanging fruit to be picked...

Labels: Association, Genetics

Thursday, June 28, 2007

Genome-wide association study for breast cancer susceptibility posted by p-ter @ 6/28/2007 08:42:00 PM

Another genome-wide association study:

Breast cancer exhibits familial aggregation, consistent with variation in genetic susceptibility to the disease. Known susceptibility genes account for less than 25% of the familial risk of breast cancer, and the residual genetic variance is likely to be due to variants conferring more moderate risks. To identify further susceptibility alleles, we conducted a two-stage genome-wide association study in 4,398 breast cancer cases and 4,316 controls, followed by a third stage in which 30 single nucleotide polymorphisms (SNPs) were tested for confirmation in 21,860 cases and 22,578 controls [!!!] from 22 studies... SNPs in five novel independent loci exhibited strong and consistent evidence of association with breast cancer (P < 10-7). Four of these contain plausible causative genes (FGFR2, TNRC9, MAP3K1 and LSP1).

These alleles are of small effect and in areas of the genome about which little is known, again showing that genome-wide association studies are a powerful way of opening up research into novel biology.

Labels: Association, Genetics

Friday, June 08, 2007

Genome-wide association for heart disease posted by p-ter @ 6/08/2007 03:20:00 PM

Following on the heels of the results of the Welcome Trust Case-Control Consortium are two papers reporting associations between risk of coronary heart disease/heart attack and an allele on chromosome 9. Like many of the signals coming out of these recent genome-wide studies, the polymorphism implicated is not in a gene, and no functional effect is immediately obvious; this will take a while to sort out.

This is another promising result, though. The nay-sayers who argued against the HapMap and whole-genome association are finding themselves proved very wrong. As is often the case in science, the Luddite position was not tenable-- technology opens all sorts of doors for biology, and a paper from 1996 (which was for essentially science fiction at the time) has proved quite prescient.

Labels: Association, Genetics

Wednesday, June 06, 2007

Genome-wide association studies in the UK posted by p-ter @ 6/06/2007 05:59:00 PM

Results from the most ambitious (and expensive) set of genome-wide association studies for common diseases were published today in Nature (open access! You can read it for free!). Funded by the Welcome Trust in the UK, a "dream team" of clinical geneticists and statisticians assembled a common set of 3000 controls to compare genetically to around 2000 cases each of Type I diabetes, Type II diabetes, arthritis, cardiovascular disease, Crohn's disease, bipolar disorder, and hypertension.

This study is being trumpeted as a major success, and to some extent, it is-- for all diseases except hypertension, at least one strong signal and many weaker signals were identified. As correlational studies are largely hypothesis-generating, some of these will lead to major discoveries about the pathology of disease. In Crohn's disease, for example, the consortium has found a couple loci involved in autophagy and the elimination of intracellular bacteria. They also confirm the association of another locus involved in autophagy. It's easier for people working on a disease to focus on pathways that are already known to be involved in the disease (for example, there's a known autoimmune component to Crohn's disease); it often takes this kind of top-down study to jolt people out of complacency.

The consortium has also make publicly available an impressive suite of software, along with new algorithms for genotype calling and mutlti-locus association, incorporating information from the HapMap. These tools are certainly at the cutting edge, and represent major advances in their own right.

On the other hand, one can't help but notice that the loci identify contribute only a fraction of the known genetic component of these diseases. This is a proof of principle-- the base has been laid; it's now feasable to scale these sorts of case-control studies up to tens of thousands of individuals. But is that really the most effective way at getting at the genetic basis of these diseases? Perhaps not.

A final comment-- I noted in the comments of a previous post that the big data sets used for population genetics these days are generated for medical reasons. There's a ton of population genetic information here, which the authors are likely going to make more use of the future. They do give us a glimpse, though-- they note a number of genomic regions that show marked geographic variability within the UK (and note they limit themselves to self-identified "white Europeans"):

Thirteen genomic regions showing strong geographical variation are listed in Table 1, and Supplementary Fig. 7 shows the way in which their allele frequencies vary geographically. The predominant pattern is variation along a NW/SE axis. The most likely cause for these marked geographical differences is natural selection, most plausibly in populations ancestral to those now in the UK. Variation due to selection has previously been implicated at LCT (lactase) and major histocompatibility complex (MHC), and within-UK differentiation at 4p14 has been found independently, but others seem to be new findings. All but three of the regions contain known genes. Aside from evolutionary interest, genes showing evidence of natural selection are particularly interesting for the biology of traits such as infectious diseases; possible targets for selection include NADSYN1 (NAD synthetase 1) at 11q13, which could have a role in prevention of pellagra, as well as TLR1 (toll-like receptor 1) at 4p14, for which a role in the biology of tuberculosis and leprosy has been suggested.

Labels: Association, Genetics, Population genetics

Wednesday, May 16, 2007

Genetics of obesity posted by p-ter @ 5/16/2007 02:45:00 PM

I realize I'm sort of beating a dead horse by reporting every single high-profile genome-wide association scan (for example), but it's worth pointing out their successes, as there was serious opposition to the HapMap project that laid the groundwork for these studies. So in that spirit, I'll point out this paper, which identifies a common variant in the FTO gene as being associated with obesity:

An additive association of the variant with BMI was replicated in 13 cohorts with 38,759 participants. The 16% of adults who are homozygous for the risk allele weighed about 3 kilograms more and had 1.67-fold increased odds of obesity when compared with those not inheriting a risk allele. This association was observed from age 7 years upward and reflects a specific increase in fat mass.

One of the most important points about genome-wide association studies is that they're (more or less) unbiased-- that is, you don't have to think about which genes could be involved in the phenotype before studying it. Some people consider this a liability, some a blessing. I'm in the latter group, as a strong signal in a genome-wide association can in some cases lead to new candidate genes, new hypotheses and expose interesting biology. This is precisely one of those cases. Here's what's known about the gene identified in this study:

FTO is a gene of unknown function in an unknown pathway that was originally cloned as a result of the identification of a fused-toe (Ft) mutant mouse that results from a 1.6-Mb deletion of mouse chromosome 8. Three genes of unknown function (Fts, Ftm and Fto), along with three members of the Iroquois gene family (Irx3, Irx5, and Irx6 from the IrxB gene cluster), are deleted in Ft mice. The homozygous Ft mouse is embryonically lethal and shows abnormal development, including left/right asymmetry. Heterozygous animals survive and are characterized by fused toes on the forelimbs and thymic hyperplasia but have not been reported to have altered body weight or adiposity. The fused-toe mutant is a poor model for studying the role of altered Fto activity, because multiple genes are deleted. Neither isolated inactivation nor overexpression of Fto has been described.

So essentially, nothing is known about this gene. Thanks to this study, this is unlikely to be the case for long.

Labels: Association, Medicine

Tuesday, February 27, 2007

A note on the Common Disease-Common Variant debate posted by p-ter @ 2/27/2007 05:19:00 PM

One of the more heated debates in human medical genetics in the last decade or so has been centered around the Common Disease-Common Variant (CDCV) hypothesis. As the name implies, the hypothesis posits that genetic susceptibility to common diseases like hypertension and diabetes is largely due to alleles which have moderate frequency in the population. The competing hypothesis, also cleverly named, is the Common Disease-Rare Variant (CDRV) hypothesis, which suggests that multiple rare variants underlie susceptibility to such diseases. As different techniques must be used to find common versus rare alleles, this debate would seem to have major implications for the field. Indeed, the major proponents of the CDCV hypothesis were the movers and shakers beind the HapMap, a resource for the design of large-scale association studies (which are effective at finding common variants, much less so for rare variants).

However, CDCV versus CDRV is an utterly false dichotomy, as I'll explain below. This point has slipped past many of the human geneticists who actually do the work of mapping disease genes, and I feel the problem is this: essentially, geneticists are looking for a gene or the gene, so they naturally want to know whether to take an approach that will be the best for finding common variants or one for finding rare variants. However, common diseases do not follow simple Mendelian patterns-- there are multiple genes that influence these traits, and the frequencies of these alleles has a distribution. A decent null hypothesis, then, is to assume that the the frequencies of alleles underlying a complex phenotype is essentially the same as the overall distribution of allele frequencies in the population-- that is, many rare variants and some common variants.

This argument would seem to favor the CDRV hypothesis. Not so. The key concept for explaining why is one borrowed from epidemiology called the population attributable risk--essentially, the number of cases in a population that can be attributed to a given risk factor. An example: imgaine smoking cigarettes gives you a 5% chance of developing lung cancer, while working in an asbestos factory gives you a 70% chance. You might argue that working in an asbestos factory is a more important risk factor than cigarette smoking, and you would be correct--on an individual level. On a population level, though, you have to take into account the fact that millions more people smoke than work in asbestos factories. If everyone stopped smoking tomorrow, the number of lung cancer cases would drop precipitously. But if all asbestos factory workers quit tomorrow, the effect on the population level of lung cancer would be minimal. So you can see where I'm going with this: common susceptibility alleles contribute disproportinately to the population attributable risk for a disease. In type II diabetes, for example, a single variant with a rather small effect but a moderate frequency accounts for 21% of all cases[cite].

So am I then arguing in favor of the CDCV hypotheis? Of course not-- rare variants, aside from being predictive for disease in some individuals, also give important insight into the biology of the disease. But it is possible right now, using genome-wide SNP arrays and databases like the HapMap, to search the entire genome for common variants that contribute to disease. This is an essential step--finding the alleles that contribute disproportionately to the population-level risk for a disease. Eventually, the cost of sequencing will drop to a point where rare variants can also be assayed on a genome-wide, high-throughput scale, but that's not the case yet. Once it is, expect the CDRV hypothesis to be trumpted as right all along.

Labels: Association, disease, Genetics, Statistics

Thursday, February 15, 2007

All diabetes, all the time posted by p-ter @ 2/15/2007 06:01:00 PM

Keeping with the diabetes theme, the first genome-wide association study of Type II diabetes has been published, and it's extraordinarily promising. Besides picking up the oft-replicated TCF7L2 gene mentioned before, they pick up three other loci, including finding a non-synonymous mutations in a zinc transporter. That's notable because 1. non-synonymous mutations clearly can have phenotypic effects (there's no wondering, could this really do something?), and 2. drug targeting of zinc transport is feasible (TCF7L2 is a transcription factor, and when you start playing with transcription factors you risk messing with a lot of pathways). The news article accompanying this study has some good perspective:

In 1918, Ronald Aylmer Fisher, an evolutionary biologist and pioneer of modern statistics, published a paper on the genetic causes of disease that brought together two rival factions. Geneticists promoted a paradigm in which diseases worked a lot like Mendel's pea plants, with just one or two genes responsible for each condition. Biometricians, however, advocated a continuous distribution of phenotypes. Fisher suggested that many mendelian traits could result in the continuous distribution of a disease. In doing so, he established the conceptual basis for the search for complex disease genes that continues today.

But Fisher's theories had a more immediate impact on animals and agriculture than on medicine — in people, it's much easier to study and measure mendelian diseases and traits. Even the much-heralded Human Genome Project in the 1990s didn't help as much as expected.
...
It has taken time for big GWA studies to be completed. "Many people didn't know how much association studies would deliver," says Peter Donnelly, a lead investigator of the Wellcome Trust Case Control Consortium, which began collecting samples for GWA studies in 2005.

Yet new results, including a study on type 2 diabetes published this week, suggest that the GWA approach will bear fruit, and lots of it....Modern biology may finally have begun to bring technological and scientific rigour to Fisher's decades-old insights.

Labels: Association, Diabetes, disease, Genetics

Thrifty genotype, again and again posted by p-ter @ 2/15/2007 05:17:00 PM

Speaking of the thrifty genotype hypothesis, a new paper from the cats at deCODE Genetics takes an in depth look at one of the loci consistently implicated in Type II diabetes. According to the authors, the succeptibility allele is ancestral, and the other, non-ancestral allele shows signs of being under recent positive selection in all the populations studied. Even more interestingly, the protective allele is associated with decreases in levels of circulating ghrelin (a hormone that increases appetite) and increases in levels of circulating leptin (a hormone that decreases appetite). This would seem, by my reckoning, to be consistent with the thrifty genotype hypothesis. In addition,

We obtained rough age estimates for HapA [the protective allele] based on its recombination history: 11,933, 8,401 and 4,051 years for the CEU, East Asian and YRI HapMap groups, respectively. Although tentative, these ages coincide broadly with the onset of agriculture in the three geographic regions represented by the HapMap groups.

On the other hand, the succeptibility allele is associated with decreased BMI after controlling for diabetic status, though I'm not sure that has any bearing on the hypothesis.

The authors conclude, bizarrely, "we note our findings contradict a key prediction of the thrifty-genotype hypothesis, insofar as HapBT2D, a major risk factor for type 2 diabetes, is negatively associated with BMI and is not the variant that contributed to adaptive evolution in the recent past."

Huh?

I can only conclude, based on that statement, that the authors aren't really clear on what the thrifty genotype hypothesis is. The original Neel paper (which is cited in this paper, so the authors have hopefully read it) makes a few simple claims, the most important of which is that the "diabetic genotype" was favorable up until the transition from the hunter-gatherer lifestyle to agriculture. It certainly does not claim that a diabetes-causing allele should be under recent positive selection, nor am I sure how anyone could get that impression. I'm inclined to take the exact opposite conclusion from this paper than the authors--that is, this data seems to support, rather than contradict, a key prediction of the thrifty genotype hypothesis, insofar as the ancestral allele leads to succeptibility, and the derived allele, which arose at about the time of agriculture, mat be associated with reduced appetite.

Labels: Association, Diabetes, disease, Genetics