Gene Expression: Why are most genetic associations found through candidate gene studies wrong?

Front page

Sunday, June 21, 2009

Why are most genetic associations found through candidate gene studies wrong? posted by p-ter @ 6/21/2009 03:47:00 PM

In a recent post, I made a blanket statement that the vast majority of candidate gene association studies published in psychiatric genetics (actually, in nearly all fields of genetics) are wrong. I'm not just being offhandedly dismissive--below, I outline the statistical argument behind that claim. This discussion is cribbed almost verbatim from a discussion of the issue by statisticians at the Welcome Trust.

Let's assume that there are a finite number of loci in genome, and we test some number of those (in a genome-wide association study, this is on the order of 500K-1M; in a candidate gene study it's more likely in the tens. But the actual marker density is irrelevant for what follows) for association with some phenotype of interest. In general, the criterion used to decide if one has discovered a true association is the p-value, or the probability of seeing the data that you have given that there is no association. But that's not really the quantity you're interested in. The real quantity of interest is the probability that there's a true association given the data you see--the inverse of what's being reported.

By Bayes' Law, this probability depends on the prior probability of an association at that marker, the p-value threshold you've chosen to call a finding "significant", and crucially, the power you had to detect the association [1][2]. Thus, the interpretation of a given p-value depends on the power to detect an association, such that the lower your power, the lower the probability that a "significant" association is true [3].

That's where recent evidence from large genome-wide association studies comes into play. For nearly all diseases, reproducible associations have small effect size and are only detectable when one has sample sizes in the thousands or tens of thousands (for many psychiatric phenotypes, even studies with these sample sizes don't seem to find much). The vast majority of candidate gene association studies had sample sizes in the low hundreds, and thus had essentially zero power to detect the true associations. By the argument above, in this situation the probability that a "significant" association is real approaches zero. The problem with candidate gene association studies is not that they were only targeting candidate genes, per se, but rather that they tended to have small sample sizes and were woefully underpowered to detect true associations.

[1] Let D be the data, T be the event that an association is true, t, be the event that an association is not true, and P(T) be the prior probability that an association is true.

P(T|D) = P(D|T)P(T) / [ P(D|T) P(T) + P(D|t) (1-P(T) ]

P(D|T) is the power, and P(D|t) is the p-value. Clearly, both are relevant here.

[2] http://jnci.oxfordjournals.org/cgi/content/full/96/6/434#FD1

[3] As the authors note,

A key point from both perspectives is that interpreting the strength of evidence in an association study depends on the likely number of true associations, and the power to detect them which, in turn, depends on effect sizes and sample size. In a less-well-powered study it would be necessary to adopt more stringent thresholds to control the false-positive rate. Thus, when comparing two studies for a particular disease, with a hit with the same MAF and P value for association, the likelihood that this is a true positive will in general be greater for the study that is better powered, typically the larger study. In practice, smaller studies often employ less stringent P-value thresholds, which is precisely the opposite of what should occur.

Labels: Genetics

Haloscan Comments

Razib's Home Page
GNXP
Archives
Interviews
Blogroll

Principles of Population Genetics
Genetics of Populations
Molecular Evolution
Quantitative Genetics
Evolutionary Quantitative Genetics
Evolutionary Genetics
Evolution
Molecular Markers, Natural History, and Evolution
The Genetics of Human Populations
Genetics and Analysis of Quantitative Traits
Epistasis and Evolutionary Process
Evolutionary Human Genetics
Biometry
Mathematical Models in Biology
Speciation
Evolutionary Genetics: Case Studies and Concepts
Narrow Roads of Gene Land 1
Narrow Roads of Gene Land 2
Narrow Roads of Gene Land 3
Statistical Methods in Molecular Evolution
The History and Geography of Human Genes
Population Genetics and Microevolutionary Theory
Population Genetics, Molecular Evolution, and the Neutral Theory
Genetical Theory of Natural Selection
Evolution and the Genetics of Populations
Genetics and Origins of Species
Tempo and Mode in Evolution
Causes of Evolution
Evolution
The Great Human Diasporas
Bones, Stones and Molecules
Natural Selection and Social Theory
Journey of Man
Mapping Human History
The Seven Daughters of Eve
Evolution for Everyone
Why Sex Matters
Mother Nature
Grooming, Gossip, and the Evolution of Language
Genome
R.A. Fisher, the Life of a Scientist
Sewall Wright and Evolutionary Biology
Origins of Theoretical Population Genetics
A Reason for Everything
The Ancestor's Tale
Dragon Bone Hill
Endless Forms Most Beautiful
The Selfish Gene
Adaptation and Natural Selection
Nature via Nurture
The Symbolic Species
The Imitation Factor
The Red Queen
Out of Thin Air
Mutants
Evolutionary Dynamics
The Origin of Species
The Descent of Man
Age of Abundance
The Darwin Wars
The Evolutionists
The Creationists
Of Moths and Men
The Language Instinct
How We Decide
Predictably Irrational
The Black Swan
Fooled By Randomness
Descartes' Baby
Religion Explained
In Gods We Trust
Darwin's Cathedral
A Theory of Religion
The Meme Machine
Synaptic Self
The Mating Mind
A Separate Creation
The Number Sense
The 10,000 Year Explosion
The Math Gene
Explaining Culture
Origin and Evolution of Cultures
Dawn of Human Culture
The Origins of Virtue
Prehistory of the Mind
The Nurture Assumption
The Moral Animal
Born That Way
No Two Alike
Sociobiology
Survival of the Prettiest
The Blank Slate
The g Factor
The Origin Of The Mind
Unto Others
Defenders of the Truth
The Cultural Origins of Human Cognition
Before the Dawn
Behavioral Genetics in the Postgenomic Era
The Essential Difference
Geography of Thought
The Classical World
The Fall of the Roman Empire
The Fall of Rome
History of Rome
How Rome Fell
The Making of a Christian Aristoracy
The Rise of Western Christendom
Keepers of the Keys of Heaven
A History of the Byzantine State and Society
Europe After Rome
The Germanization of Early Medieval Christianity
The Barbarian Conversion
A History of Christianity
God's War
Infidels
Fourth Crusade and the Sack of Constantinople
The Sacred Chain
Divided by the Faith
Europe
The Reformation
Pursuit of Glory
Albion's Seed
1848
Postwar
From Plato to Nato
China: A New History
China in World History
Genghis Khan and the Making of the Modern World
Children of the Revolution
When Baghdad Ruled the Muslim World
The Great Arab Conquests
After Tamerlane
A History of Iran
The Horse, the Wheel, and Language
A World History
Guns, Germs, and Steel
The Human Web
Plagues and Peoples
1491
A Concise Economic History of the World
Power and Plenty
A Splendid Exchange
Contours of the World Economy 1-2030 AD
Knowledge and the Wealth of Nations
A Farewell to Alms
The Ascent of Money
The Great Divergence
Clash of Extremes
War and Peace and War
Historical Dynamics
The Age of Lincoln
The Great Upheaval
What Hath God Wrought
Freedom Just Around the Corner
Throes of Democracy
Grand New Party
A Beautiful Math
When Genius Failed
Catholicism and Freedom
American Judaism

Hello

Movable Type archives
August 11,2002
August 18,2002
August 25,2002
September 01,2002
September 15,2002
October 20,2002
December 08,2002
December 22,2002
December 29,2002
January 05,2003
January 12,2003
January 19,2003
January 26,2003
February 02,2003
February 09,2003
February 16,2003
February 23,2003
March 02,2003
March 09,2003
March 16,2003
March 23,2003
March 30,2003
April 06,2003
April 13,2003
April 20,2003
April 27,2003
May 04,2003
May 11,2003
May 18,2003
May 25,2003
June 01,2003
June 08,2003
June 15,2003
June 22,2003
June 29,2003
July 06,2003
July 13,2003
July 20,2003
July 27,2003
August 03,2003
August 10,2003
August 17,2003
August 24,2003
August 31,2003
September 07,2003
September 14,2003
September 21,2003
September 28,2003
October 05,2003
October 12,2003
October 19,2003
October 26,2003
November 02,2003
November 09,2003
November 16,2003
November 23,2003
November 30,2003
December 07,2003
December 14,2003
December 21,2003
December 28,2003
January 04,2004
January 11,2004
January 18,2004
January 25,2004
February 01,2004
February 08,2004
February 15,2004
February 22,2004
February 29,2004
March 07,2004
March 14,2004
March 21,2004
March 28,2004
April 04,2004
April 11,2004
April 18,2004
April 25,2004
May 02,2004
May 09,2004
May 16,2004
May 23,2004
May 30,2004
June 06,2004
June 13,2004
June 20,2004
June 27,2004
July 04,2004
July 11,2004
July 18,2004
July 25,2004
August 01,2004
August 08,2004
August 15,2004
August 22,2004
August 29,2004
September 05,2004
September 12,2004
September 19,2004
September 26,2004
October 03,2004
October 10,2004
October 17,2004
October 24,2004
October 31,2004
November 07,2004
November 14,2004
November 21,2004
November 28,2004
December 05,2004
December 12,2004
December 19,2004
December 26,2004
January 02,2005
January 09,2005
January 16,2005
January 23,2005
January 30,2005
February 06,2005
February 13,2005
February 20,2005
February 27,2005
March 06,2005
March 13,2005
March 20,2005
March 27,2005
April 03,2005
April 10,2005
April 17,2005
April 24,2005
May 01,2005
May 08,2005
May 15,2005
May 22,2005
May 29,2005
June 05,2005
June 12,2005
June 19,2005
June 26,2005
July 03,2005
July 17,2005
August 07,2005

Blogspot archives
June 2002
July 2002
August 2002
September 2002
October 2002
November 2002
December 2002