Like an Old Testament prophet of yore Graham Coop has been prophesying that cryptic population stratification may be a major confounder in analyses for as long as I’ve known him with any degree of familiarity. So it’s no surprise he’s an author on one of two preprints which have rocked the genomics world:
Reduced signal for polygenic adaptation of height in UK Biobank:
There is considerable variation in average height across European populations, with individuals in the northwest being taller, on average, than those in the southeast. During the past six years, a series of papers reported that polygenic scores for height also show a north to south gradient, and that this cline results from natural selection. These polygenic analyses relied on external estimates of SNP effects on height, taken from the GIANT consortium and from smaller replication studies. Here, we describe a new analysis based on SNP effect estimates from a large independent data set, the UK Biobank (UKB). We find that the signals of selection using UKB effect-size estimates for height are strongly attenuated, though not entirely absent. Because multiple prior lines of evidence provided independent support for directional selection on height, there is no single simple explanation for all the discrepancies. Nonetheless, our current view is that previous analyses were likely confounded by population stratification and so the conclusion of strong polygenic adaptation in Europe now lacks clear support. Moreover, these discrepancies highlight (1) that current methods for correcting for population structure in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of polygenic differences between populations should be treated with caution until these issues are better understood.
Genetic predictions of height differ significantly among human populations and these differences are too large to be explained by random genetic drift. This observation has been interpreted as evidence of polygenic adaptation, natural selection acting on many positions in the genome simultaneously. Selected differences across populations were detected using single nucleotide polymorphisms (SNPs) that were genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of sub-significant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection for diverse traits, the introduction of methods to do this, and claims of polygenic adaptation for multiple traits. All of the claims of polygenic adaptation for height to date have been based on SNP ascertainment or effect size measurement in the GIANT Consortium meta-analysis of studies in people of European ancestry. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. While we replicate most previous findings when restricting to genome-wide significant SNPs, when we extend the analyses to large fractions of SNPs in the genome, the differences across groups attenuate and some change ordering. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure, a more severe problem in GIANT and possibly other meta-analyses than in the more homogeneous UK Biobank. Therefore, claims of polygenic adaptation for height and other traits, particularly those that rely on SNPs below genome-wide significance, should be viewed with caution.
I haven’t read both preprints through and through, but my first thought (along with others), is the same as Casey Brown:
Still trying to wrap my head around how the within family association analysis is so affected by population stratification… https://t.co/9LiPBN9qcj pic.twitter.com/jeOsuO3KPt
— Casey Brown (@casey6r0wn) June 25, 2018
Note that no one has responded to his question.
Finally, recall that population structure within Europe is relatively weak and the distances between the groups low. It reminds you of how difficult polygenic traits are to analyze due to the small and subtle effects, and how they might be overwhelemed even by subtle population structure. And recall, even the British population has some of that… (albeit, an order of magnitude or so less than what you can find across Europe).
Does this explain away a lot of the hard to explain genetic correlations too?
And it should be affected, not effected.
Related – https://www.biorxiv.org/content/early/2018/06/28/357483 – “Quantification of genetic components of population differentiation in UK Biobank traits reveals signals of polygenic selection”.
This includes “For height (ΔY=0.093; individuals with ancestry from southern England are taller on average than individuals with ancestry from Northern Ireland), our estimate of %G was 74.5% (s.e.=16.7%; p=8.4×10-6), implying that differences in height along PC1 are primarily due to selection and cannot be explained by genetic drift.”
This throws a little light on the comments from “Signals of polygenic…” paper above (to which Nick Patterson is also affiliated as an author, along with “Quantification…”), that “stratification effects go in opposite directions in a UKB height GWAS of white British samples and a UKB height GWAS of all samples” (that is, more North European ancestry in all European samples correlates with taller height, but in British samples, with smaller height).
In the broader context, it’s kind of interesting how the height selection story has evolved. First we had the assumption that recent selection was important in Europe for height. Then Mathieson’s findings in 2015 totally flipped the script that by suggesting that tall height in in the Yamnaya could probably explain the European pattern (and that last 4000 years didn’t matter so much). Now these three papers reverse that again, in favour of the early Bronze Age steppe populations being at most only weakly taller than European Neolithics (see “Signals of polygenic…”), and this pattern is not strong enough that it can’t be reversed within Britain, where southern English are genetically less Yamnaya than the Celts, but appear to be slightly taller.
Even more tangentially, we also have the finding from the new paper by Chuan-Chao Wang that the southern steppes of Ciscaucasia along the piedmont of the Northern Caucasus was in the Eneolithic, 1000 years before the Yamnaya, home to people who were essentially genomically like the Yamnaya less European farmer admixture. These samples also look to have a fairly low frequency of SLC45A2 derived variant based on the paper’s “Supplementary Information 1-7” (though only 3x samples).
So where earlier in the year we might reasonably have thought that the strongest probability was the Yamnaya formed from recent admix, just prior to their time, of a tall, fairer skinned EHG population from Ukraine-Volga (high derived SLC45A2) and a smaller, darker pigmented population ultimately from the Caucasus, it looks now like the probability is raised that the bulk of their ancestry may be from a single, longstanding population from the steppe near the Ciscaucaus that was fairly similar in genomic pigmentation and height to the EEF Anatolians (though likely with a good bit less “Basal Eurasian” ancestry). Then subsequent natural selection and nutrition and disease within Europe is all more important in explaining present day differences and the differences seen in bioarchaeology.