Height differences across Europe could be less affected by selection than we had thought

Like an Old Testament prophet of yore Graham Coop has been prophesying that cryptic population stratification may be a major confounder in analyses for as long as I’ve known him with any degree of familiarity. So it’s no surprise he’s an author on one of two preprints which have rocked the genomics world:

Reduced signal for polygenic adaptation of height in UK Biobank:

There is considerable variation in average height across European populations, with individuals in the northwest being taller, on average, than those in the southeast. During the past six years, a series of papers reported that polygenic scores for height also show a north to south gradient, and that this cline results from natural selection. These polygenic analyses relied on external estimates of SNP effects on height, taken from the GIANT consortium and from smaller replication studies. Here, we describe a new analysis based on SNP effect estimates from a large independent data set, the UK Biobank (UKB). We find that the signals of selection using UKB effect-size estimates for height are strongly attenuated, though not entirely absent. Because multiple prior lines of evidence provided independent support for directional selection on height, there is no single simple explanation for all the discrepancies. Nonetheless, our current view is that previous analyses were likely confounded by population stratification and so the conclusion of strong polygenic adaptation in Europe now lacks clear support. Moreover, these discrepancies highlight (1) that current methods for correcting for population structure in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of polygenic differences between populations should be treated with caution until these issues are better understood.

And…Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies:

Genetic predictions of height differ significantly among human populations and these differences are too large to be explained by random genetic drift. This observation has been interpreted as evidence of polygenic adaptation, natural selection acting on many positions in the genome simultaneously. Selected differences across populations were detected using single nucleotide polymorphisms (SNPs) that were genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of sub-significant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection for diverse traits, the introduction of methods to do this, and claims of polygenic adaptation for multiple traits. All of the claims of polygenic adaptation for height to date have been based on SNP ascertainment or effect size measurement in the GIANT Consortium meta-analysis of studies in people of European ancestry. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. While we replicate most previous findings when restricting to genome-wide significant SNPs, when we extend the analyses to large fractions of SNPs in the genome, the differences across groups attenuate and some change ordering. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure, a more severe problem in GIANT and possibly other meta-analyses than in the more homogeneous UK Biobank. Therefore, claims of polygenic adaptation for height and other traits, particularly those that rely on SNPs below genome-wide significance, should be viewed with caution.

I haven’t read both preprints through and through, but my first thought (along with others), is the same as Casey Brown:

Note that no one has responded to his question.

Finally, recall that population structure within Europe is relatively weak and the distances between the groups low. It reminds you of how difficult polygenic traits are to analyze due to the small and subtle effects, and how they might be overwhelemed even by subtle population structure. And recall, even the British population has some of that… (albeit, an order of magnitude or so less than what you can find across Europe).

The future shall, and should, be sequenced

Last fall I talked about a preprint, Human demographic history impacts genetic risk prediction across diverse populations. It’s now published in AJHG, with the same informative title, Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Even though talked about this before, I thought it would be useful to highlight again.

To recap, GWAS is a pretty big deal, but only in the last 15 years or so. With genome-wide data researchers began to explore associations between diseases and population genetic variation. In some cases they discovered strong associations between characteristics and genetic variants, but in many casese it turned out that though a trait is highly heritable (e.g., schizophrenia) the causal variants are either not common or do not explain much of the variation in the poplation (or both).

But as the second decade of GWAS proceeds the sample sizes are getting larger, and researchers are moving from SNP-chips, with their various biases, to high quality whole-genome sequences. One of the major sorts of low hanging fruit in the minds of many people are rare variants. Basically SNP-chips are geared toward finding common variations within large populations, since they have a finite number of markers they are going to interrogate. Sequencing though is a comprehensive catalog of the genome in a relative sense. If you have high coverage (so you sample the site many times) you can easily discover rare mutations within an individual genome that makes them distinctive from almost the rest of the human race (these may be de novo mutations, or, they could be mutations private to their extended pedigree).

But context matters. Martin et al. find that confirmed GWAS hits in Europeans tend to exhibit decreased portability as a function of genetic distance. This isn’t entirely surprising, especially if rarer variants are part of the explanation. Rare variants usually emerged later in history, after the differentiation between geographic races.

A solution would be to have a diverse panel of populations in your studies. For many reasons this was not to be. Northwest Europeans are enormously enriched in current data sets. Martin et al. observe that recent this has diminished somewhat, from 95% European to less than 80%. But they observe that this is mostly due to the inclusion of “Asian” samples, as opposed to African and Native Americans, who remain as undererpresented as they did several years ago.

The African and Native American samples present somewhat different problems. The Native American groups are quite drifted due to bottlenecks. Likely they have their own variants due to the combined affects of mutation and selection through 15 to 20,000 years of isolation from other human populations. In contrast, the African groups have lots of diversity with a high time depth due to their ancestral histories, which are less subject to bottleneck effects. The prediction ability into Africans of current GWAS looks to be rather pathetic. This is reasonable because their diversity is poorly captured in Eurocentric study designs, and, they are more genetically diverged from Europeans than Asians are.

Ultimatley I think, and hope, this portability question will be of short term utility. As sequencing gets cheap, and studies become more numerous, we’ll fill in the gaps of understudied populations. Finally, ethics is above my paygrade, but I do hope those who demand a strenuous bar on consent keep in mind that that will result in slower growth of these study populations. Academics want to do a good job, but they also want to stay on the good side of IRB.

Citation: Martin, Alicia R., et al. “Human demographic history impacts genetic risk prediction across diverse populations.” bioRxiv (2016): 070797.