Substack cometh, and lo it is good. (Pricing)

Not happening at genomic speed: diversification of GWAS panels

 
One of the things that is evident and the norm when you are interested in genetics and genomics is that things happen fast. There are some sciences which proceed at a normal and conventional pace. But, because genomics is fundamentally driven by the synergy of two technologies, modern automated sequencing, and computation, the field has been moving at faster than the speech of light. A single whole genome sequence is now cheaper than $1,000, whereas the first whole genome 20 years ago cost $3,000,000,000!

People who point to a paper in 2010…well, in genomics that’s ancient history. Take a look at the initial HapMap papers from the mid-2000s if you want to have a laugh!

But, there’s one area that it seems “genomic speed” hasn’t applied: and that’s the attempts to increase population diversification necessary in GWAS panels to maximize insight. The figure to the right is from a new preprint, Current clinical use of polygenic scores will risk exacerbating health disparities. To my surprise, over the last few years, the proportion of people of European ancestry, which mostly means Northwest European ancestry, in genome-wide association studies has actually increased. The absolute number increases are still heartening, as a a lot of the low-hanging fruit can probably be picked at sample sizes of thousands.

The reason that this matters is illustrated in the figure at the top of this post: GWAS studies in European populations are not entirely informative in non-European populations. There are several factors underlying that patter for this. A major one in quantitative (p0lygenic) traits seems to be the phylogenetic distance (the “American” populations are mixed European and African and Amerindian). This distance reflects evolutionary history, as populations split, or minimize gene flow, and drift and selection result in different allele frequencies and characteristics.

Additionally, some alleles (mutations) are unique to specific populations. This is due to mutation after separation. Using “discovery” panels of European populations may miss these explanatory alleles. Some of these alleles may also be found at small fractions within European populations, though below the statistical threshold of detection. In other words, expanding the population pool is probably useful for Europeans as well.

There are also two other issues. First, in many cases, linkage disequilibrium (LD) patterns are such that many explanatory SNPs are tags for causal variants (i.e., SNP 1 in state A is correlated with SNP 2 in state B). That association may differ between populations.  And within African populations LD as a whole is less useful because of the high genetic diversity and a small window of many of the blocks. Finally, there is the possibility of gene-gene interactions, which may differ between populations.

The “good” thing about this problem is that the solution is simple: invest more money and time into non-European populations. This is a solution where steps are straightforward, they just need to be taken. I’m not exactly alarmed, but a few years ago I was relatively sanguine, and here we are! There are some projects in Africa right now, and there are groups in India and the Middle East as well as East Asia, but ultimately it’s all about execution, not proposals.

2 thoughts on “Not happening at genomic speed: diversification of GWAS panels

  1. Not totally skeptical, but how straightforward is it really to run a Biobank scale project in China, India, South Africa? For East Asia as least, as well it’s not like it’s something Japan, South Korea or Taiwan couldn’t do on their own, and probably should be doing.

    On South Asia, we’ve talked a lot in the past about how high population structure should be a benefit to locating disease associated SNPs, but it seems like it should pose challenges where you have a need of very high numbers of individuals within a population to generate population risk scores. As in, are there even enough individuals in some populations in India to do this? Or is this not an issue and really just capturing a broadly representative slice of the South Asian cline should get most of what is needed?

  2. Importantly, the situation with the predictive power of the GWAS polygenic scores in the non-European populations is even more depressing than with the discovery of the associations.
    The European populations are a kind of a special case owing to their unusual genetic homogeneity, but even there, socio-economic confounds of the residual population structure have a profound impact on the polygenic trait scores (among the examples are the height polygenic score debacle, and the “genetic separations” of locals vs. within-the-UK migrants in the Bristol educational attainment study).
    Once you get to more structured and more recently admixed population, the environmental and social confounds on the genetic make-up became rampant (especially in the most clinically relevant groups of the Hispanic and African Americans). And because the relevant LDs don’t reach as deep in time as the divergences of the source populations, evaluating each SNP on the basis of its local ancestry becomes paramount (but it is completely unaccounted for in today’s methods which, at best, remain limited to 2D PCA to gauge broad global ancestry)

Comments are closed.