The Kalash in perspective

 

When Zack ran ChromoPainter/fineStructure on South Asians the results naturally yielded a blueish hue along the diagonal. This is expected because the diagonal represents the population’s own relationship with itself. The bluer the diagonal, the more inbred and isolated the population is likely to be. To the top left you see various Austro-Asiatic tribes, in the middle the “Gujarati A” population from the HapMap (probably Patels), and in the bottom right various Pakistani groups, who presumably have higher rates of consanguinity than the South Asian norm. But one group stands out among the other Pakistani groups: the Kalash. They’re highly endogamous because they’re the last pagan population in Pakistan, isolated in the fastness of the Chitral. It is likely that if their region of Pakistan had been under Afghan, and not British, rule these people would have been forcibly submitted to Islam like their Nuristani cousins across the border 100 years ago. As it is I assume that at some point I may have to update these posts with the note that Pakistani Taliban forcibly converted the few thousand remaining Kalash to Islam. Such is the march of history in the abode of peace.

Religion may be a critical explanation for this isolation, because the similarly isolated Burusho of northern Pakistan are not nearly as inbred. Though the Burusho are speakers of a linguistic isolate, they adopted Islam ~500 years ago, and so are likely to have intermarried more frequently with outsiders. The Kalash isolation means that in some ways they’re relatively useless in comparative genomic analysis. Though they are not an ‘ancestral population’ they routinely jump out as distinctive early on in PCA or ADMIXTURE/Structure/frappe analyses. In short, they are close to one large extended family. Nevertheless, they can be somewhat informative when these particularities are taken into account.

Presumably the Kalash are distant from other populations because of their isolation, but their relationship to other world populations will retain the same general conformation before their endogamous period. With that in mind I ran a narrow South Asian focused ADMIXTURE run. With ~400,000 markers I had: The Kalash, Burusho, Brahui, Pathan, Gujarati Patels, Hazara, Uygur, Russians, Japanese, Bantus of Kenya, Basque, Adygei, and Druze. Below are the results for K = 7 for selection:

 

Read More

Update on the Afrikaner genotype

Since my original post on the Afrikaner genotype, I’ve gotten many responses. No genotypes yet though. At some point I need to organize how to pay for typing many individuals. Currently my intent is to pay for those who will allow their identities to be public so that people can confirm their genealogies. Other people have emailed me to say that Afrikaners with whom they have shared genotypes on 23andMe often have African or Asian ancestral segments. But all hearsay so far.

In other news, I’ve got a Dinka gentoype I’ll be analyzing soon (some minor technical issues with merging datasets has delayed this some).

Non-overlapping magisteria for the social and biological?

Most of you know that Stephen Jay Gould proposed ‘non-overlapping magisteria’ for science and religion. I don’t care much for the framing myself, though neither am I on the same page as Sam Harris and company. But I thought of this model when reading this comment below:

Kind of a tangent, but I think it gets slippery considering which construction is more “real.” We tend to come at it as the “reality” being genetic ancestry upon which a socially constructed (“not real”) conception of race is sloppily mapped. However, I get the social science perspective that the socially constructed race is the category that is often much more “real” — it is the lived experience of everyone in that group. If you are visibly black (or white) and part of that community, then you “really” are black (or white) in many, many ways that matters, whether you’re 90% African or 0%.

In the context of the topics we mainly discuss here — population genetics, medical genetics, etc — genetic ancestry is “real” and social race categories matter less. This is why I’m mostly in favor of letting social science have the word “race” and would really push for biologists to use better-defined terms (like ancestry).

It’s a mistake to say that human artifice is “less real” than genetics, it’s context dependent. I recognize that the social construction of race is a sort of unrefereed crowd-sourced attempt at deriving ancestry, but as we start to divide causal factors into social / biological aspects of race, this mapping is more of a hindrance than a help. A first step is to stop using shared language to discuss them.

Read More

Clusters where they "shouldn't be"….


Uyghur girls

A few people have pointed me to the paper, Implications for health and disease in the genetic signature of the Ashkenazi Jewish population. You should check it out if you don’t have academic access to papers, it’s not gated. Rather, I want to focus on a methodological issue.

In the genetics reader survey only 20 percent of you agreed that you understood how to read an ADMIXTURE plot. After looking at some of the results in this paper I have a lot of sympathy. Understanding what’s going on requires more prior information than is often present in the legends of the figures.

It is known that to a first approximation Ashkenazi Jews, that is, the Jews of Europe, can be understood as an admixture between a European population and a Middle Eastern one. But Ashkenazi Jews also exhibit their own genetic distinctiveness, probably due to long term endogamy. This shows up in various genetic statistics. In this paper the authors show that Ashkenazi form their own cluster in both PCA and ADMIXTURE, two ways in which to ascertain population structure. Below I’ve reedited and highlighted some populations of note in one of their ADMIXTURE plots. It’s rather informative of the bigger problem with interpreting these sorts of results in the absence of context.

Read More

American medicine & American red-tape

I just attended a presentation where a researcher outlined how epigenomics could help patients with various grave illnesses. Normally I don’t focus on human medical genetics too much because it always depresses me. I don’t understand how medical geneticists don’t start wondering what hidden disease everyone around them has. In any case the researcher outlined how epigenomic information allowed for better treatment, so as to extend the lives of patients. All well and good. But then one individual in the audience began asking pointed questions as to the medical ethics of the enterprise, and whether the researcher had cleared some legally sanctioned hurdles. More specifically, there was a question whether exploring someone’s epigenomic profile might expose private information of their relatives! (because relatives share epigenomic and genomic profiles to some extent)

Frankly I began to get enraged at this point. People are suffering from terminal illnesses, and considerations of the genetic privacy of their near relatives are looming large? Seriously? The reality is that manifestation of a disease itself gives one information about the risks of their relatives. In any case, the researcher admitted that further progress in this area is probably going to be due to the investments of wealthy individuals (e.g., people like Steve Jobs who have illnesses) as well as outside of the United States. You’re #1 America!

Working class vs. middle class white seculars

Rod Dreher at The American Conservative, White Working-Class ‘Seculars’:

What’s interesting to think about is that these working-class non-churchgoers are probably not secular in the same way white intellectual elites are secular. I bet if you polled them, 999 out of 1,000 would say they believed in God and considered themselves to be Christians. It’s just that they don’t go to church. Where I live, during deer hunting season, to be a white male is to be seasonally “secular” in this way.

One way to answer this question is look at the GSS. I used the ATTEND (attend church that is) variable to ascertain secularity. Those who never attended church or did so less than once a year (in other words, some years they did attend, in other years they did not), are “secular.” Those who attend nearly weekly, or more, are “religious.” To assess class I simply divided the non-Hispanic white population into those who had a college degree or higher (middle class), and those who did not (working class).

Below are some responses to a selection of questions.

 

Read More

Kkkhhhaaannn!!!

My post, 1 in 200 men direct descendants of Genghis Khan, is linked up somewhere almost every week. Why is it so popular? No idea. But one thing that has come to mind: we’ve come a long way in since the early 2000s in assembling databases of human scientific genealogies. Soon enough a substantial proportion of the males in the human race will have their Y chromosomes typed. What proportion of these individuals are going to be part of a “star-shaped phylogeny,” implying a radiation from a single individual relatively recently in the past? My own suspicion is a great deal. The Khan lineage is simply a relatively recent one as these things go. How did R1a1 come to be so widespread?

The social and biological construction of race

Many of our categories are human constructions which map upon patterns in nature which we perceive rather darkly. The joints about which nature turns are as they are, our own names and representations are a different thing altogether. This does not mean that our categories have no utility, but we should be careful of confusing empirical distributions, our own models of those distributions, and reality as it is stripped of human interpretative artifice.

I have argued extensively on this weblog that:

1) Generating a phylogeny of human populations and individuals within those populations is trivial. You don’t need many markers, depending on the grain of your phylogeny (e.g., to differentiate West Africans vs. Northern Europeans you actually can use one marker!).

2) These phylogenies reflect evolutionary history, and the trait differences are not just superficial (i.e., “skin deep”).

The former proposition I believe is well established. A group such as “black American” has a clear distribution of ancestries in a population genetic sense. The latter proposition is more controversial and subject to contention. My own assumption is that we will know the truth of the matter within the generation.

Read More

I've got your missing heritability right here…

A debate is raging in human genetics these days as to why the massive genome-wide association studies (GWAS) that have been carried out for every trait and disorder imaginable over the last several years have not explained more of the underlying heritability. This is especially true for many of the so-called complex disorders that have been investigated, where results have been far less than hoped for. A good deal of effort has gone into quantifying exactly how much of the genetic variance has been “explained” and how much remains “missing”.

The problem with this question is that it limits the search space for the solution. It forces our thinking further and further along a certain path, when what we really need is to draw back and question the assumptions on which the whole approach is founded. Rather than asking what is the right answer to this question, we should be asking: what is the right question?

The idea of performing genome-wide association studies for complex disorders rests on a number of very fundamental and very big assumptions. These are explored in a recent article I wrote for Genome Biology (referenced below; reprints available on request). They are:

1) That what we call complex disorders are unitary conditions. That is, clinical categories like schizophrenia or diabetes or asthma are each a single disease and it is appropriate to investigate them by lumping together everyone in the population who has such a diagnosis – allowing us to calculate things like heritability and relative risks. Such population-based figures are only informative if all patients with these symptoms really have a common etiology.

2) That the underlying genetic architecture is polygenic – i.e., the disease arises in each individual due to toxic combinations of many genetic variants that are individually segregating at high frequency in the population (i.e., “common variants”).

3) That, despite the observed dramatic discontinuities in actual risk for the disease across the population, there is some underlying quantitative trait called “liability” that is normally distributed in the population. If a person’s load of risk variants exceeds some threshold of liability, then disease arises.

All of these assumptions typically go unquestioned – often unmentioned, in fact – yet there is no evidence that any of them is valid. In fact, the more you step back and look at them with an objective eye, the more outlandish they seem, even from first principles.

First, what reason is there to think that there is only one route to the symptoms observed in any particular complex disorder? We know there are lots of ways, genetically speaking, to cause mental retardation or blindness or deafness – why should this not also be the case for psychosis or seizures or poor blood sugar regulation? If the clinical diagnosis of a specific disorder is based on superficial criteria, as is especially the case for psychiatric disorders, then this assumption is unlikely to hold.

Second, the idea that common variants could contribute significantly to disease runs up against the effects of natural selection pretty quickly – variants that cause disease get selected against and are therefore rare. You can propose models of balancing selection (where a specific variant is beneficial in some genomic contexts and harmful in others), but there is no evidence that this mechanism is widespread. In general, the more arcane your model has to become to accommodate contradictory evidence, the more inclined you should be to question the initial premise.

Third, the idea that common disorders (where people either are or are not affected) really can be treated as quantitative traits (with a smooth distribution in the population, as with height) is really, truly bizarre. The history of this idea can be traced back to early geneticists, but it was popularised by Douglas Falconer, the godfather of quantitative genetics (he literally wrote the book).

In an attempt to demonstrate the relevance of quantitative genetics to the study of human disease, Falconer came up with a nifty solution. Even though disease states are typically all-or-nothing, and even though the actual risk of disease is clearly very discontinuously distributed in the population (dramatically higher in relatives of affecteds, for example), he claimed that it was reasonable to assume that there was something called the underlying liability to the disorder that was actually continuously distributed. This could be converted to a discontinuous distribution by further assuming that only individuals whose burden of genetic variants passed an imagined threshold actually got the disease. To transform discontinuous incidence data (mean rates of disease in various groups, such as people with different levels of genetic relatedness to affected individuals) into mean liability on a continuous scale, it was necessary to further assume that this liability was normally distributed in the population. The corollary is that liability is affected by many genetic variants, each of small effect. Q.E.D.

This model – simply declared by fiat – forms the mathematical basis for most GWAS analyses and for simulations regarding proportions of heritability explained by combinations of genetic variants (e.g., the recent paper from Eric Lander’s group). To me, it is an extraordinary claim, which you would think would require extraordinary evidence to be accepted. Despite the fact that it has no evidence to support it and fundamentally makes no biological sense (see Genome Biology article for more on that), it goes largely unquestioned and unchallenged.

In the cold light of day, the most fundamental assumptions underlying population-based approaches to investigate the genetics of “complex disorders” can be seen to be flawed, unsupported and, in my opinion, clearly invalid. More importantly, there is now lots of direct evidence that complex disorders like schizophrenia or autism or epilepsy are really umbrella terms, reflecting common symptoms associated with large numbers of distinct genetic conditions. More and more mutations causing such conditions are being identified all the time, thanks to genomic array and next generation sequencing approaches.

Different individuals and families will have very rare, sometimes even unique mutations. In some cases, it will be possible to identify specific single mutations as clearly causal; in others, it may require a combination of two or three. There is clear evidence for a very wide range of genetic etiologies leading to the same symptoms. It is time for the field to assimilate this paradigm shift and stop analysing the data in population-based terms. Rather than asking how much of the genetic variance across the population can be currently explained (a question that is nonsensical if the disorder is not a unitary condition), we should be asking about causes of disease in individuals:

– How many cases can currently be explained (by the mutations so far identified)?

– Why are the mutations not completely penetrant?

– What factors contribute to the variable phenotypic expression in different individuals carrying the same mutation?

– What are the biological functions of the genes involved and what are the consequences of their disruption?

– Why do so many different mutations give rise to the same phenotypes?

– Why are specific symptoms like psychosis or seizures or social withdrawal such common outcomes?

These are the questions that will get us to the underlying biology.

Mitchell, K. (2012). What is complex about complex disorders? Genome Biology, 13 (1) DOI: 10.1186/gb-2012-13-1-237

Manolio, T., Collins, F., Cox, N., Goldstein, D., Hindorff, L., Hunter, D., McCarthy, M., Ramos, E., Cardon, L., Chakravarti, A., Cho, J., Guttmacher, A., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C., Slatkin, M., Valle, D., Whittemore, A., Boehnke, M., Clark, A., Eichler, E., Gibson, G., Haines, J., Mackay, T., McCarroll, S., & Visscher, P. (2009). Finding the missing heritability of complex diseases Nature, 461 (7265), 747-753 DOI: 10.1038/nature08494

Zuk, O., Hechter, E., Sunyaev, S., & Lander, E. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability Proceedings of the National Academy of Sciences, 109 (4), 1193-1198 DOI: 10.1073/pnas.1119675109