Tutsis are genetically very similar to Masai

Many years ago, before I used ggplot, I did a little analysis of the genetics of the Tutsi. Actually, it was the genetics of a single Tutsi, or more precisely, someone who was 75% Tutsi ancestry (3 out of 4 grandparents).

I found that the Tutsi individual seemed quite distinct from the Bantu peoples in nearby Kenya. I suggested that it was likely that the Tutsi were then genetically distinct from the Hutu people amongst whom they lived. For many years this was part of the genetics section of the Wikipedia entry on the Tutsi, but recently the reference was removed and the page seems to have been re-edited.

That’s fine. I’m just a random blogger who had one sample. But as it happened recently about a dozen Diasporic Tutsis reached out to me. Over the last decade, the number of people who have been genotyped has increased greatly. So it wasn’t that difficult for interested parties to find these genotypes.

The mission they put before me is simple: “tell us about our genetics”. Over the next few weeks, I’ll do that. As there is no IRB, this won’t be published in a peer-reviewed journal (I am open to putting any researcher in contact with these Tutsis who reached out to me). I’m just going to put what I find out there so that Tutsis who do personalized genetic testing can make sense of what they’re finding out.

I received these genotypes today. A quick merge of samples I have reduced it down to 50,000 markers. I will work on creating a merge with a larger number of markers. But, I’ll report what I have found out so far as a first pass.

As you can see on the PCA plot above the Tutsi overlap almost perfectly with the Masai. Not with the Kenyan Bantu, or the Luo, who are more “African” shifted. But with the Masai. But, they are not as “Eurasian shifted” as the Somali.

Treemix confirms this:

Read More

Selection for and against pigmentation alleles in South Asia

Deepika Padukone

Recently some British friends were asking about what we knew about South Asian historical genetics now. I explained that it does look like there was some migration in from the Central Asian steppe and West Asia into South Asia during the Holocene. To which one friend responded, “that’s obvious though, many Indians look like brown white people.” Setting aside the semantic paradox (if you are brown, you are literally not white), it is clear what he is getting at: due to shared ancestry the facial structure of many South Asians is not that different from West Eurasians.

The Bollywood actress Deepika Padukone is an example of someone who is rather brown-skinned (naturally), but whose facial features are such that if she went with 100% skin-bleaching she would pass as white without too much trouble. For the purposes of this post, I Googled Indian albino…and came up with this family. You can make your own judgments. I don’t know what to think of that!

The reason for this post is a newly accepted paper, Ancestry-specific analyses reveal differential demographic histories and opposite selective pressures in modern South Asian populations:

Genetic variation in contemporary South Asian populations follows a northwest to southeast decreasing cline of shared West Eurasian ancestry. A growing body of ancient DNA evidence is being used to build increasingly more realistic models of demographic changes in the last few thousand years. Through high quality modern genomes, these models can be tested for gene and genome level deviations. Using local ancestry deconvolution and masking, we reconstructed population-specific surrogates of the two main ancestral components for more than 500 samples from 25 South Asian populations, and showed our approach to be robust via coalescent simulations.

Our f3 and f4 statistics based estimates reveal that the reconstructed haplotypes are good proxies for the source populations that admixed in the area and point to complex inter-population relationships within the West Eurasian component, compatible with multiple waves of arrival, as opposed to a simpler one wave scenario. Our approach also provides reliable local haplotypes for future downstream analyses. As one such example, the local ancestry deconvolution in South Asians reveals opposite selective pressures on two pigmentation genes (SLC45A2 and SLC24A5) that are common or fixed in West Eurasians, suggesting post-admixture purifying and positive selection signals, respectively.

Read More

Genes, memes, and Mundas

The Munda languages of the northeastern quadrant of the Indian subcontinent are quite interesting because they are more closely related to the Austro-Asiatic languages of Southeast Asia than to the Indo-Aryan or Dravidian languages which are spoken by their neighbors. The Munda are usually classified as adivasi, which has connotations of being an ‘original inhabitant’ of the Indian subcontinent.

More concretely, the Munda have traditionally operated outside of the bounds of Sanskrit-influenced Hindu civilizations, occupying upland zones and governing themselves as tribal units, rather than being a caste population.

What the field of genetics tells us is that there are really no true aboriginal inhabitants of the Indian subcontinent in an unmixed form. That is, the vast majority of people in the Indian subcontinent have a substantial contribution of ancestry from the wave of migration out of Africa that occupied the southeast fringe of Eurasia beginning ~50-60,000 years ago. The modern adivasi generally are defined more by their social-cultural position within the landscape of Indian culture, as opposed to their long-term residence in the subcontinent.*

The term is a particular misnomer for the Munda because of the evidence that they are intrusive to the subcontinent from Southeast Asia. We have ancient DNA and archaeology which indicates that upland rice farmers, likely Austro-Asiatic, arrived in northern Vietnam ~4,000 years ago. This makes it unlikely to me that they were in India much earlier. The Y chromosomal data indicate that the paternal ancestry of the Munda derives from Southeast Asians, not the other way around.

A new genome-wide analysis of the Southeast Asian fraction of Munda ancestry suggests that it can be as high as ~30%. The paper is The genetic legacy of continental scale admixture in Indian Austroasiatic speakers:

Surrounded by speakers of Indo-European, Dravidian and Tibeto-Burman languages, around 11 million Munda (a branch of Austroasiatic language family) speakers live in the densely populated and genetically diverse South Asia. Their genetic makeup holds components characteristic of South Asians as well as Southeast Asians. The admixture time between these components has been previously estimated on the basis of archaeology, linguistics and uniparental markers. Using genome-wide genotype data of 102 Munda speakers and contextual data from South and Southeast Asia, we retrieved admixture dates between 2000–3800 years ago for different populations of Munda. The best modern proxies for the source populations for the admixture with proportions 0.29/0.71 are Lao people from Laos and Dravidian speakers from Kerala in India. The South Asian population(s), with whom the incoming Southeast Asians intermixed, had a smaller proportion of West Eurasian genetic component than contemporary proxies. Somewhat surprisingly Malaysian Peninsular tribes rather than the geographically closer Austroasiatic languages speakers like Vietnamese and Cambodians show highest sharing of IBD segments with the Munda. In addition, we affirmed that the grouping of the Munda speakers into North and South Munda based on linguistics is in concordance with genome-wide data.

The paper already came out as a preprint many months back, so I’ve already mentioned it. The big finding, to me, is that it uses genome-wide methods to estimate an admixture in the range of ~4,000 between the southern Munda Southeast Asian and South Asian ancestral components. It also confirms something that has been pretty evident for nearly ten years of genome-wide analysis of South Asian population genetics: the Munda have less West Eurasian ancestry even after you account for the Southeast Asian admixture than any mainland Indian population outside of the Tibeto-Burman fringe.

In Narasimhan et al. the authors present a model that fits the data where:

  1. The proto-Munda mix with an “Ancient Ancestral South Indian” (AASI) population that has no West Eurasian admixture in India’s northeast
  2. Then, mix more with an “Ancestral South Indian” (ASI) population that has some West Eurasian admixture

Read More

Very ancient ghosts in the African genome

The above figure is from a preprint (updated from last year), Recovering signals of ghost archaic introgression in African populations. But to truly get a sense of this preprint, I would highly recommend you read the supplementary material. And, to be honest, a publication from 2007, The Joint Allele-Frequency Spectrum in Closely Related Species, as the core of the method used in the preprint is developed in that paper.

Here is the abstract:

While introgression from Neanderthals and Denisovans has been well-documented in modern humans outside Africa, the contribution of archaic hominins to the genetic variation of present-day Africans remains poorly understood. Using 405 whole-genome sequences from four sub-Saharan African populations, we provide complementary lines of evidence for archaic introgression into these populations. Our analyses of site frequency spectra indicate that these populations derive 2-19% of their genetic ancestry from an archaic population that diverged prior to the split of Neanderthals and modern humans. Using a method that can identify segments of archaic ancestry without the need for reference archaic genomes, we built genome-wide maps of archaic ancestry in the Yoruba and the Mende populations that recover about 482 and 502 megabases of archaic sequence, respectively. Analyses of these maps reveal segments of archaic ancestry at high frequency in these populations that represent potential targets of adaptive introgression. Our results reveal the substantial contribution of archaic ancestry in shaping the gene pool of present-day African populations.

To get a sense of how much work went into this preprint, really do read the supplementary material. The step by step analysis convinced me pretty thoroughly that these results are not due to straightforward errors in the genotypes and classifications of the genotypes. Such things do happen, so it was nice to see them be very careful about that.

The key point is that the distribution of the conditional site frequency (CFS) spectrum in West Africans does not align with theoretical expectations. The condition here being the state in the archaic outgroup, generally the Vindijia Neanderthal. The authors ran a bunch of simulations and models and found a subset that could produce the CSF they see, the u-shaped distribution. It is represented by the graph you see at the top-right. Basically, a scenario where a diverged archaic lineage which diverged from the other human lineages before the Neanderthal-Denisovan lineage left Africa contributed to the ancestry of West Africans within the last ~100,000 years (the most likely time is ~50,000 years ago).

This is not a new finding at the highest level of generality. Jeff Wall has been beating this drum for nearly 15 years. For example, Genetic evidence for archaic admixture in Africa.

What has changed is that whole-genome sequencing, including high-quality sequences of ancient hominins, has allowed for a more robust exploration of the topic. The analysis of site frequencies was really not useful 20 years ago without genome-wide data. More data has allowed for more subtle methods.

Read More

Europe had a lot of demographic turnover because there were never many humans

Now things are coming into focus. Population dynamics and socio-spatial organization of the Aurignacian: Scalable quantitative demographic data for western and central Europe:

Demographic estimates are presented for the Aurignacian techno-complex (~42,000 to 33,000 y calBP) and discussed in the context of socio-spatial organization of hunter-gatherer populations. Results of the analytical approach applied estimate a mean of 1,500 persons (upper limit: 3,300; lower limit: 800) for western and central Europe. The temporal and spatial analysis indicates an increase of the population during the Aurignacian as well as marked regional differences in population size and density. Demographic increase and patterns of socio-spatial organization continue during the subsequent early Gravettian period.

If you read The genetic history of Ice Age Europe you know the very first modern humans to arrive in Europe didn’t leave a genetic footprint in future populations. And the impact of both the later Gravettian and the Magdalenian seems to have been marginal. The primary “hunter-gatherer” contribution to modern Europeans is through a group which expanded after ~15,000 BC.

In any case, there are two things that I observe in relation to the population estimates above. First, they aren’t that unreasonable for a large mammal which isn’t much of a primary consumer of plants. Second, such a small and fragmented population indicates that extinction is always a possibility. You can take a standard conservation biological view and just assume statistically that small fragmented groups are likely to extinct over enough generations. Or, you can point out that genetically such small breeding populations (remember that the genetic breeding effective population is always smaller than the census population) are likely to build up deleterious alleles, and that’s probably going to result in a decrease of long term fitness.

In other words, I think localized mutational meltdowns would be possible in this scenario.

The small populations during this period are not surprising. Many of the Neanderthal, Denisovan, and hunter-gatherer (e.g., the first WHG sample) populations had small sizes that led to homogeneity genetically and inbreeding. You see it in the homozygosity data and the runs of homozygosity. Ultimately, it was the larger population sizes due to agriculture which changed things in a fundamental sense.

This makes me wonder what was so advantageous about these marginal modern humans which allowed them to overwhelm and absorb the older Eurasian hominins?

On “big science”, ancient DNA, and David Reich

A lot has happened in the last few days in backchannel conversations and social media in relation to the piece in The New York Times Magazine which put the spotlight on ancient DNA, and David Reich, for the general audience. Unlike Carl Zimmer’s ancient DNA column in the science section of the paper, the people reading Gideon Lewis-Kraus’ 12,000-word piece are not going to be familiar with the field and will miss omissions and the context.

To “bullet” some of the issues with the piece, in order of simplicity and straightforwardness to me:

Read More

Models uncovering African population genetic history

In a deep sense, we know a lot more about the population genetic history of England at the fine-grain than we do about the whole continent of Africa. That’s going to change in the near future, as researchers now realize that the history and emergence of modern humans within the continent was a more complex, and perhaps more multi-regional, affair than had been understood.

Because of the relative dearth of ancient DNA, there has been a lot of deeply analytic work that draws from some pretty abstruse mathematical tools operating on extant empirical data. A series of preprints have come out which use different methods, and arrive at different particular details of results, but ultimately seem to be illuminating a reoccurring set of patterns. Dimly perceived, but sensed nonetheless.

Here’s the latest offering, Models of archaic admixture and recent history from two-locus statistics. I can’t pretend to have read the whole preprint (lots of math), but these empirical results jumped out at me:

We inferred an archaic population to have contributed measurably to Eurasian populations. This branch (putatively Eurasian Neanderthal) split from the branch leading to modern humans between ∼ 470 − 650 thousand years ago, and ∼ 1% of lineages in modern CEU and CHB populations were contributed by this archaic population after the out-of-Africa split. This range of divergence dates compares to previous estimates of the time of divergence between Neanderthals and human populations, estimated at ∼650 kya (Pr¨ufer et al., 2014). The “archaic African” branch split from the modern human branch roughly 460 − 540 kya and contributed ∼ 7.5% to modern YRI in the model (Table A2).

We chose a separate population trio to validate our inference and compare levels of archaic admixture with different representative populations. This second trio consisted of the Luhya in Webuye, Kenya (LWK), Kinh in Ho Chi Minh City, Vietnam (KHV), and British in England and Scotland (GBR). We inferred the KHV and GBR populations to have experienced comparable levels of migration from the putatively Neanderthal branch. However, the LWK population exhibited lower levels of archaic admixture (∼ 6%) in comparison to YRI, suggesting population differences in archaic introgression events within the African continent (Table A3).

To be frank I’m not sure as to the utility of the term “archaic” anymore. I sometimes wish that we’d rename “modern human” to “modal human.” That is, the dominant lineage that was around ~200,000 years ago in relation to modern population ancestry.

Skull from Iwo Eleru, Nigeria. Photo credit: Katerina Harvati and colleagues CC-BY

But, these results are aligned with other work from different research groups which indicate that something basal to all other modern humans, but within a clade of modern humans in relation to Neanderthal-Denisovans, admixed with a modern human lineage expanding out of eastern Africa. The LWK sample is Bantu, and has a minority Nilotic component that has West Eurasian ancestry. This probably accounts for the dilution of the basal lineage from 7.5% to 6%.

I wouldn’t be surprised if the final proportions differ. And other research groups have found deep lineages with African hunter-gatherers. My own view is that it does seem likely that one of the African human populations that flourished ~200, 000 years ago expanded and assimilated many of the other lineages. The “Out of Africa” stream is one branch of this ancient population. But it seems possible that the expansion was incomplete, and that other human lineages persisted elsewhere until a relatively late date.

How paternity testing is like international trade

Nonpaternity rate % N
Switzerland 0.83 1607
USA, Michigan, white 1.49 1417
USA, California, white 2.1 6960
USA, Hawaii 2.3 2839
UK, West London 3.7 2596
Paternity Testing Laboratories
UK 16.6 1702
USA, Los Angeles, white 24.9 1393
Sweden 38.7 5018
South Africa, Cape Coloured 40 1156

The results above are from Kermyt Anderson’s How Well Does Paternity Confidence Match Actual Paternity? This is still one of the best surveys of the field, despite being 12 years ago. A more recent paper, Cuckolded Fathers Rare in Human Populations, uses more powerful genetic genealogy methods to come to the same conclusion as Anderson’s survey: extrapair paternity, or nonpaternity events, are rare in Western societies. I don’t think it is limited to Western societies. I suspect that when high throughput sequencing is applied to Chinese clan lineages and Hindu gotras, you will found that nonpaternity events are similar to those in the West.*

On the other hand, in some small-scale societies, the rates are much higher.

I won’t delve into the evolutionary anthropology here. Rather, I want to point to a new paper, Growth of ancestry DNA testing risks huge increase in paternity issues. Ancestry testing is huge. Within the next year, it is almost certain that 10% of the American population when having some sort of high-density genomic testing done.

As the author of the paper pointed out to me on Twitter, 1% of 16 million people is still a lot. Yes, in absolute terms. But we need to look at the other side of the equation.

In Anderson’s original data one of the interesting results is that in most datasets drawn from paternity testing laboratories, where there is a very high suspicion of nonpaternity events, most of the fathers nevertheless were biological fathers! In a nonpaternity testing context, nonpaternity events will be much closer to ~1%. But, I think it is reasonable to suppose that some of the 99% of the fathers who turn out to be biological fathers also have suspicions…which are unfounded.

Like free trade, you tend to see one side of the equation much more than the other. In free trade scenarios, a minority of workers may lose their jobs or have to work under reduced wages, but the vast majority of consumers will get cheaper or better products. The former is much more salient than the latter.

Similarly, the small minority of fathers and families who are going to be “surprised” in a negative way, is balanced out by the likely larger number who have low-grade suspicions, but in fact, are confirmed in their biological relatedness.

Addendum: Needless to say, if you are part of the “cuckold community”, you should probably not getting this sort of testing.

* The necessity of good quality whole-genome sequencing is due to the fact that male relatives are excellent candidates for nonpaternity events. To get a certain estimate one would want to count unique mutations across the pedigree.

Patterns of genetic diversity within Africa

The violin-plot above is from a new preprint, Runs of Homozygosity in sub-Saharan African populations provide insights into a complex demographic and health history. Here’s the abstract:

The study of runs of homozygosity (ROH), contiguous regions in the genome where an individual is homozygous across all sites, can shed light on the demographic history and cultural practices. We present a fine-scale ROH analysis of 1679 individuals from 28 sub-Saharan African (SSA) populations along with 1384 individuals from 17 world-wide populations. Using high-density SNP coverage, we could accurately obtain ROH as low as 300Kb using PLINK software. The analyses showed a heterogeneous distribution of autozygosity across SSA, revealing a complex demographic history. They highlight differences between African groups and can differentiate between the impact of consanguineous practices (e.g. among the Somali) and endogamy (e.g. among several Khoe-San groups). The genomic distribution of ROH was analysed through the identification of ROH islands and regions of heterozygosity (RHZ). These homozygosity cold and hotspots harbour multiple protein coding genes. Studying ROH therefore not only sheds light on population history, but can also be used to study genetic variation related to the health of extant populations.

This sort of run-of-homozygosity analysis is enabled by high-density genotyping or whole-genome sequencing. After quality control, the authors had 1 to 1.5 million SNPs for all populations.

The interesting thing about this preprint is that by looking at the violin-plots can you can see exactly all the things that population geneticists have learned about the demography, structure, and history of humans in the past generation or so.

  • The rightmost panel shows the average total length of short ROH. Partly the pattern fits into the older serial bottleneck model of the settlement of the world. The pattern of Amerindian > East Asian > European > African. But what about the lower fractions for mixed Latin Americans and Gujuratis? This is a consequence of admixture, as these populations are mixtures in a sense of other groups.
  • The length of the long ROH segments, the second to last panel on the right, is indicative of recent patterns of marriage. Within Africa, you see some groups have many individuals with lots of long ROH segments. This is because of consanguinity. As the authors observe, the Oromo and Somali are both Cushitic speaking groups from the Horn of Africa, but the latter are universally Muslim, while only a minority of the former are. Islamic cultures have traditionally encouraged consanguineous marriages, and you can see the difference between these groups (whose total length of short segments is similar).
  • The pattern of ROH here can be predicted by simple genetic models: the extent of random mating within populations, recombination rates across the genome, and total population size. What modern genomic technology does is provide data to test the models.


The golden age of pigmentation is yet to come

Skin color is important and interesting. It is important because people think it is important. Humans often classify each other by complexion, and it has a high social importance in many cultures.

This tendency starts at a very young age. When my children are toddlers they’ve all misidentified photographs of black American males with a medium brown complexion as their father (for example, my son recently misidentified a photograph of me that was actually the singer Pharrell). In terms of my background though, I’m 100% Eurasian in ancestry. On a PCA plot, I’m about halfway between Europeans & Near Easterners and East Asians (I have 15% East Asian ancestry so I’m more shifted to East Asians than the typical South Asian).

Skin is the largest human organ, and we are a visual species. It is an incredibly salient canvas. So it’s no surprise that we use complexion as a diagnostic marker for taxonomic purposes. The ancient Greeks correctly observed that the peoples of southern India have dark skin like Sub-Saharan Africans (“Ethiopians”), but that their hair is not woolly. Islamic commenters regularly referred to South Asians as “black crows”, while European observers of the 17th century noted that the ruling class of Indian Muslims tended to be white (i.e., mostly Turkic and Iranian in provenance) while the non-elites were black (descendants of Indian converts).*

Luckily, for a characteristic that we’re fascinated by, pigmentation has been reasonably tractable to genetics. As early as the 1950s human geneticists using classical methods of pedigree analysis predicted that pigmentation was polygenic, but that most of the variation was due to a small number of loci (see The Genetics of Human Populations). In particular, they focused on families of mixed European and African ancestry in British ports with known pedigrees.

When genomic methods came on the scene in the 2000s, pigmentation was one of the first traits that yielded positive GWAS hits as well as population genetic findings related to natural selection. In Mutants, written in the middle aughts, the author observed that there wasn’t much known about the basis of normal human variation in pigmentation. This all changed literally a year after the publication of this book. By the middle of 2006, a review paper came out with the title, A golden age of human pigmentation genetics. The reason this paper was written is that a host of studies on European populations had identified several loci which explained a substantial proportion of the intercontinental difference in pigmentation between Africans and Europeans.

Read More