One of the major issues that confuses people is that the distribution of a trait or gene is often only weakly correlated with overall phylogeny and the rest of the genome.
To give a strange but classic example, the MHC loci are subject to strong balancing selection. This means that novel alleles do not substitute and replace ancestral alleles. Substitution of this sort results in “lineage sorting,” so that when you look at chimpanzees and humans you can see many polymorphic loci where all humans carry one variant and all chimpanzees the other. In contrast at the MHC loci there is frequency-dependent selection for rare variants, so the normal cycling process does not occur. Humans and chimpanzees overlap quite a bit on MHC, and any given human may have a more similar profile to a given chimpanzee than another human.
There are 19,000 human genes. At 3 billion base pairs only about ~100 million are polymorphic on a worldwide scale (using some liberal definitions). There are lots of unique stories to tell here.
A new preprint, Inferring adaptive gene-flow in recent African history, illustrates how certain genes with functional significance may differ from genome-wide background. The authors find that among the Fula (Fulani) people of West Africa there has been introgression from a Eurasian mutation that confers lactase persistence. The area of the genome around this gene is much more Eurasian than the rest of the genome. In contrast, the area around the Duffy allele is much less Eurasian. The variation in this locus is related to malaria resistance. Finally, in other African populations, they found gene flow of MHC variants.
None of this is entirely surprising, though the authors apply novel haplotype-based methods which should have wider utility.
Very readable review, Gene Discovery for Complex Traits: Lessons from Africa. It’s open access, so I recommend it. The summary:
The genetics of African populations reveals an otherwise “missing layer” of human variation that arose between 100,000 and 5 million years ago. Both the vast number of these ancient variants and the selective pressures they survived yield insights into genes responsible for complex traits in all populations.
The main issue I might have is I’m not sure that focusing on 5 million year time spans is particularly useful. Rather, looking at the last major bottleneck for modern humans before the “Out of Africa” event would be key, since that’s when a lot of the common variation would disappear, and very rare variants probably don’t have deep time depth in any case. With all that being said, the qualitative analysis is on point.
One of the major issues in the “SNP-chip” era has been that ascertainment of variation has been skewed toward Europeans. Though more recent techniques have tried to fix this…this review points out that if you by necessity constrain the SNPs of interest to those that vary outside of Africa (most of the world’s population), you are taking may alleles private to Africa off the table. This is relevant because the “Out of Africa” bottleneck ~50,000 years ago means that African populations harbor a lot more genetic variation than non-African populations do.
The move to high-quality whole genome sequencing obviates these concerns. As a matter of course African variation will be “picked up” since the marker set is not constrained ahead of time.
Importantly the authors focus on South Africa and the Xhosa population. This group has about ~20% Khoisan genetic ancestry, which is very diverse, and, very distinct, from that of the remaining ~80% of its ancestry. With its large African immigrant population and highly diverse native groups, some of them quite admixed, South Africa could actually provide some hard-to-substitute value in biomedical genetics.