The 1000 Genomes paper is out, A global reference for human genetic variation. It’s open access, read the whole thing. Here’s the abstract:
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
The PSMC above is interesting to me. It shows BEB, the Bengali population form Dhaka, starting from a small base and exploding in size. There are some issues relating to ascertainment that need to be admitted here though. The Indian Gujurati sample turns out to be about half Patel, and half other Gujuratis. In contrast, the Bengalis are relatively homogeneous in ancestry (sampled from Dhaka), and don’t seem to exhibit much population structure. What I’m saying is that when the authors talk about “Gujuratis” they are really talking about “sort of Patels”, while when they talk about Bengalis, they are talking about Bengalis as a whole. There’s an apples-to-oranges aspect to this. It also needs to be kept in mind when they note the alleles private to the Gujurati (GIH) sample; that’s almost certainly due to the large number of endogamous Patels in the original Houston data set who are going to share a lot more demographic history than you’d otherwise expect among Gujuratis.
Secondly, the bottleneck + genetic homogeneity in the admixture for Bengalis reinforces the model outlined in The Rise of Islam and the Bengal Frontier, 1204-1760. Basically the population size change above highlights that eastern Bengalis descend from a small group of founders relatively recently in the past, despite their >100 million modern census size. Genetically this has resulted in the ancestral homogeneity you see in the plot above, but culturally it also allowed for the degrading of the social institutions of Indian society which allowed Hinduism to be robust to nearly one thousand years of Islamic hegemony across the subcontinent. Additionally, the lack of structure in ancestral components reflects relatively little endogamy (I have checked the runs of homozygosity in my parents’ genotypes, and they’re lower than those of my South Asian friends from particular caste/jati backgrounds).