Substack cometh, and lo it is good. (Pricing)

How Chinese genetics is like Chinese food

Representatives of Szechuan and Shangdong cuisine


The Pith: The Han Chinese are genetically diverse, due to geographic scale of range, hybridization with other populations, and possibly local adaptation.

In the USA we often speak of “Chinese food.” This is rather peculiar because there isn’t any generic “Chinese cuisine.” Rather, there are regional cuisines, which share a broad family similarity. Similarly, American “Mexican food” and “Indian food” also have no true equivalent in Mexico or India (naturally the novel American culinary concoctions often exhibit biases in the regions from which they sample due to our preferences and connections; non-vegetarian Punjabi elements dominate over Udupi, while much authentic Mexican American food has a bias toward the northern states of that nation). But to a first approximation there is some sense in speaking of a general class of cuisine which exhibits a lot of internal structure and variation, so long as one understands that there is an important finer grain of categorization.

Some of the same applies to genetic categorizations. Consider two of the populations in the original HapMap, the Yoruba from Nigeria, and the Chinese from Beijing. There are ~30 million Yoruba, but over 1 billion Han Chinese! Even granting that the Yoruba seem excellent representatives of Sub-Saharan African genetic variation (not Bantu, but not far from the Bantu), there are still more Han Chinese than Sub-Saharan Africans (including the African Diaspora). So it’s nice that over the past few years there’s been a deep-dive into Han genetics. A new paper in the European Journal of Human Genetics focuses on the north-south difference among Han Chinese, using groups flanking them to their north and south as references, Natural positive selection and north–south genetic diversity in East Asia.


First, let’s back up for a moment. Who are the Han? Where did they come from? The details aren’t simple, insofar there wasn’t a “Han Homesteading Act” which pushed the frontiers of Chinese culture and civilization to a limit demarcated by a national boundary line. But overall the shift in Chinese society over the past ~3,000 years been outward from a northern focus to the south. 2,000 years ago China proper, the zone where dominant Han ethnic habitation overlapped with Chinese political hegemony, consisted primarily of the Yellow River plain. Though the Han Dynasty extended their empire south toward Vietnam the landscape was still predominantly non-Han outside of a few locales beyond the Yangtze. During the Han Dynasty even the Yangtze River basin was still somewhat liminal. This changed between the year 0 and 1000. The collapse of the Han Dynasty in the 3rd century led to what are sometimes termed the Chinese Dark Ages. During this period of political fragmentation much of northern China was dominated by barbarian dynasties, and Han political elites controlled the commanding heights only in the south. With the rise of the Tang in the 7th century the shift to the Yangtze River which had occurred in the interregnum solidified. Economically, demographically, and to some extent culturally, what during the Han Dynasty would have been defined as a zone of barbarian habitation, or marginal Han civilization, had become the center of gravity of the Sinic world by 1000. The domains of the Han by this period began to push far south of the Yangtze, and some of the most preeminent intellectuals came out of relatively isolated southern provinces such as Fujian, on the coast between the Yangtze and Pearl River deltas. In the next 1,000 years the Han spread through many sections of southern China which were previous redoubts of aboriginal peoples. Yunnan for example likely did not become majority Han until the past few centuries.

This poses a question: was this expansion of the Han a biological process, or a cultural one? It seems likely some of both. There are even customs particular to some Chinese dialect groups, such as the Cantonese, which may have a pre-Han origin. This amalgamation combined with the widespread geographic diversity of China is a perfect laboratory for evolutionary processes. In Plagues and Peoples William H. McNeill notes that demographic expansion by Han peasants (as opposed to military or bureaucratic outposts) into much of southern China during the early Imperial period was limited due to diseases. One presumes that transforming the landscape would have some mitigating effect on the power of pestilence, but admixture and selection may also have allowed the biologically inoculated Han to occupy areas which were previously no-go.

Here’s the abstract of the paper:

Recent reports have identified a north–south cline in genetic variation in East and South-East Asia, but these studies have not formally explored the basis of these clinical differences. Understanding the origins of these variations may provide valuable insights in tracking down the functional variants in genomic regions identified by genetic association studies. Here we investigate the genetic basis of these differences with genome-wide data from the HapMap, the Human Genome Diversity Project and the Singapore Genome Variation Project. We implemented four bioinformatic measures to discover genomic regions that are considerably differentiated either between two Han Chinese populations in the north and south of China, or across 22 populations in East and South-East Asia. These measures prioritized genomic stretches with: (i) regional differences in the allelic spectrum for SNPs common to the two Han Chinese populations; (ii) differential evidence of positive selection between the two populations as quantified by integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH); (iii) significant correlation between allele frequencies and geographical latitudes of the 22 populations. We also explored the extent of linkage disequilibrium variations in these regions, which is important in combining genetic association studies from North and South Chinese. Two of the regions that emerged are found in HLA class I and II, suggesting that the HLA imputation panel from the HapMap may not be directly applicable to every Chinese sample. This has important implications to autoimmune studies that plan to impute the classical HLA alleles to fine map the SNP association signals.

The authors do not focus on phylogenetic relationships and the historical inferences one can make from them much. For example they don’t posit any complex migration scenario to explain the pattern of genetic substructure in China today. Instead the spotlight is on differences in allele frequencies which seem outside of the normal expectation, and so might have been targets of selection. To frame that appropriately in a phylogenetic context they pooled a wide range of data sets together (HGDP, HapMap, SVGP) and generated a PCA which illustrates the relationships of East Asian populations on a two dimensional plot. The figure is rather hard to make out because of similarities in color coding, but the basic result is shown to the left. You see a north-south axis within China, and some separation from groups to the north and south. Interestingly some Chinese ethnic minorities are within the range of variation of the Han. There are many reasons this could be. They might have been already nested within the original Han range of variation before the demographic expansion of the latter. There could have been extensive gene flow between the Han and minorities, in particular in the direction of the latter if the Han were far more numerous. And of course many Han dialect groups could simply be culturally assimilated minorities if you go back far enough. A combination of these with various weights in different contexts is certainly the best approximation to what occurred. Pure replacement and pure cultural diffusion seems untenable as a robust explanation. Additionally, the best check for the relationship between Han and minorities is to look for the differences within the same province. So Han from Yunnan should be cross-referenced with ethnic minorities from the same locale, instead of Han from Guangdong being proxies for “South Chinese.” I suspect that the gap between the Dai and the southern Chinese is partially an artifact of undersampling Han from those particular isolated regions of China where they live cheek-by-jowl with Dai.

But the rationale for this paper was to shine a light on the effects of natural selection on the Han genome and possible adaptations, not the systematics of East Asian human populations. As noted in the abstract they used several methods to get at this issue. They looked to see the correlation between allele frequencies and latitude. The logic presumably being that latitude is correlated with climate and other geographical parameters which serve as environmental selection pressures. All things equal northern climes for example will have fewer pathogens and parasites. Consider the value of a frost season in killing many surface soil organisms. Second they also looked at differences in Fst between Han of the north and Han of the south. Fst is a measure of between population genetic differences. As it converges upon zero there’s basically no difference between the populations in question, while a value of 1.0 would indicate that all the variation is partitioned across the two groups so that you could use a marker to perfectly distinguish membership in a population for an individual. The authors had an average difference between north and south Han in mind, and looked for genomic regions where the differences were far greater than expectation. They also looked at the contribution of a given SNP to the variation you saw illustrated in the PCA. Big contributions to the inter-population variation obviously indicate differences across populations. Finally, they also looked at haplotype structure as a signature of natural selection. While Fst focuses on specific points in the genome, haplotype structure elucidates patterns across genes, sequences of markers. Natural selection tends to homogenize genomic regions temporarily as a particular variant rises in frequency and drags along its neighbors in a selective sweep hitchhike. The two methods they used have different powers to detect selective events; iHS is better at catching sweeps in mid-stream, where allele frequencies are not fixed. XP-EHH on the other hand picks up nearly completed sweeps. These two methods complement each other and rely on similar logic. Again, like Fst the authors focused on regions of the genome which were at the tails of the expected distribution given pairs of populations with the genetic distances which one sees across the total genome.

What did they find? Here’s a table which shows you some genes:

MAF latitude corFST(CHB vs CHS)XP-EHHiHS (CHB)iHS (CHS)SNP loadingsGenes
2.1 × 10−5(rs6901084)0.50%0.5% (positive)0.01%0.01%0.10%HLA-DRB1, HLA-DQA1-2, HLA-DOB, PSMB9, BRD2, TAP2, PSMB8, TAP1, HLA-DMB, HLA-DMA, HLA-DOA
2.0 × 10−4(rs4489283)No evidence0.5% (positive)0.50%0.50%0.10%NRG1
6.6 × 10−5(rs2370969)No evidence0.1% (negative)0.50%0.10%0.10%WDR48, GORASP1, TTC21A, AXUD1, CMYA1, CX3CR1, CCR8, SLC25A38, LAMR1, MOBP
9.3 × 10−4(rs6762261)No evidenceNo evidence0.10%0.50%0.50%EPHB1
9.5 × 10−4(rs986148)No evidence0.1% (positive)0.10%NA

The first thing that jumps out at me is HLA. These genes are involved in immune response, and are extremely polymorphic. If you’re going to see regional differences correlated with ecology, this is where you’d look. The expansion of the Han to the south of China was probably accompanied by changes in the type of immunological portfolio which was the norm among the peasants. It isn’t in this table, but other genes found at the intersection of tests are LPP and ADH. The former has been implicated in celiac disease, while the latter is an alcohol dehydrogenase locus. When it comes to natural selection disease matters a lot, but so does digestion. I don’t have a good explanation for the patterns here, but there are differences in cuisine within China. Rice is dominant in the center and south, while wheat and millet dominate the north. I would be interesting to know if there are also variations in alcohol production and consumption. China is in many ways equivalent to Europe, and there are differences between north and south in ADH and cultural norms in the amount and nature of alcohol consumption. Finally you have something like NRG1, which seems to be a locus of neurological function. This doesn’t exhibit difference across the two Han classes, but seems to have been the target of natural selection within the overall population. Perhaps the social norms of the culture and society of Han China reshaped the personality profiles of the population?

Going back to the analogy with cuisine: like food the components and elements of genetic variation are shaped by different forces. Modern Italian cuisine for example has a dependence upon the basic elements which were common in Italy 2,000 years ago (e.g., olive oil), but it has changed a great deal with the Columbian Exchange (e.g., tomatoes). Descent shapes the possibilities of future culinary options by fixing some constraints and preferences (traditional Jewish food is light on shellfish!). But over time new variants can arise and alter the original base. Additionally, there are local adaptations. The Cajuns are descended from Acadians, from the maritime provinces of Canada. Obviously spicy crayfish concoctions were not part of their original culinary portfolio, but they had to make due with the options that they had in their new ecology. There’s a strong correlation between warmer climes and spice, probably having to do with the anti-bacterial properties of many of these non-nutritious additives. (from what I know South Indian and South Chinese cuisines are both much spicier than North Indian and North Chinese fare). Within any broad family of cuisines one must acknowledge both the unity and diversity. And the same applies within a cultural-genetic macro-region on the scale of China.

Image credit: Rolf Muller

Posted in Uncategorized

Comments are closed.