« What's your mythology? | Gene Expression Front Page | Ayaan Hirsi Ali interviewed »
May 05, 2005

Measuring Genetic Diversity: Part 2

My post yesterday stopped just as I was getting to the most important bit: how diversity 'between populations' is measured. Here is the rest...

The concept of heterozygosity as a measure of diversity can be extended from a single population to two populations (or sub-divisions of a single population). [For what is meant by 'heterozygosity' in this context see the previous post, including Note 1.] Suppose we have two populations, A and B, with the same set of alleles at a given locus, but different frequencies. (I will assume in what follows that populations are of equal size, and that we are considering diversity at a single locus.) If we know the frequencies, it is easy to calculate the probability that two genes selected randomly at that locus, one from each population, are homozygous for a given allele. We can then total the probabilities for each allele and subtract the sum from 1 to get a figure for heterozygosity, just as in the case of a single population. We can call this HD, to indicate that it is heterozygosity between genes selected from the two different populations. (Warning!! some authors use the term HD for a different concept.)

Unless the frequencies for all alleles are the same in populations A and B, HD is bound to be higher than the average H within A and B separately. If we take the average frequency of an allele in the two populations, which we can call M, the frequency in one population will be M+d and in the other M-d. (The M’s and d’s may of course be different for different alleles, subject to the constraint that the M‘s must sum to 1 and the d‘s in each population must sum to 0.) The homozygosity between A and B for that allele will be (M+d)(M-d) = M^2 - d^2, so HD over all alleles will be 1 - ΣM^2 +Σd^2. The average homozygosity for an allele within the two populations will be M^2 + d^2 [see Note 4] so the average heterozygosity within the populations, over all alleles, must be 1 - ΣM^2 - Σd^2. This is 2Σd^2 less than HD; in other words, the heterozygosity between the two populations is greater by 2Σd^2 than the average heterozygosity within them. It might therefore seem natural to take 2Σd^2 as an indicator of the ’difference’, ’divergence’, ’diversity’, or ’distance’ between the two populations.

However, this is not how diversity between two populations is usually measured. Suppose we consider the two populations as subdivisions of a single larger population. The average frequency for an allele in the combined population is M, so the heterozygosity for two genes selected at random (at the same locus) within the combined population is 1 - ΣM^2. We can call this HT, (with T for ‘total’) to indicate heterozygosity in the total combined population. [See Note 5] But the average heterozygosity within the two subpopulations is 1 - ΣM^2 - Σd^2 (see previous paragraph). If we call this HW (W for ‘within‘), it will be seen that HT = HW + Σd^2. It can also be seen that HD - HT =Σd^2, which may be interpreted as the excess of heterozygosity when two genes are selected from different sub-populations, over and above its level if they are selected at random from the total population. It is natural for a geneticist to see this as analogous to the partitioning of variance for some trait into ’within-group’ and ’between-group’ components. We can therefore define between-group heterozygosity not as HD, or as HD - HW, but as HD - HT. If we call this HB (B for ’between’), then HT = HW + HB. The heterozygosities within and between groups can of course also be expressed as proportions of total heterozygosity, in the form HW/HT and HB/HT. Since HB equals HT - HW, we can also express HB/HT as (HT - HW)/HT, or as 1 - HW/HT. These are all common expressions for Masatoshi Nei’s GST, introduced in 1973, which is probably the most widely used measure of diversity ’between groups’ in population genetics. (Warning!! Different authors use different abbreviations for the various components.) GST can be calculated as Σd^2/(1 - ΣM^2). For the case of two alleles, it is equivalent to Sewall Wright’s FST, [note 6] which for two populations is d^2/pq [note 7]. Lewontin, in his original study of human genetic diversity, used a slightly different measure which produces similar results to GST or FST. GST can also be applied to cases with more than two populations, with unequal population sizes, or with repeated hierarchical subdivisions. (Nei introduced it in this general form, with a more complex derivation than I have given here for the special case of two equal populations.)

It is not wrong to use GST as an indicator of diversity between populations. However, it seems more natural to do so if we start with a focus of interest on a total population divided into many sub-populations, than if we are starting with just two populations, and want to quantify the extent of difference between them. For the latter purpose it seems more natural to compare heterozygosity between the populations with the heterozygosity within them, perhaps by calculating (HD - HW)/HW. If our primary interest is in (say) population A - as might be the case if we are members of it! - it would also be relevant to calculate (HD - HA)/HA, where HA is heterozygosity within population A. (Note that this will be different from the equivalent calculation for population B if one population is more internally uniform than the other.) This will give an indicator of the extent to which an individual in the other population is likely to be genetically different from oneself as compared with another member of one’s own population. Typically, (HD - HA)/HA will be about twice the size of GST calculated for the two populations. If on the other hand we are considering many populations, GST, calculated for all of them, will give an approximation to the average difference if we make pairwise comparisons between them. [Note 8]

It seems then that GST may be suitable for comparison of many populations, while some other measure may be more suitable if we are just comparing two. But a more fundamental problem, which applies to any measures based on a partition of heterozygosity, is that the outcome is affected by the level of diversity or uniformity within populations. To dramatise this point, suppose population A has 5 alleles at a locus, each with frequency .2, while population B has 5 completely different alleles at that locus, also with frequency .2. In the combined population there are therefore 10 alleles, each with frequency .1. GST will therefore be (10 x .01)/(1 -[10 x .01]) = .11. Thus only 11% of total heterozygosity at this locus is accounted for by the differences between the two populations. Yet in fact they are genetically completely different! In principle, they might even be separate species. To take a more realistic example, suppose there are 3 alleles with frequency 4:3:3 in one population and 8:1:1 in the other. These frequencies are markedly different, yet GST comes out at only .14. The point can even be illustrated with a 2-allele system. Suppose one population has the alleles in frequencies 7:3 and the other in frequencies 3:7. The two populations are therefore markedly different in allele frequencies, yet GST will be only .16. The underlying problem is that GST (and other measures based on comparisons of heterozygosity) are measuring not just the difference between populations but the uniformity or diversity within them. Even in a 2-allele system, it is impossible to get a high value of GST unless each population is quite uniform, with a different single allele predominant in each. When GST (or FST) in humans is compared with that of other large mammals, where GST is often higher than for humans, this does not necessarily mean that human populations are very similar to each other, so much as that other animal populations are more internally uniform. This could just reflect the small population size of many large mammals, and the consequent strength of genetic drift. A low level of GST can be due to a low level of difference between populations, a high level of diversity within them, or any combination of the two. Knowing the level of GST by itself therefore strictly tells us nothing about the extent of genetic difference between populations.

These reservations about the use of GST are not entirely new. Nei himself, in his 1973 paper, noted that ’the estimate obtained in one population cannot be compared with that of another, unless the breeding system is similar for the two populations. If HS [= my HW] is small, GST may be very large even if the absolute gene differentiation is small’. However, Nei doesn’t seem to have mentioned the converse problem that if HW is large, GST may be small even if absolute differentiation is large. Anyway, Nei’s words of caution seem to have been generally ignored, and textbook writers and others have cheerfully compiled comparative tables of GST in different species or sub-species without considering whether the breeding systems of the populations are similar. This can lead to misunderstanding even by professional biologists. Brian Charlesworth, an eminent geneticist, drew attention to the dangers in a 1999 paper, saying: ’relative measures of between-population divergence, such as FST [or GST] are inherently dependent on the extent of within-population diversity. Indeed, for loci with very high levels of diversity such as microsatellites, FST is a poor measure of between-population divergence even in the absence of forces that affect diversity, since FST is necessarily low even if absolute divergence is high…’

Although GST and FST are the most widely used measures of between-population diversity, other formulae have sometimes been proposed. Charlesworth mentions two alternatives, which in my notation are equivalent to (HD-HW)/HD and (HD-HW)/2HT. Interestingly, Wright himself (Wright, p.413) originally proposed using the square root of FST as the measure of divergence between populations, which would tend to raise the level of divergence for low-FST populations as compared with using raw FST figures. But these measures are still sensitive to within-population diversity, since HW enters into the denominator in one way or another.

In his 1973 paper Nei suggested using the average gene diversity between populations (my HD), after subtracting the average within-population diversity (my HW), as an ’absolute measure of gene differentiation’, and claimed that it is ’independent of the gene diversity within subpopulations’. I am not sure that this is correct. It may be true if we are considering cases where internal diversity within subpopulations is low (which Nei seems to have been mainly concerned with), but not when it is high. The measure can be expressed as HD - HW = 2Σd^2, or 2HB in my notation. However, 2HB is still subject to the constraint that total heterozygosity between populations (HD) cannot be greater than 1, and 2HB is HD - HW. If HW is high, 2HB must therefore be low. For example, in the 10-allele case mentioned above, it would give the result 2HB = .2, which seems unsatisfactory as a measure of ’absolute divergence’ in a case where the two populations have no alleles in common!

Nei himself had already suggested in 1972 a measure of ’genetic distance’ between two populations which seems to avoid most of the problems discussed above. This measure, Nei’s D, is based on homozygosity - the probability that two randomly selected genes at a locus are identical - rather than heterozygosity. If we express average homozygosity within population A as H’A, and within population B as H’B, while homozgyosity for genes selected one from each population is H’AB, then Nei’s D can be expressed as minus log(H’AB/√[H’A.H’B]), where log stands for the logarithm (to base e) of the expression in brackets. H’AB/√[H’A.H’B] is the homozygosity between populations A and B divided by the geometric mean of the homozygosity within the two populations. If the two populations have identical gene frequencies for all alleles, then homozygosity between the populations will be the same as within them, and H’AB/√[H’A.H’B] will be 1. Its logarithm will therefore be 0. Where the gene frequencies are not identical between the populations, H’AB will be smaller than √[H’A.H’B], so H’AB/√[H’A.H’B] will be a fraction between 0 and 1; it will be 0 if the two populations have no alleles in common, since H‘AB will then be 0. The logarithm of a fraction is a negative number. For values of the fraction greater than 1/e but less than 1 the log will be a negative fraction; for the value 1/e it will be minus 1; and for values from 1/e to 0 it will be a negative number increasing (in absolute value) by 1 for each power of 1/e in the value of the fraction. As the fraction approaches close to 0, its log therefore goes to ‘minus infinity‘. [Note 9] Since D is minus log(H’AB/√[H’A.H’B]), the minus sign converts the negative values of the log into positive ones.

While at first sight rather daunting, Nei’s D has some attractive properties. Nei himself stressed its value for studies of population structure and evolution, which I am not competent to assess. But simply as a descriptive measure of genetic difference between populations, it seems preferable to GST, as it does not seem to be seriously distorted by the extent of heterozygosity within populations. However, it does have the drawback that for values of H’AB/√[H’A.H’B] approaching 0, the value of D increases disproportionately, as it ‘goes to infinity‘.

I do wonder whether for the basic purpose of summarising genetic difference between two populations, it would not be better simply to take the sum of the absolute differences in the frequencies of alleles between them (including zero frequencies for any alleles that are absent from one population). For example, if one population has alleles a, b, c, d, and e with frequencies .3, .2, .3, .1, and .1, and the other has alleles a, b, c, and d, with frequencies .5, .3, .1, and .1 the absolute differences would be .2, .1, .2, 0, and .1. The sum of the differences would therefore be .6. Such sums could of course be averaged over several loci. The maximum range of this indicator would be from 0 (no differences at all) to 2 (no alleles in common). If it is preferred to have only values ranging from 0 to 1, the indicator could be divided by 2. I am aware that this is a very crude measure, with no sophisticated rationale in population genetics, but it is easy to calculate, and does not seem to give intuitively absurd results for any scenario I can think of.

I think the main lesson to draw is that heterozygosity does not capture everything we are interested in if we want to measure genetic diversity at population level. If its limitations are not understood, there is a danger of drawing unfounded or absurd inferences. To illustrate this, consider a remark made by Lewontin in his book on Human Diversity. Having summarised the evidence that on average 85% of human diversity is within populations, as measured by GST or similar measures, he remarks that ‘To put the matter crudely, if, after a great cataclysm, only Africans were left alive, the human species would have retained 93% of its total genetic variation, although the species as a whole would be darker skinned. If the cataclysm were even more extreme and only the Xhosa people of the southern tip of Africa survived, the human species would still retain 80% of its genetic variation. Considered in the context of the evolution of our species, this would be a trivial reduction’ (Lewontin, p. 123).

To see the fallacy in this, suppose we consider a ’population’ made up of many different animal species, ranging from ants to zebras. If each of these species has an internal average heterozygosity of .85 (which is quite possible, for large widely-ranging species), then GST calculated for the whole ‘population’ will be no higher than .15. By Lewontin’s logic, all but one of these species could be exterminated without greatly reducing ‘genetic diversity’. Not a conclusion to please the tree-huggers! This is not to say Lewontin is necessarily wrong about the human case, but his inference cannot properly be drawn solely from measurements of GST. Before saying anything definite about the importance of genetic diversity between human populations, it would be necessary to consider other measures such as Nei’s D, which more directly measure the difference in gene frequencies between them. Nei and colleagues have done this for a number of genetic markers in the major human ‘races’ (African, Asian, and Euopean) and found fairly small values for D (less than 0.1 on average), which suggests that Lewontin’s conclusion is not in fact unreasonable. (See Nei, Livshits and Ota). But it still seems undesirable that the 85/15 figure should be so widely used with so little consideration of what it actually means.

Note 4: Homozygosity in the population with p = M+d will be (M+d)^2 = M^2 + d^2 + 2Md, and in the population with p = M-d will be (M-d)^2 = M^2 + d^2 - 2Md. The average of these is M^2 + d^2.

Note 5: The sources of this heterozygosity can be analysed into three components. There is a ½ chance of selecting one gene from each subpopulation, a ¼ chance of selecting them both from the population with allele frequency M - d, and a ¼ chance of selecting them both from the population with allele frequency M + d. The probabilities of homozygosity add up to ½(M+d)(M-d) + ¼(M-d)^2 + ¼(M+d)^2 = M^2, so HT over all alleles is 1 - ΣM^2 as expected.

Note 6: The terms FST and GST are used almost interchangeably in the literature. I won’t explore the exact relationship between the two measures. It is sometimes said that they are conceptually different but quantitatively the same. Nei’s derivation is certainly clearer than Wright’s, which is closely connected to his theories of inbreeding and genetic drift. Wright’s ’F’ is his measure of inbreeding, which he interpreted as the coefficient of correlation between uniting gametes. His explanations were notoriously obscure, and according to W. G. Hill the interpretation in terms of correlation is now unfamiliar to most geneticists. Incidentally, in the 1943 paper which is usually cited as the source of FST, Wright doesn’t actually use this term.

Note 7: More generally, for two or more populations Wright’s FST can be expressed as Vp/M’pM’q, where Vp is the variance of the frequency of one of the alleles among the populations, M’p is its mean frequency, and M’q is the mean frequency of the other allele. For two populations this reduces to d^2/pq. This assumes that we are taking the variance directly from the two populations themselves. If on the other hand we are estimating the variance among a wider ensemble of populations, using the observed populations as a sample basis for the estimate, the formula would need to be adjusted to allow for the fact that the variance of a sample is usually lower than the true population variance (technically it is a ‘biased statistic‘). For a ’sample’ of only two populations, the adjustment would double the variance, and therefore also double the value of FST. This adjustment seems inappropriate if we are only interested in measuring diversity between two populations. I mention this because Cavalli-Sforza et al, p.26-7, use the adjusted formula without explaining it, and it took me some time to work out why it was different from the formula I had seen elsewhere.

Note 8: it will be exactly equal to this average if we include the zero ‘differences’ between each population and itself.

Note 9: see e.g. Fine, p.377. Jobling et al, p.168, state incorrectly that D varies between 0 and 1. They also give the formula for D incorrectly, by omitting a necessary summation sign and putting a bracket in the wrong place. Cavalli-Sforza et al, p.27, give a correct version of the formula under the heading ’Nei’s Unbiased Genetic Distance’, and call it DN. This is different from the measure called D further up on the same page.


L. Cavalli-Sforza et al.: The History and Geography of Human Genes, 1994
*B. Charlesworth: ‘Measures of divergence between populations and the effect of forces that reduce variability’, Molecular Biology and Evolution, 15, 1998, 538-43.
H. B. Fine: College Algebra (Dover edn., 1961)
*W. G. Hill, ’Sewall Wright’s ’Systems of mating’’, Genetics, 143, 1996, 1499-1506.
M. Jobling et al: Human Evolutionary Genetics, 2004
R. Lewontin: Human Diversity, 1982
*M. Nei: ‘Genetic distance between populations’, American Naturalist, 106, 1972, 283-92.
*M. Nei: ‘Analysis of gene diversity in subdivided populations’, Proc. Nat. Acad. Sci., 70, Dec 1973, pp.3321-3323.
*M. Nei, G. Livshits and T. Ota: ‘Genetic variation and evolution of human populations’, 1993, in Genetics of Cellular, Individual, Family and Population Variability, ed. C. Hanis.
Sewall Wright: ‘Isolation by distance’ in Sewall Wright: Evolution: Selected Papers, ed. William B. Provine, 1986.

Items marked * are available as free pdf downloads if you Google hard enough.

Related: Part I.

Posted by David B at 01:49 AM