« April 24, 2005 - April 30, 2005 | Main | May 08, 2005 - May 14, 2005 »

May 07, 2005

Dr Tatiana

I watched Dr Tatiana's Sex Advice To All Creation this week on British TV (Channel 4). It was a co-production with Discovery Channel Canada , and I think it is scheduled to appear in Canada soon. It isn't clear if it will be shown in the US - the sexual contents and language may be too strong for middle-America.

I quite enjoyed the series - even the musical intervals - but I doubt that I would have learned anything from it if I knew nothing about the subject to begin with. I guess it will increase sales for Olivia Judson's book, which may have been the main object of the exercise from her point of view.

Apart from the wildly gimmicky presentation, I found Olivia Judson's voice distracting. She has the poshest voice I've heard on British TV since Hugh Laurie played Bertie Wooster. She's so posh she makes the Duchess of Devonshire sound common. Her voice sounded a bit strained, and I wondered if she is really hiding the dark secret that she is - American! Well, her father is American, her brother is at MIT, and she took her first degree at Stanford, so I think she must have spent a lot of time in America - was she even born there? Maybe she is overcompensating for it with an exaggerated English accent. Just a suggestion...

Posted by David B at 05:07 AM | | TrackBack

May 05, 2005

South Park Epicycles

Because of the reference in Frank Rich's column1 I decided to watch the March 30th episode of South Park, Best Friend's Forever. It was hilarious, as I'd assume it would be. But, aside from the Schiavoesque plot-line, I found the references to nerd martial culture extremely amusing. I assume that the non-nerd audience could make out the obvious mimicry of the LotR films (they make it explicit by the end), but how many people would catch the Ender's Game allusion? Details were altered, but it seems pretty clear that Kenny was Ender Wiggin (was the soulless Japanese kid Bean?). Anyway, for me at least, as a mid-level nerd, the genius of South Park is that it's rather like classic children's literature: for the kids like Frank Rich the colorful and simplistic Schiavoesque plot-line catches their attention and bombards their senses with naked allusions to recent cultural touchstones, while for adult nerds there are subtle references to powerful motifs of the geek subculture which nourishes the mature mind with greater richness and depth.

1 - I have never beek a fan of the whole "South Park Conservative" weirdness. It seems kind of like shitting in a public space, and declaring it sanitary. So, while I rarely read political columns, this time I had to check it out. It seems to detach the term conservative from any social or historical context and transforms it into a plain dictionary definition, a defense of the social status quo, from the Bush twins to Federline Yo. Michelle Malkin strikes at the South Park Conservatives from the Right, and it is an especially hard hit since she herself has been labelled as such by the creator of the concept.

Posted by razib at 09:12 PM | | TrackBack

$5000 Genome Sequencing

Randall Parker has an interesting post on “$5000 DNA Sequencing By 2007”.

The Haplotype Map shows SNP’s shared by many people. Whole genome sequencing will show SNP’s that are unique to individuals. The entire human population could be viewed as an experiment to understand how genes influence outcome in different environments. Gene function is often tested by inducing mutations in yeast or fruit flies and then observing the affect. Naturally occurring mutations would allow the same discovery process using the human populace.

Posted by fly at 01:34 PM | | TrackBack

Glazer on ethno-medicine

Nathan Glazer, the pro-affirmative action and multi-culturalism neoconservative, has a piece in The New Republic titled New Blood which discusses medicine and ethnicity. Of course, it is behind the subscription wall and there is no preview, so I have no idea where he comes down or what he says.

Posted by razib at 10:40 AM | | TrackBack

Ayaan Hirsi Ali interviewed

Ayaan Hirsi Ali is being interviewed on NPR's Talk of the Nation (in about 5 minutes as I type this). The archive should be available by 6 PM EDT.

Update: Just finished listening to the interview. A.H. Ali was a rather soft-spoken individual, and in 20 minutes she managed to stay on message and remain focused. One, obviously enraged, caller asserted that some individuals ("usually ladies") have been confused and distorted by their personal history and neglect the reality that true Islam is the most liberating of all religions to women. Another caller, a woman, rather moderate, contended that the problem was not Islam but the cultures. In terms of the first viewpoint, I would assert there is an element of 1 + 1 = 4 in these assertions, obviously they are either denying reality, or redefining the character "1" to be what we would term "2." Ali responded that if the true Islam is about equal rights, than Muslims and Islam should "prove it" by putting it into practice. On the second point she tried to be careful to distinguish traditional local Muslim traditions from the universalist Islamism that is spreading across borders.

In my post below I contended that mythogies are derived from "public representations." The problem with representations is that communication of the concepts we have in our mind to other human beings are often only propogated as resemblences, not exact replicas (aside from precise mathematical concepts replicas are very rare indeed). Often public representations are reinforced by visual depiction as well as literary reproduction (consider the visual motifs dominant in Catholicism which supplement the homilies of the priesthood and the text of the Bible). When people try to assert that A is really not the way that someone else is defining A it gets into the problem that both individuals have their own conceptions of what the other person conceives, and both perceptions are almost certainly imperfect (consider the degeneration of comment threads as posters begin to start debating phantoms of their own perception of the other's argument than the argument at hand). Fixation on axiomatic definitions can help to get around such things, but often the definitions make the term useless in practical application. From a non-religious perspective all this basically means that Ayaan Hirsi Ali's response that Muslims need to "prove" the character of Islam through their deeds is the only practical way to make heads and tails of the problem of Muslim integration into European society. Trying to figure out the basis of "true Islam" won't get us anywhere.

Addendum: One thing that often happens with religious mythologies is that the fact that those who espouse them strongly believe in their truth tends to also distort the debate. Consider a comment on Sepia Mutiny, a browno-centric American weblog, from a few weeks ago. The commentor states, "Also another subtle fact that you fail to realise is that a lot of the "conservative values" that desi Muslims hold are actually expressed not because of Islamic doctrine but also out of desi values....There is no hadiths or Quranic sura sating this but I beleive this is found in Manusmriti." There are few points here, the commentor, obviously a believing Muslim is basically trying to fob off some of the behaviors of Muslim brown individuals on the fact that Hinduism has within it some retrograde practices and edicts, and brown Muslims are derived from the same South Asian substrate where Hinduism is ubiquitous (the Manusmriti is a religious code of ethincs and practice devised by the Hindu sage Manu). This is, as far as it goes, not incorrect in that many South Asian Muslims do unwittingly internalize Hindu values and outlooks without thinking about it. The most glaring example for me is that in my experience at mosque the South Asians had the most particular attitude toward what was halal or not, a clear reflection I think of the numerous food taboos within South Asian non-Muslim culture (the higher your social status within Hinduism, the greater the number you food taboos, usually).

Nevertheless, there is a tendency for Muslims of somewhat modernist bent to simply deny that misogyny is any part of Islam, and somehow it seems that Islamic "culture" was derived wholly from retrograde Byzantine, Persian or South Asian sources. It is as if Islam is simply nothing more than the five pillars at the essential Sunnah and Hadith, a plain set of axioms to live one's life by. This sort of conception of religion is highly sterile, and rathe ahistorical. But, it gives one the option of simply defining away unpleasant historical or social truths. Some evangelical Protestant Christians engage in the same sort of redefinition, they will deny that Christianity is a religion at all, it is simply belief in Christ, a sui generis state totally absent any other implications or complications.

For any discourse to continue this sort of redefinition and ceding the high ground of semantics to true believers needs to stop. In the American context I know that many people of pluralist and tolerant bent are inclined to giving Muslims the pass when it comes to assertions that it is not Islam, but the "culture," which exhibits retrograde tendencies. In constrast the same liberties are not extended to conservative Christians (whose practices tend to more mild when viewed through a critical socially liberal angle). Nevertheless, conservative Christians engage in the same tendency, they are clear about the difference between True Christianity, and the distortions that plagued the faith after its early period (radical Protestant sects are aided in their denial of the legitimacy of the Roman Catholic period of dominance). Like Muslims I have even heard evangelical Christians simply ascribe negative coercive patterns in medieval faith to the Roman or Greek pagan heritage (making analogies to the Christian persecutions as evidence of the intolerance of those cultures toward dissent).

Culture can not be easily diced and sliced, religion can not be hived off into a separate cognitive cell with an assertion or two. With caution and proper research I do believe various strands of a culture can be teased apart and examined in isolation, but the pattern of the modern debate is far too flippant and tends to be subordinated toward the ideologically biased truth values espoused by the parties concerned.

Posted by razib at 10:24 AM | | TrackBack

Measuring Genetic Diversity: Part 2

My post yesterday stopped just as I was getting to the most important bit: how diversity 'between populations' is measured. Here is the rest...

The concept of heterozygosity as a measure of diversity can be extended from a single population to two populations (or sub-divisions of a single population). [For what is meant by 'heterozygosity' in this context see the previous post, including Note 1.] Suppose we have two populations, A and B, with the same set of alleles at a given locus, but different frequencies. (I will assume in what follows that populations are of equal size, and that we are considering diversity at a single locus.) If we know the frequencies, it is easy to calculate the probability that two genes selected randomly at that locus, one from each population, are homozygous for a given allele. We can then total the probabilities for each allele and subtract the sum from 1 to get a figure for heterozygosity, just as in the case of a single population. We can call this HD, to indicate that it is heterozygosity between genes selected from the two different populations. (Warning!! some authors use the term HD for a different concept.)

Unless the frequencies for all alleles are the same in populations A and B, HD is bound to be higher than the average H within A and B separately. If we take the average frequency of an allele in the two populations, which we can call M, the frequency in one population will be M+d and in the other M-d. (The M’s and d’s may of course be different for different alleles, subject to the constraint that the M‘s must sum to 1 and the d‘s in each population must sum to 0.) The homozygosity between A and B for that allele will be (M+d)(M-d) = M^2 - d^2, so HD over all alleles will be 1 - ΣM^2 +Σd^2. The average homozygosity for an allele within the two populations will be M^2 + d^2 [see Note 4] so the average heterozygosity within the populations, over all alleles, must be 1 - ΣM^2 - Σd^2. This is 2Σd^2 less than HD; in other words, the heterozygosity between the two populations is greater by 2Σd^2 than the average heterozygosity within them. It might therefore seem natural to take 2Σd^2 as an indicator of the ’difference’, ’divergence’, ’diversity’, or ’distance’ between the two populations.

However, this is not how diversity between two populations is usually measured. Suppose we consider the two populations as subdivisions of a single larger population. The average frequency for an allele in the combined population is M, so the heterozygosity for two genes selected at random (at the same locus) within the combined population is 1 - ΣM^2. We can call this HT, (with T for ‘total’) to indicate heterozygosity in the total combined population. [See Note 5] But the average heterozygosity within the two subpopulations is 1 - ΣM^2 - Σd^2 (see previous paragraph). If we call this HW (W for ‘within‘), it will be seen that HT = HW + Σd^2. It can also be seen that HD - HT =Σd^2, which may be interpreted as the excess of heterozygosity when two genes are selected from different sub-populations, over and above its level if they are selected at random from the total population. It is natural for a geneticist to see this as analogous to the partitioning of variance for some trait into ’within-group’ and ’between-group’ components. We can therefore define between-group heterozygosity not as HD, or as HD - HW, but as HD - HT. If we call this HB (B for ’between’), then HT = HW + HB. The heterozygosities within and between groups can of course also be expressed as proportions of total heterozygosity, in the form HW/HT and HB/HT. Since HB equals HT - HW, we can also express HB/HT as (HT - HW)/HT, or as 1 - HW/HT. These are all common expressions for Masatoshi Nei’s GST, introduced in 1973, which is probably the most widely used measure of diversity ’between groups’ in population genetics. (Warning!! Different authors use different abbreviations for the various components.) GST can be calculated as Σd^2/(1 - ΣM^2). For the case of two alleles, it is equivalent to Sewall Wright’s FST, [note 6] which for two populations is d^2/pq [note 7]. Lewontin, in his original study of human genetic diversity, used a slightly different measure which produces similar results to GST or FST. GST can also be applied to cases with more than two populations, with unequal population sizes, or with repeated hierarchical subdivisions. (Nei introduced it in this general form, with a more complex derivation than I have given here for the special case of two equal populations.)

It is not wrong to use GST as an indicator of diversity between populations. However, it seems more natural to do so if we start with a focus of interest on a total population divided into many sub-populations, than if we are starting with just two populations, and want to quantify the extent of difference between them. For the latter purpose it seems more natural to compare heterozygosity between the populations with the heterozygosity within them, perhaps by calculating (HD - HW)/HW. If our primary interest is in (say) population A - as might be the case if we are members of it! - it would also be relevant to calculate (HD - HA)/HA, where HA is heterozygosity within population A. (Note that this will be different from the equivalent calculation for population B if one population is more internally uniform than the other.) This will give an indicator of the extent to which an individual in the other population is likely to be genetically different from oneself as compared with another member of one’s own population. Typically, (HD - HA)/HA will be about twice the size of GST calculated for the two populations. If on the other hand we are considering many populations, GST, calculated for all of them, will give an approximation to the average difference if we make pairwise comparisons between them. [Note 8]

It seems then that GST may be suitable for comparison of many populations, while some other measure may be more suitable if we are just comparing two. But a more fundamental problem, which applies to any measures based on a partition of heterozygosity, is that the outcome is affected by the level of diversity or uniformity within populations. To dramatise this point, suppose population A has 5 alleles at a locus, each with frequency .2, while population B has 5 completely different alleles at that locus, also with frequency .2. In the combined population there are therefore 10 alleles, each with frequency .1. GST will therefore be (10 x .01)/(1 -[10 x .01]) = .11. Thus only 11% of total heterozygosity at this locus is accounted for by the differences between the two populations. Yet in fact they are genetically completely different! In principle, they might even be separate species. To take a more realistic example, suppose there are 3 alleles with frequency 4:3:3 in one population and 8:1:1 in the other. These frequencies are markedly different, yet GST comes out at only .14. The point can even be illustrated with a 2-allele system. Suppose one population has the alleles in frequencies 7:3 and the other in frequencies 3:7. The two populations are therefore markedly different in allele frequencies, yet GST will be only .16. The underlying problem is that GST (and other measures based on comparisons of heterozygosity) are measuring not just the difference between populations but the uniformity or diversity within them. Even in a 2-allele system, it is impossible to get a high value of GST unless each population is quite uniform, with a different single allele predominant in each. When GST (or FST) in humans is compared with that of other large mammals, where GST is often higher than for humans, this does not necessarily mean that human populations are very similar to each other, so much as that other animal populations are more internally uniform. This could just reflect the small population size of many large mammals, and the consequent strength of genetic drift. A low level of GST can be due to a low level of difference between populations, a high level of diversity within them, or any combination of the two. Knowing the level of GST by itself therefore strictly tells us nothing about the extent of genetic difference between populations.

These reservations about the use of GST are not entirely new. Nei himself, in his 1973 paper, noted that ’the estimate obtained in one population cannot be compared with that of another, unless the breeding system is similar for the two populations. If HS [= my HW] is small, GST may be very large even if the absolute gene differentiation is small’. However, Nei doesn’t seem to have mentioned the converse problem that if HW is large, GST may be small even if absolute differentiation is large. Anyway, Nei’s words of caution seem to have been generally ignored, and textbook writers and others have cheerfully compiled comparative tables of GST in different species or sub-species without considering whether the breeding systems of the populations are similar. This can lead to misunderstanding even by professional biologists. Brian Charlesworth, an eminent geneticist, drew attention to the dangers in a 1999 paper, saying: ’relative measures of between-population divergence, such as FST [or GST] are inherently dependent on the extent of within-population diversity. Indeed, for loci with very high levels of diversity such as microsatellites, FST is a poor measure of between-population divergence even in the absence of forces that affect diversity, since FST is necessarily low even if absolute divergence is high…’

Although GST and FST are the most widely used measures of between-population diversity, other formulae have sometimes been proposed. Charlesworth mentions two alternatives, which in my notation are equivalent to (HD-HW)/HD and (HD-HW)/2HT. Interestingly, Wright himself (Wright, p.413) originally proposed using the square root of FST as the measure of divergence between populations, which would tend to raise the level of divergence for low-FST populations as compared with using raw FST figures. But these measures are still sensitive to within-population diversity, since HW enters into the denominator in one way or another.

In his 1973 paper Nei suggested using the average gene diversity between populations (my HD), after subtracting the average within-population diversity (my HW), as an ’absolute measure of gene differentiation’, and claimed that it is ’independent of the gene diversity within subpopulations’. I am not sure that this is correct. It may be true if we are considering cases where internal diversity within subpopulations is low (which Nei seems to have been mainly concerned with), but not when it is high. The measure can be expressed as HD - HW = 2Σd^2, or 2HB in my notation. However, 2HB is still subject to the constraint that total heterozygosity between populations (HD) cannot be greater than 1, and 2HB is HD - HW. If HW is high, 2HB must therefore be low. For example, in the 10-allele case mentioned above, it would give the result 2HB = .2, which seems unsatisfactory as a measure of ’absolute divergence’ in a case where the two populations have no alleles in common!

Nei himself had already suggested in 1972 a measure of ’genetic distance’ between two populations which seems to avoid most of the problems discussed above. This measure, Nei’s D, is based on homozygosity - the probability that two randomly selected genes at a locus are identical - rather than heterozygosity. If we express average homozygosity within population A as H’A, and within population B as H’B, while homozgyosity for genes selected one from each population is H’AB, then Nei’s D can be expressed as minus log(H’AB/√[H’A.H’B]), where log stands for the logarithm (to base e) of the expression in brackets. H’AB/√[H’A.H’B] is the homozygosity between populations A and B divided by the geometric mean of the homozygosity within the two populations. If the two populations have identical gene frequencies for all alleles, then homozygosity between the populations will be the same as within them, and H’AB/√[H’A.H’B] will be 1. Its logarithm will therefore be 0. Where the gene frequencies are not identical between the populations, H’AB will be smaller than √[H’A.H’B], so H’AB/√[H’A.H’B] will be a fraction between 0 and 1; it will be 0 if the two populations have no alleles in common, since H‘AB will then be 0. The logarithm of a fraction is a negative number. For values of the fraction greater than 1/e but less than 1 the log will be a negative fraction; for the value 1/e it will be minus 1; and for values from 1/e to 0 it will be a negative number increasing (in absolute value) by 1 for each power of 1/e in the value of the fraction. As the fraction approaches close to 0, its log therefore goes to ‘minus infinity‘. [Note 9] Since D is minus log(H’AB/√[H’A.H’B]), the minus sign converts the negative values of the log into positive ones.

While at first sight rather daunting, Nei’s D has some attractive properties. Nei himself stressed its value for studies of population structure and evolution, which I am not competent to assess. But simply as a descriptive measure of genetic difference between populations, it seems preferable to GST, as it does not seem to be seriously distorted by the extent of heterozygosity within populations. However, it does have the drawback that for values of H’AB/√[H’A.H’B] approaching 0, the value of D increases disproportionately, as it ‘goes to infinity‘.

I do wonder whether for the basic purpose of summarising genetic difference between two populations, it would not be better simply to take the sum of the absolute differences in the frequencies of alleles between them (including zero frequencies for any alleles that are absent from one population). For example, if one population has alleles a, b, c, d, and e with frequencies .3, .2, .3, .1, and .1, and the other has alleles a, b, c, and d, with frequencies .5, .3, .1, and .1 the absolute differences would be .2, .1, .2, 0, and .1. The sum of the differences would therefore be .6. Such sums could of course be averaged over several loci. The maximum range of this indicator would be from 0 (no differences at all) to 2 (no alleles in common). If it is preferred to have only values ranging from 0 to 1, the indicator could be divided by 2. I am aware that this is a very crude measure, with no sophisticated rationale in population genetics, but it is easy to calculate, and does not seem to give intuitively absurd results for any scenario I can think of.

I think the main lesson to draw is that heterozygosity does not capture everything we are interested in if we want to measure genetic diversity at population level. If its limitations are not understood, there is a danger of drawing unfounded or absurd inferences. To illustrate this, consider a remark made by Lewontin in his book on Human Diversity. Having summarised the evidence that on average 85% of human diversity is within populations, as measured by GST or similar measures, he remarks that ‘To put the matter crudely, if, after a great cataclysm, only Africans were left alive, the human species would have retained 93% of its total genetic variation, although the species as a whole would be darker skinned. If the cataclysm were even more extreme and only the Xhosa people of the southern tip of Africa survived, the human species would still retain 80% of its genetic variation. Considered in the context of the evolution of our species, this would be a trivial reduction’ (Lewontin, p. 123).

To see the fallacy in this, suppose we consider a ’population’ made up of many different animal species, ranging from ants to zebras. If each of these species has an internal average heterozygosity of .85 (which is quite possible, for large widely-ranging species), then GST calculated for the whole ‘population’ will be no higher than .15. By Lewontin’s logic, all but one of these species could be exterminated without greatly reducing ‘genetic diversity’. Not a conclusion to please the tree-huggers! This is not to say Lewontin is necessarily wrong about the human case, but his inference cannot properly be drawn solely from measurements of GST. Before saying anything definite about the importance of genetic diversity between human populations, it would be necessary to consider other measures such as Nei’s D, which more directly measure the difference in gene frequencies between them. Nei and colleagues have done this for a number of genetic markers in the major human ‘races’ (African, Asian, and Euopean) and found fairly small values for D (less than 0.1 on average), which suggests that Lewontin’s conclusion is not in fact unreasonable. (See Nei, Livshits and Ota). But it still seems undesirable that the 85/15 figure should be so widely used with so little consideration of what it actually means.

Note 4: Homozygosity in the population with p = M+d will be (M+d)^2 = M^2 + d^2 + 2Md, and in the population with p = M-d will be (M-d)^2 = M^2 + d^2 - 2Md. The average of these is M^2 + d^2.

Note 5: The sources of this heterozygosity can be analysed into three components. There is a ½ chance of selecting one gene from each subpopulation, a ¼ chance of selecting them both from the population with allele frequency M - d, and a ¼ chance of selecting them both from the population with allele frequency M + d. The probabilities of homozygosity add up to ½(M+d)(M-d) + ¼(M-d)^2 + ¼(M+d)^2 = M^2, so HT over all alleles is 1 - ΣM^2 as expected.

Note 6: The terms FST and GST are used almost interchangeably in the literature. I won’t explore the exact relationship between the two measures. It is sometimes said that they are conceptually different but quantitatively the same. Nei’s derivation is certainly clearer than Wright’s, which is closely connected to his theories of inbreeding and genetic drift. Wright’s ’F’ is his measure of inbreeding, which he interpreted as the coefficient of correlation between uniting gametes. His explanations were notoriously obscure, and according to W. G. Hill the interpretation in terms of correlation is now unfamiliar to most geneticists. Incidentally, in the 1943 paper which is usually cited as the source of FST, Wright doesn’t actually use this term.

Note 7: More generally, for two or more populations Wright’s FST can be expressed as Vp/M’pM’q, where Vp is the variance of the frequency of one of the alleles among the populations, M’p is its mean frequency, and M’q is the mean frequency of the other allele. For two populations this reduces to d^2/pq. This assumes that we are taking the variance directly from the two populations themselves. If on the other hand we are estimating the variance among a wider ensemble of populations, using the observed populations as a sample basis for the estimate, the formula would need to be adjusted to allow for the fact that the variance of a sample is usually lower than the true population variance (technically it is a ‘biased statistic‘). For a ’sample’ of only two populations, the adjustment would double the variance, and therefore also double the value of FST. This adjustment seems inappropriate if we are only interested in measuring diversity between two populations. I mention this because Cavalli-Sforza et al, p.26-7, use the adjusted formula without explaining it, and it took me some time to work out why it was different from the formula I had seen elsewhere.

Note 8: it will be exactly equal to this average if we include the zero ‘differences’ between each population and itself.

Note 9: see e.g. Fine, p.377. Jobling et al, p.168, state incorrectly that D varies between 0 and 1. They also give the formula for D incorrectly, by omitting a necessary summation sign and putting a bracket in the wrong place. Cavalli-Sforza et al, p.27, give a correct version of the formula under the heading ’Nei’s Unbiased Genetic Distance’, and call it DN. This is different from the measure called D further up on the same page.


L. Cavalli-Sforza et al.: The History and Geography of Human Genes, 1994
*B. Charlesworth: ‘Measures of divergence between populations and the effect of forces that reduce variability’, Molecular Biology and Evolution, 15, 1998, 538-43.
H. B. Fine: College Algebra (Dover edn., 1961)
*W. G. Hill, ’Sewall Wright’s ’Systems of mating’’, Genetics, 143, 1996, 1499-1506.
M. Jobling et al: Human Evolutionary Genetics, 2004
R. Lewontin: Human Diversity, 1982
*M. Nei: ‘Genetic distance between populations’, American Naturalist, 106, 1972, 283-92.
*M. Nei: ‘Analysis of gene diversity in subdivided populations’, Proc. Nat. Acad. Sci., 70, Dec 1973, pp.3321-3323.
*M. Nei, G. Livshits and T. Ota: ‘Genetic variation and evolution of human populations’, 1993, in Genetics of Cellular, Individual, Family and Population Variability, ed. C. Hanis.
Sewall Wright: ‘Isolation by distance’ in Sewall Wright: Evolution: Selected Papers, ed. William B. Provine, 1986.

Items marked * are available as free pdf downloads if you Google hard enough.

Related: Part I.

Posted by David B at 01:49 AM | | TrackBack

What's your mythology?

I had a friendly chat today with an individual who mused upon whether civilizations can survive without mythologies, and wondered if American civilization had a mythology. My first thought was to ask, "What's your mythology?" My more elaborate response was that first, all humans have their own mythology,1 and civilizations are constructed from human building blocks, ergo, they will also by their nature exhibit mythologies. Civilizational mythologies will of course differ from personal mythologies, in that while the former are public representations, the latter are private. While the former are generally (though not always) invariant over a generation, or perhaps even multiple generations, the latter may shift and morph over the course of an individual's life. Nevertheless, there is a dynamic interaction between public and private mythologies which I believe we implicitly assume but rarely explicitly state. I believe that many intellectuals stumble upon the reality that the reaction flows in both directions and attempt to fix the causal arrow in one direction (inferring individuals character from public mythology or assuming that public mythology is simply a reflection of individual character).

Mythologies can tell you something about the self-perception of a collective body of people, because they are often expressed as depictions of the ultimate individual apotheosis of a given culture. A Gilgamesh, Herakles or Indra.2 Individual mythologies can also you tell something about someone.

So I will offer the first ten personages who populate my own personal mythology (the first ten who come to mind, so it will have only a rough aproximation with "true" rank order if I repeated this exercise) . Spinoza, Darwin, Aristotle, R.A. Fisher, J.B.S. Haldane, Marcus Aurelius, Antoine Lavoisier, John Stuart Mill, Isaac Newton and Archimedes. I wil refrain from elucidating my personal mythology in any conceptual fashion (it is obvious that the names above are actors in broad expanse of my imaginings) because as a private representation it is so self-referential that I don't think that I can communicate how I perceive it to be with any degree of high fidelity beyond banal generalities which you can already intuit from the list above (or my writings as a whole).

1 - I excuse the extremely autistic or mentally retarded. Also, note that my use of the term "mythology" is rather liberal.

2 - In modern times evangelical Christian mythology tends to focus on Jesus as the apotheosis, while, not to be blasphemous, Muslims do the same to Muhammed. But this tendency is not limited to religion, Lenin was once the apotheosis of the Soviet Man, while Mao is still to some extent revered by the Chinese.

Posted by razib at 01:44 AM | | TrackBack

The macroevolutionary future

MSNBC has an interesting, if speculative, story that profiles some thinkers who project far into the the future the trajectory that macroevolution will take. When I was a child I was more interested in macroevolution than I am now, and I did stumble upon some beautifully illustrated works which depicted giant carnivorous post-rats stalking enormous post-rabbits. It was only a step below science fiction in many ways, but it is interesting to note that in any given future it is likely that the vast majority of genera, and many classes, of animals, plants and fungi1 will have disappeared. The Tree of Life is populated by many more dead ends than it is by viable shoots.

Which brings me to a question, I am thinking about reading Jerry Coyne and Allen Orr's book Speciation. Is it the real deal?

I know Allen Orr mostly from his Boston Review columns, where he fights the good fight against Dembskification and less clean bouts against evolutionary psychology.2 Readers might find it amusing that Orr's lab technician between 1993-2003, who studied the genetics of adaptation and speciation, attended a Baptist Church which rejects speciation!3 From their web site:

We believe in the Genesis account, and that it is to be accepted literally, and not allegorically or figuratively; that man was created directly in God's own image and after His own likeness; that man's creation was not a matter of evolution or evolutionary changes of species, or developments through indeterminable periods of time from lower to higher forms; that all animals and vegetable life were made directly and God's established law was that they should bring forth only "after their kind."

William Dembski is rather less direct than that....

1 - Taxonomic categories are fuzzier and less precise for many unicellular organisms that reproduce asexually so I am much more reluctant to make any generalizations about the prokaryotes.

2 - One might wonder about the second point, as I've made clear I don't buy the whole package of Evolutionary Psychology. I will elaborate on my own irrelevant opinions on phenotypic plasticity or the importance of cognitive biases in guiding development (my views are irrelevant because the science is being done). But Orr's review seems to paint with too broad a brush based on Pinker's specific work. Like many evolutionary biologists who can grapple with nature, red in tooth and claw, I think that Orr is overly harsh about what might be acceptable as a standard of evidence, and what is acceptable as a null hypothesis.

3 - This is not meant to be a hit against Orr or anything like that, I just found it kind of bizarre. After all, Orr is not just a biologist, he is an evolutionary biologist, and he is not just an evolutionary biologist, he is interested in speciation. Then again, I was a brown atheist who was once a member of the Korean American Christian Fellowship, so anything is possible.

Posted by razib at 12:39 AM | | TrackBack

May 04, 2005

Measuring Genetic Diversity: Lewontin’s Other Fallacy

For a while now I’ve been trying to understand how genetic diversity is measured.

For example, there is the familiar finding by Richard Lewontin, replicated by many others, that in humans about 85% of total genetic diversity is found within any single population, and only 15% between different populations.

But what do such statements actually mean? How can diversity be measured and apportioned between populations?

I guess that most laymen with an interest in genetics, like myself, content themselves with a very vague understanding of such claims. A full explanation would probably be too complicated and technical for the layman to follow.

But I wanted to dig a bit deeper. While I did this mainly for my own benefit, the results may be interesting to others.


- the good news is, that the basic concepts and methods are not very technical. Most of the key points can be understood using only elementary algebra.

- the bad news is that there are several different ways of measuring diversity, and especially the diversity or ’distance’ between two or more populations. In general the various methods will give the same rank order of diversity, but they may give very different numerical values.

- the worse news is that some of the most widely used measures are of doubtful value. The problem is not that they are actually wrong, but that the results are ambiguous and can give a misleading impression. In particular, Wright’s FST, Nei’s GST, and similar measures (including that used by Lewontin) can seriously understate the relative importance of genetic differences between populations as compared to differences within them.

Reasons for these conclusions are given more fully in the continuation. I hesitate to go into these technical issues, because I expect to get one (or both) of two responses:

(1) That’s all nonsense


(2) Oh, everyone knows that!

But after a good deal of reading on the subject, I don’t think it’s all nonsense. Most of the points I make can be found somewhere in the academic literature. In particular, I was pleased to find that an eminent population geneticist has made the same key point that independently occurred to me. On the other hand, these issues do not seem to be widely discussed in the literature, and someone who had read (say) the popular works of Cavalli-Sforza, Spencer Wells, and the like, or an introductory genetics textbook, would probably not be aware that there was any serious problem about measuring diversity.

Incidentally, the main problem I discuss is quite different from that raised by Anthony Edwards in his well-known paper on ’Lewontin’s Fallacy’. I should also emphasise that I am not saying that Lewontin's 85% figure is actually misleading, just that it may be. But to understand why, read the rest…

In measuring genetic diversity we are attempting to quantify the extent of differences within or between populations. If we were measuring diversity in continuous quantitative traits such as height, there would be a clear starting point for identifying the ’differences’ we are interested in. Given any two measurements of the same trait, we can subtract one from the other to get a raw difference or interval. These intervals are themselves quantities in the same dimension as the trait itself, and can be added, multiplied, averaged, etc, to obtain such measures of diversity as the standard deviation, the Gini coefficient, the inter-quartile interval, or the mean absolute-value deviation.

With a non-quantitative trait such as genetic material, the starting point is not so obvious. Given any two stretches of DNA, we can try to identify and list the differences between them. But are all differences equal? Are we interested in differences of single nucleotides, codons, functioning genes, or what? Is every difference to be given the same weight, or do we, for example, ignore non-coding regions or synonymous codons? This depends in part on the underlying motive for measuring diversity. The existing measures of diversity, such as Wright’s FST, were devised mainly to assist in reconstructing phylogenies and population history. For this purpose all genetic differences are potentially informative, and the tendency is to treat all differences as equal. A different approach might be appropriate for other purposes. The choice of units of analysis may also affect the size of measured diversity even if it does not affect the rank order of diversity in different populations: for example, diversity at the level of haplotypes will be larger than at the level of haplogroups, since each haplogroup is divided into many haplotypes.

However, I don’t want to linger on these issues, and will assume that a decision has been taken about the level and kind of genetic differences we are interested in. That still leaves some problems of measurement to be settled.

Once genetic material has been classified into a number of variant forms or ’alleles’ (at whatever level), it would be natural to suggest that the level of diversity within a population can be measured by the number of different alleles within that population, while the diversity between two populations can be measured by the number (and/or proportion) of alleles found in one population and not the other. This could be a workable approach in comparing different species, but within the same species the problem is that most alleles are common to most populations (except perhaps for mitochondrial or Y chromosome haplotypes). Differences are more in the frequency (proportion) of different alleles than in their simple presence or absence.

With sufficient data on the frequency of different alleles, it is possible to calculate the probability that two genes at a given locus, selected at random from the relevant population or populations, will be either identical (homozygous) or different (heterozygous). [See note 1.] This will give us the average expected number of genetic differences between individuals, and it is plausible that this pins down in more precise terms the vague concept of ‘diversity’. Most measures of genetic diversity are therefore based on some index of heterozygosity: the probability that two genes at a given locus, selected at random from the relevant population(s), will be different. As Lewontin puts it, ’there are various measures of the diversity of objects in a collection, all of which are equivalent to asking the probability that two objects taken at random from the collection will be of different kinds’ (Lewontin, p.120) If the frequency of a given allele in a population is p, then the probability of randomly selecting that allele twice in succession is p^2 (i.e. p-squared). [Note 2] If we square the relevant frequency p for each allele at the same locus, the overall probability of selecting some allele twice in succession will be the sum of all the p-squareds: Σp^2. This is the expected homozygosity of the population at that locus. Since heterozygosity is just the complement of homozygosity, the expected heterozygosity at that locus is 1 - Σp^2, which I will call H. (In the special case of a two-allele system, with frequencies p and q (= 1-p), H = 2pq [see note 3].) We can then average H over a number of different loci to get an estimate of average H within the population.

Intuitively, we would expect diversity to be higher, other things being equal, when there are more alleles in the system rather than fewer, and when their frequencies are evenly spread rather than concentrated in one or a few alleles. Conversely, we would expect diversity to be low when most of the frequency is concentrated in one or a few alleles. H meets these criteria of diversity rather well, and in general the level of H seems to be a reasonable way of ranking different populations with respect to their internal genetic diversity. However, this does not guarantee that differences of diversity can be numerically measured by the difference in H. Suppose for example that a population has n alleles, with equal frequencies for each allele. H will therefore be 1-n[(1/n)^2] = 1-1/n. Consider the following values of n and the corresponding values of H (to two places of decimals):


Evidently increasing the number of alleles from, say, 2 to 4 does not double the ‘diversity’ as measured by H, and increasing n beyond about 5 makes relatively little difference to H. Even increasing it tenfold from 10 to 100 only increases ‘diversity’ by 1/10. Since H cannot exceed 1, it is bound to be squeezed up against the ceiling when values of n are high. This seems intuitively unsatisfactory if we are looking for a numerical measure of diversity.

As well as differences in the number of alleles, differences in the relative frequency of alleles can also have intuitively unsatisfactory effects on H. Consider the following values of H (to two places of decimals) for different values of p, where p is the more common of two alleles in a two-allele system:


Over quite a wide range of changes in gene frequency (from p = .5 up to about p = .75) ‘diversity’ changes rather slowly, but beyond this point it falls more rapidly. Suppose we are comparing diversity in 4 populations, A, B, C, and D, where p and q are in the ratios 5:5, 7:3, 8:2 and 9:1 respectively. For these ratios, H is .5, .42, .32 and .18 respectively. By the criterion of H, we will conclude that the rank order of diversity (from greater to less) is A>B>C>D, which is intuitively reasonable, but we would also conclude that the difference in diversity between C and D (.32 - .18 = .14) is greater than the difference between A and B (.5 - .42 = .08). This does not seem intuitively right: in quantifying diversity at a population level the difference between a 5:5 split and a 7:3 split is surely at least as important as the difference between 8:2 and 9:1.

A further weakness of H as a quantitative measure of diversity is that it is almost bound to produce high values of H (between, say, .7 and .99) if there are more than 2 alleles at a locus, and no single allele is predominant. H cannot be less than .5 unless the most common allele has a frequency of at least .5. If there are more than a few alleles with significant shares of the population, H can hardly be less than .7. For example, if there are 4 alleles with frequencies in the ratio 30:25:25:20, H will be about .75. With 5 alleles in the ratio 25:20:20:20:15, H will be nearly .86. Such a compressed scale of measurement is likely to be inconvenient and potentially misleading. By analogy, suppose that we tried to measure climatic temperature on a new scale under which all temperatures between 0 and 60 degrees Fahrenheit had values between 0 and 90 and all temperatures above 60 degrees F were squeezed into the values between 90 and 100 on the new scale. The new scale would be unlikely to catch on!

Of course, the level of heterozygosity is interesting in itself, and if we want to define diversity by heterozygosity we are free to do so, but this doesn’t necessarily capture everything in our informal concept of diversity.

If this seems a minor technical point, reflect that if H within populations is as high as .86, then diversity between populations, measured in the most common way (Nei's GST), cannot be more than .14, no matter how different the populations are from each other. (They could even be different species.)

But the measurement of diversity between populations is a bit more complicated than within a single population, so I will continue the analysis in a second post…

Note 1: These terms strictly apply only to genes within the same organism, but now seem also to be widely used with reference to genes in different individuals or populations. In this sense heterozygosity or homozygosity are hypothetical, referring to the probability that individuals would be hetero- or homozygous if their parental gametes were selected at random in the way specified.

Note 2: Strictly, in a finite population this is only true if we allow the same gene to be selected twice, but this does not matter unless the population is very small.

Note 3: Homozygosity = p^2 + q^2, so H = 1 - p^2 - q^2. But q = 1 - p, therefore H = 1 - p^2 - (1 + p^2 - 2p) = 2p(1-p) = 2pq.

R. Lewontin: Human Diversity, 1982

Related: Lewontin debunked.

Posted by David B at 05:50 AM | | TrackBack

May 03, 2005


The rich are getting fat.

Further complicating attempts to compare income and obesity are cultural factors. Certain racial and ethnic groups positively equate a man's girth with wealth -- it's a sign of success, Drewnowski said.

"I would caution against any attempts to interpret these data to say social differences have disappeared," he said. "It just shows that obesity is a general problem and it's now affecting pretty much everybody. ... But it would be very shortsighted to stop paying attention to the people who are most vulnerable."

Yet today, the obesity remedies most often recommended for Americans in general -- eat fresh salads, go ride a bike -- are impossible for many low-income families, Drewnowski said

Exercise can be hard in inner cities, where the streets may be too dangerous after working hours. Many grocery stores in low-income neighborhoods don't stock expensive fresh produce. And people who work two or three jobs have little time to make home-cooked meals.

Robinson agreed: "I don't want to take focus away from the serious racial and ethnic disparities in health."

But, she said, it's likely that different factors play a role in spurring obesity among the middle class than the poor. "We need to have a lot more research ... to tailor our interventions to specific populations."

1) I'd be interested in how the genetic makeup of the rich has changed in the past 30 years. Not just ethnic changes, but more general changes as well. For instance, are they as smart as they once were?

2) Who's doing the cooking? Sure, an Italian, heaps-of-pasta-on-the-table mother might make you fat. But I'd guess that a career-minded, busy, let-the-kids-scrounge, junk-food-in-the-fridge mother is worse.

3) How many people, as a percentage of the population, are fit because of conscious effort? I'd guess that the numbers must be pretty low. It's hard work. Hell, I ran 80 miles in April and gained 14 pounds because I wasn't watching my diet. It takes a great deal of conscious effort to overcome your environment. The vast majority of people just go with the flow.

Posted by Thrasymachus at 02:16 PM | | TrackBack

May 02, 2005

Adapting Minds, David Bulller & Evolutionary Psychology

I wasn't going to comment on this until I later, but a confluence of events have prompted me to offer (quick) opinions on the book Adapting Minds, by David J. Buller. Steve has weighed in, and now Buller's former student, Will Wilkinson has put in his 2 cents. Like Will I'm only half way through the book, just scratching the "empirical chapters." Discussion about Buller's book has been prompted by a peculiar review in The Wall Street Journal by one Sharon Begley. I have pretty much digested the "theoretical" chapters, and so I was surprised by the content of Begley's review because it fixated on what Buller implies are the secondary empirical chapters.

Let me clarify. Buller's book is a broadside into what I have termed Evolutionary Psychology™, basically the model proposed by Leda Cosmides and John Tooby, and promoted by the likes of David Buss and Steven Pinker. This model of biologistic thinking implies a few core theoretical commitments, in particular:

  • Massive modularity.
  • A Pleistocene adaptive environment which is of overwhelming relevance to our current presdispositions and biases

The first half of Buller's book is a point by point wide blitzkrieg upon these two positions. There is a lot to disagree with, and, that is why Buller distinguishes Evolutionary Psychology from the broader field of evolutionary psychology, the latter consisting of "behavorial ecology," "evolutionary anthropology" and "human ethology." Though these fields often have greater scope and are less cognitively focused than Evolutionary Psychology they are basically peddling the same product under a different brand name. Nevertheless those who adhere to the alternative brands often do not accept the theoretical commitments of EP practioners, which is the primary reason for their distancing from the appellation "Evolutionary Psychologist." Even those who call themselves evolutionary social psychologists, like Geoffrey Miller of The Mating Mind fame do not subscribe to all tenets of the EP consensus. In Miller's case it is the one point where I have strong reservations and disagreemants with orthodox EP, the position that most of the "major" psychological traits will be monomorphic, "human universals" where all populations and individuals will display little heritable variation.1 I believe there is likely a great deal of variation and some non-trivial interpopulational differences, not to mention the findings of behavior genetics. I also suspect that the "EEA" is untenable, and like Buller I see no reason why evolution had to stop with the Pleistocene. I will gloss over Buller's arguments against massive modularity because I need to personally do some more reading in the field of cognitive science before I can comment. But, it needs to be emphasized that the theoretical chapters are the meat of the book.

In an expansive introduction Buller offers two major points:

  • The book is aimed at Evolutionary Psychology, not evolutionary psychology (at least primarily).
  • The empirical chapters exist in large part because one argument that EP promoters make is that because the model facilitates strong results one should give the it the theoretical benefit of the doubt. Bulller's empirical chapters are attempts to weaken the empirical support so as to buttress his theoretical case.

The take home message is not to jump ahead to the topical chapters, because they are really simply battles in the midst of a very wide ranging war. Unfortunately it looks like Begely found the theoretical chapters dry and skimmed over them, otherwise I can not see how she missed Buller's repeated admonishments not to misinterpret his rather precise project.

So where do we go from here? Will offers that perhaps his old professor's empirical objections will not necessarily pan out. I had the same thought, and today I saw this from Carl Zimmer: Cheating on the Brain, where Carl reviews a paper which offers mild neurological support for the Wason Selection Task which purports to show the probability of a content-specific "Cheater Detection Mechanism" in the mind. Carl points out, as does Buller in his book, that the experiment has been heavily criticized and does not have canonical status outside of EP circles. But here we have empirical science stepping in. Actually, I think the support is very mild at best, nevertheless, it is a proof of principle issue, EP does make predictions and offer testable conjectures.

Finally, I would like to end with a peculiar sociology-of-science observation. One of the main critics of the Wason Selection Task is a French anthropologist by the name of Daniel Sperber. He pops up in Buller's book and Carl's post. So is this man the bête noire of EP? Well, in fact, Sperber gives a nod to both Tooby and Cosmides in his book Explaining Culture as having convinced him that evolutionary thinking was relevant to the study of mind and culture. If you read Buller's book you will not be aware of this, though Sperber is a strong critic of the particulars of Tooby and Cosmides' model, he also is a fellow-traveler. While he may offer powerful philosophical and experimental arguments against the Wason Selection Task, he also pumps out enormous essays in defense of massive modularity.

It's all rather Byzantine, and not only should Buller's book be read carefully, it can not be assumed to be the last word, because even with 500 pages he can not fully characterize the richness of thought that is emerging in the intersection of the human and evolutionary sciences. I think the final word on Buller's book in reference to the fields of Evolutionary psychology, evolutionary psychology, and yes, even punditry, is that what does not kill you makes you stronger.2

1 - The EP orthodoxy appeal to the coadapted-gene-complex position. Their basic argument is that the mind such a massively contingent organ that even minor variations in the gene profile that controls the phenotype through recombination or heterozygosity would result in failure (so to prevent recombination you have the traits be fixed on all loci). I disagree. So do many others who subscribe to the "bean bag" genetics mentality.

2 - Buller makes this explicit. One of his clearest beefs seems to be that EP and anti-EP folks often talk past each other because they do not precisely define by what they mean by "Evolutionary Psychology." This makes Begely's misunderstanding somewhat egregious and her skimming seems pretty naked from where I stand. For the anti-Gouldians out there, you might be curious that Buller spends a fair amount of time demolishing Gould's objections before presenting his own case.

Posted by razib at 10:03 PM | | TrackBack

The myth of the brown-eyed baltic blonde?

In Living Races of Man Carleton S. Coon1 asserted offhand that the Baltic region exhibits a relatively high frequency of blondes who also happen to have brown eyes. In contrast, Ireland is characterized by dark-haired blue-eyed individuals (especially western Ireland). The latter assertion I find plausible, though it must be admitted that the majority of northern Europeans have dark hair and blue eyes, so it would not be implausible if the intersection of the two traits in many areas exceeded 50%. But, in any case, I don't know much about the Baltic region, so I set about to see if I could find evidence of the brown-eyed blonde

I found this Lithuanian modelling agency which has a facebook with a decent amount of information. I could puzzle out the terms for "blue," "green" and "brown" for eye color pretty quickly, though the many variations in hair color I simply compressed into "blonde" and "dark" categories. Here is the pivot table that excel popped out for me (I surveyed only the adult female models, someone else can check out the dudes if they are so inclined):

Eye color       Blonde Dark Totals 
Blue  23 13 36
Brown  4 11 15
Green  11 11 32
Totals  38 35 73

I don't see a high frequency of brown-eyed blondes, and when I was going through the facebook I was actually surprised at how many very dark women there were employed by the agency (see here). I had a hard time keeping track of the names, so perhaps they were Russian (though the woman I linked to just right now is Lithuanian if her name is any clue, check with google if you don't believe me). As you can see, the blonde women in fact had the lowest number with brown eyes, an indication of either population substructure, or, more likely as far as I'm concerned, a pleitropic effect from the same alleles that code for hair and eye color (MC1R is implicated in pigmentation in general, but there are many, many, other loci, one recent count suggests 120, though many of those will likely be involved in regulation of MC1R).

I had a harder time finding Irish models, and so I figured I wouldn't waste my time doing a comparison. Anyone who has Coon's book on hand might want to post what he says (what numbers he offers) in the comment thread.

1 - Coon is by the way a great last name if you want to be accused of being a racist!

Posted by razib at 06:13 PM | | TrackBack

Selfish DNA

New theory contends that long-lived, quiescent retroelements are a major driving force in human genome evolution

“Alu elements are short, 300-nucleotide-long DNA sequences capable of copying themselves, mobilizing through an RNA intermediate, and inserting into another location in the genome. Over evolutionary time, this retrotransposition activity has led to the generation of over one million copies of Alu elements in the human genome, making them the most abundant type of sequence present. Because Alu elements are so abundant, comprising approximately 10% of the total human genome, they have been thoroughly characterized in terms of their origin and sequence composition. What has remained elusive to scientists, however, are the actual mechanisms by which these elements persist and propagate over time to influence human evolution.”

“To date, the most widely accepted theory of Alu retrotransposition is called the "master gene" theory, which asserts that the majority of Alu retrotransposition activity is driven by a small number of hyperactive "master" sequences. In this model, mutations occurring in the "master" copies have rendered themselves capable of substantial propagation and persistence over time. However, prior evidence from the Ya5 subfamily indicated that at least some "master" Alu elements may persist in low-copy numbers for long periods of evolutionary time without retrotranspositional activity, suggesting that the mechanisms of Alu expansion may be much more complex. These observations led Dr. Batzer and his co-workers to examine the Yb subfamily of Alu elements, to demonstrate that the Yb subfamily has a similar evolutionary pattern to that of AluYa5, and to formulate the "stealth driver" hypothesis for the evolution of these Alu elements.

"In contrast to 'master' genes, 'stealth drivers' are not responsible for generating the majority of new Alu copies, but rather for maintaining genomic retrotransposition capacity over extended periods of time," Batzer explains. "By generating new Alu copies at a slow rate, a 'stealth driver' may occasionally spawn progeny elements that are capable of much higher retrotransposition rates. These hyperactive progeny elements may act as 'master' genes for the amplification of Alu subfamilies and are responsible for producing the majority of the subfamily members. Due to their high retrotransposition levels, however, they are likely to be rapidly purged from human populations through natural selection."”

Posted by fly at 11:25 AM | | TrackBack

May 01, 2005

What's your s?

A few weeks ago John Hawks posted something important, he suggested that the basic paradigm which population geneticists arguing for Out-of-Africa vs. !Out-of-Africa are working with is faulty (Henry Harpending has implied the same to me in emails). As Greg Cochran specified, the cruxes are the issues of selection and interbreeding: the models that are used to infer homonid admixture often assume that the populations never exchanged genes except for recent and singular hybrdization events/periods. Additionally there is often an assumption of neutrality as regards the genetic locus in question (on the well known loci like mtDNA).1 In other words, it seems some geneticists who have spent years breeding rat and fly lineages in a controlled environment forget that humans are generally far more promiscuous and don't have grad students and lab techs monitoring clandestine inter-lineage trysts.

How plausible is it that archaic populations did not interbreed so that alleles were never exchanged? Since we don't have archaic populations around anymore (or so we think....), let's look at possible H. sapiens analogs. Consider lactose tolerance, there is strong evidence that the allele that confers adult lactose catabolism in many populations has a common origin in Eurasia. In short, the lactose tolerance of populations traditionally classified as "Caucasoid" derives from a common mutational event ~10,000 years ago. On the other hand, the lactose tolerance of various African populations seems to derive from alternative alleles (there is also evidence that dipigmentation in Europeans and East Asians is controlled by different alleles, control-f "pigment" or "Horton"). But looking specifically at the distribution of lactose tolerance in South Asia, in northern India about 70% of the population is lactose tolerant, in southern India about 30% is (cite). Doing a little googling for this sort of data I stumbled on to some "Aryan" sites that suggested that this was evidence that northern Indians are more like Europeans than they are like southern Indians. That is correct, on that particular locus! My own survey of the literature suggests (no surprise) that South Asians are a genetically diverse group, and depending on the locus you analyze you will come to different conclusions, though I personally believe that the balance of the evidence (evaluated over all "informative" loci) suggests that north and south Indians have far more in common genetically than they do with any other populations outside of South Asia (excepting a few groups like the Parsis). The difference between north and south India as far as lactose tolerance goes might simply be due to the feasibility of pastoralism and the density of cattle, so the selection pressures were far greater in the north than in the south.2

We obviously know a lot more about the details of population substructure among modern populations than we do about putative "archaic" groups. Unfortunately paleontology can only tell us so much (and I get the impression that a "splitting" mentality is dominant in some sectors of paleoanthropology), and a lot of what passes for background assumptions in many hypotheses are built on slim conjecture. Additionally, particular areas like Eastern Asia have a dearth of homonid remains over the past few hundred thousand years until the emergence of moderns. John points out that the reason we might be finding so few "archaic" alleles where recombination seems to not have occurred is that it may be that only those alleles with very narrowly constrained local adaptations might have remained cordoned off from the rest of the humanity and so evince an ancient coalescense.

Which brings me to a second point-David Boxenhorn expects "local populations to be better adapted to local climatic conditions, making old-new hybrids which combine the best of both the most fit." Is this a valid assertion? Depends. Certainly I suspect that the lactose tolerance allele spread throughout Western Eurasia through such a process of deme-to-deme genetic exchange, or at least a mild form of demic diffusion where hybridization played a large role. We have talked before about the fact that sharp phenotype differences might persist even where neutral alleles (ancestrally informative) show clinal gradients due to selection pressures (Henry Harpending's canonical examples are the San of the Kalahari and the Rh- allele among the Basques). But obviously this doesn't always happen: when groups face each other as "collectives" there can be simple replacement of one population for another. In other words, the fitness of allele X vs. allele Y on locus A is swamped out by the reality that people carrying allele X are simply being marginalized/killed off by people carrying allele Y so that there isn't a chance for the "fit" allele to remain present at a high frequency in the population (consider the light-skinned Australians who live in Queensland). Now, if there are a small group of survivors who get absorbed into the Y carrying population which bring in allele X, if the selection pressure is environmental the frequency of X should increase over time. If X was favored only by social/sexual selection, then it might actually be selected against!

All in all it is a big mess. I suspect that, as I have noted before, that one reason Out-of-Africa is so popular is that it avoids such messes. It also preserves our "common sense" need to square genetic phylogenies with ancestral phylogenies as well as phenotypic particularities (that is, one "human" group emerged with a particular unique suite of genes and a distinctive adaptational skillset in one location at one time). It is sometimes said that genuine oral memories don't go back much further than a few centuries, and so it certainly seems that for most people you would observe a concordance between ancestry and appearance. Selection usually takes at least a few generations to work its magic even for traits where there is a lot of variation in fitness within the population.3 Also, many situations where selective sweeps could occur, perhaps in terms of resistance to plague, are not always susceptible to visual inspection.

But the problem isn't just with the lay audience for popular books: someone must be producing the books. Spencer Wells is no idiot (no, really, Harvard accepted him!). Neither are a host of other scientists and scholars who promote Out-of-Africa. The scientists who pushed for an early separation between the ape and human lineage were also not idiots, and they kept believing their fossils as opposed to the biochemical evidence which suggested a recent divergence until the fossils falsified them. So I have to bring up the "p" word, paradigm. Kuhn's model is abused and distorted to the point where science becomes just "another superstition." But, in reality, it's like democracy, a shitty system that happens to be better than any other out there (at least at getting an empirically based handle on the world out there). Over the past 10 years a swarm of studies have emerged that examine the histories of genes (or in particular, the relationship between various alleles). Most of the time scientists have focused on the NRY and mtDNA because the assumption of neutrality and the lack of recombination made their models far simpler, and by the time a pipeline of papers was compressed into a pithy popular press articles the history of genes became the history of our species. On the foundational level of course scientists understand that human beings are simply a collection of genes, and there is no necessary reason that all these genes have tracked the exact same paths through the various individuals and subpopulations that have comprised the homonid lineage. But it would certainly be easier if they did! Over 20 years ago Richard Dawkins wrote The Selfish Gene to promote the viewpoint that the body, the individual, was just a vessel for the propogation of genes, at least from the perspective of evolution. Ten years later mitochondrial Eve burst upon the scene, and the history of the small bits of rapidly mutating mitochondrial DNA was conflated with the history of the human species. With the glut of data produced by improved genomic sequencing techniques the blind spots in the empirical landscape are being illuminated. The familiar singular vistas offered by mitochondrial Eve are being overshadowed by bizarre counterintuitive conformations girdling the whole horizon. The roads in gene land might be narrow but the scenery escapes a simple adjective, and it is simple adjectives that sell books. John Brockman, your scientists need you!

Addendum: John states somewhat cryptically: "That would leave natural selection as apparently the only explanation, although what pattern of selection would create the excess of ancestral alleles in Africa is up for grabs. I have an idea, but I'm not sharing it just yet." Well, we know there have been selective sweeps in Eurasia that did not impact Africa to the same extent. Another factor, which I do find tenuous from the angle of functional relevance is that Africa seems to have a high pathogen load (this has resulted in social adaptations) that might enforce some sort of frequency dependent selection on some loci (I don't know if, for example, the mtDNA has anything to do with pathogens in terms of fitness). If I recall correctly the MHC loci, which are directly relevant for immune response and coordination, are more diverse and varied in Africa. Also, as noted above in the context of lactose tolerance it seems that there were reasonable (though obviously not perfect) genetic barriers across the Sahara for much of human prehistory (the Eurasian lactose tolerance mutation likely wasn't in the genetic background in any of the numerous populations of the Sahel). If one assumes that the periodic expansions of the Eurasian ice sheet pushed back homonid populations so that there were only isolated remnants, periodic bottleneck effects (which would drive down the long term effective population a lot) could have homogenized the Eurasian gene pool. In contrast Africa might have been relatively shielded from the worst of the environmental fluctuations and so been subject to fewer bottleneck events which would have driven down genetic diversity. I note that in The Real Eve Stephen Oppenheimer focuses on the Indian subcontinent as the source of all Eurasian lineages of modern humans based on his reading of the mtDNA, and the climatic explanation has the virtue of explaining this as well. Oppenheimer is rather explicit that even the modern human populations in Eurasia "went their separate ways" during the Last Glacial Maximum, never to meet again genetically....

Related: The origins of phenotypic variation? "Racial Diversity"

1 - Greg pointed out that chance of fixation for an allele is 2s, where s is the selection coefficient vs. 1/2Ne, where Ne is the effective breeding population, for a neutral marker (often this is couched in terms of per mutational event, but in the comments Greg stated it as number of copies introduced, same diff from the angle of the population that is getting the novel allele into the system). Also, when Ns, where N is population size and s is selection coefficient, is much less than one (ie; Ns << 1) change in allelic frequency is determined by drift, while if Ns is much greater than one selection is the primary operator (Hedrick 2000). As you know, the power of drift is inversely proportional to population size.

2 - A common "functional" explanation for the cow veneration in South Asia is that utilizing them for milk is far more efficient than slaughtering cattle for meat. This theory has had its ups and downs and I think the current state of the consensus is that it is not correct.

3 - The observation that even people with the same coancestors may vary a great deal in appearance because of independent assortment (and over the long term recombination) is more obvious in populations where there has been recent admixture between two geographically distinct races whose phenotypes tend to be disjoint on a host of characteristics. Consider two cousins who both share two black grandparents and two white grandparents, one on both sides. It is entirely possible that one will display a greater signature of African or non-African ancestry than the other, though ancestrally both are 1/2 African and non-African. Obviously this sort of situation is less of a consideration in pre-modern circumstances.

Posted by razib at 05:53 PM | | TrackBack