Tuesday, December 16, 2008

The Unread Fisher: Human Evolution   posted by DavidB @ 12/16/2008 04:14:00 AM

The last five chapters of R. A. Fisher's Genetical Theory of Natural Selection - about a third of the book - are devoted to human evolution. These chapters are seldom quoted and probably seldom read, even by Fisher enthusiasts. [Note 1]

There are some obvious reasons for this neglect. Much of this part of Fisher's book is concerned in a broad sense with eugenics, the very mention of which is sufficient to paralyse rational thought in some quarters. But even for those who are not scared of the e-word, there would be reasons for disregarding these chapters. The evidence on which Fisher relies is thin and out-of-date. His evidence on the heritability of human fertility, which is central to his arguments, depends entirely on studies of the British aristocracy, which is hardly a representative sample of the species. Apart from this, like many of his contemporaries (in the 1930s) Fisher believed that current fertility trends were dysgenic: that Britain (and other western nations) were threatened by a decline in the genetic quality of the population. For example, the psychologist R. B. Cattell estimated in 1937 that average IQ in Britain was falling at a rate of about 1 percent per decade. The snag with the dysgenic hypothesis is that the period since the 1930s has seen a large improvement in almost all measurable aspects of human 'quality': IQ, educational achievement, height, general health, and longevity. The average man or woman in Britain today lives about 20 years longer and has an IQ about 20 points higher than in Fisher's day (by 1930 norms). It is possible to argue, like Richard Lynn [Note 2], that an underlying genetic decline has been masked by an even larger environmental improvement, but from a practical point of view pessimists like Fisher and Cattell have been refuted by events.

Nevertheless, there is much in Fisher's neglected chapters which is interesting and worth reading, and this post is intended as a brief taster....

Why Human Evolution is Special

Fisher complains that the treatment of man in general works on evolution is usually superficial, and emphasises that human evolution is interesting and unusual enough to deserve extended treatment. He points out (p.192) that any animal that has undergone profound changes in its recent evolutionary history should be of special interest to the evolutionist. He mentions some of the more obvious special features of man - big brain, exceptional social organisation, use of artificially constructed tools, symbolic communication - and concludes that 'to the non-human observer mankind would present a number of highly interesting evolutionary inquiries and would raise questions not easily to be answered only by the use of comparisons and analogies'(p.192) . But the distinctive perspective he gives to human evolution is that, in contrast to most other species, natural selection in man operates mainly through differences in fertility, rather than mortality, and that it now operates (at least in 'civilized' man) with exceptional intensity (p.218, 228). He goes on to explore the interaction of social class, fertility, and sexual selection, and concludes that the combination of conditions producing an acceleration of evolutionary changes is 'peculiar to man' (p.269). Far from thinking that in modern man evolution has come to a halt, as some modern evolutionists (e.g. S. J. Gould, Steve Jones) have claimed, Fisher therefore believes that it is exceptionally rapid, and capable of producing significant changes even during recorded historical times.

The Evolution of Fertility

Fisher makes some brief but important general remarks on the evolution of fertility, which are applicable to all species, and not just to man. Despite this, his remarks have been generally overlooked. [Note 3] He argues that fertility, like any other trait, is subject to natural selection, and that the most important factor in determining optimal fertility is the amount of parental expenditure required: 'In organisms in which that degree of parental expenditure, which yields the highest proportionate probability of survival, is large compared to the resources available, the optimal fertility will be relatively low' (p.204). In 'civilized' man, the most important determinants of fertility are psychological. There are factors of temperament which determine the propensity to marry, whether marriage is early or late, and the degree of enthusiasm for children (p.210-13). But there are also social and institutional factors such as prohibitions on infanticide (p.218-21). These factors will themselves be affected by psychological influences which will vary over time, (p.219), since parents who are reluctant to commit infanticide will have more children surviving, and the children will tend to inherit their parents' temperament. Fisher argues that this is responsible for the changing historical views on infanticide, and minimises the role of religious doctrine, which itself (he argues) is responsive to the general mood of the population. In a splendidly Fisherian phrase he remarks: 'It would, I believe, be a fundamental mistake to imagine that the moral attitude of any religious community is to any important extent deducible from the intellectual conceptions of their theology (however much preachers make it their business so to deduce it)' (p.222) .

Man versus Social Insects

In several places (p.199-204, 271-2) Fisher compares and contrasts human and insect societies. He stresses the major difference that in insect societies reproduction is specialised in a reproductive caste, often with a single queen. An insect society therefore 'more resembles a single animal body than a human society' (p.200) and 'selection must in this case act exclusively on the reproductive insects via the prosperity of the societies from which they arise' (p.201). In the light of modern sociobiology this emphasis on the reproductive system may seem blindingly obvious, but in Fisher's time it was not, and even in the 1950s writers like A. E. Emerson still tended to neglect it. Fisher also has a most interesting comment on the origins of insect societies, suggesting that 'as soon as the young adults of any incipient social form took either to performing the preparatory labour for reproduction, or to tending the young, before they themselves had commenced to reproduce, the balance of selective advantage would have been shifted towards favouring the fertility of the foundress of the colony, and towards favouring equally the development of the organs and instincts of workers rather than of queens among her earlier, and possibly less well nourished, offspring' (p.205).

In human societies, in contrast, reproduction remains individualistic, and genetic competition within communities is always present. Fisher does not entirely dismiss the importance of inter-group selection: 'Among small independent competing tribes the elimination of tribes containing an undue proportion of the socially incompetent, and their replacement by branches of the more successful tribes, may serve materially to maintain the average standard of competence appropriate to that state of society' (p.201). But even in this state of society competition within the community is present, and becomes more important as the size of groups increases (p.201). He later points out that 'The selection of whole groups is, however, a much slower process than the selection of individuals, and in view of the length of generation in man the evolution of his higher mental faculties, and especially of the self-sacrificing element in his moral nature, would seem to require the action of group selection over an immense period' (p.264. Incidentally, this is the first use of the exact phrase 'group selection' I have noticed in the literature. Sewall Wright, around the same time, uses 'intergroup selection'.) Fisher concludes that the main force in the evolution of such qualities has been individual selection, but powerfully enhanced by the action of kinship groups and sexual selection, which in the case of man also involves decisions by kinship groups. I will discuss this further in another post.

To be continued, probably after Christmas.....

Note 1: I will give page references to the easily available Dover edition (1958). There are no relevant changes from the first edition. Among Fisher's admirers, W. D. Hamilton does in his very first published paper refer to the 'human' chapters of GTNS (see Narrow Roads of Gene Land, vol. 1, p.8), but elsewhere does not, even when (as in his essay 'Innate social aptitudes of man') they would be highly relevant.

Note 2: Lynn has written a book, Dysgenics, and various articles on this theme. He attributes the increase in average IQ (the Flynn Effect) mainly to improved nutrition. The awkwardness of his position is that his argument for dysgenic effects requires the genetic influence on individual IQ to be large, while his interpretation of the increase in IQ also requires the influence of environment to be large - larger in fact than the entire observed Flynn Effect, since this is the net result of a negative genetic trend and a positive environmental effect. This combination of requirements is not logically impossible, but it is uncomfortable.

Note 3: the modern theory of the selection of optimal fertility is usually credited to David Lack, who gathered empirical evidence for it, but the key concept of optimal parental investment is contained not only in Fisher but in various other writers. Fisher himself credited the concept to Major Leonard Darwin.

Labels: ,

Thursday, November 27, 2008

Wright, Fisher, Haldane, and odds and ends   posted by DavidB @ 11/27/2008 06:36:00 AM

From time to time I give links to those of my old posts that may still be worth reading. Previous guides are here: 1, 2, 3, 4.

It is over two years since the last update. In that time most of my posts have been on the history of population genetics, and especially on the 'founding fathers', R. A. Fisher, J. B. S. Haldane, and Sewall Wright. I recently finished a long series of Notes on Sewall Wright, so this is a convenient time to take stock.

Most of these posts are long, and aimed not so much at day-to-day readers as at people searching for specific topics.

Notes on Sewall Wright

On Reading Wright gave an overview of the planned series of notes, and includes some general reflections on Wright's reputation.

Before continuing with the series as planned, I realised that I needed to cover an additional topic, Wright's Method of Path Analysis This note is especially concerned to clarify the concept of a path coefficient, and the relationship between Wright's method and multiple regression.

In preparing the note on path analysis, I wanted to refer to some source containing the material on the statistical theory of correlation and regression that would be needed to understand Wright's work. I could not find a suitable source, so I decided to write it myself, using notes I have made on the subject over the years.

Notes on Correlation, Part 1 covers the general concepts of correlation and regression, and the justification for using them (which, like much in the foundations of statistics, is a moot point). Part 2 proves some key theorems on the correlation and regression of two variables, and discusses problems of interpretation. Part 3 outlines the theory of correlation and regression for more than two variables. This is particularly important for the understanding of Wright's path analysis.

After the note on Path Analysis I got back on the series as planned, with the following notes.

The measurement of kinship tries to explain Wright's approach to this, by contrasting it with the now more familiar methods of Gustave Malecot. The essential point is that Wright's kinship coefficients are in principle correlation coefficients rather than probabilities of identity (as in Malecot's system). A consequence of this is that kinship (or relatedness, or inbreeding) is relative to a specified population. The kinship between randomly selected individuals within such a population, relative to that population, is on average zero. This has implications for Hamiltonian inclusive fitness. Another implication is that Wright's kinship coefficients can be, and often are, negative (unlike Malecot's probabilities).

Wright's F-statistics. Wright devised a series of statistics known as F-statistics for measuring relationship and diversity within or between populations. The best known of these is FST, which is widely used as a measure of the genetic divergence between sub-populations of a species. My note traces the evolution of the F-statistics in Wright's work.

Genetic drift.. This note was originally going to be called 'Inbreeding and the decline of genetic variance', but that is not a very catchy title. I try to clarify the connection between genetic drift, inbreeding, and the decline of heterozygosis (a measure of genetic diversity). The note includes a detailed commentary on Wright's proof that heterozygosis tends to decline by 1/2N per generation.

Population size. I discuss the concept of effective population size and point out that Wright overlooked an important class of cases where effective population size is much larger than the current number of breeding adults.

Migration. Migration is important to Wright's theories because even very low rates of migration suffice to prevent subpopulations of a species diverging by genetic drift. The note traces Wright's work on the subject including his famous article on 'Isolation by distance'.

The adaptive landscape. Wright is closely associated with the concept of the adaptive landscape, though as far as I can find Wright himself never used this term. My note especially aims to explain the concept of a selective peak, and why Wright believed that there are a multitude of distinct selective peaks, usually of different fitness. In a related post on the Adaptive Landscape: Miscellaneous points, I discussed some issues not directly concerned with Wright, such as Stuart Kauffman's NK model, the relationship between selective peaks for genotypes and for gene frequencies, and the accessibility and stability of peaks.

The shifting balance theory of evolution.
This final note in the series is split into two parts. Part 1 examines the origins of Wright's famous shifting balance theory, and analyses the contents of the original version of the theory, as published in 1929-31. Part 2 explores subsequent developments in the theory, some of which are very important. Notably, as early as 1932 Wright abandoned his insistence that only genetic drift in small populations could take a population away from a suboptimal selective peak, as he now accepted that environmental fluctuations could have the same effect. In my view this removed much of the rationale for Wright's emphasis on population structure in evolution, though Wright himself never fully absorbed the implications of the change, which many biologists have overlooked.

Altogether, this series of posts would come to over 100 print pages. That's very nearly a book's worth! Alas, even if there were a market for such a boring book, I don't have the time, energy, or expertise to research and write it to the necessary standards, but I hope that anyone making a serious study of Wright will find something useful in my posts.

R. A. Fisher

My various notes on R. A. Fisher are mainly attempts to correct misunderstandings of his views which I have come across from time to time.

Fisher and Wright on population size (and here). These two notes were written shortly before I started my series of notes on Sewall Wright. Fisher is sometimes thought to have believed that entire species are randomly mating single populations. As this is palpably false, it is worth examining what Fisher really thought. In my first note I show, using Fisher's publications and letters, that he believed that migration between districts was usually frequent enough to offset their divergence by genetic drift. This does not imply that species are literally random mating (if they were, migration would be irrelevant), but only that for many purposes they can be treated as if they were. In the second note I examine what Fisher says about the actual population size of species. An Addendum is here.

Fisher on epistasis. It is sometimes claimed that Fisher ignored epistatic gene effects or considered them unimportant. My post shows that Fisher took account of epistasis in a variety of ways. Two further posts produce additional evidence: here and here.

Fisher on the adaptive landscape Following my note on Sewall Wright's adaptive landscape concept, I wrote this post on Fisher's views on the subject. Notably, he believed that environmental change, particularly in the biotic environment, made the idea of a constant landscape inapplicable.

Fisher on inclusive fitness

In this short post I draw attention to a passage by Fisher which contains a general anticipation of Hamilton's concept of inclusive fitness.

J. B. S. Haldane

I have written much less about Haldane than about Fisher and Wright. This is not because Haldane was less important or original. Haldane probably originated more of the basic results of population genetics than either of the others. But I tend to write posts mainly on issues that are obscure or controversial, whereas most of Haldane's results are clear and uncontroversial.

I have however devoted two posts to Haldane: one on Haldane's Dilemma, which examines Haldane's pioneering attempt to quantify the amount of genetic change possible by natural selection in a given period (see here for some corrections), and Haldane's Selection Theorem which comments on Haldane's proof that the probability that an individual favourable mutation will be successful is 2s, where s is the coefficient of selection.

Odds and ends

Finally, a few posts cover other issues.

Good Point? arises from a study by the economists Samuel Preston and Cameron Campbell. If intelligence is partly inherited, and less intelligent people on average have more children, it seems to follow that the average intelligence of the population will decline from one generation to the next. Preston and Campbell use an elaborate mathematical model to show that this is not necessarily the case. My post examines the argument, using a much simpler model due to the statistician I. J. Good. Briefly, I conclude that the argument is mathematically possible but biologically unrealistic. The case illustrates the danger of using sophisticated mathematics without properly considering the underlying assumptions.

Heterosis and the Flynn Effect looked sceptically at claims that heterosis (reduced inbreeding) might explain the long term increase in IQ scores.

Origins of the British is a piece examining the evidence on the ethnic origins of the people of the British Isles, following the recent book by Stephen Oppenheimer.

Group Selection and the Wrinkly Spreader takes a look at a recent defence of group selection by E. O. and D. S. Wilson, by examining in detail an example (the 'wrinkly spreader' variant of a certain bacterium) that they claim is a good case of group selection in action. It isn't.

Ethnic Genetic Interests Revisited looks at the new edition of Frank Salter's book Ethnic Genetic Interests, which includes comments on my own critique of the first edition.

Genophilia traces the origins of the term 'genophilia', which has been wrongly attributed to Francis Galton.

Labels: ,

Sunday, November 23, 2008

R. A. Fisher on Inclusive Fitness (again)   posted by DavidB @ 11/23/2008 04:12:00 PM

I recently posted a note on an anticipation of Hamilton's concept of inclusive fitness by R. A. Fisher in the Genetical Theory of Natural Selection.

As I pointed out, in that passage Fisher did not quantify the effect of what he called 'indirect effects of natural selection', so he did not state what we now call 'Hamilton's Rule' (though later in GTNS he came close to it in his discussion of distasteful insects).

However, I have noticed the following passage in a letter from Fisher to Leonard Darwin dated 27 June 1929, which states Hamilton's Rule for the special case of parental care:

The reproductive value at different ages must determine the extent to which parental care pays. If all ages were of equal reproductive value, a species would tend to benefit its offspring up to the point at which the offspring gains double the advantage which the parent loses, but no further. Of course immature offspring are usually worth much less, and so should be cared for only at a cheaper rate still. But if crocodiles were able to recognise their mature offspring, I suppose they would co-operate with them not only on terms of mutual advantage, but on terms of joint advantage so long as the loss of either did not exceed half the gain of the other. Hence society starts with the family. - Natural Selection, Heredity and Eugenics: Including selected correspondence of R. A. Fisher with Leonard Darwin and others, edited by J. H. Bennett (1983), p.104-5

The important qualification about the maturity of the offspring is probably also in Hamilton somewhere, but I can't immediately find it. Dawkins makes a similar point in his '12 Misunderstandings of Kin Selection'.

Added: I had another skim through Hamilton's papers, but I still couldn't find a discussion of the maturity point. However, I imagine Hamilton would have said that differences of maturity should be taken into account in quantifying the 'benefit' to an offspring of a given amount of parental care. So, for example, in a species with very high infant mortality, the benefit of a given amount of resources to an immature offspring, measured by the expected number of its own future offspring, would be less (other things being equal) than to an offspring who has already reached sexual maturity. Against this, 'other things' are seldom equal, and the benefit of a given amount of resources (e.g. food) to a newborn may be much greater than to an older offspring which can already fend for itself.

Labels: ,

Sunday, November 09, 2008

Notes on Sewall Wright: The Shifting Balance Theory (Part 2)   posted by DavidB @ 11/09/2008 01:32:00 AM

Part 1 of this note dealt with Sewall Wright's Shifting Balance theory of evolution (the SBT) in its original form, as propounded between 1929 and 1931. This final part deals with subsequent developments in the theory. These include refinements and elaborations, some changes of emphasis, one major addition, and one major change of substance. In particular I will cover:http://www.blogger.com/post-edit.g?blogID=10083047&postID=4815748383060203879
Blogger: Gene Expression - Edit Post "Notes on Sewall Wright: The Shifting Balance Theo..."

1. The role of new mutations
2. The concept of selective peaks
3. The effect of changes in environment
4. The adaptiveness of evolution
5. The process of intergroup selection
6. The three phases of the shifting balance.

I will throw in a few remarks about Fisher and Haldane as well.

NB: all page references are to Evolution: Selected Papers unless otherwise stated. Spelling and punctuation of quotations are as printed (some use American and some use British spelling). Square brackets indicate comments of my own.

1. The role of new mutations

First, a few words are necessary about the meaning of 'mutation'. In the 1930s very little was known about the physical and chemical nature of genes and therefore about the nature of changes to genes, in other words 'mutations'. In 1939 Wright gave a useful statement of current assumptions at that time: 'Presumably any particular gene can arise at a single step from only certain of the others and in turn mutate only to certain ones but the latter may be capable of producing mutations which could not have arisen from the former at one step and so on through a branching network of potentially unlimited extent' (306). This implies a 'step-by-step' evolution of genes themselves. Each gene may be said to have a first appearance in time, though recurrence of the same gene at different times is not excluded. The occurrence of mutations depends on the prior existence of the genes of which they are variants, so a particular type of mutation itself has an origin in time. The opportunity for mutations of a particular type will also depend on the frequency of the relevant genes in the population. If a gene is changing in frequency, the opportunity for new mutations of that gene will also be changing. We may therefore expect the rate of specific mutations to increase or decrease over time. This may explain some otherwise obscure comments in Fisher's Genetical Theory of Natural Selection (GTNS). In several places Fisher assumes that any new mutation will initially have a low rate of occurrence, but that this rate will increase over time (see especially GTNS p.78). This assumption makes sense if Fisher held the same view as Wright on the nature of mutations.

Wright's original formulation of the SBT said little about the role of beneficial new mutations in evolution. In 'Evolution in Mendelian populations' (EMP) (1931) Wright said only that in very large populations 'there is little scope for evolution. There would be complete equilibrium under uniform conditions if the number of allelomorphs at each locus were limited. With an unlimited chain of possible gene transformations, new favorable mutations should arise from time to time and gradually displace the hitherto more favored genes but with the most extreme slowness even in terms of geologic time' (150). This negative assessment of the prospects for evolution in large undivided populations conflicted with that of Fisher in GTNS, which appeared in 1930 after Wright's 'Evolution in Mendelian populations' (EMP) (1931) had been sent for printing. (A few short notes were added to take account of Fisher's work, but major changes were not possible.) Whereas Wright had concluded that large freely interbreeding populations were unfavourable to progressive evolution, Fisher believed that large populations (without strong barriers to gene flow) were favourable to evolution because of the greater scope they offered to new mutations. Fisher reinforced this in his published review of EMP, saying that 'even under static conditions, unless it is postulated that the organism is as well adapted as it could possibly be (in which case, obviously, evolutionary improvement is impossible), the equilibrium will be broken by the occurrence of any favourable mutation, of which a steady stream will doubtless occur in one or other of the very numerous individuals produced in each generation. The advantage of the large populations in picking up mutations of excessively low mutation rate seems to be overlooked [by Wright]... ' (Natural Selection, Heredity and Eugenics, p.288). Here, then, we find one of the major differences in the evolutionary theories of Wright and Fisher.

Wright elaborated and defended his position on this issue on several occasions, beginning with his own review of Fisher's GTNS in 1930. He notes that Fisher's 'scheme appears to depend on an inexhaustible flow of new favorable mutations. Dr. Fisher does not go into this matter of inexhaustibility but presumably it may be obtained by supposing that each locus is capable of an indefinitely extended series of multiple allelomorphs, each new gene becoming a potential source of genes which could not have appeared previously. The greatest difficulty seems to be in the posited favorable character of the mutations. Dr. Fisher, elsewhere presents cogent reasons as to why the great majority of all mutations should be deleterious. He shows that all mutations affecting a metrical character 'unless they possess countervailing advantages in other respects will be initially disadvantageous' [see Note 1]. He shows that in any case the greater the effect, the less the chance of being adaptive. [See Note 2] Add to this the point that mutations as a rule probably have multiple effects, and that the sign of the net selection pressure is determined by the greater effects, and it will be seen that the chances of occurrence of new mutations advantageous from the first are small indeed' (85).

There is a risk of ambiguity in this conclusion. If Wright means to say that only a small proportion of new mutations will be initially advantageous, his arguments are plausible, though not conclusive. If on the other hand he means to say that the 'chances of occurrence' of any such mutations, even in a large population, are small, the arguments are quite insufficient. It would be like confusing the probability that John Smith will die tomorrow, which is small, with the probability that someone will die tomorrow, which in a large population is virtually certain. Suppose that in a population of one billion, one in 100,000 individuals in each generation show some new mutation or other. There would then be 10,000 such new mutations in the population in each generation. Evidently, even if only a very small proportion of these mutations are advantageous, there might still be (in Fisher's terms) a 'steady stream' of them. Whether or not this is the case is an empirical matter.

Wright made similarly negative comments about new mutations on various occasions when defending the SBT:

1932: [under constant conditions] 'further evolution can only occur by the appearance of wholly new (instead of recurrent) mutations, and ones which happen to be favorable from the first. [Comment: this is valid only if 'new' means 'new under the same conditions'. Evolution might also occur through recurrence of mutations previously unfavourable but now favourable under new conditions.] Such mutations would change the character of the field [the 'adaptive landscape'] itself, increasing the elevation of the peak occupied by the species. Evolutionary progress through this mechanism is excessively slow since the chance of occurrence of such mutations is very small [comment: note the same ambiguity as in Wright's review of GTNS] and, after occurrence, the time required for attainment of sufficient frequency to be subject to selection to an appreciable extent is enormous' 165). [The last remark is puzzling. Any favourable new mutation is subject to selection from the outset, but it is at risk of being lost by random drift before it becomes safely established. It is not 'safe' until it has recurred a few hundred times. But in a large population, even with very low mutation rates this should only take a few hundred generations, which is not long in evolutionary time. This is one of Fisher's main arguments for the evolutionary advantage of large population size: see GTNS p.78. Once a mutation has reached a level of a hundred or so copies - say, a frequency of 1 in 10,000,000 in a population of a billion - the rate of advance will depend on the selective advantage of the gene. If the selective advantage is such as to double its frequency in 1,000 generations - equivalent to an advantage of rather less than 1 in 1,000 - the gene will go from first appearance to fixation (or equilibrium against back-mutation) in less than 30,000 generations. [See Note 3] This is not very long in geological time, though it would be imperceptibly slow to human observers, and until the later stages the gene would still be rare.]

1939: 'there is very little chance of occurrence of wholly new alleles in a large freely interbreeding population. There is also very little chance that any new mutation will be favorable at its first occurrence and even if favorable very little chance that it will attain sufficient frequency to be subject to selection to an appreciable extent' (321) [The italics for 'large' are Wright's own. The implicit assumption seems to be that in a large population every good mutation will already have been found. But note my previous comment that the advantageousness of a mutation is relative to conditions.]

1948: 'Presumably all mutations that are likely to arise at one or two steps from the more abundant genes present in the population have been tried by natural selection and found wanting, and thus are found at negligibly low frequencies if at all. There may be very valuable mutations which could only arise through a succession of unfavourable ones but these will have very little chance of occurring' (535) [see the previous comments]

1959: 'A genetic system can take the step from one selective peak to another one only by some non-selective process. A novel mutation may do this by creating a new peak, but this must be an excessively rare event' (Tax, p.451)

Wright maintained his opposition to the importance of new mutations to the end of his career. But his arguments are always brief and unquantified. There is a recurring ambiguity, as noted above, between the probability that a given new mutation will be advantageous, and the probability that any advantageous new mutation will occur. Fisher's view (GTNS p.78), was that in large populations, of the order of a billion (which includes most plant and invertebrate animal species), such mutations would occur often enough to be important in evolution. Wright opposed this conclusion, but it is difficult to avoid the feeling that in doing so he was trying to shore up a position which he had adopted without first considering mutation. It should at once be said that Fisher was equally stubborn (and more intemperate) in defending his own positions.

2. The concept of selective peaks

As noted in my post on Wright and the adaptive landscape, in 1932 Wright introduced the metaphor of a multidimensional field of gene combinations. I have discussed Wright's adaptive landscapes at length (see also here), so I will not repeat those discussions now. The point I wish to emphasize here is that the concept of selective peaks, valleys, etc, as introduced in 1932 was not just a new metaphor adopted for purposes of exposition, but an important addition of substance to the SBT.

From 1932 onwards it is a fundamental part of the SBT that there is a multiplicity of selective peaks in the field of possibilities available to a population. Many of these peaks are of different height (fitness). Under the influence of selection alone, and under constant conditions, a population cannot move from one peak to another. Under selection a population will tend to move towards one of the peaks, but usually the closest, which will seldom be the highest. It is therefore very likely that a population will be 'trapped' on an inferior peak, from which it cannot move purely by selection under constant conditions.

This aspect of the SBT is so important, and so familiar from Wright's later writings, that it is tempting to assume that in substance it was already there in the original version of the theory, even if the analogy of 'peaks' and 'valleys' was missing. In purely genetic terms, the meaning of a 'peak' in the landscape is that there is some set of gene frequencies such that any small departure from that set is opposed by selection. If there is more than one such set, there are multiple peaks. But the terminology of 'peaks', etc, is inessential. The substance of the theory could be stated quite well without it. It is therefore natural to expect some such equivalent statement in EMP, but I have not found one. It is true that, when discussing evolution in large populations in his 1929 summary, Wright does say that 'changed conditions cause a usually slight and reversible shift of the gene frequencies to new equilibrium points' (78), but in the context of his discussion in EMP (150) it appears that Wright was thinking only of a shift in the equilibrium between selection and mutation. His repeated claims that such shifts are essentially reversible would be difficult to reconcile with the concept of multiple peaks, and indeed, once Wright had clearly formulated that concept, he abandoned the claim of irreversibility.

The concept of multiple selective peaks is closely related to Wright's emphasis on epistatic fitness interactions, but this familiar feature of Wright's philosophy of evolution is also lacking from EMP. The beginnings of a new emphasis on epistasis can be found in 'Statistical theory of evolution' (1931), written after EMP but published slightly earlier. In discussing populations of intermediate size, Wright points out that 'it is the organism as a whole that is selected, not the individual genes, and a gene favored in one combination may be unfavorable in another' (95). And in subdivided populations 'exceptionally favorable combinations of genes may come to predominate in some of the subgroups' (95). But there is still, as far as I can see, no indication that even large populations may have alternative stable states, as proposed by Wright in 1932.

It is natural to wonder how Wright arrived at his 1932 conception of multiple selective peaks. It is possible that his reading of the section on 'Simple metrical characters' in GTNS had planted the seed. We know from Wright's correspondence that he was encouraged by receiving an offprint from Haldane in which the latter outlined similar ideas (Provine 275). It is also possible that Wright had privately reached his conception (without the geometrical analogy) much earlier, as Provine seems to think (Provine 275). But if Wright did indeed have the concept in mind when writing the paper which became EMP it is odd that he did not incorporate it in that work. I can only leave this as an unsolved puzzle.

3. The effect of changes in environment

As I have mentioned in previous posts (and as is also pointed out by Provine), until 1931 Wright considered that the evolutionary effects of temporary changes in environment would 'usually' or 'essentially' be reversible (78, 85, 150). But in 1932, with his paper on 'The roles of mutation, inbreeding, crossbreeding and selection in evolution', he took a new position. After introducing his concept of the multidimensional field of gene combinations, and the associated diagrams, he notes that 'the environment, living and non-living, of any species is actually in continual change. In terms of our diagram this means that certain of the high places are gradually being depressed and certain of the low places are becoming higher... Here we undoubtedly have an important evolutionary process and one which has been generally recognised. It consists largely of change without advance in adaptation. The mechanism is, however, one which shuffles the species around in the general field. Since the species will be shuffled out of low peaks more easily than high ones, it should gradually find its way to the higher general regions of the field as a whole' (167). This formulation is repeated, usually in similar words, in most of Wright's subsequent general surveys of evolutionary theory, e.g. 323, 374, 535, and 562.

It is perhaps not immediately clear (and Wright does not explain) why 'the species will be shuffled out of low peaks more easily than high ones'. Presumably it is partly because higher peaks may have stronger selection coefficients, and will therefore resist drift more strongly, but mainly because, other things being equal, higher peaks will have wider zones of attraction. A population may therefore drift further from the peak but still be pulled back towards it by selection. In geometrical terms, if two solid figures have the same shape, the taller figure will have the larger base. In genetic terms, the higher the fitness of a genotype relative to the average fitness of the population, the wider will be the range of gene frequencies within which the genes making up that genotype will be positively selected. But this is not an absolute rule. If a peak of fitness depends on very specific epistatic interactions of several genes, the peak may be high but narrow, like a spike. In this case a population may be easily jolted out of a high peak by environmental change, and never return to it. Changing environments may therefore be expected to promote mainly genes that are advantageous in a wide range of genetic combinations.

We are bound to ask why Wright changed his mind about the effects of environmental change. Wright himself gives no help on this point, because he never (I think) admitted that he had changed his mind. The change in 1932 goes together with Wright's formulation of the adaptive landscape concept, and in one sense goes very naturally with it. If we accept that there are multiple peaks of fitness in the landscape, and that it is largely a matter of chance which peak is most accessible to a population, then any factor which causes populations to move in a quasi-random way around the landscape could have the effect of 'shuffling' the population from one zone of attraction to another. But in another sense there is a tension between the landscape concept and environmental change, since the effect of environmental change is not so much to move the population around a fixed underlying landscape as to modify the landscape itself. As several commentators have suggested, in a changing environment the proper analogy is not so much with a solid landscape as with a choppy sea.

It is quite possible that Wright's change of mind in 1932 resulted simply from his own reflection on the issues. But he may also have been influenced by the positions already taken by Fisher and Haldane. As I mentioned in my post on Fisher and epistasis, in the section on 'Simple metrical characters' in GTNS Fisher had pointed out that metrical traits under stabilising selection could lead to multiple stable equilibrium gene frequencies, and that changes in selection coefficients due to environmental change could produce a lasting shift from one equilibrium to another. Wright had certainly read this section of GTNS, since he quotes from it in his review of the book. At that time (1930) he still thought that the effects of environmental change would usually be reversible, but he qualifies that position, saying: 'It may be granted that an irregular sequence of environmental conditions would result occasionally in irreversible changes (because of epistatic relationships), thus giving a real, if very slow, evolutionary process... ' (85). Over the next year Wright may have come to reconsider whether the process would only be 'occasional'. Haldane's The Causes of Evolution (1932, p.56) also contains a highly relevant passage: 'the change from one stable equilibrium to another may take place as the result of the isolation of a small unrepresentative group of the population, a temporary change in the environment which alters the relative viability of different types, or in several other ways...'. Unfortunately I do not know the exact dates of publication of Haldane's book and Wright's article of the same year, so it is not clear whether Wright could have seen it before writing his article. Wright had certainly read an article by Haldane of 1931 on 'Metastable Populations', which also discusses the theory of multiple equilibria, but this article refers only to chance fluctuations in the composition of populations, and not to environmental change, as possible reasons for a switch between alternative equilibria.

Whatever the reasons for Wright's new position on environmental fluctuation, he cannot be accused of playing down its importance. Several times he emphasised it: 'here we undoubtedly have an evolutionary process of major importance' (322), 'it can hardly be doubted that this has been one of the most important causes of evolution' (374), and 'there can be no doubt that a large part, perhaps the major portion, of evolutionary change, is of this character' (562). Nevertheless, it has often escaped the notice of later biologists, who assume that Wright continued to see genetic drift as the only way out of evolutionary stagnation.

Despite Wright's acceptance of, and even emphasis on, environmental change as a possible cause of 'peak shift', in some respects the implications of this new position were not fully assimilated into Wright's evolutionary philosophy. First, Wright might have been expected to rethink his position on the importance of population size and structure. On the face of it, a population of any size - large, small, or medium - may equally be affected by environmental change, and equally likely to shift from one peak to another. If this is so, Wright's belief in the ineffectiveness of evolution in large populations would need to be reconsidered. I am not aware that Wright did so. Second, if environmental change is capable of upsetting the equilibrium, perhaps other factors might also do so. One such factor is migration. If different gene frequencies are able to evolve in subpopulations, through genetic drift or local selective pressures, then migration between subpopulations may upset the equilibrium in some or all of them. Wright's SBT does allow for one particular effect of migration: if one subpopulation happens to have reached a higher selective peak than others, migration from that subpopulation may shift others towards the higher peak. But my point is that any migration between subpopulations with different gene frequencies may break up the existing equilibria and give the opportunity for new, and often higher, equilibria to be attained. It therefore seems that even if new favourable mutations are too rare, and mutation pressure is too weak, shifts between equilibria might occur in three ways: genetic drift in small subpopulations, environmental changes (biotic or nonbiotic) which might in principle affect populations of any size, and migration between subpopulations of any size.

4. The adaptiveness of evolution

I can deal more briefly with this topic because it has been dealt with thoroughly by Provine, who traces the change in emphasis from nonadaptive evolution, even at the level of differences between species, in Wright's early work, to a much stronger emphasis on adaptation in the post-war writings.

The only point I would add is that even in his later writings Wright saw adaptation as occurring mainly through intergroup selection. Selection within a single population, large or small, is in Wright's view ineffective in producing continuing adaptation because any single population will soon become stuck on a suboptimal selective peak. Evolution within subpopulations leads to divergence between them, either through genetic drift or fluctuating environmental factors. Neither of these is adaptive with respect to long term trends. This is obvious in the case of genetic drift, but even selection under fluctuating environment may be regarded as a quasi-random factor. It contributes to long-term adaptation only by providing the variation between subpopulations on which intergroup selection can work: 'In this theory [the SBT], the joint effects of random drift and intrademic selection merely supply raw material for interdemic selection' (618). Some subpopulations will, by chance, have combinations of genes which have the potential to increase fitness in the species as a whole, and these are spread by intergroup (interdemic) selection. The processes which generate diversity between subpopulations may be seen as analogous to mutation in the conventional neo-Darwinian framework: each mutation may have some underlying cause, and is not strictly random in the sense that mutations in all directions are equally probable, but it is random with respect to the long-term adaptiveness of the species as a whole.

It should be evident by now that Wright's SBT is a radical departure from the neo-Darwinism of Fisher, Haldane, and most other theorists of the 'evolutionary synthesis', and it should not be surprising that it has found admirers among such rebels against the synthesis as the punctuationists and the group selectionists of the last few decades.

5. The process of intergroup selection

Despite its importance in the SBT, Wright says little about the process of intergroup (or interdemic) selection. In principle one can envisage three different ways in which groups with higher average fitness could influence the properties of the wider population:

a) one group may become extinct, and a fitter group may then expand into the unoccupied territory

b) one group may move into the territory occupied by another group and displace it without interbreeding

c) members of one group may migrate into the territory of another, and influence its gene pool by interbreeding.

I do not think that Wright ever mentions process (a). In various places he seems to favour either process (b) or (c). In 1931 he says that 'exceptionally favorable combinations of genes may come to predominate in some of the sub-groups. These may be expected to expand their range while others dwindle' (95, see also 152). Since there is no mention of interbreeding, this seems to be closest to process (b). In 1932, on the other hand, he says that successful local races 'will expand in numbers and by crossbreeding will pull the whole species toward the new position' (168). This is closer to process (c). In 1939 he combines both (b) and (c), saying successful races 'by cross breeding with other races, as well as by actual displacement of these, will pull the species as a whole toward the new position' (324). In 1940 he says that successful local races may 'tend to displace all other local strains by intergroup selection (excess migration)' (351). The word 'displace' tends to suggest process (b). Also in 1940 he refers to some groups 'supplying more than [their] share of migrants to other regions, thus grading them up to the same type' (375, see also 423). The reference to 'grading up' may seem to imply a mingling of populations and interbreeding (process (c)). There is of course no reason why both processes should not play a part, as explicitly suggested in 1939. But both face some obvious difficulties. With process (b) it is necessary to explain why there is no interbreeding between the different types. This would be surprising unless some degree of reproductive isolation - i.e. speciation - had already evolved. With process (c) the problem is to explain why interbreeding does not break up the advantageous gene combinations on which the superiority of one group is supposed to rest. The problem is expecially severe if the successful group is initially small in relation to the whole population, as assumed at least in the original version of the SBT, with its reliance on genetic drift. This issue has been studied in several recent assessments of the SBT, the general conclusion being that the process is possible but, like the SBT as a whole, requires rather a lot of quantitative conditions to be met if it is to succeed.

As I mentioned in Part 1 of this note, 'intergroup selection' as envisaged by Wright has little to do with 'group selection' as envisaged by most of its recent advocates. Wright does not suggest that successful groups have evolved adaptations for group living, or that their members behave 'altruistically' towards each other. His claim is rather that the subdivided population structure allows some groups, by chance, to form combinations of genes that are advantageous to individual fitness. The higher mean fitness of the groups is the resultant of these individual fitness advantages.

However, in some of his later writings Wright does mention the possibility of the evolution of altruistic social traits through intergroup selection, for example: 'characters may be fixed [through random drift in small subpopulations] that are favourable to the group as a whole even though disadvantageous in individual competition' (536, see also Tax p.466). The problem, of course, is that this requires migration from other groups to be near zero if the 'altruistic' groups are to survive for more than a brief period without being undermined by freeloaders.

6. The three phases of the shifting balance

Finally, in his later writings on the SBT Wright often refers to three 'phases' of the shifting balance. Like the term 'shifting balance' itself, the 3-phase formulation seems to have been first used in the article of 1970 on 'Random drift and the shifting balance theory of evolution'. The phases are described as the 'phase of random drift', in which gene frequencies in each deme drift around the current selective peak; the 'phase of mass selection', in which a deme has drifted into the zone of attraction of a new selective peak, and moves rapidly towards it under the influence of selection; and the 'phase of interdemic selection'.

The explicit distinction between three phases seems to be new in 1970, but it is essentially a clarification of the process which had been implicit in various writings at least since 1932. I won't comment further on the substance of the three phases, which have already been discussed under various headings.


The purpose of this Note has been mainly to analyse the various aspects of the SBT in their chronological development, and not to assess its credibility. A few years ago I drew attention to some recent controversy, mainly in the journal 'Evolution', by biologists pro and con the SBT. These discussions still seem to be relevant, but I note that some aspects of the SBT (or of Wright's philosophy of evolution more generally) have not been sufficiently recognised. One is the important change in 1932 when Wright recognised that environmental fluctuations, as well as genetic drift, could have lasting effects on the genetic equilibrium of a population. Despite Wright repeating this point on several occasions, it has been widely overlooked (Dobzhansky being a notable exception, and Provine a more recent one). There is some excuse for this if, as I have argued, the implications of the change were never sufficiently absorbed by Wright himself. The second point is that Wright was consistently negative towards the prospects for new favourable mutations. I have suggested that his comments involve an ambiguity between the rarity of new favourable mutations among all mutations, which is not disputed, and the rarity of occurrence of any such mutations, even in a large population and over a timescale of many generations. Wright's negative conclusions are only valid if such mutations are rare in both senses. His position implies that the differences between populations, whether closely related species or subpopulations of the same species, will arise mainly by different epistatic combinations of existing genes, rather than by the selection of new variants. This is in principle testable.

This is the last of my planned notes on Sewall Wright, and it is a relief to get to the end of the journey. I will not attempt any overall assessment at this stage, but I will probably prepare a post giving links to all the notes in the series, as well as to related notes on Fisher and Haldane.

Note 1. See GTNS p.107, but note that according to Fisher, if the effect of the mutation is small (say, no more than 1 percent of the standard deviation of the trait), even mutation rates as low as one in a million may be sufficient to overcome the initial selective disadvantage and eventually push the mutation into a frequency where it is favoured by selection.

Note 2. The reference is evidently to the section in GTNS on 'The nature of adaptation'. What Fisher shows, given his assumptions, is that:

a) other things being equal, a smaller mutation is always more likely to be advantageous than a larger one. (As Kimura pointed out much later, this is partially offset by the consideration that the size of any advantage is likely to be greater for a larger mutation, and this affects the probability that it will survive in the population. Overall, mutations with effects somewhat above the minimum size have the highest probability of survival.)

b) for any given size of mutation, the probability of being advantageous is lower the more aspects of fitness are affected by it.

Using a very schematic geometrical model, Fisher quantifies the probability that mutations of a given size will be advantageous. It is assumed in the model that the present position of the organism is at some distance from a local optimum. The probability that a mutation will be advantageous is inversely related both to the size of the mutation and to the square root of the number of dimensions of fitness affected. For very small mutations the probability is close to 1/2, declining to zero for mutations with an effect more than twice the distance between the starting point and the local optimum (this zero probability being an assumption built into the model, rather than proved by it). But note that the probabilities are not always very small, even for mutations with an effect quite substantial relative to the present distance between the organism and the optimum. Also, since the probability declines in proportion only to the square root of the number of dimensions of fitness affected, not to that number itself, the decline is not as rapid as might be feared. Contrary to some popularisations, Fisher does not claim that mutations with very large or complex effects are impossible, or even highly improbable, only that they are less likely to be advantageous than those with smaller and/or simpler effects.

Note 3: Some readers may wonder how this can be reconciled with Haldane's rule of thumb that up to one mutation can go to fixation, on average, in every 300 generations - see my post on Haldane's Dilemma. I think the explanation has two parts. First, Haldane's '300 generations' estimate assumes that a gene under selection starts from a position of balance between adverse selection and mutation pressure, and then becomes favourable due to a change in environment. On this assumption the gene will already have a small but not negligible frequency in the population. Second, the '300 generations' figure does not mean that a single gene under selection goes from rarity to fixation in 300 generations, but rather that, on average, one gene could be fixed in every 300 generations. There is a difference between these two claims. Under typical selection intensities of 1 in 1000, or even 1 in 100, the process of fixation for a single initially rare gene would obviously take longer than 300 generations. Haldane's model assumes that there are a number of genes undergoing selection simultaneously or overlapping with each other. If we imagine, say, 100 genes starting the process of selection at the same time, and all taking 30,000 generations to reach fixation, the average number of genes fixed per generation over the period of 30,000 generations would be 100/30,000 = 1/300, but these would all reach fixation in a bunch at the end of the period. More realistically, if the periods of selection are overlapping in a more-or-less random way, and selection has been in progress for long enough, we would expect any period of, say, a thousand generations to see a few genes reaching fixation, with an average of about 1 per 300 generations.


R. A. Fisher, The Genetical Theory of Natural Selection, 1931, variorum edition ed. J. H. Bennett, 1999.
R. A. Fisher: Natural Selection, Heredity and Eugenics: Including selected correspondence of R. A. Fisher with Leonard Darwin and others, edited by J. H. Bennett (1983).
J. B. S. Haldane, 'Metastable populations', Proceedings of the Cambridge Philosophical Society, 27, 1931, 137-142.
J. B. S. Haldane, The Causes of Evolution, 1932 (reprint ed. E. Leigh, 1990)
William B. Provine, Sewall Wright and Evolutionary Biology, 1986.
Sewall Wright: 'Physiological genetics, ecology of populations, and natural selection', in Evolution After Darwin, vol. 1, ed. Sol Tax, 1960 (Tax). (Article first published in 1959.)
Sewall Wright: Evolution: Selected Papers (ESP), ed. William B.Provine, 1986.
Sewall Wright: 'Random drift and the shifting balance theory of evolution', in Mathematical Topics in Population Genetics, ed. Kojima, 1970.

Labels: ,

Thursday, October 23, 2008

Notes on Sewall Wright: The Shifting Balance Theory - Part 1   posted by DavidB @ 10/23/2008 03:52:00 AM

Finally, Sewall Wright's Shifting Balance theory of evolution. This will positively, definitely, categorically be my last note on Sewall Wright. Unless I think of something else.

For convenience I will split the note into two parts, one dealing with the theory in its original form, and the second dealing with subsequent developments.

Two catch-phrases indissolubly linked with Sewall Wright are the adaptive landscape, and the shifting balance. In preparing my note on Wright's concept of the adaptive landscape I was surprised to discover that Wright himself seldom if ever used this expression. I could not find a single example. I was therefore half-expecting that I would not find any reference to the shifting balance either - and I would have been half-right. Wright did use that term, but not, as far as I can find, until surprisingly late in his long career....

All page references are to Evolution: Selected Papers unless otherwise stated. See the References for details.

The first mention of 'the shifting balance'

Wright refers extensively to the 'shifting balance theory' in Volume 3 of his treatise Evolution and the Genetics of Populations, published in 1977, but I have not found this term in the first two volumes (1968 and 1969), or in anything else published by Wright before 1970. Nor was it used by authors such as Dobzhansky, Mayr, and Simpson, when describing Wright's ideas. The earliest use of the term I have found is in Wright's article of 1970 on 'Random drift and the shifting balance theory of evolution'. Admittedly, I have not read all of his 200-odd papers published before that year, but unless anyone can unearth an earlier use I suggest that the term was in fact coined in this article of 1970, some 50 years into Wright's career. The terminology of a theory is less important than its substance, but the absence of the term 'shifting balance' before 1970 (if I am right about this) does have two implications: first, we should not expect other authors (such as Fisher and Haldane) to have commented on the 'shifting balance theory' as such, and second, in the absence of a single label, it may not have been perceived as a single unified theory at all.

Earlier terminology

The apparent absence of the phrase 'shifting balance' before 1970 does not mean that Wright had never previously used the terms 'balance' or 'shifting', sometimes in close proximity. Wright was fond of the term 'balance', and related terms such as 'equilibrium' or 'poise', and used them for a variety of purposes, sometimes with a precise mathematical meaning, and sometimes more loosely. Here are some examples, chronologically arranged:

1931: 'The conditions favorable to progressive evolution as a process of cumulative change are neither extreme mutation, extreme selection, extreme hybridization nor any other extreme, but rather a certain balance between conditions which make for genetic homogeneity and genetic heterogeneity' (96)

1931: 'Evolution as a process of cumulative change depends on a proper balance of the conditions which... make for genetic homogeneity and genetic heterogeneity of the species' (158)

1941: 'The most general conclusion that can be drawn from the attempt to develop a mathematical theory of the simultaneous effects of all statistical processes that affect the genetic composition of populations is that in general the most favorable conditions for evolutionary advance are found when these are balanced against each other in certain ways, rather than when any one completely dominates the situation' (488)

1951: 'The general qualitative conclusion would still seem to hold that this [the evolution of culture] or any other evolutionary process depends on a continuously shifting but never obliterated state of balance between factors of persistence and change, and that the most favourable condition for this occurs when there is a finely subdivided structure in which isolation and cross-communication are kept in proper balance' (596)

1959: 'It is concluded that the most favorable conditions are those of balance: a balance among the directed processes that insures the maintenance of a high degree of heterozygosis in minor factors and a balance between the directed processes as a group and various sorts of random ones that insures extensive random drift around the equilibrium positions of the gene frequencies. All these conditions are met in the highest degree where there is a certain balance between isolation and crossbreeding within each of a large number of local populations of the species' (Tax, 470-1)

1960: 'In developing the balance theory of evolution, I was trying to arrive at a judgement of the most favorable conditions for evolution under the Mendelian mechanism' (619)

It will be noted that in the last of these passages Wright refers to the 'balance theory of evolution', and in another the 'balance between factors of persistence and change' is said to be 'continuously shifting'. Wright therefore comes very close to using the phrase 'shifting balance theory', but the fact that even in these passages he does not actually use it strengthens the suspicion that he had not yet coined the term as such.

What balance? And what shifts?

Many other uses of the terms 'balance' and 'shift' by Wright could be cited. I have quoted only those which come closest to his explicit term 'the shifting balance'. But even these examples, on a careful reading, leave it unclear what is the 'balance' that is seen by Wright as essential to effective evolution. Many different things are said to be 'balanced'. What exactly is a 'balance between factors of persistence and change', and is it the same as 'balance between conditions which make for genetic homogeneity and genetic heterogeneity'? Migration, for example, is a factor usually making for genetic homogeneity, but it is also often a factor making for 'change'. So which side of the balance does it fall on?

It might be hoped that in Wright's 1970 article, or in Volume 3 of Evolution and the Genetics of Populations, where the 'shifting balance theory' is discussed at length, we would find a clear statement of the meaning of the term itself. What is the relevant balance, how does it shift, and how does Wright's theory of evolution depend on the shifting of the balance? It may be that the answers are there, but if so, I have not found them. While Wright discusses various component parts of his theory, the overarching term 'the shifting balance' is not itself defined or explained. Moreover, whatever interpretation we give to the term 'balance', it does not seem that the 'shifting' of the balance itself plays any essential part in Wright's conception of the evolutionary process. The balance between the various factors of evolution, including selection, mutation, migration, environment, genetic drift, and population structure - to list the obvious ones - might stay constant, yet the process of evolution as described by Wright could still work, if the balance of factors is right. It is not the shifting of the balance, but the existence of the right kind of balance, which according to Wright is favourable to evolutionary progress. I conclude that the 'shifting balance theory' is a convenient and memorable label, but one without a precise literal meaning in isolation.

When was the theory first published?

Even if the label 'shifting balance theory' was not adopted until 1970, the doctrines covered by that label may have been propounded earlier. Wright himself, in 1970, claimed to have first published the theory as long ago as 1929. It can be confirmed that some of the key elements of the theory were contained in Wright's great 1931 paper 'Evolution in Mendelian populations', and summarised in shorter related papers beginning in 1929. Notably, these contain several key propositions which Wright maintained consistently to the end of his life:

a) The most favourable circumstances for evolution are in large populations subdivided into many small partially isolated populations;

b) Large freely interbreeding populations are not favourable to continuing evolution;

c) Genetic drift is an important part of the evolutionary process; and

d) The differential success of subpopulations, which Wright describes as 'intergroup selection', is an important contributor to cumulative evolutionary change.

If we regard these four propositions as constituting the shifting balance theory, then it was indeed first published in 1929.

Changes to the theory

This does not mean that there were no important changes to the theory after 1929. I believe there were changes both of substance and of emphasis, which I would summarise as follows:

1. In 1932 Wright adopted the metaphor of a multidimensional field of gene combinations and fitness values, which was later described (though not by Wright) as the 'adaptive landscape'. In my view this was more than just an illustrative device. The concept of selective peaks as alternative states of stable equilibrium was a valuable addition of substance to the theory, not corresponding to anything clearly stated in the original version.

2. Whereas in 1929-31 Wright had denied that temporary changes in environmental conditions would have major evolutionary effects, in 1932 he changed his position and accepted that environmental fluctuations could 'shuffle' populations from one evolutionary position of equilibrium to another, usually higher, one.

3. As a consequence of change (2), Wright reduced his emphasis on the importance of genetic drift, which he had originally claimed as essential to long-term evolutionary progress. After 1932 genetic drift was in principle only one of several mechanisms for change. But Wright did not make it sufficiently clear that his position had changed, and did not follow through the implications of the change for his views on the importance of population structure.

4. Throughout his career Wright maintained that the evolutionary process was partly adaptive and partly non-adaptive or 'random', but the emphasis he put on these elements shifted from the non-adaptive aspect to a greater emphasis on adaptation.

5. In his later writings on the subject Wright identified three 'phases' in the shifting balance process, but these are much less clear in the earlier versions of the theory.

Some but not all of these changes have already been identified in William Provine's admirable biography of Wright. The remainder of this note will mainly be concerned with documenting the various changes.

The original version of the theory (1929-31)

The key propositions of the original version of the theory were conveniently summarised by Wright himself in a short paper of 1929, which I will quote in full:

The frequency of a given gene in the population is affected by mutation, selection, migration and chance variation. The pressure exerted by these factors (excluding chance) and the position of equilibrium between opposing pressures are easily found. Gene frequency fluctuates about this equilibrium in a distribution curve, determined by size of population and the various pressures. The mean and variability of characters, correlation between relatives and the evolution of the population, depend on these distributions. In too small a population, there is nearly complete random fixation, little variation, little effect of selection and thus a static condition, modified occasionally by chance fixation of a new mutation, leading to degeneration and extinction. In too large a freely interbreeding population, there is great variability, but such a close approach of all gene frequencies to equilibrium that there is no evolution under static conditions. Changed conditions cause a usually slight and reversible shift of the gene frequencies to new equilibrium points. With intermediate size of population, there is continual random shifting of gene frequencies and consequent alteration of all selection coefficients, leading to relatively rapid, indefinitely continuing, irreversible and large fortuitous but not degenerative changes even under static conditions. The absolute rate, however, is slow, being limited by mutation pressure. Finally, in a large but subdivided population, there is continually shifting differentiation among the local races, even under uniform static conditions, which through intergroup selection brings about indefinitely continuing, irreversible, adaptive and much more rapid evolution of the species as a whole. (78)

These propositions are all stated more fully and supported by arguments in the 1931 papers 'Statistical theory of evolution' and 'Evolution in Mendelian populations'. (Although 'Statistical theory of evolution' was published first, it seems that 'Evolution in Mendelian populations' was completed first and 'Statistical theory of evolution' written as a summary of it.) Some of them are also covered in Wright's 1930 review of Fisher's Genetical Theory of Natural Selection. Most of them are restated and defended throughout Wright's career. The arguments given by Wright to support the key propositions (quoted in italics from the 1929 article) can be summarised as follows:

In too small a population, there is nearly complete random fixation, little variation, little effect of selection and thus a static condition, modified occasionally by chance fixation of a new mutation, leading to degeneration and extinction.

For this purpose 'too small' a population is one in which 1/4N (where N is the effective population size) is much larger than selection and mutation rates. (148) In this case genetic drift will be the main factor in evolution. Most genes will soon be fixed, there will be little variation within each population, and random unadaptive changes will lead to extinction. (93, 142, 148)

In too large a freely interbreeding population, there is great variability, but such a close approach of all gene frequencies to equilibrium that there is no evolution under static conditions.

For this purpose 'too large' a population is one in which both selection and mutation rates are much larger than 1/4N. (148) In this case, genetic drift will have little effect, and gene frequencies will be determined by the balance of selection and mutation. If selection on a gene is much stronger than mutation pressure, there will be almost complete fixation at each locus and therefore no evolution under fixed conditions. (148-50) If selection is not much stronger than mutation pressure, there will be more genetic diversity, but all gene frequencies will be close to equilibrium and evolution will be very slow unless conditions change. (150) Note that these arguments tacitly assume that there are no new favourable mutations, or existing ones still under selection.

Changed conditions cause a usually slight and reversible shift of the gene frequencies to new equilibrium points.

In 'Statistical theory of evolution' Wright says that 'Changes in conditions should be followed by systematic changes in gene frequencies until all have reached the new positions of equilibrium. Return to the old conditions should be followed by return to the old equilibria' (92). No specific reason is given for this conclusion. In 'Evolution in Mendelian populations' the explanation is slightly fuller. Following a strengthening of selection, gene frequencies will change, but 'The rapid advance has been at the expense of the store of variability of the species and ultimately puts the latter in a condition in which any further change must be exceedingly slow. Moreover, the advance is of an essentially reversible type. There has been a parallel movement of all the equilibria affected and on cessation of the drastic selection, mutation pressure should (with extreme slowness) carry all equilibria back to their original positions. Practically, complete reversibility is not to be expected, and especially under changes in selection which are more complicated than can be described as alternately severe and relaxed. Nevertheless, the situation is distinctly unfavorable for a continuing evolutionary process' (150). Note that Wright does not claim the changes are always reversible, only that this is 'essentially' or 'usually' the case. Bur he gives no clear reasons for this position, and only a year later (1932) he abandons it. As this is one of the major developments in the theory I consider it more fully in Part 2 of this note.

With intermediate size of population, there is continual random shifting of gene frequencies and consequent alteration of all selection coefficients, leading to relatively rapid, indefinitely continuing, irreversible and large fortuitous but not degenerative changes even under static conditions. The absolute rate, however, is slow, being limited by mutation pressure.

For this purpose an intermediate size of population is one where, for many genes, the selection pressure is not much stronger than the mutation rate, and neither selection pressure not mutation rate are much higher than 1/4N (150-1). (Since mutation rates were known by Wright not to be much higher than 1 in 100,000, this implies an effective population size of the order of 25,000.) In these circumstances genetic drift will be strong enough to cause considerable fluctuation in gene frequencies, but not to lead to rapid fixation of genes and loss of genetic diversity. Wright describes the result as 'a kaleidoscopic shifting of the average characters of the population through predominant types which practically are never repeated' (95, see also 151). But Wright emphasises that it would be a very slow process, as 'hundreds of thousands of generations are required for important evolutionary changes' (95). He mentions the effect of mutation rates as limiting the speed of change (78, 95, 151), presumably because with mutation rates not very different from the rate of genetic drift, mutation pressure tends to maintain genetic uniformity. But surely the main reason for slowness is that genetic drift itself is very slow in a population of many thousands.

Finally, in a large but subdivided population, there is continually shifting differentiation among the local races, even under uniform static conditions, which through intergroup selection brings about indefinitely continuing, irreversible, adaptive and much more rapid evolution of the species as a whole.

This is the most important proposition of the shifting balance theory in its original form. Wright never abandoned his view that a large subdivided population is most favourable to evolution. The subdivisions must be small enough, and isolated enough from each other, that the subpopulations can diverge in gene frequencies (151-2). Curiously, there is an important difference between Wright's accounts in his two 1931 presentations of the theory. In 'Statistical theory of evolution' Wright mentions only 'random drift' as causing the divergence between subpopulations, with the result that there is a 'geologically rapid drifting apart of the various sub-groups, even under uniform conditions. This is a non-adaptive radiation, but, on the average, not such as to lead to appreciable deterioration' (95). In 'Evolution in Mendelian populations', on the other hand, Wright mentions both genetic drift and local variation in selection pressures, so that the result is 'a partly nonadaptive, partly adaptive radiation among the subgroups' (151). There is of course no reason why both processes should not occur at once, perhaps in different subgroups or at different loci in the same subgroups at the same time. But the difference does have implications for the final phase of the process, which is 'intergroup selection'. On this, Wright says that 'Those [subgroups] in which the most successful types are reached presumably flourish and tend to overflow their boundaries while others decline, leading to changes in the mean gene frequencies of the population as a whole' (152). But if adaptive variation among subgroups is due only to local circumstances of selection (as seems to be suggested in 'Evolution in Mendelian populations'), those types which have highest fitness in their own locality cannot be expected to succeed elsewhere. If on the other hand the variation among subgroups is purely due to random drift (as seems to be suggested in 'Statistical theory of evolution'), it is not obvious that they will differ significantly in fitness for genetic reasons. 'Statistical theory of evolution' does however contain a very important development or clarification of the theory: 'Exceptionally favorable combinations of genes may come to predominate in some of the subgroups. These may be expected to expand their range while others dwindle. This process of intergroup selection may be very rapid as compared with mass selection of individuals, among whom favorable combinations are broken up by the reduction-fertilization mechanism in the next generation after formation' (95). The reference to 'favorable combinations' here is the first sign of the emphasis on epistatic fitness interactions which becomes increasingly important in the later development of the theory. But in the original statement, in 1931, it comes out of the blue and unsupported by any detailed analysis.

Likewise, the concept of 'intergroup selection' is not explored in any depth, and the claim that it would be more rapid than 'mass selection of individuals' is little more than a bare assertion. The suggested advantage that 'favorable combinations' are not immediately broken up by sexual reproduction seems to require not only a high degree of genetic unity within the subgroups, but the maintenance of that unity during the process of 'intergroup selection', despite the probable intermingling of different groups. The credibility of this process has been one of the main areas for recent controversy and research on the shifting balance theory. It should incidentally be stressed (see also Provine, p.288) that 'intergroup selection' as envisaged by Wright has little to do with 'group selection' as envisaged by most of its recent advocates. Wright does not suggest that successful groups have evolved adaptations for group living, or that their members behave 'altruistically' towards each other (though his theory does not exclude this either, and he later made some comments in this direction). His claim is rather that the subdivided population structure allows some groups, by chance, to form combinations of genes that are advantageous to individual fitness. The higher mean fitness of the groups is the resultant of these individual fitness advantages.

Wright also gives mixed messages about the adaptiveness of the process. While repeatedly claiming that in the long run the process is adaptive, Wright accepted the common view of many biologists at the time that the differences between subspecies and even between species of the same genera are usually non-adaptive (154, see also Provine p.288-99), a view which would seem to require the adaptive process of 'intergroup selection' to occur mainly between different genera or even higher taxa! But in this case 'intergroup selection' between small subgroups of the same species would be irrelevant to the process. Yet in 'Evolution in Mendelian populations' Wright also suggests that intergroup selection within the species may be responsible for 'peculiar adaptations' and 'extreme perfection' (154-5), a claim which is not, I think, repeated anywhere else. Overall, the emphasis in these early writings is more on the nonadaptive than the adaptive aspects of the process.

Taking stock

Before exploring the subsequent development of the theory (in Part 2), I will try to take stock of the position reached by 1931.

Already in his summary note of 1929 Wright had stated some of the key propositions of the shifting balance theory. In the two articles of 1931 he began the task of justifying these propositions. The arguments he put forward were ingenious, stimulating, and not implausible, but far from conclusive. There were moreover a number of tensions, if not actual inconsistencies, within Wright's accounts. One of these concerned the extent to which the process was adaptive, as has been explored fully by Provine. Another is the respective roles of genetic drift and local selection, on which I have pointed out an apparent difference between the two articles of 1931. Another is the problem of migration between groups. As suggested in my earlier note on migration, Wright did not attempt to quantify the effects of migration until after he had committed himself to the importance of random drift within semi-isolated subgroups. Only then, in 1929, did he discover 'that isolation in districts must be much more nearly complete than I realized at first' for the process to work. 'Evolution in Mendelian populations' makes an attempt to remedy the deficiency (128), but further work was clearly needed.

Several important aspects of the theory in its mature form are also lacking from the original version. Notably, there is nothing clearly corresponding to Wright's later emphasis on alternative local optima - 'selective peaks' - available to populations or subpopulations. These local optima depend heavily on epistatic fitness interactions, which are hardly mentioned in the original version. In the mature theory, subpopulations 'explore' the field of possibilities under the influence of random factors (genetic drift, but also environmental fluctuations) until they wander into the zone of attraction of a new selective peak. The stage of 'exploration' is Phase 1 of the process, while the climbing of the population up a peak is Phase 2, and intergroup selection is Phase 3. In the original version of the theory there is no clear distinction between Phase 1 and Phase 2, because there is nothing to suggest that the process of 'exploration' ever stops, short of the exhaustion of genetic variation by random fixation of genes. The phrase 'continually shifting differentiation' seems inconsistent with any sharp distinction between two phases. The first signs of a new approach are to be found in 'Statistical theory of evolution', with its reference to some groups finding 'exceptionally favorable combinations of genes', implying epistatic peaks of fitness. Quite possibly this had been in Wright's mind all along, but I do not think it can be identified in anything written before 'Statistical theory', including the much more widely read 'Evolution in Mendelian populations'.

Another important omission is any serious discussion of the probability of favourable new mutations. Wright's negative assessment of the prospects for evolution in large freely interbreeding populations depends on the tacit assumption that new mutations can be neglected. Wright later developed arguments to support this position.

Overall, a careful reader of Wright's publications up to 1931, without knowledge of subsequent developments, might reasonably conclude that Wright had put forward a remarkably original, ingenious, and comprehensive theory of evolution, consistent with most of what was then believed about the observed pattern of evolution, and free of any obvious fatal defects. This is itself was a very major achievement. But the same reader might also think that the theory was sketchy and speculative, and in need of further elaboration, not to mention empirical tests. Wright himself was no doubt aware of this, and continued to develop the theory for another 50 years, as I will discuss in Part 2.

William B. Provine, Sewall Wright and Evolutionary Biology, 1986.
Sewall Wright: 'Physiological genetics, ecology of populations, and natural selection', in Evolution After Darwin, vol. 1, ed. Sol Tax, 1960 (Tax). (Article first published in 1959.)
Sewall Wright: Evolution: Selected Papers (ESP), ed. William B.Provine, 1986.
Sewall Wright: 'Random drift and the shifting balance theory of evolution', in Mathematical Topics in Population Genetics, ed. Kojima, 1970.

Labels: ,

Sunday, October 12, 2008

Adaptive Landscapes: Miscellaneous Points   posted by DavidB @ 10/12/2008 03:23:00 AM

My post here discussed Sewall Wright's concept of the adaptive landscape, and a post here discussed R. A. Fisher's views on the subject. Before I come to my planned note on Sewall Wright's Shifting Balance theory, there are some points about adaptive landscapes which didn't fit easily into the earlier posts...


As mentioned in the post on Wright's 'landscapes', he used two different versions of a multi-dimensional model of fitness. In one interpretation the dimensions, except for that of fitness, represent the number of alleles of different types in an individual genotype. I will call this a genotype landscape. In the other interpretation, the dimensions except for that of fitness represents the proportion of alleles of different types in a population. I will call this a frequency landscape. Both interpretations can be called genetic landscapes.

While Wright's interpretations always have genetic dimensions, other authors have used concepts in which the dimensions of the landscape represent phenotypic or ecological variables. I will call these phenotype landscapes. Peaks in such a landscape represent optimal phenotypes or ecological niches.

In both genetic and phenotype landscapes one of the dimensions usually represents reproductive fitness, but some alternative measure of adaptation may be used. For example if the phenotype is the shape of a fish, the measure of adaptation might be some aspect of swimming efficiency.

Some authors draw a distinction between a fitness landscape and an adaptive landscape, but the distinction is not consistently used. For example, according to Gavrilets (p.30) these terms are used to designate what I have called genotype and frequency landscapes respectively, but McGhee (p.1) uses them to designate genetic and phenotype landscapes. Most authors seem to use the terms 'adaptive landscape', 'fitness landscape', 'genetic landscape', and 'selective landscape' interchangeably, though each of them may also have other meanings. (For example, 'genetic landscape' may be used to describe the geographical distribution of genes.) Anyone searching for relevant studies should try all of these variants. I will use adaptive landscape as a general term embracing all of them.


There are at least two recent books devoted to adaptive landscapes, by Gavrilets and McGhee (see refs.). Gavrilets deals mainly with genetic landscapes, McGhee with phenotype landscapes. The book by Gavrilets has an extensive bibliography, which provides a good way into the literature on genetic landscapes. The studies I have looked at deal mainly with genotype landscapes. There seems to be comparatively little work on frequency landscapes, perhaps because the subject is less amenable to study by computer simulation.

The number of peaks in genotype landscapes

There is an extensive literature on the number of peaks in genotype landscapes, mainly based on the work of Stuart Kauffman.

To begin with, consider a model devised by Kauffman and Levin (1987). Suppose a genome has N loci. For simplicity, assume the loci are haploid and that there 2 possible alleles at each locus. There are therefore 2^N possible different genotypes. Now, suppose that each distinct genotype has a fitness which is independent of the fitness of any other genotype. We may then represent the fitnesses by numbers chosen at random (Kauffman and Levin use the range of rational numbers between 0.0 and 1.0). For simplicity we stipulate that no two genotypes have exactly the same fitness. If we choose one of the 2^N possible genotypes at random, there are N other genotypes which can be derived from that chosen genotype by varying an allele at a single locus. We call these the neighbours of the chosen genotype. The chosen genotype is a local optimum if it has higher fitness than all of its neighbours. But by the stated assumptions the fitnesses of the N + 1 genotypes concerned are random numbers, each of which must have a probability of 1/(1 + N) of being the largest in the set. There is therefore a probability of 1/(N + 1) that the chosen genotype is a local optimum. But the chosen genotype is randomly chosen from the 2^N possible genotypes, and any other genotype (by the given assumptions) would have an equal chance of being a local optimum within its own 'neighbourhood'. Since there are 2^N possible genotypes in total, the total expected number of local optima in the system is therefore (2^N)/(N + 1) [Note 1].

It is obvious that this number increases rapidly with increasing N. It is equally obvious that the assumption of independent fitnesses for each possible genotype is biologically unrealistic. It implies that no single locus, or combination of fewer than N loci, has any predictable effect of its own on fitness. As an extreme alternative to this, suppose that each locus makes a contribution to fitness which is independent of all other loci. In this case one of the alleles at each locus must be unambiguously fitter than the other allele, regardless of the alleles at other loci. Suppose we designate the fitter of the two alleles by an even number, and the less fit allele by an odd number. It is clear that no genotype containing an 'odd' allele can be a local optimum, because the fitness of the genotype could always be increased by substituting an even allele for the odd one. The only local optimum in the system is therefore the single genotype containing exclusively 'even' alleles, no matter how many genotypes there are in the system. This result can be extended to systems with diploid loci and/or multiple alleles at each locus, provided that one of the alleles at each locus is unambiguously fitter than all other alleles. We could also allow the fitness contribution of a locus to be affected by the alleles at other loci, provided the effect is not so great as to reverse the rank order of fitness of the alleles at each locus. This would be the case, for example, if each allele has a primary effect on one trait which makes a large difference to fitness, and a secondary effect on other traits, provided the secondary effects do not exceed the fitness difference due to the primary effect.

Between the two extreme models, there could be a variety of systems in which the rank order of the contributions of loci to fitness is partly but not entirely independent of other loci. Kauffman has devised a framework known as the NK model. [Note 2] In the NK model there are N haploid loci, with 2 possible alleles at each locus, while the fitness contribution of each locus is affected by the alleles at K other loci as well as itself. The fitness contribution of each possible combination of alleles at each such group of K + 1 loci is a random number chosen from the interval 0.0 to 1.0. For any particular assignment of alleles to the K+ 1 loci, this number determines the fitness contribution of the locus in question. The fitness of the genome as a whole, for any particular assignment of alleles to all N loci, is the average of the contributions for each locus.

The precise way in which the loci are connected to each other may vary. According to Kauffman (p.55) this usually makes little difference to the outcome. It may be useful to consider a simple special case which is not treated by Kauffman. Suppose we divide the N loci into N/(K + 1) discrete sets (assuming for simplicity that N/(K + 1) is a whole number). Let each of the K + 1 loci in each such set be 'connected' to the remaining K loci in the set. There are 2^(K + 1) possible combinations of alleles for each such set, and let each combination be assigned a fitness value randomly chosen from the interval 0.0 to 1.0. For any particular assignment of alleles to the loci, this number constitutes the fitness contribution of every locus in the set to the fitness of the genome. But each such set of K + 1 loci can be treated as a case of the Kauffman/Levin model, and has an expected number of [2^(K+ 1)]/[K + 2] local optima. Since each such set of loci, by assumption, has no effect on the fitness contribution of any loci outside the set, it follows that any combination of local optima for all of the N/(K + 1) discrete sets will also be a local optimum for the entire genome, since any change at a single locus would reduce the overall fitness of the genome. Since there are ([2^(K+ 1)]/[K + 2])^[N/(K + 1)] such combinations, this is the expected number of local optima for the entire genome. It may be easily checked that for the value K = N - 1, where each locus is connected to every other locus in the genome, this reduces to (2^N)/(N + 1), as in the first of the extreme models, while for K = 0, where no locus is connected to any other locus, it reduces to 1, as in the other extreme model. For values of K between 1 and N - 2, the number of local optima increases with increasing K and/or N.

In my simple example the genome is divided into non-overlapping sets of loci. But more generally in the NK model there will be overlap. For example, the sets of connected loci may be arranged cyclically, like abcde, bcdef, cdefg ......zabcd. Or the connections could be chosen at random, in which case there is a non-zero probability that the same locus will enter into more than one set of connected loci. This makes the problem of determining the number of local optima much more complicated. A given set of alleles may be a local optimum with respect to one set of connected loci, but one or more of those alleles may be sub-optimal for another set to which it belongs. In this case, changing one of those alleles will reduce fitness at some loci but increase it at others. The effect on the overall number of local optima for the genome as a whole is not intuitively obvious, and does not seem amenable to calculation by a general formula. Kauffman and others have relied on computer simulations. The most important result is that the number of local optima increases rapidly with increasing N and/or K (Kauffman p.60). This is not surprising, but it may be taken as vindicating Sewall Wright's intuition that in genotype landscapes with a lot of epistatic relations, the number of selective peaks will be very large. In general one may say that for a realistic size of genome (i.e. with thousands of loci) the number of peaks will be very large unless the value of K (averaged over the genome) is close to zero.

Kauffman's NK model is in many ways simplistic, but it does seem quite robust as a basis for exploring the theory of genotype landscapes. Other researchers have developed it in various ways. I don't know (or understand) this work well enough to summarise it, but I recommend the book by Gavrilets, which applies the theory to the problem of speciation. He notably claims that if a sufficient proportion of alleles are allowed to be selectively neutral, then in genotype landscapes of high dimensionality there will usually be a 'network' of ridges connecting the peaks, and along which populations can evolve without crossing fitness 'valleys'.

The number of peaks in frequency landscapes

As noted earlier, there seems to be much less work on frequency landscapes. In my post on Fisher I mentioned that in private letters Fisher argued that as the number of dimensions rises, the proportion of 'level points' which are all-round maxima will fall, and will be about 1/2^N of the total, where N is the number of dimensions. Fisher may have assumed that (a) in each dimension of gene frequencies, only about half of the level points will be maxima, and (b) the location of the maxima in each dimension is usually independent of the other dimensions. [Note 3] With these assumptions, the probability that a level point will be simultaneously maximal in all dimensions will only be about (1/2)^N, or 1 in 2^N, as suggested by Fisher. It does not follow that the number of maxima would not rise. If the number of level points in a single dimension is n, the expected number of level points in N independent dimensions would be n^N, so the expected number of all-round maxima would be (n^N)/2^N. For any n much greater than 2, this will increase rapidly with increasing N; for example, if n = 4, the number of maxima for N = 2, 3, 4.... will be 4, 8, 16... which rapidly becomes enormous.

The validity of the two key assumptions - that about half of the level points in each dimension will be maxima, and that these will be independent of each other - is debatable. First, if we consider loci without epistasis, there are three cases. If one homozygote is superior to the other, while the heterozgyote is either intermediate in fitness or equal to one of the homozygotes, then there will be one maximum and one minimum in the relevant dimension. If the heterozygote is superior to both homozygotes, there will be one maximum and two minima. If both homozygotes are superior to the heterozygote, there will be two maxima and one minimum. There are no cases in which there would be more than two minima or maxima. (If there are more than two alleles at the locus the possibilities are more complicated, but it is difficult to think of realistic scenarios in which there are more than two maxima or minima in each dimension.) For loci without epistasis the assumption that about half of the level points in each dimension will be maxima is therefore plausible as a rough average. But for loci with epistasis the key assumptions are doubtful. The assumption of independence for each dimension is no longer generally valid, as the fitness for all the interacting loci has to be considered simultaneously. For the important case of two interacting loci under selection for an intermediate phenotype (see the post on Wright) there will be two maxima, two minima, and only one saddle point. The key assumptions therefore do not hold even approximately in this case, and if it is at all common, the number of all-round maxima for the genome as a whole may be very large.

It has indeed been claimed (Gavrilets p.37) that the number of maxima is bound to rise with the number of dimensions. But as already discussed in connection with Kauffman's systems, there is no necessity about this: it is quite easy to conceive of a system with only one all-round maximum.

The accessibility and stability of peaks

From an evolutionary point of view, what is important is not just the number of adaptive peaks, but whether they are accessible to the population - i.e. whether the population will evolve towards them - and whether, if the population reaches them, they will be stable under disturbances such as temporary changes in the environment or influxes of migrants. For both purposes, in a frequency landscape we need to consider the 'zone of attraction' of the peaks, i.e. the range of gene frequencies within which the population will move towards the peak under the influence of natural selection. I have not found much discussion of this issue in the literature (which, as I have said, deals mainly with genotype rather than frequency landscapes), but a few general points seem clear.

First, we expect that, other things being equal, higher peaks will have wider zones of attraction. In geometrical terms, if two solid figures have the same shape, the taller figure will have the larger base. In genetic terms, the higher the fitness of a genotype relative to the average fitness of the population, the wider will be the range of gene frequencies within which the genes making up that genotype will be positively selected.

Second, peaks will have a wider zone of attraction if their component genes have an advantage in the heterozygote as well as the homozygote state. If the optimum genotypes contain recessive homozygotes, the genotypes will be rare, and therefore will not contribute much to the fitness of their component alleles, until the relevant alleles are already frequent in the population.

Third, even if a peak has very high fitness, it will not have a wide zone of attraction if the high fitness depends on the epistatic combination of a large number of alleles which do not otherwise have a fitness advantage. In such a case, the advantageous combinations will not appear with significant frequency in the population until all of the component genes already have a high frequency. The peak will be like a spike with a narrow base. Such a peak will be neither easily accessible nor stable, since even if the peak is reached, any fluctuation in the landscape is liable to push the population out of the zone of attraction.

Finally, whether or not a peak is easily accessible to a population depends on the population's current gene frequencies. Here it should be noted that in most of the plausible scenarios for multiple fitness peaks, such as Wright's favourite example of traits under stabilising selection, some of the alleles in the optimum genotypes will (at the peak) be fixed in the population, with alternative peaks at opposite sides or corners of the landscape. (This fact tends to be obscured by illustrative diagrams, including Wright's, which usually show peaks somewhere in the middle of landscape.) If alleles are fixed, the population can only move to another peak if new alleles are introduced by mutation or migration. These new alleles will be opposed by selection unless the environment changes so that the peak itself shifts. In order to move to another peak without migration or a change in environment, a long period of genetic drift, opposed by selection, will be required unless the population is very small. This is one of the key issues of credibility with Wright's shifting balance theory in its original form.

Note 1: Kauffman and Levin, pp.20-21. There might be a suspicion of fallacy somewhere in this argument, as the probability that a genotype is a local optimum is not independent of the probability for other genotypes. It would certainly be fallacious to conclude that there is a probability [1/(N + 1)]^[2^N] that all of the genotypes are local optima, since this is impossible. However, Kauffman and Levin's formula for the number of local optima appears to be valid.

Note 2: Kauffman p.42. Kauffman's description of the model is very concise and not ideally clear, partly because of ambiguity in his use of the terms 'gene', 'allele' and 'locus'. But I think my interpretation is consistent with what Kauffman and others say about the NK model.

Note 3: since Fisher gave no reasons for his claim, this is just speculation. He may quite possibly have had other reasons, but didn't spell them out. In his statistical work Fisher was very familiar with applications of N-dimensional geometry, so he would have had a better understanding than most people of the properties of high-dimensional landscapes .


Sergey Gavrilets, Fitness Landscapes and the Origin of Species, 2004.
Stuart Kauffman, The Origins of Order, 1993
Stuart Kauffman and Simon Levin, 'Towards a general theory of adaptive walks on rugged landscapes', J. Theoretical Biology, 1987, 128, 11-45.
George R. McGhee, The Geometry of Evolution: Adaptive Landscapes and Theoretical Morphospaces, 2007

Labels: ,

Wednesday, October 01, 2008

Punctuation Error?   posted by DavidB @ 10/01/2008 04:59:00 AM

Readers who lived through the Punctuated Equilibrium controversy of the 70s and 80s will recall that it petered out rather inconclusively, largely for lack of decisive empirical evidence one way or the other. The fossil record is seldom good enough to distinguish unambiguously between punctuational and gradual modes of evolution, one problem (noted already by Darwin) being that the sudden appearance of a new form in a given locality may result from migration rather than rapid evolution in the same place.

Given these difficulties, a disproportionate amount of attention was focused on a handful of examples that seemed to show good evidence either of punctuational or gradual evolution. One of the best examples on the punctuationist side of the debate was a study of molluscs in the Turkana Basin of Africa by P. G. Williamson [Note 1] Williamson's study was criticised at the time on various grounds - for example that the changes observed might be due to environmental stress rather than genetic evolution - but the critics did not produce new evidence from the field.

That is changed by an article [Note 2] by a Dutch team in a recent issue of the journal Evolution....

The Abstract of the article is as follows:

A running controversy in evolutionary thought was Eldredge and Gould's punctuated equilibrium model, which proposes long periods of morphological stasis interspersed with rapid bursts of dramatic evolutionary change. One of the earliest and most iconic pieces of research in support of punctuated equilibrium is the work of Williamson on the Plio-Pleistocene molluscs of the Turkana Basin. Williamson claimed to have found firm evidence for three episodes of rapid evolutionary change separated by long periods of stasis in a high-resolution sequence. Most of the discussions following this report centered on the topics of (eco)phenotypy versus genotypy and the possible presence of preservational and temporal artifacts. The debate proved inconclusive, leaving Williamson's reports as one of the empirical foundations of the paradigm of punctuated equilibrium. Here we conclusively show Williamson's original interpretations to be highly flawed. The supposed rapid bursts of punctuated evolutionary change represent artifacts resulting from the invasion of extrabasinal faunal elements in the Turkana palaeolakes during wet phases well known from elsewhere in Africa.

I have read the full article (available here), which looks convincing on this particular case (but what do I know about old African molluscs?) [Added: a more easily readable pdf version is also available. Google 'bocxlaer turkana' and you should find it.] The strongest point is that it is not just armchair criticism but based on extensive new fossil collecting. But since I specialise in armchair criticism I can hardly throw any stones.

Obviously one such case doesn't disprove punctuated equilibrium, but Williamson's study was in some ways the 'poster child' for the theory (more so than even Eldredge and Gould's own studies), so its demolition (if accepted) would be a serious blow.

Note 1: P. G. Williamson, 'Palaeontological documentation of speciation in Cenozoic molluscs from Turkana Basin', Nature, 1981, 293, pp.437-43. Also reprinted in Evolution Now, ed. John Maynard Smith, 1982. I can't find any publications by Williamson after 1990, and I believe I have read somewhere that he died at a sadly early age. My apologies if I am mistaken.

Note 2: Bert van Bocxlaer, Dirk van Damme, and Craig S. Feibel, 'Gradual versus punctuated equilibrium evolution in the Turkana Basin molluscs: evolutionary events or biological invasions?', Evolution, 2008, 62, pp.511-20.

Labels: ,

Saturday, September 27, 2008

R. A. Fisher and the Adaptive Landscape   posted by DavidB @ 9/27/2008 06:01:00 AM

In my note on Sewall Wright's concept of the Adaptive Landscape I said that I would later discuss R. A Fisher's views on the subject. Some commentators have claimed that Fisher held a definite view on the 'shape' of the landscape. For example, a book by Sergey Gavrilets includes a section on 'Fisher's single-peak fitness landscapes', with the claim that:

In contrast to Wright, Fisher... suggested that as the number of dimensions in a fitness landscape increases, local peaks in lower dimensions will tend to become saddle points in higher dimensions. In this case, according to Fisher, natural selection will be able to move the population without the need for genetic drift or other factors. A typical fitness landscape implied by Fisher's views has a single peak. - Gavrilets, p.36

I think this goes beyond anything that Fisher actually says about Wright's adaptive landscape. There is of course room for debate about what an author's views imply. My own interpretation is that Fisher was sceptical about the value of the landscape concept as such, because both environmental and genetic conditions were too changeable for the metaphor of a 'landscape' to be useful. For Fisher the question of the 'shape' of the landscape therefore did not arise as a major issue, and he had no need to take a firm view on it. I discuss this interpretation below the fold.


As I pointed out in my earlier note, Wright himself seldom if ever used the term 'landscape', so we should not expect to find the term in Fisher either. Wright usually referred to a 'field' of gene combinations, and a 'surface' of selective values. He used these concepts mainly to illustrate his shifting balance theory of evolution. Any comments by Fisher that are relevant to the shifting balance theory could therefore also be relevant to the landscape concept. Even with this broad scope, I can find few published comments by Fisher on the subject. The main ones are in his 1932 review of Wright's paper on 'Evolution in Mendelian Populations', reprinted in Bennett (ed.), his 1941 paper on 'Average excess and average effect of a gene substitution', his 1953 paper on 'Population genetics', and his 1958 paper on 'Polymorphism and natural selection', all available at the Fisher Archives here.

In addition to Fisher's published writings, his correspondence contains a few relevant remarks. Most of his correspondence is accessible at the Fisher Archives, and a good selection of his letters on evolution and genetics is published in Bennett (ed.) Two letters are especially relevant. In February 1931 Wright outlined his landscape concept in a letter to Fisher, quoted in Provine's biography of Wright (p.272). In a reply Fisher made some sceptical comments. Then in 1938 Fisher's colleague E. B. Ford described Wright's concept in a popular book on genetics. In a letter of 2 May 1938 to Ford, commenting on his book, Fisher gave what is probably his longest critique of the landscape concept. The letter is published in Bennett (ed.) (p.201-2) and available at the Fisher Archives, so I will not quote it in full, but it should certainly be read by anyone interested in this issue.

From Fisher's published and unpublished writings we can extract a number of criticisms of Wright's theory.

The interpretation of the dimensions of the landscape

In his biography of Wright, William B. Provine has pointed out that Wright in various places used two different interpretations of the genetic 'dimensions' of the landscape, which in Provine's view are inconsistent (Provine, p.313). In one interpretation the dimensions represent the number of alleles of a given type in an individual genome, while in the other interpretation they represent the frequency of those alleles in a population. Provine points out that in the first interpretation there is properly speaking no continuous surface, but only a lattice of discrete points. He also argues that there is no way of validly transferring conclusions from one interpretation to the other. I believe that these criticisms are somewhat overstated, but it is interesting to find that they are both anticipated by Fisher. In his letter to Ford, Fisher comments that either Ford's description of Wright's views, or the views themselves, are confused, and points out that 'so far as individuals are concerned, there is only a discontinuous aggregate of lattice points, each having its own selective value. There is no continuum of possible values in which we might speak of peaks or maxima.' In his article of 1941, Fisher also criticises one of Wright's own accounts, remarking that Wright 'confuses the number of genotypes, e.g. 3^1000, which may be distinguished among individuals, with the continuous field of variation of gene frequencies.... the large number of genotypes gives no reason for thinking that even one peak, maximal for variations of all gene ratios should occur in this field of variation' (1941, p.378). It is surprising that no-one else seems to have picked up on the apparent confusion in Wright's accounts until Provine's book in 1986.

The number of peaks

As discussed in my earlier post, Wright believed that there are usually a very large number of local fitness maxima in the landscape. Fisher, on the other hand, believed that this was unproven. As noted above, he thought that Wright's view was partly due to confusion between optimal genotypes and optimal frequencies. There is no easy transition from the existence of multiple optima among genotypes to multiple optima among frequencies. I have suggested in my earlier post that in some circumstances (notably where the optimal genotype is homozygous at all loci, and fitness is not frequency-dependent) there can be such a transition, but this is a special case. In general Fisher was correct to regard Wright's argument as inconclusive.

Fisher makes another criticism in his letters to Wright and Ford. In the letter to Wright he says:

In one dimension a curve gives a series of alternative maxima and minima, but in two dimensions two inequalities must be satisfied for a true maximum, and I suppose that only about one fourth of the stationary points will satisfy both. Roughly I would guess that with n factors only 2^-n of the stationary points would be stable for all types of displacement, and any new mutation will have a half chance of destroying the stability. This suggests that true stability in the case of many interacting genes may be of rare occurrence, though its consequence when it does occur is especially interesting and important.

In his letter to Ford, Fisher writes:

In one dimension, as in a road, we pass over an alternative series of hills and dips, so that half of the level points are maxima. In two dimensions, in addition to peaks and bottoms we have cols [i.e. saddle points], which may be regarded as the lowest points on ridges or the highest points on valleys, the curvature of the ground being positive in one direction and negative in another, and the peaks are only about a quarter of the level spots. In n dimensions only about one in 2^n can be expected to be surrounded by lower ground in all directions.

Disregarding for a moment the important comment in the first letter about new mutations, Fisher's thinking seems to be as follows. In each dimension of gene frequencies, only about half of the level points will be maxima. Assuming that the location of the maxima in each dimension is independent of the other dimensions, the probability that a level point will be simultaneously maximal in all dimensions will only be about (1/2)^n, or 1 in 2^n.

As these are just comments in private letters, it is difficult to know how much weight we should put on them. Fisher uses the words 'roughly', 'guess', and 'about', which do not suggest a dogmatic position. The validity of the two key assumptions - that about half of the level points in each dimension will be maxima, and that these will be independent of each other - could be discussed at length. But even at best, Fisher's argument only goes to show that the proportion of the level points which are all-round maxima will fall as the number of dimensions increases (which, incidentally, Wright himself accepted, e.g. at ESP p.226). It does not follow that the number of all-round maxima will remain small. If Fisher believed that this was necessarily the case (which is not clear), he was mistaken. It is quite possible that with an increasing number of dimensions the number of level points may increase faster than the proportion of all-round maxima declines. Indeed, it has been claimed that this is generally the case, but this is also unproven. (I will discuss this more fully in a separate post.)

I have not found any definite statement by Fisher either accepting or denying the existence of multiple optima. As I pointed out in my post on Fisher's views on epistasis, he accepted that there could be alternative stable allele frequencies at particular loci. As far as I can see, Fisher would not have denied in principle the possibility of multiple optima for the genome as a whole, and indeed his 1931 letter to Wright might be interpreted as accepting them as an important if rare phenomenon. But overall I think Fisher's position should be described as deeply sceptical. Wright himself said that Fisher 'did not accept the concept of multiple selective peaks' (Wright,1970, p.23), which is literally true, provided it is not taken as implying outright rejection either.

The mean fitness of the population

In Wright's theory, a population is expected to 'climb' up the slope of the fitness landscape under the influence of natural selection, implying that the mean fitness of the population increases. (Selection may however be offset by migration, recurrent mutation, or genetic drift.) In his publications from 1935 onwards (e.g. ESP p.239, 366) Wright uses a formula which may be expressed as delta-q = [q(1 - q)/2W][dW/dq], where q and (1 - q) are the frequencies of two alleles, delta-q is the single-generation change in q, W is the mean fitness of the population, and dW/dq is the partial derivative of W with respect to changes in q. The formula may be interpreted as saying that the effect of selection on the frequency of a particular allele is proportional to its effect on the mean fitness of the population (as well as to the current frequency distribution q(1 - q)).

In his 1941 paper Fisher strongly criticised this formulation, showing by a somewhat roundabout argument that it depends on the assumption of random mating, and claiming that any attempt to relate selection pressure to mean fitness is 'foredoomed to failure just so soon as the simplifying, but unrealistic, assumption of random mating is abandoned' (p.378). Wright's derivation of his formula, e.g. at ESP p.239, does indeed assume random mating. But Fisher's objection is not just technical: 'In regard to selection theory, objection should be taken to Wright's equation principally because it represents natural selection, which in reality acts upon individuals, as though it were governed by the average condition of the species or inter-breeding group. Early selectionists, following in this respect the language of the earlier theological writers on organic adaptation, often speak of selection as directed 'for the good of the species'. In reality it is always directed to the good, as measured by descendants, of the individual. Unless individual advantage can be shown, natural selection offers no explanation of structures or instincts which appear to be beneficial to the species. Yet in Wright's equation the whole evolutionary sequence would appear to be governed by the principle of increasing the 'general good'.' (p.378) I think this is somewhat unfair to Wright, who did not ascribe any causal efficacy to the fitness of the population as such, but Fisher's statement is important as his first general criticism of 'good of the species' thinking. He makes similar criticisms in his 1953 and 1958 papers. In the 1958 edition of GTNS a section on 'The Benefit of the Species' is added, which has become highly influential on modern evolutionary thinking. Although this new section does not refer to Wright, it is plausible that Fisher's sharpening of his hostility to 'good of the species' thinking was stimulated by his objections to Wright's equation.

New Mutations

As already mentioned, in his 1931 letter to Wright, Fisher argues that 'any new mutation will have a half chance of destroying the stability' of an optimal gene frequency. He makes a similar point in his published review of Wright's 1931 paper on 'Evolution in Mendelian Populations', saying that 'even under static conditions, unless it is postulated that the organism is as well adapted as it could possibly be (in which case, obviously, evolutionary improvement is impossible), the equilibrium will be broken by the occurrence of any favourable mutation, of which a steady stream will doubtless occur in one or other of the very numerous individuals produced in each generation. The advantage of the large populations in picking up mutations of excessively low mutation rate seems to be overlooked [by Wright]'.

Their attitude towards new mutations is one of the fundamental dividing lines between Wright and Fisher. Wright repeatedly played down the importance of favourable new mutations, on the grounds that their chance of occurring would be negligible even over long periods (see e.g. ESP pp.150, 165, and 321). He seems to have believed that all possible mutations would already have occurred often enough to be selected if they were favourable, so that the possibility of improvement through new mutations would already have been exhausted. Fisher, in contrast, believed that in large populations even very low mutation rates (say, of one in a thousand million per generation) could not be neglected, and that on an evolutionary time-scale of hundreds or thousands of generations they would provide scope for continuing evolution. It may of course be thought that neither Wright nor Fisher, in the 1930s, knew enough about the nature of genes to have any good basis for their opinions.

Changing Environment

Wright's concept of the adaptive landscape is explicitly based on the assumption of constant environmental conditions. Any change in those conditions involves a change in the landscape itself. Wright was of course aware that environments could change, but he seems to have regarded the 'landscape' as having an underlying continuity of existence even if environmental fluctuations might temporarily change its shape. (I will consider Wright's views on this further in my final post on the shifting balance theory.)

Fisher, on the other hand, believed that environmental change was in one sense irreversible. In the section 'Deterioration of the Environment' in GTNS he emphasised especially the organic environment of competitors, etc:

For the majority of organisms... the physical environment may be regarded as constantly deteriorating... Probably more important than the changes in climate will be the evolutionary changes in progress in associated organisms. As each organism increases in fitness, so will its enemies and competitors increase in fitness; and this will have the same effect, perhaps in a much more important degree, in impairing the environment, from the point of view of each organism concerned. - The Genetical Theory of Natural Selection, Variorum Edition, ed. Henry Bennett, 1999 p.41-2

In his review of Wright's 'Evolution in Mendelian Populations' (reprinted in Bennett, ed.) Fisher again emphasised environmental change:

Professor Wright considers that: 'In too large a freely interbreeding population there is great variability, but such a close approximation to complete equilibrium of all gene frequencies that there is no evolution under static conditions'. He therefore argues that the subdivision of species into partially isolated local races of small size is an important condition not merely, as is obvious, for fission into distinct species, but for progressive evolution. This conclusion is much more debatable [Fisher then makes his point about the importance of new mutations even under static conditions]... Moreover, static conditions in the evolutionary sense certainly do not occur, for, apart from geological and climatological changes, the evolutionary progress of associated organisms ensures that the organic environment shall be continually changing

In short, as several recent commentators have noted, Fisher held a 'Red Queen' conception of evolution, in which organisms have to keep constantly running just to keep up with the competition. This is quite alien to Wright's conception, in which under the influence of selection alone the organic world would soon grind to an evolutionary halt. The extent to which either of these views is correct is a matter for empirical observation. Genetic studies of living populations tend to show continual change, at least at a microevolutionary level, which might seem to support Fisher's view, whereas paleontologists often claim to observe long-term stasis in morphological traits, which might support Wright. This is of course one of the points at issue in the debate over 'punctuated equilibrium', which seems to have petered out through boredom (and the death of some key participants) rather than being resolved. A possible explanation of the apparent conflict of evidence is that traits in hard body parts may be more tightly constrained by stabilising selection than biochemical and behavioural traits. For other suggestions see Williams, Chapter 9.

J. H. Bennett, ed.: Natural Selection, Heredity and Eugenics: Including selected correspondence of R. A. Fisher with Leonard Darwin and others, 1983.
Sergey Gavrilets, Fitness Landscapes and the Origin of Species, 2004.
William B. Provine, Sewall Wright and Evolutionary Biology, 1986.
Sewall Wright: Evolution: Selected Papers (ESP), ed. William B.Provine, 1986.
George C. Williams: Natural Selection: Domains, Levels, and Challenges, 1992.
Sewall Wright: 'Random drift and the shifting balance theory of evolution', in Mathematical Topics in Population Genetics, ed. Kojima, 1970.

Labels: ,

Tuesday, September 23, 2008

R. A. Fisher and Inclusive Fitness   posted by DavidB @ 9/23/2008 01:20:00 AM

W. D. Hamilton is rightly given the main credit for establishing the concept of inclusive fitness. He gave it its name, developed its mathematical theory, and examined a wide range of empirical evidence for it.

There had of course been occasional anticipations of inclusive fitness, going back to Darwin's treatment of neuter social insects in the Origin. Hamilton himself mentioned three such partial anticipations: by G. C. Williams, by J. B. S. Haldane, and by R. A. Fisher in his treatment of the evolution of distastefulness among insects (Hamilton, Narrow Roads of Gene Land, vol. 1, pp.49-50).

Curiously, neither Hamilton nor many other commentators seem to have noticed a more general and prominent formulation of the concept by Fisher in the Genetical Theory of Natural Selection......

In Chapter 2 of that book, on the 'Fundamental Theorem of Natural Selection', there is a section headed 'Reproductive Value', which contains the following passage (with emphasis added):

We may ask, not only about the newly born, but about persons of any chosen age, what is the present value of their future offspring; and if present value is calculated at the rate determined before [in the section on the 'Malthusian Parameter'], the question has a definite meaning - To what extent will persons of this age, on average, contribute to the ancestry of future generations? The question is one of some interest, since the direct action of Natural Selection must be proportional to this contribution. There will also, no doubt, be indirect effects in cases in which an animal favours or impedes the survival or reproduction of its relatives; as a suckling mother assists the survival of her child, as in mankind a mother past bearing may greatly promote the reproduction of her children, as a foetus and in less measure a sucking child inhibits conception, and most strikingly of all in the services of neuter insects to their queen. - The Genetical Theory of Natural Selection, Variorum Edition, ed. Henry Bennett, 1999 p.27

What Fisher here describes as 'indirect effects' may be considered a concise but very general statement of what was later defined by Hamilton as inclusive fitness. Fisher's brief remark may have been overlooked, not only because the statement is not mathematically quantified, but because Fisher immediately goes on to say that 'such indirect effects will in very many cases be unimportant compared to the effects of personal reproduction', and he does not discuss them further. He therefore treats them essentially as a complication to be mentioned but cleared out of the way. Nevertheless, he does recognise the existence of such indirect effects (both positive and negative) and mentions several examples which have later been extensively treated by Hamilton and other sociobiologists.

I dare say that someone somewhere has already noticed and mentioned this passage of Fisher, but as it does not seem to be widely known it will do no harm to mention it again.

Labels: ,

Wednesday, September 17, 2008

R. A. Fisher on Population Size: Addendum   posted by DavidB @ 9/17/2008 05:16:00 AM

A while ago I posted two notes on R. A. Fisher's views on population size: Part 1 here and Part 2 here. I assembled some evidence from The Genetical Theory of Natural Selection suggesting that Fisher believed the population size of a species was usually between a million and a million million, with the latter figure being a realistic possibility for some species of small invertebrates.

In writing that post I could not find any more direct evidence, so I am pleased to have come across a letter from Fisher to C. Tate Regan, dated 7 February 1927, containing the following explicit statement:

The population number of 10^6 [1,000,000] parents in each generation represents a somewhat small species. I suppose most species lie between 10^6 and 10^12 [1,000,000,000,000], although some, such as some of the millipedes, certainly exceed the latter figure. The larger the population the less frequent need mutations be to maintain a given stock of segregating factors, or in other words, with the same mutation rates the larger will the variance (when equilibrium is attained) be. (Bennett, ed., p.255)

Earlier in the letter Fisher makes it clear that he is thinking about genes that are nearly neutral in their effect, so that variance is maintained by a balance between mutation and drift.

A population of a million million does seem very large, but Fisher's reference to millipedes confirms that he was thinking of small inverterbrates, where very large populations are quite possible. For example, a population of a million million would only require an average density of one per square metre over an area of about a tenth the size of the United States.

J. H. Bennett, ed., Natural Selection, Heredity and Eugenics: Including selected correspondence of R. A. Fisher with Leonard Darwin and others1983

Labels: ,

Thursday, September 04, 2008

R. A. Fisher on Epistasis (yet again)   posted by DavidB @ 9/04/2008 03:33:00 AM

Having previously commented on R. A. Fisher's views on epistasis, I have noticed another relevant passage in The Genetical Theory of Natural Selection:

Each successful gene which spreads through the species, must in some measure alter the selective advantage or disadvantage of many other genes. It will thus affect the rates at which these other genes are increasing or decreasing, and so the rate of change of its own selective advantage. The general statistical consequence is that any gene which increases in numbers, whether this increase is due to a selective advantage , an increased mutation rate, or any other cause, such as a succession of favourable seasons, will so react upon the genetic constitution of the species, as to accelerate its increase of selective advantage if this is increasing, or to retard its decrease if it is decreasing. To put the matter in another way, each gene is constantly tending to create genetic situations favourable to its own survival, so that an increase in numbers due to any cause will in turn react favourably upon the selective advantage which it enjoys. The Genetical Theory of Natural Selection, Dover edn., pp.102-3

It would be hard to find a stronger statement of the pervasive role of epistatic fitness in evolution. But I dare say the myth that Fisher 'did not believe in epistasis' will persist.

Labels: ,

Monday, September 01, 2008

Notes on Sewall Wright: the Adaptive Landscape   posted by DavidB @ 9/01/2008 03:17:00 AM

My series of posts on the work of Sewall Wright is now approaching its (anti?)climax. The next post, on the shifting balance theory, should be the last. The present note deals with a closely related subject. Wright introduced the concept of the 'adaptive landscape' largely in order to illustrate the shifting balance theory. It does however have great interest in its own right, and there is a substantial literature on the concept of adaptive landscapes. [Note 1]

Wright's own treatment of the subject has attracted some controversy following the biography of Wright by William B. Provine. Provine pointed out that Wright used two different interpretations of the 'landscape', which in Provine's view were inconsistent with each other: 'One of Wright's two versions of the fitness surface is unintelligible, and even if one were to escape this problem and put the gene combinations on continuous axes, the two versions would be mathematically wholly incompatible and incommensurable, and there would be no way to transform one into the other' (Provine, p.313). I believe that Provine's criticisms are overstated, but he was right to point out that Wright's concept of the landscape is problematic. This note examines the issues. It is long.


The general concept of the adaptive landscape is that the genetic constitution of an individual or a population can be represented by a point in a space of many dimensions. The biological fitness associated with that genetic constitution can then be represented by a measurement along a further dimension. The fitness 'heights' of different genetic constitutions form a quasi-surface. Points or areas of high fitness can be described as 'peaks', points of low fitness as 'pits', 'troughs', etc, and more complex configurations as 'ridges', 'valleys', 'passes', etc. The genetic evolution of a population can be represented by the movement of points around the 'landscape'. Subject to certain provisos, under the influence of natural selection a population will move up the steepest available slope towards areas of higher fitness. If the population reaches a local peak - a point surrounded in all directions by lower ground - evolution will stop until circumstances change in some significant way.

Wright believed that in general there will be many local peaks of fitness in the landscape, often differing in height from each other. It is therefore likely that under the influence of natural selection alone, and under constant environmental conditions, a population will get 'stuck' on a peak which is not the highest in the landscape. Evolution would be quicker, and more beneficial to the species, if there were some means of shifting populations away from these suboptimal local peaks. According to the shifting balance theory in its original form, the only way of moving a population from a peak, other than a large and permanent change in environmental conditions, is by genetic drift, which enables a population to cross 'valleys' of relatively low fitness. This is most likely to occur if the species is divided into a large number of small, partially isolated, subpopulations. Some subpopulations will then by chance find themselves on higher peaks of fitness, and their greater reproductive success will result in a net gene flow into other subpopulations, raising the general fitness of the species and enabling evolution to continue. Wright later abandoned his original exclusive emphasis on genetic drift, but this has not always been sufficiently emphasised. I will deal with this more fully in the final post.

To consider the 'landscape' in more detail:


Wright's first known use of the landscape concept is in a letter of February 3 1931 to R. A. Fisher, quoted in Provine's biography (p.272). Wright's first published account came in a short paper in 1932. Thereafter he discussed the concept in most of his general surveys of population genetics and evolutionary theory. I cannot claim to have read all of Wright's scattered papers, and I have relied heavily on the collection 'Evolution: Selected Papers', (ESP) edited by Provine with Wright's co-operation. Unfortunately, by the operation of Sod's Law, probably the best account of the 'landscape' is not included in ESP (it is in a 1960 Darwin symposium volume edited by Sol Tax). Surprisingly, Wright's huge 4-volume treatise on Evolution and the Genetics of Populations has no systematic treatment of the landscape concept, though various of its component parts are discussed. Finally, a special interest attaches to a paper of 1988, since this came after the publication of Provine's biography. For details see the references.


Wright himself seldom if ever uses the term 'landscape'. In fact, I have not found a single example of it. He does on one occasion (ESP p.625) use the similar term 'topography', but in general he uses two other terms: the field of gene combinations, and the surface of selective values. For convenience I will continue to use the term 'landscape', but anyone searching in Wright's own works should look for 'fields' and 'surfaces', not 'landscapes'. The popularity of the term 'landscape' probably stems from its use in George Gaylord Simpson's Tempo and Mode in Evolution (p.89) and The Major Features of Evolution (p.155), which were more widely read by biologists than Wright's own works. For the same reason, the landscape concept is often given interpretations which derive from Simpson rather than Wright, in which the 'peaks' of the landscape represent either locally optimal phenotypes, or ecological niches. These interpretations are compatible with those of Wright, but not the same as Wright's own landscape, in which the dimensions other than fitness always represent genetic rather than phenotypic variables.

The Number of Dimensions

Wright's landscape has one dimension for fitness, and others representing the genetic constitution of an individual or a population, which I will call the genetic dimensions. At least one genetic dimension is required for each distinct locus at which more than one allele is present in the population. A position along a genetic dimension represents either the number of copies of an allele (in the case of an individual) or the frequency (proportion) of that allele in the population. Since the number of genes at a locus in an individual must add up to the relevant ploidy (one for a haploid, two for a diploid, etc), and the frequencies of different alleles at a locus in a population must add up to 100%, it is only necessary to specify the number or frequencies of (n - 1) alleles at each locus, since the number or frequency of the n'th allele will then be determined as a residual. It is therefore sufficient to have (n - 1) dimensions for each locus, where n is the number of alleles in the population at that locus. The total number of genetic dimensions is the sum of the (n - 1)'s for all loci. The gene pool of any species probably has at least 1,000 loci at which there are two or more alleles present in the population. The number of genetic dimensions is therefore at least 1,000, and usually much larger.

The Axes

It might be supposed that the genetic dimensions would be represented diagrammatically by Cartesian axes at right angles to each other (orthogonal axes). For loci with more than two alleles this would however have the disadvantage that the alleles would not be treated symmetrically. For example, with 3 alleles (A, B and C) represented on two orthogonal axes, if one axis represented the balance between A and B, and the other axis the balance between A and C, the balance between B and C could be inferred but would not be directly shown in the diagram. Wright therefore suggests in several places (e.g. Tax p.431-2) that the axes need not be orthogonal, so that for example in the case just mentioned the pairs A-B, A-C, and B-C could be represented by the sides of an equilateral triangle. In practice, Wright usually illustrates his concept with diagrams showing two orthogonal axes for genetic dimensions and one axis (height) for fitness, which on a flat page can be indicated either by perspective or by contours on a map.

The Number of Genotypes

The number of possible genotypes is vast. With at least 1,000 loci, even if only two positions were possible at each locus, the total number of genotypes representable in the system would be at least 2^1000. Wright himself gives a more generous estimate of 10^1000. Either way, the number is super-astronomical. As Wright points out, it is larger than the number of elementary particles in the universe. It is certainly far greater than the number of individuals in any species. It follows that most of the positions in the genetic 'space' of any actual species will be empty. Even if for most loci a single allele has a high frequency in the population, the genotypes of individuals will be very sparsely scattered over the space. Apart from clones, it is unlikely that two individuals will ever have exactly the same genotype.

Genotypes or Frequencies?

As Provine showed clearly in his biography (pp.307-17), Wright used two different interpretations of his genetic dimensions. In one interpretation, which I will call the genotype version, a position along a genetic dimension represents the number of alleles of a certain type in an individual genotype. For example, if the dimension represents the allele pair A-a at a diploid locus, a position at one end of the axis would represent the homozygote AA, a position at the other end would represent the homozygote aa, and a position in the middle would represent the heterozygote Aa. [Note 2] The whole genotype of an individual would be represented by a single point in the many-dimensional genotype space, and the allele composition of the individual at a given locus could be 'read off' from the projection of that point onto the relevant axis. The genetic composition of a population could then be represented by a number of points, one for each member of the population, at appropriate positions in the 'space'.

In the alternative interpretation, which I will call the frequency version, a position along a genetic dimension represents the proportion of alleles of a certain type in a population. For example, if the dimension represents the allele pair A-a at a diploid locus, a position at one end of the axis would represent fixation (100% frequency) for the allele A, a position at the other end would represent fixation for the allele a, and a position in between would represent an intermediate frequency, e.g. 60% A and 40% a. The entire genetic composition of a population could be represented by a single point at an appropriate position in the 'space'. It must not be inferred that all members of the population would have the genotype represented by this point under the genotype version. In fact, unless most loci are fixed for a single allele, it is extremely unlikely that any individual in the history of the species would have exactly that genotype.

There is no doubt that Wright uses both of these interpretations. In his first known account (in the 1931 letter to Fisher) he uses only the frequency version, but in the first published account (1932) he uses only the genotype version. From 1935 onwards his publications most often use the frequency version, but the genotype version is never entirely lost, and the two interpretations may even appear in the same work. (See Note 3 for my own attempt at a chronological listing.)

But is there really any inconsistency in the two different interpretations? It is evidently quite possible for a position along an axis to represent either an allele number or an allele frequency, and there is no fundamental reason why the two interpretations should not be used at different times, or even at the same time, provided the differences between the two interpretations are properly noted. There is of course a danger that the use of two different interpretations will lead to confusion, or even to actual error if theorems or generalisations which are valid only for one interpretation are applied to the other one. I am not aware that Wright himself ever falls into definite error, but his explanations are often unclear. According to Provine (p.311) , when he first pointed out the different interpretations to Wright, the latter was somewhat taken aback, and did not realise that he had been switching between them. Wright's 1988 paper, which includes a response to Provine's critique, is surprisingly insouciant about the issue, effectively taking the line: 'Why worry, it's only a diagram.'

Provine does have other criticisms, but before discussing these it will be useful to look at the remaining dimension of the landscape, that of fitness.

The Dimension of Fitness

In view of its importance Wright says surprisingly little about the nature or definition of fitness. In his first presentation of the landscape concept he says only that the entire field of gene combinations can be 'graded with respect to adaptive value under a particular set of conditions' (ESP p.162) . The word 'graded' seems to imply a relative measure of fitness, which is consistent with Wright's general approach and that of many other population geneticists, including Haldane. For most purposes a relative measure is sufficient. Wright does however recognise that an absolute measure, such as Fisher's Malthusian parameter, may be useful or necessary for some purposes, for example in dealing with overlapping generations (Tax, p.433).

A more important issue is the question of the relevant 'set of conditions', on which Wright is again disappointingly vague. Clearly the fitness of a given genotype will depend in part on the environment. It appears that Wright intends fitness to be averaged over the usual range of environments in which a species finds itself. But it would be reasonable to object that conditions will be constantly changing, so that there is no such thing as an 'average' environment except at a moment in time. Even at a moment in time the environment will vary in different parts of a species' geographical range. The most important aspect of a species' environment is often not the inorganic factors (climate, etc) but the organic or biotic environment of competitors, food, predators, parasites, and pathogens. These differ fundamentally from the inorganic environment because they are themselves evolving by natural selection, sometimes in response to the species of interest. For example, a new mutation occurring among any of the pathogens affecting a species may dramatically change the fitness of all the genotypes of that species. Wright does in various places recognise that the organic and inorganic environment are liable to change, but he tends to present this as a factor leading to movement of the species around the 'landscape', when it could arguably be seen as invalidating the concept of the landscape altogether. One of the essential features of a landscape, in the ordinary sense, is that it has at least a modicum of persistence through time.

For an individual member of a species, the other members of the same species are an important part of its biotic environment. This raises the possibility that the absolute or relative fitness of different genotypes may vary according to the genetic composition of the species population. Notably, this would be the case with various forms of frequency-dependent selection, for example, if pathogens or predators attack the most common variants. I cannot find any discussion of the issue in Wright's early papers. Under the first published (1932) account, which presents only the genotype version, it seems to be assumed that each genotype can be assigned a fitness regardless of gene frequencies. In the first published account of the frequency version (1935), Wright deals mainly with certain special cases, which again seem to be independent of frequency. In two more general presentations (1939 and 1940), I still find no clear statement. Finally, in 1942 (in an article based on a lecture given in September 1941) we find an explicit assumption that 'the relative selective values of these genotypes are independent of their frequencies' (ESP p.472). It may be relevant that in 1941, in a paper referenced in Wright's 1942 article, R. A. Fisher had sharply criticised Wright's 1940 presentation. Whatever the reasons, in later discussions, notably Tax and EGP, Wright gave more attention to the issue of frequency-dependence (see especially Tax pp.443-49). Generally speaking, frequency-dependence can involve either positive or negative feedback, in the first case driving alleles to fixation, and in the second often leading to a balanced polymorphism. If the latter case is common in nature, it would tend to make the landscape concept more difficult to interpret (see further below).

Is there a fitness surface?

On many occasions Wright refers to the values of the fitness dimension as forming a 'surface'. This would normally imply at least an approximate continuity of values for fitness with respect to changes along the other dimensions. Provine has pointed out that under the genotype version, the fitness values cannot be continuous. The genotype values themselves form a lattice of discrete points, not a surface, so the associated fitness values must likewise be discontinuous.

I think this objection is somewhat overstated. First, as a matter of textual detail, Wright seldom uses the term 'surface' when he is referring to the genotype version; in particular, he does not use the term in his first (1932) published account. But on at least one occasion (in 1939, ESP p.318), he does unambiguously refer to a fitness surface with respect to genotypes; also, as Provine points out, even in the 1932 account Wright uses a diagram which seems to imply a continuous surface. Provine's criticism therefore needs to be met, but I think it is not as serious as Provine suggests. It is true that the genotype values form a lattice of points rather than a surface, but it is possible to define a 'distance' between these points by the number of gene substitutions needed to go from one point to another. We can reasonably describe some points as being closer than others. It would then also be reasonable, if not mathematically exact, to say that the associated fitness values approximate to a surface, provided that small differences in distance correspond to small differences in fitness. The real objection, it seems to me, is not that the surface is not strictly continuous, but that the necessary correspondence between fitness and distance does not exist. Genotypes which differ only in a single allele may differ widely in fitness, for example if the heterozygote at a given locus has above-average fitness, whereas the recessive homozygote is lethal. I do not see any basis for an assumption that differences in fitness correspond, even loosely, to the number of genetic differences between two genotypes.

I suggest that the following picture is more plausible. A very large part of the 'genotype space' must correspond to zero fitness, since it would involve combinations of rare disadvantageous alleles which are unlikely ever to be combined in reality. Only a small 'corner' of the space is inhabited by actual genotypes. Most of these will have rather similar average fitness, equivalent to producing around two surviving offspring (by sexual reproduction), since, on average, this is what most genotypes actually achieve under their normal circumstances. (If they did not, the population would soon die out.) Among these mediocre genotypes there will be a scattering of super-fit types, and a larger scattering of low-fitness types. The geometrical picture is that most of the landscape would be flat, with uniformly zero fitness, rising gently up to a small inhabited plateau of mediocre fitness, in which there are numerous 'holes' corresponding to genotypes with low fitness (e.g. lethal recessives) compared to their immediate neighbours. [Note 4] There will also be scattered pimples or wrinkles of modest height representing clusters of genotypes containing advantageous genes that are still in the process of selection, and shallow depressions representing mildly disadvantageous genes. But because it contains numerous 'holes' - isolated genotypes or groups of genotypes with fitness much lower than their neighbours - the landscape is not even approximately a continuous surface.

If now we turn to the frequency version, there are better grounds for regarding the fitness surface as continuous. In the frequency version each point in the genetic space corresponds to a certain set of allele frequencies at each locus. Provided we make certain assumptions about the mating system and linkage (usually random mating and zero linkage), each array of allele frequencies will be associated with an array of all possible genotypes, each with a definite probability of occurrence. The mean fitness associated with a given point in the frequency space will therefore also be defined. As the point moves around the space, the genotype probabilities will vary continuously, and so will the average fitness, since the value of ab + cd varies continuously if a and c vary continuously, for any fixed values of b and d. It is true that in a finite population the allele frequencies cannot vary with strict mathematical continuity, since they are ultimately fractions with the population size as a denominator, but unless the population size is very small, the fitness surface will approximate to continuity.

What is a fitness peak?

The idea of a fitness 'peak' is central to Wright's use of the 'landscape' concept. So what exactly is a fitness peak? Characteristically, in introducing the term (in 1932) Wright does not formally define it, and his meaning has to be inferred from what he says about it.

This is one issue where it is important to distinguish between the genotype and frequency versions of the landscape. With the genotype version, the definition of a fitness peak is relatively straightforward. If a genotype has higher fitness than any genotype which can be derived from it by substituting another allele at a single site (including e.g. substituting a homozygote for a heterozygote at a given locus), then it may be described as a local fitness peak. So far as I am aware, this is how Wright always uses the term 'peak' under the genotype version.

Under the frequency version matters are less clear. We could, of course, stipulate that a set of frequencies is a local peak if any small frequency change at a single locus would reduce the mean fitness of the population. But this would exclude the reasonable possibility that frequencies may change slightly but simultaneously at more than one locus, which might increase mean fitness even though no single-locus change would do so. The natural definition of local fitness peak implied by these considerations is that a set of frequencies is a local fitness peak if no combination of small simultaneous frequency changes, at any number of loci, would increase mean fitness. Geometrically, this is equivalent to stipulating that a local fitness peak is immediately surrounded by downward slopes of fitness in all 'directions' in the genetic space. Probably this intuitive concept could be defined more precisely in terms of the 'principal directions' of differential geometry, but I am not aware that Wright himself ever took this approach. [Note 5] In practice, Wright deals mainly with specific cases where the intuitive meaning of a fitness peak is sufficiently clear.

How many peaks?

One of Wright's fundamental claims about the landscape is that it has numerous local peaks. Moreover, many of these have a different fitness 'height'. To give some examples (all page references to ESP), he claims that the number of peaks is 'many' (9, 483), 'enormous' (163, 370), 'large' (226), 'inconceivably great' (230), 'multiple' (318), 'innumerable' (348, 554), and even 'virtually infinite' (535). He also insists that many of these peaks will have a different selective value (see the cited or nearby pages for examples). Without these claims, the landscape concept has little interest. The basis of the claims therefore needs to be examined.

In his original 1932 presentation Wright used a simple probabilistic argument for the existence of numerous peaks. The number of possible genotypes is vast, so even if only a tiny proportion of them are local optima, the number of local optima would still be very large: 'With something like 10^1000 possibilities it may be taken as certain that there will be an enormous number of widely separated harmonious combinations. The chance that a random combination is as adaptive as those characteristic of the species may be as low as 10^-100 and still leave room for 10^800 separate peaks....(ESP p.163)'.

This is a dubious argument. It may be compared to a common argument for the existence of intelligent life elsewhere in the universe. There are around 10,000 billion billion stars in the universe, so even if the proportion of stars with planets supporting intelligent life is tiny - say, 1 in 10,000 billion - there would still be an enormous number of such stars. But consider the following counter-argument. It is plausible that the emergence and survival of intelligent life requires a moderately large number of conditions - say, at least 100 - to be met. It is also plausible that these conditions are largely independent, and individually quite improbable - say, with a probability of only 1 in 100. But with these assumptions, the probability that all of the necessary conditions are met in any given case is less than 1 in 1/100^100. This is vastly less than 1 in 10,000 billion billion, so rather than expecting there to be a large number of stars with planets supporting intelligent life, it would be a miracle if there are any at all. In reality, neither argument goes much further than establishing the bare possibility of the conclusion. Similarly, in the case of selective peaks, the sheer number of possible genotypes is in itself not a strong argument for the existence, rather than the bare possibility, of numerous different peaks.

Wright does later present better arguments for the existence of multiple peaks. By far his most common example is that of a quantitative trait controlled by several loci where the selective optimum for the trait is at an intermediate value, i.e. neither the highest nor the lowest that can be produced by the various possible combinations of alleles. In this situation it is likely that the optimum intermediate value of the trait can be produced by different allele combinations. The effect of an allele on fitness (not necessarily on the quantitative trait itself) is epistatic, i.e. dependent on the combination of other genes in the genotype. Which of the relevant alleles are favoured by selection may then depend on the accident of which allele at a locus happens to be most frequent when selection begins, with all other alleles at the locus being driven to extinction. This example is used repeatedly: ESP pp.247, 310, 319, 370, 477, 626, Tax p. 450, EGP vol. 1 pp.59-60.

The theoretical possibility of multiple selective peaks in this situation has been generally recognised. As I pointed out in a post on R. A. Fisher and epistasis, it was recognised by Fisher in 1930. It was also noted by J. B. S. Haldane, who is sometimes mentioned by Wright in this context. Indeed, a diagram used repeatedly by Wright to illustrate the point (e.g. ESP pp. 310, 371) looks suspiciously like an adaptation of one used by Haldane (Causes of Evolution, p.107).

It should be noted that the example of an intermediate optimal phenotype applies to both the genotype and frequency versions of the landscape concept. Provine has claimed that the two versions are 'mathematically wholly incompatible and incommensurable, and there would be no way to transform one into the other' (Provine, p.313). Like his other criticisms, I think this one is overstated. In at least one important class of cases a local peak under the genotype version will be a local peak under the frequency version as well. This is where the local optimum genotype is homozygous at all loci (or where the organism is haploid). In this case, if all the alleles of the optimum genotype are fixed (i.e. have a frequency of 100%) in the gene pool, all genotypes produced from the gene pool will be identical, and will have the local optimum value. Any change in frequencies (including simultaneous changes in several frequencies) can then only occur by mutations, producing a small proportion of alternative alleles. Assuming random mating and zero linkage, the genotypes produced from the new gene pool will usually differ from the local optimum genotype at no more than a single locus. But by definition these are all less fit than the local optimum, so the change in frequencies will be selected against. Genotypes which differ from the local optimum at more than one locus are indeed possible, and may be fitter than the local optimum, but they will occur so rarely that they can usually be neglected. The frequency array in which all the alleles of a local optimum genotype are fixed in the population will therefore usually be a local peak under the frequency version.

If the optimum genotype is not homozygous at all loci, I think Provine is right that there is no easy transition from the genotype version to the frequency version. For any locus that is heterozygous in the local optimum genotype, the heterozygote is most likely to be produced by a 50:50 ratio of the relevant alleles in the population. Let us suppose that the population is fixed for all the homozygous alleles in the optimum genotype, and has a 50:50 ratio for all the heterozygous alleles. Unlike the case where all loci are fixed, this frequency set will produce a multiplicity of genotypes. If there are more than a few heterozygous loci in the optimum genotype, only a small proportion of the genotypes produced from the frequency set will actually have the optimum genotype. (At any heterozygous locus a 50:50 frequency will produce 50% heterozygotes, so if there are n independent heterozygous loci the proportion of genotypes that are heterozygous at all the relevant loci will be (1/2)^n, which rapidly becomes negligible as n increases.) There is no guarantee that this frequency set will be a local fitness optimum (as defined under the frequency version), since this will depend on the fitness of numerous different genotypes, whose mean fitness may well be higher at some other nearby point in the frequency space. It all gets very complicated. If we also take account of frequency-dependent fitness, it is even messier, since there may be no such thing as a local optimum genotype that remains optimal under all frequency arrays.

The case of optimum fitness of a trait with an intermediate value does however go some way towards vindicating Wright's confidence in the existence of numerous local peaks. Assuming that there are several such traits which are genetically independent of each other, and of other loci, this may lead to a very large number of local optimum genotypes. With at least two independent optima for each trait, the total number of local optimal genotypes will be at least 2^n, where n is the number of traits. This quickly leads to large numbers: over a thousand for n = 10, over a million for n = 20, over a billion for n = 30, and so on. But there is a snag. Selection for an intermediate value of a trait will, if it is successful, always produce much the same phenotype. For example, if the optimum length of a canine tooth is 1 inch, selection will tend to produce that length of tooth even if different combinations of alleles are involved. In this case there will be multiple peaks in the genetic landscape, but they will all be of much the same 'height' in the fitness dimension. This would take much of the interest out of the concept. Wright recognised this snag at least from 1935 onwards. His answer to the problem was to emphasise that most genes have multiple (pleiotropic) effects, and that the system of peaks relative to one character is therefore not independent of that relative to another (ESP p.230, 320, etc.) In some places Wright seems to imply that the allele frequencies may be fixed at an arbitrary peak by selection for the optimal value of one trait, leaving the effects on some other trait varying and often suboptimal (e.g. ESP p.595, but he is not explicit). But this is doubtful. Suppose for example that an allele combination which determines the length of the canine teeth also affects the incisors. If two such combinations produce the same optimum length of canines, but different lengths of incisors, there will be selective pressure to bring the latter towards its own optimum. In this situation there may well be genes at other loci that are capable of modifying the trait. If necessary, new mutations could be selected (not necessarily absolutely new, but newly advantageous.) It is not clear that significantly different (in fitness) multiple peaks will persist for any trait. In at least one place (Tax p.450) Wright himself may recognise this possibility, but it does not seem to have dented his confidence in the existence of multiple peaks with different fitness.

Although the case of intermediate optimum traits is by far the most common reason given by Wright for the existence of multiple peaks, it is not quite the only one. He does occasionally mention the possibility of multiple peaks at a single locus with two or more alleles, if the homozygotes are fitter than the heterozygotes. He also recognises the value of Simpson's concept of phenotypic and ecological peaks, distinguishing two cases: those where different phenotypes give alternative ways of adapting to the same selective conditions, and those where they give ways of adapting to different ecological niches within the same environment (ESP p.555).


Overall, it seems to me that Wright makes out a plausible case that there are likely to be multiple peaks of fitness, but the arguments are not conclusive. If the environment is changing, as it always is, the landscape itself becomes fluid. And if there is widespread genetic polymorphism and/or frequency-dependence in a population, much of Wright's original formulation is (by his own admission) not directly applicable. Provine's criticisms of the two different versions of the landscape concept seem to me overstated, but he is right to question its usefulness as a heuristic device. If several generations of biologists failed even to notice the existence of the two versions, the metaphor of the landscape can hardly be said to have encouraged clarity of thought.

The discussion so far has left some important issues untouched. What are the reasons for expecting a population to 'climb' up a fitness slope? Even if there are many fitness peaks in the landscape, are they all accessible to the population? Will a population get 'stuck' on a peak for any length of time? If so, what circumstances may shift it away from that peak? These questions all go to the heart of the shifting balance theory, so rather than discuss them now I will leave them for my intended note on the shifting balance theory. But before I get there I think it will be useful to cover two supplementary issues which are less directly concerned with Wright's own views. First, what did R. A. Fisher think about all this? And second, apart from Wright's own arguments, what other theoretical or empirical reasons are there for believing in multiple fitness peaks?

Note 1: I do not claim to be very familiar with this literature, which is often highly technical and has little to do with Wright's own formulation. See for example the book by Gavrilets and its extensive bibliography.

Note 2: Wright himself sometimes uses a notation in which only one of the two alleles at a locus is indicated, so that for example if there are three loci with alleles Aa, Bb, and Cc, the genotype AabbCc could be represented by small letters as abbc, and AABbcc as bcc, and so on. The single genotype in which there are no small letters at all is represented by +. Some of Wright's examples are very difficult to follow if these conventions are not understood.

Note 3: 1931 (letter to Fisher): frequency; 1932 (ESP p.163): genotype; 1935 (ESP p.226): frequency; 1937 (ESP p.248): frequency; 1939 (ESP pp.310, 318): both; 1940 (ESP p.347): genotype; 1940 (ESP p.370): frequency; 1941 (ESP p.472): frequency; 1948 (ESP p.535): genotype; 1948 (ESP p543): frequency; 1949 (ESP p. 552): frequency; 1960 (Tax): both; 1977 (ESP p.9): frequency; 1980 (ESP p.626): genotype.

Note 4: Terms like 'hole' and 'wrinkle' must be understood as the n-dimensional analogues of these terms in three dimensions. A 'hole' may itself be a figure with many dimensions.

Note 5: Even in 3 dimensions, containing 2-dimensional surfaces, differential geometry is a tough subject. For an introduction see Aleksandrov, ed, chapter 7.


Works by Sewall Wright

Evolution: Selected Papers (ESP), ed. William B.Provine, 1986
Evolution and the Genetics of Populations (EGP), 4 vols., 1968-1978
'Physiological genetics, ecology of populations, and natural selection', in Evolution After Darwin, vol. 1, ed. Sol Tax, 1960 (Tax)
'Surfaces of selective value revisited', American Naturalist, 131, 1988, 115-23.

Other works

A. Aleksandrov et al., eds., Mathematics: its content, methods, and meaning, vol. 2, 1963
R. A. Fisher, 'Average excess and average effect of a gene substitution', Annals of Eugenics, 11, 1941, 53-63.
Sergey Gavrilets, Fitness Landscapes and the Origin of Species, 2004
J. B. S. Haldane, The Causes of Evolution, 1932 (reprint ed. E. Leigh, 1990)
William B. Provine, Sewall Wright and Evolutionary Biology, 1986
G. G. Simpson: Tempo and Mode in Evolution, 1944 (reprint 1984)
The Major Features of Evolution, 1953

Labels: ,

Sunday, July 20, 2008

Fisher on Epistasis: another Addendum   posted by DavidB @ 7/20/2008 06:12:00 AM

In my recent note on R. A. Fisher and epistasis, I mentioned that Fisher's theory of the evolution of dominance relied on the epistatic effect of 'modifier' genes. On looking again at the chapter in The Genetical Theory of Natural Selection dealing with the evolution of dominance, I see that there is a more general statement of the principle that the effect of a gene depends in part on the genetic background against which it occurs:

The fashion of speaking of a given factor, or gene substitution, as causing a given somatic change, which was prevalent among the earlier geneticists, has largely given way to a realization that the change, although genetically determined, may be influenced or governed either by the environment in which the substitution is examined, or by the other elements in the genetic composition. Cases were fairly early noticed in which a factor, B, produced an effect when a second factor, A, was represented by its recessive gene, but not when the dominant gene was present. Factor A was then said to be epistatic to factor B, or more recently B would be said to be a specific modifier of A. .... These are evidently only particular examples of the more general fact that the visible effect of a gene substitution depends both on the gene substitution itself and on the genetic complex, or organism, in which this gene substitution is made.
- The Genetical Theory of Natural Selection, page 54, variorum edition, 1999, from the first edition text of 1930. There is a slight change of wording in the second (1958) edition.

Labels: ,

Friday, July 18, 2008

R. A. Fisher and Epistasis   posted by DavidB @ 7/18/2008 05:21:00 AM

My next note on Sewall Wright will cover the exciting subject of the adaptive landscape. As every schoolboy knows, Wright considered epistatic gene interactions very important in determining the 'peaks' of the landscape. A sharp contrast is sometimes drawn between Wright and R. A. Fisher in this respect. For example:

Fisher believed that the process of genetical evolution occurred through selection that acts on the additive effects of genes in large populations. Although Fisher formally considered gene interactions, he was also dismissive of them, likening epistatic genetic variation to nonheritable (i.e. nontransmissible) environmental variations of phenotype. In contrast, Wright believed that nonadditive, or epistatic, effects were of primary importance, particularly in subdivided populations.
- from the editors' Preface to Epistasis and the Evolutionary Process

What is said here about Wright seems broadly correct, but what is said about Fisher is seriously misleading. Before continuing with my notes on Wright, I will therefore try to clarify Fisher's views on epistasis.[Note: due to formatting problems, italics and other refinements may be omitted.]

First, it is necessary to say something about the meaning of epistasis. The term 'epistasis' itself seems to have emerged around 1917. The first use cited in the OED is from the index to the 1917 volume of the journal Genetics. Around the same time Fisher, in writing his 1918 paper on the Correlation of Relatives, coined the term 'epistacy', but this never caught on. Both terms were derived from the adjective 'epistatic'. Like much of the terminology of genetics (including the word 'genetics' itself) this was coined by William Bateson, in 1907. Bateson used it with a relatively limited meaning to describe cases where a gene at one locus masked or suppressed the action of genes at another locus. For example, genes at one locus might affect the pigmentation of an animal's fur, but a gene at another locus might suppress the production of pigment entirely, causing albinism. In this case the trait of albinism (or the gene producing it) would be called epistatic (literally 'standing over'), while the traits that were masked would be called 'hypostatic' (literally 'standing under'). This limited usage of 'epistatic' is still sometimes found in medical genetics, but in evolutionary genetics a wider usage is more common. In the wider usage, epistasis is any kind of interaction between genes at different loci. Of course, many traits are affected by genes at more than one locus, but this does not necessarily imply interaction. The meaning of 'interaction' is that the genes at different loci do not act independently. For qualitative traits, the usual test of this is that the traits of the offspring do not show the expected Mendelian ratios (which is how epistasis in Bateson's sense was originally discovered). For quantitative traits, the usual criterion is that the value of the trait is not simply the sum of the values attributable to the individual genes concerned. If it is simply the sum, the genes are often said to have a purely 'additive' effect. If not, the trait either shows dominance (if the interaction is between genes at the same locus) or epistasis (if at different loci).

Assuming that epistasis can be identified (which in practice is often very difficult for small effects), it may be asked how the effects of epistatic interaction on a quantitative trait can be measured. One answer to this would be to decide that where interaction is involved, the entire effect of the interacting genes should be counted as epistatic. But this seems unreasonable if the same genes would still have some effect even if there were no interaction. An ideal solution might be to find cases in which the genes concerned are not involved in any epistatic relations, and measure their effect in these circumstances, then subtract this from the effect in the case of epistasis. But if epistasis is a widespread phenomenon, it would be difficult to find these non-epistatic cases, since most genes would show some effects of interaction. In any event, a different approach is generally taken.

The usual approach to measuring the effects of epistasis is roughly as follows. Each gene is assigned a value (the 'average effect' of the gene) based on the average value for the trait concerned among those members of the population who carry that gene, expressed as a deviation from the population mean. Each genotype (gene combination) is then assigned a value based simply on the sum of these average values. This is called the 'breeding value', since it is the part of the genetic makeup of the individual which enables the traits of its offspring to be predicted for breeding purposes. These breeding values will have a certain variance, relative to the population mean, usually called the additive genetic variance. The actual observed values will have a greater variance than this, due to the effects of environment, dominance, epistasis, and various other complications. The portion of the observed variance attributable to epistasis is estimated after the effects of environment and dominance have been subtracted. Genes with epistatic effects are not excluded from the analysis, and they may contribute to both additive and (in a more complicated way) to dominance variance as well as to the specific epistatic or 'genetic interaction' variance. All this is explained more fully, and no doubt more clearly, in Falconer. For a simple worked example of my own see Note 1.

The standard terminology is unfortunate. It cannot be stressed too strongly that 'additive' variance is not the same as the variance due to genes with purely additive effects. The additive variance takes account of the average effects of all genes, including those that may show strong dominance or epistasis. These average effects depend in part on the gene frequencies present in the population in question, and assume that all possible genotypes occur in the proportions expected under a given system of mating (usually assumed to be random). Part of the average effect is therefore due to the effects of gene interactions. Conversely, the so-called 'epistatic variance' covers only a part - usually the minority - of the effects that might intuitively be ascribed to interaction. Enthusiasts for epistasis (as in the volume already cited) sometimes complain that the standard method of apportioning variance tends to understate the effects of epistasis, and makes it difficult to detect. For example, James Cheverud comments that 'most tests for epistasis rely on the epistatic variance alone and ignore its contribution to additive and dominance variance' (p.65) and Edmund Brodie says that 'under a wide range of allele frequencies and strengths of interaction, the majority of variance produced by gene interaction is actually additive' (p.10). It would be possible in principle to use alternative measures which assign more of the observed variance to epistasis. But the standard method does have the advantage that it is possible to estimate the additive variance from the observed correlation between parents and offspring, and conversely to estimate the value of offspring from that of parents. This is particularly important if we wish to predict the effects of natural or artificial selection. Whatever we call it, the 'additive' variance is a useful concept and is not going to go away.

It is also desirable to distinguish between epistasis for fitness and for other traits of the organism. Fitness itself (whether measured simply by number of offspring or otherwise) shows epistasis if the effects on fitness of genes at different loci are not purely additive. If fitness is measured in relation to some particular trait, the fitness may show epistasis even if the trait as such does not. (And presumably vice versa, though I cannot think of a plausible scenario for this.) For example, a trait such as body size might be influenced by several genes acting purely additively in their effects on body size, but epistatically in their effect on fitness. This will often be the case if fitness is highest for some intermediate value of the trait. The fitness effects of genes tending to raise (or lower) the value of the trait will then depend crucially on the other genes they happen to be combined with. In the simplest case, if there are two haploid loci, with alleles H and L (for High and Low) at one locus, and h and l at the other, the combinations Hl and hL, which give intermediate size, may be favoured by selection, while the combinations Hh and Ll, which give high and low size respectively, are selected against. In this case the fitness is epistatic even though the direct effect of the genes on the phenotype is additive.

After all these preliminaries, I turn to discuss what Fisher actually said about epistasis.

Correlation of relatives

As already mentioned, Fisher's great 1918 paper on the 'Correlation of Relatives' proposed the term 'epistacy' to allow for the interaction of genes at different loci, and devised the standard method for apportioning variance. Fisher introduces his definition of 'epistacy' as follows: 'There is in dominance a certain latency. We may say that the somatic [phenotypic] effects of identical genetic changes are not additive, and for this reason the genetic similarity of relations is partly obscured in the statistical aggregate [see Note 2]. A similar deviation from the addition of superimposed effects may occur between different Mendelian factors [genes at different loci]. We may use the term Epistacy to describe such deviation, which although potentially more complicated, has similar statistical effects to dominance. If the two sexes are considered as Mendelian alternatives, the fact that other Mendelian factors affect them to different extents may be regarded as an example of epistacy. The contributions of imperfectly additive genetic factors divide themselves for statistical purposes into two parts: an additive part which reflects the genetic nature without distortion, and gives rise to the correlations which one obtains, and a residue which acts in much the same way as an arbitrary error introduced into the measurements. ' (p.404) Note that Fisher says here quite explicitly that part of the contribution of 'imperfectly additive' genes is itself additive, or as we would say, falls within the additive variance. Fisher does not say a great deal more about 'epistacy' in this paper (but see p.408-9 for the mathematical treatment of epistatic variance), and one of the contributors to the volume cited earlier claims that in his 1918 paper Fisher 'dismissed gene interactions as being of only minor importance in the evolutionary process, analogous to nonheritable modifications of the phenotype' (p.125). This goes beyond anything Fisher says. What he does say is that 'Throughout this work it has been necessary not to introduce any avoidable complications, and for this reason the possibilities of Epistacy have only been touched upon...' (p.432). For Fisher's specific purpose in this paper, which was to explain the correlation between relatives on Mendelian principles, and not to discuss evolutionary theory in general, his brief treatment of 'epistacy' seems sufficient. Fisher finds that with his methods the existing data on the correlation of relatives (mainly the data of Karl Pearson on humans) can be explained satisfactorily by additive variance, dominance, and assortative mating, without much influence of other factors, which by implication include epistatic variance. Fisher is more explicit about this in his 1922 paper on the Dominance Ratio, where he says that 'special causes, such as epistacy, may produce departures [from the expected correlations], which may in general be expected to be very small from the general simplicity of the results'. But before interpreting this as a general pronouncement on the insignificant role of epistasis in evolution, we should note that (a) the additive variance includes much of the effect of 'epistatic' genes, and (b), the discussion was concerned with ordinary traits such as height, and not with fitness. As emphasised earlier, there may be epistasis for fitness even if the underlying traits are purely additive.

The evolution of dominance

One of Fisher's best-known, and most controversial, theories is that of the evolution of dominance. Noting that harmful mutations are usually (though not always), recessive in their effects, Fisher sought to explain this by the action of modifier genes at other loci, which would be gradually selected to minimise the harmful effects of common recurring mutations by making them recessive. The theory has not been generally accepted, and Wright in particular opposed it, mainly on the grounds that the selective advantage of modifier genes would be so weak that it would usually be overpowered by their other, more direct, effects. Regardless of whether Fisher was right or wrong on this issue, the point to note here is that his theory depends entirely on epistatic effects! In this respect, at least, Fisher was more enthusiastic about epistasis than Wright himself.


A whole chapter of the Genetical Theory of Natural Selection is concerned with Mimicry. In discussing the underlying genetics of mimicry, Fisher emphasises the role of modifier genes, including those that act as 'switches' for other genes. For example, discussing the 'hooded' gene in rats, he says 'The gene, then, may be taken to be uninfluenced by selection, but its external effect may be influenced, apparently to any extent, by means of the selection of modifying factors' (p.185). And in discussing another case he goes on to say 'The gradual evolution of such mimetic resemblances is just what we should expect if the modifying factors, which always seem to be available in abundance, were subjected to the selection of birds or other predators' (p.185). While modifiers might in principle be purely additive in effect, they are more likely to be epistatic. This is presumably always the case with 'switch' genes.


Chapter 6 of GTNS deals with a variety of issues concerning sex, sexual selection, sex-limited traits, and speciation. Some of these could well involve epistasis - indeed, 'sex-limited' traits (those which are only manifested in one sex) do so almost by definition, if sex is genetically determined. (As mentioned in Fisher's paper on 'Correlation of Relatives', quoted above, differences between the sexes can be regarded as a case of 'epistacy'.) However, I find only one definite reference in the chapter to epistatic effects. In his discussion of speciation, Fisher points out that the adaptiveness of genes will vary in the different parts of a species's range, and says that 'In addition to those genes which are selected differentially by the contrasted environments, we must moreover add those, the selective advantage or disadvantage of which is conditioned by the genotype in which they occur, and which will therefore possess differential survival value, owing not directly to the contrast in environments, but indirectly to the genotypic contrast which these environments induce' (p.141). A difference in the selective advantage of a gene according to the genotypic background implies epistatic fitness. What Fisher is describing here is actually what is often called a 'co-adapted gene complex', much beloved of Wrightians.

The Fundamental Theorem of Natural Selection

The Fundamental Theorem of Natural Selection states that 'The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time' (GTNS p.37). The FTNS is notoriously difficult to interpret, and I do not intend to say much about it here. It is however now generally accepted, following the interpretations by George Price and A. W. F. Edwards, that when Fisher refers to 'genetic variance' he means the 'additive' genetic variance. The additive variance takes account of the average effect of genes in all the various environmental circumstances and genetic combinations in which they are found, in the proportions to be expected under a given system of mating. (See expecially p.31 of GTNS, where Fisher defines 'average excess' and 'average effect'.) It therefore incorporates the effects of dominance and epistasis to the extent that these contribute to the additive value of the genes. There is no reason at all to suppose that genes with epistatic effects are excluded from the FTNS. What is excluded is only that part of the total variance that is not covered by the contribution of those genes to additive variance. This can be justified on the grounds that the non-additive variance does not predictably change gene frequencies in the next generation and therefore has little effect on evolution. As Cheverud admits, 'the rate of evolution is determined by the additive genic [sic] variance alone' (p.65).

Selection at two loci

Before 1930 neither Fisher nor Wright had treated selection at more than one locus. As so often, the pioneer of the subject was J. B. S. Haldane, in 1926. In 1930 Fisher did however give the subject a short section in Chapter 5 of GTNS, under the heading 'Equilibrium involving two factors'. (This chapter is one of several that appear to be invisible to some readers.) The interesting situation, as Fisher recognises, is where two different combinations of alleles (e.g. AB and ab) are both favoured by selection, while the same genes are disadvantageous in other possible combinations (e.g. Ab and aB). Fitness in this case is therefore clearly epistatic. In his chapter summary Fisher says that stable equilibria may be established, but he is rather vague about the conditions for stability. But his main point is that there will be selection in favour of closer linkage between favourable gene combinations on the same chromosomes, and it is therefore a puzzle why recombination is as frequent as it is. I think this remains a problem. In any event, it is a case where Fisher clearly recognised the role of epistasis.

Selection of metrical characters

One of the most intriguing, but difficult, sections of GTNS is the one (also in the 'invisible' Chapter 5) on 'Simple metrical characters'. (I sometimes wonder if Fisher's use of the word 'simple' was a sly joke.) The case of interest is where a quantitative character, such as the size of a tooth, is regulated by genes at more than one locus, and subject to stabilising selection in favour of an intermediate size. Egbert Leigh has described this (in his 'Afterword' to the 1990 reprint of Haldane's 'The Causes of Evolution') as 'a topic still replete with mysteries and surprises'. Fisher's account is even more tangled than most, because he attempts to explain simultaneously selection of the metrical trait itself and selection for dominance of the genes controlling it. I cannot pretend to understand everything he says on the subject, but what is clear for the present purpose is that fitness in this case is epistatic, and that there may be more than one outcome of selection, depending on the initial frequencies of the genes concerned: 'the conditions of equilibrium are always unstable. Whichever gene is at less than its equilibrium frequency will tend to be further diminished by selection' (p.121). This is precisely the situation which Wright often emphasised as leading to alternative 'selective peaks'. But unlike Wright, Fisher did not believe a species was likely to get 'stuck' permanently on a selective peak (not that Fisher had much time for the adaptive landscape anyway). Fisher believed that following any change in the optimum phenotypic value due to environmental change there would be sufficient genetic variation (in a large population) for selection to shift organisms quickly towards the new optimum. His confidence in this was based mainly on the results of artificial selection, as he referred to 'the extreme rapidity with which such measurements are modified when selection is directed to this end' (p.119). The effects of such changes on gene frequencies might be lasting, even if the initiating circumstances were temporary. In Fisher's analogy, which may be more illuminating to physicists than to me, 'the system resembles one in which a tensile force is capable of producing both elastic and permanent strain, and in which the permanent deformations always tend to relieve the elastic forces which are set up' (p. 125).

This section of GTNS raises a rather intriguing historical possibility. As Provine has noted in his biography of Wright (Provine p.285-6), there was an unexplained change in Wright's account of the 'shifting balance' theory between his exposition in 'Evolution in Mendelian Populations' (1931), and his next major account in 1932. In 1931 he had asserted that temporary changes in the environment would only have temporary effects on the gene pool, being essentially reversible. Hence his emphasis on genetic drift in small subpopulations, as the only possible means of shifting from one peak to another. In 1932, on the other hand, he accepted that environmental changes could also shift a population from one stable peak to another, so that their effects might be lasting even after the change in environment had reversed. Unfortunately Wright did not explain the reasons for his change of mind, nor did he draw attention to the change, which is really very important, since it greatly weakens Wright's argument for the importance of genetic drift in small local subpopulations. Provine speculates, plausibly enough, that Wright's correspondence with Fisher, his reading of GTNS, and Fisher's own published review of 'Evolution in Mendelian Populations', had something to do with the change. My own suggestion, to build on this, is that Fisher's discussion of metrical characters in Chapter 5 of GTNS was a particular influence. But I have no direct evidence of this, so it will probably remain a mere speculation.


The main purpose of this note has been to identify and document what R. A. Fisher himself, as opposed to the straw man 'Fisher', actually said and believed about epistasis. Readers will be able to draw their own conclusions, but I will briefly indicate my own.

a) Fisher did not deny the existence of epistasis, in the broad sense, and in some specific cases - including the evolution of dominance, selection at two loci, and quantitative (metrical) traits under stabilising selection - he gave it an important role.

b) Fisher agreed with Wright (and Haldane) that in some circumstances, including stabilising selection, there could be more than one outcome of selection in terms of the resulting gene frequencies. Unlike Wright (in 1931), but like Wright (in 1932), he believed that temporary environmental change could shift a population durably from one equilibrium set of gene frequencies to another. Fisher's treatment of the problem in GTNS may have influenced Wright's unexplained volte-face on this important issue.

c) Fisher did not believe populations were likely to get stuck on a local peak in the selective landscape, but this was not because he did not believe in epistatic effects, but because he did not believe in the validity of the selective landscape concept at all. I will probably say more about Fisher's thinking on this in another post.

d) Fisher's general concept of evolutionary change, as expressed in the Fundamental Theorem of Natural Selection, does not exclude epistatic effects. The FTNS takes account of epistasis (and dominance) precisely to the extent that they do affect the rate of evolutionary change. The FTNS is neutral with respect to the importance of epistasis: whether it is important or unimportant cannot be inferred from the theorem, which takes account of additive variance in fitness whatever its source. Unfortunately much confusion has arisen about the meaning of 'additive' and 'epistatic' variance. If it is not understood that 'additive' variance includes much of the effect of epistatic genes, while 'epistatic' variance excludes much of that effect, the scope of the FTNS will be seriously misconstrued. It would be better to call additive variance something like 'heritable variance', while the non-additive effects of dominance and epistasis are clearly labelled in such a way as to make it clear that they are only part of the total effect of gene interactions.

e) Unlike Wright, Fisher did not, at least in his published works, put any emphasis on epistasis as a major factor in evolution. It is necessary to read GTNS quite carefully (or at least to look at all the chapters!) to find the references I have gathered together here. It is an empirical matter whether epistasis plays the central role that Wright gave it. Or it might have an important role that neither Wright nor Fisher had thought of, as suggested in Kondrashov's theory of sex.

I have not dealt here with another aspect of Fisher's views, namely his rejection of the importance in evolution of large single mutations. I have no doubt that Fisher believed that evolution occurred mainly through the selection of a large number of genes with individually small effects. I have not discussed this because (a) it was not a point of disagreement between Fisher and Wright, and (b) it does not seem relevant to the issue of epistasis. As far as I can see, large mutations are no more or less likely to have epistatic effects than small ones.


After writing the above, I came across a further reference to epistasis in Fisher's correspondence. Writing to Leonard Darwin in 1928, Fisher said 'I am inclining to the idea that the main work of evolution lies in the discovery by trial of perhaps rare combinations of its existing variants, which work better than the commoner combinations. A slight increase in the number of individuals bearing such a favourable combination will then set up selection in favour of all the genes in the combination, with marked evolutionary results. Many of these genes would have been previously rare mutant types (not necessarily rare mutations) unfavourable to survival. I think of the species not as dragged along laboriously by selection like a barge in treacle, but as responding extremely sensitively whenever a perceptible selective difference is established. All simple characters, like body size, must be always very near the optimum, so much so that the average body sizes of two alternative genes must be balanced on either side of the optimum, selection always tending to eliminate the rarer because it is further from the optimum...' (Correspondence p.88). In his Introduction to the correspondence, J. H. Bennett draws attention to this letter, and remarks that 'It is interesting, and perhaps needs emphasizing, that both Fisher and Wright considered systems of interacting genes to be of critical importance in evolution. A fundamental difference in their views of the evolutionary process concerned the means by which interaction systems could be exploited' (p.47) While I agree with Bennett that Fisher took some account of 'interaction systems' , in other words epistasis in the broad sense, this letter of 1928 seems a good deal more positive on the subject than anything I have noticed in his published works. I take this opportunity to say that Bennett's Introduction is one of the most useful things yet written on Fisher's work and ideas, and deserves repeated reading.

Note 1

Consider the simplest case of a haploid organism with a quantitative trait determined by genes at two loci. I assume complete genetic determination. Let the alleles in the population be A and a at one locus, and B and b at the other, each with a frequency of 50% in the population. Under random mating the four genotypes AB, Ab, aB and ab will therefore all have the frequency 25%. (In a diploid there would be nine genotypes to consider, and the possible complication of dominance, which is why I have chosen the haploid case.)

Let us suppose that the measurements of the trait for the four genotypes are as follows, where c and d are any numerical values:

AB........c + d

I have chosen these values to dramatise the situation. Intuitively, one would say that all of the variation in the trait was due to the epistatic interaction of A and B, since all other genotypes than AB have the identical value c. So let us see how the variance comes out under the standard method.

The mean value of the trait in the population is evidently .75c + .25(c + d) = c + .25d. The mean values for each gene considered separately, measured by the average value of the individuals who possess that gene, are:

A........ .5(c + d) + .5c = c + .5d
a......... c
B........ .5(c + d) + .5c = c + .5d
b........ c

Expressed as deviations from the population mean, c + .25d, these values come out as:

A........ + .25d
a......... - .25d
B........ + .25d
b........ - .25d

These are known as the 'average effects' of the genes in question.

The so-called 'breeding value' of a genotype is simply the sum of the average effects of its component genes, so for the four genotypes we have the breeding values:

AB.......... + .5d
Ab.......... 0
aB.......... 0
ab.......... - .5d

It may be noted that the combination ab has a substantial (negative) breeding value, even though there is, intuitively, no interaction between a and b. This reflects the fact that the interaction of A and B pulls up the population mean, and therefore affects the deviation values of other alleles and genotypes. The combination ab falls as far below the resulting mean as the combination AB rises above it. The symmetry is of course a consequence of the symmetry of the chosen assumptions about gene frequencies, etc.

The breeding values are already deviations from the population mean, so for the variance of breeding values (the so-called additive genetic variance) we have:

.25(.5d)^2 + .25(0)^2 + .25(0)^2 + .25(.5d)^2 = .125d^2.

It is already apparent that although the variance is intuitively entirely due to epistasis, the 'additive' variance is not zero. For comparison, we can measure the total variance of the values of the genotypes. The deviation values are as follows:

AB.......... c + d - (c + .25d) = .75d
Ab, aB, and ab.......... c - (c + .25d) = - .25d

Taking account of the proportions of the genotypes in the population we therefore have the variance of genotypic values as follows:

.25(.75d)^2 + .75(- .25d)^2 = .1875d^2

Subtracting the 'additive' variance from the total genotypic variance we find only .0625d^2 left for the 'epistatic' variance. So even where we have rigged the example to give a strong influence to epistasis, 2/3 of the resulting variance is 'additive', and only 1/3 'epistatic'!

Note 2: I think that by 'genetic changes' in this sentence Fisher means not just mutations, but any gene substitution, such as may occur through the normal processes of sexual reproduction. So, for example, if at a single locus the combination aa is replaced by the combination Aa, there will be a certain measurable effect of the change. If the effect of substituting two As is twice the effect of substituting just one A, the effect is additive. Otherwise the locus shows some degree of dominance.


D. S. Falconer: Introduction to Quantitative Genetics, 3rd. edn., 1989

R. A Fisher: The Genetical Theory of Natural Selection, 1930. I have given page references to the revised Dover edition of 1958, but the quoted passages are all unchanged from the first edition. For scholarly purposes the best edition is now the Variorum edition of 1999, edited by Henry Bennett.

Fisher's papers are cited from the online copies available from the archives at Adelaide (see link on sidebar)

Natural Selection, Heredity and Eugenics: Including selected correspondence of R. A. Fisher with Leonard Darwin and others, edited by J. H. Bennett (1983). Much of the correspondence is also available online from the archives at Adelaide.

Epistasis and the Evolutionary Process, ed. J. B. Wolf, E. D. Brodie, and M. J. Wade. 2000

William B. Provine: Sewall Wright and Evolutionary Biology, 1986. (Paperback edn. 1989)

Labels: ,

Thursday, July 03, 2008

Notes on Sewall Wright: Migration   posted by DavidB @ 7/03/2008 10:47:00 AM

Continuing my series of notes on Sewall Wright's population genetics, I come to the subject of migration. This is important in understanding the differences between Wright and R. A. Fisher on the role of genetic drift in evolution. Fisher and Wright both agreed that genetic drift would be too weak a process to be of evolutionary significance in large populations (above, say, 10,000 in effective size) . [Note 1] Equally, they agreed that it would be important in small populations, provided these remained sufficiently isolated over sufficiently long periods of time. Their disagreement was over the probability that the necessary degree of isolation would occur. This depends largely on the rate of migration between populations.

Fisher's views on the subject can be pieced together from scattered remarks, as I attempted here. It seems that from an early stage - at least from his 1921 review of the 'Hagedoorn Effect' - Fisher regarded small isolated populations as unimportant in evolution. If they stayed isolated for long, they would go extinct from occasional adverse conditions (epidemic disease, drought, etc). If they did not stay isolated, the flow of migrants from outside (whether in a steady small trickle, or occasional larger floods) would be sufficient to prevent their gene frequencies from drifting far from those of the general population of their species. But so far as I know, Fisher never made any formal quantitative estimate of the amount of migration necessary to offset genetic drift.

Sewall Wright, on the other hand, did make such estimates, and developed them in published works from 1931 onwards. It is known that a first draft of Wright's major 1931 paper on 'Evolution in Mendelian Populations') was written as long ago as 1925. In this he already took the view that genetic drift in small semi-isolated populations was an important evolutionary factor. This might suggest that by that time he had already considered the role of migration in depth. The draft of 1925 has not survived (Provine p. 237), but it seems that in fact it did not yet contain a detailed treatment of migration. The evidence for this is from Wright's correspondence with Fisher in 1929. Wright told Fisher that 'since I wrote [in August 1929, sending a copy of his draft] I have been trying to get a clearer idea of the effect of diffusion [i.e. migration] and I see, at least, that isolation in districts must be much more nearly complete than I realized at first, to permit random fixation of strains' [Provine p.256].

This conclusion is presented more formally in 'Evolution in Mendelian Populations' (at ESP pp.127-9). Here Wright develops an equation for the distribution of gene frequencies which incorporates a term for m, the rate of migration into a small semi-isolated population from a larger population with different gene frequencies. The exact meaning of this equation is difficult to interpret [see Note 2], but Wright's own conclusion is that 'Where m [the migration rate] is less than 1/2N [with N being the effective size of the receiving population] there is a tendency toward chance fixation of one or the other allelomorph [i.e. one of the alleles at a locus where there are two alleles in the population]. Greater migration prevents such fixation. How little interchange appears necessary to hold a large population together may be seen from the consideration that m = 1/2N means an interchange of only one individual every other generation, regardless of the size of the subgroup'.

This conclusion has been widely restated in the population genetics literature. Unfortunately I do not know of any clear and mathematically elementary proof. (John Maynard Smith [p. 158-60] presents a proof using only basic algebra, but it combines the treatment of migration and mutation, and involves various simplifying assumptions and approximations. There are also some confusing misprints or slips of the pen.)

It may be surprising that the rate of migration sufficient to prevent populations drifting apart can be stated as a constant number of migrants, regardless of the size of the population. D. S. Falconer comments that 'This conclusion, which may at first seem paradoxical, may be understood by noting that a smaller population needs a higher rate of immigration than a larger one to be held at the same state of dispersion' [Falconer p.79]. We may put this point slightly more formally by noting that the effect of migration in offsetting drift may be expected to be proportional to the rate of migration. The rate can be expressed as n/N, where n is the number of migrants and N is the effective size of the receiving population. Since the effect of genetic drift has previously been shown to be proportional to 1/2N, we can therefore expect the migration rate required to neutralise drift to be n/N = k/2N, where k is some constant factor of proportionality. But it follows that in equilibrium we will have n = k/2, where k is a constant. Of course, this does not tell us the size of k, but it is plausible that it is of the order of 1, as is proved by Wright and others using more rigorous methods.

The conclusion that only around 1 migrant every other generation is sufficient to prevent sub-populations drifting apart might seem fatal to Wright's belief in the importance of genetic drift. As shown in his correspondence with Fisher, Wright does initially seem to have had his confidence shaken. But Wright (like Fisher) was not one to give up a cherished theory without a struggle. Immediately following the quoted passage from 'Evolution in Mendelian Populations', Wright continues: 'However, this estimate must be qualified by the consideration that the effective N [the population size] of the formula is in general much smaller than the actual size of the population or even than the breeding stock, and by the further consideration that qm ['m' is a subscript, indicating the frequency of the allele among the migrants] of the formula refers to the gene frequency of actual migrants and that a further factor must be included if qm is to refer to the species as a whole. Taking both of these into account, it would appear that an interchange of the order of thousands of individuals per generation between neighboring subgroups of a widely distributed species might well be insufficient to prevent a considerable random drifting apart in their genetic compositions' (ESP p.128).

Wright's first point, that effective N may be lower than the apparent size of the population, is either confused or confusing, since Wright has just proved that N, the effective size of the receiving population, is irrelevant to the number of immigrants required to neutralise drift. Perhaps Wright is thinking of the effective number of migrants, rather than of the receiving population, in which case the number who succeed in contributing to the gene pool may indeed be less than the total number. The second point is valid, but not well explained. Wright's formula contains a term mqm (with the second m a subscript), where qm is the frequency of the relevant allele among the migrants. But the underlying assumption is that this is the same as in the species generally. Wright's point (made more explicitly in later papers) is that the allele frequencies in neighbouring populations are likely to be more similar than in the species generally, so that mqm will actually be less than is assumed in the derivation of the result. To adjust for this we might stipulate that the 'effective' number of migrants is smaller than the actual number, even of those who successfully breed, just as the 'effective' population size may be smaller than the actual size. This approach is clearer in later papers, for example at ESP p.236: 'Cross breeding is, however, most likely to be with neighboring populations which differ but little in value of q. In this case the coefficient m is only a small fraction of the actual amount of change [i.e. the actual observed rate of migration]'. With this adjustment of mqm, the number of actual migrants required to neutralise drift might indeed be many more than 1 per generation.

This is valid as far at it goes, but it depends on the assumption that allele frequencies in neighbouring populations are likely to be relatively similar. This is perfectly plausible, but only because we tacitly assume that migration between neighbouring subpopulations is, or recently has been, sufficient to offset genetic drift. Wright therefore seems perilously close to sawing off the branch he is sitting on. Certainly, if the allele frequencies do drift 'considerably' apart (to use Wright's word in 'Evolution in Mendelian Populations'), the assumption of similar frequencies ceases to apply, and we can no longer rely on it. A further consideration is that on an evolutionary time scale (i.e. hundreds or thousands of generations) occasional larger influxes of migrants are almost bound to occur, and undo all the slow work of genetic drift. Even if an allele is lost or fixed in a subpopulation, it can be reintroduced at any time by migration from outside, so long as it persists somewhere in the species.

Wright continued to study the effect of migration after 1931, with his fullest treatment in the paper 'Isolation by Distance' in 1943 (ESP pp.401-425). Here Wright examines three different models for migration: the Island Model, in which migrants are derived at random from a number of semi-isolated subpopulations of the species, and therefore on average have the gene frequencies of the species as a whole; isolation by distance in a two-dimensional continuum, where the probability of cross-breeding is proportional to the distance between the birthplaces of the breeding individuals; and isolation by distance in a linear range such as a river-bank. Wright's conclusions from the Island Model are not very different from those in his 1931 paper based on the cruder assumption of random migration throughout the species. The conclusions from two-dimensional isolation by distance are only slightly more favourable. As he summarises it in 1943: 'It is apparent that there is a great deal of local differentiation if the random breeding unit is as small as 10, even within a territory the diameter of which is only ten times that of the unit. If the unit has an effective size of 100, differentiation becomes important only at much greater relative distances. If the effective size is 1000, there is only slight differentiation at enormous distances. If it is as large as 10,000 the situation is substantially the same as if there were panmixia [random mating] throughout any conceivable range' (ESP p.411). Only for the more special linear-range model is there substantial differentiation due to drift in populations of moderate size.

Wright's theoretical conclusions might seem to imply that genetic drift in subpopulations would seldom be a major factor in evolution. It seems to require rather special circumstances to be effective: either very small populations, populations sparsely scattered with long distances between them, populations with a narrow linear range, or organisms that are very immobile at all stages of their life cycle. Wright nevertheless continued to insist throughout his career that drift in subpopulations was an important, if not essential, feature of evolution. The uncharitable view of this would be that Wright was simply stubborn. Having taken up his position on the importance of this factor, before having considered in depth the effects of migration, he was determined to defend it. come what may. (There would be a parallel here with the equally stubborn position of Fisher on the evolution of dominance.) A more charitable view would be that Wright was trying to find an explanation of something that was generally accepted by biologists when he began his career: namely, that the observable differences between subspecies, and even between species, are usually selectively neutral. Wright himself stresses this point in 'Evolution in Mendelian Populations': 'It appears, however, that the actual differences among natural geographical races and subspecies are to a large extent of the nonadaptive sort expected from random drifting apart. An interesting example, apparently nonadaptive, is the racial distribution of the 3 allelomorphs which determine human blood groups' (ESP p.128).

In the years and decades following 'Evolution in Mendelian Populations', the opinion of biologists turned away from the consensus view in 1931 (really no more than a superficial assumption) that subspecific differences are selectively neutral. Much of the relevant research was carried out by the students and collaborators of Wright and Fisher themselves, notably E. B. Ford in England and Theodosius Dobzhansky in the USA. The general outcome was that even apparently minor subspecific differences often had some selective value. Human blood groups, for example, were found to be correlated with resistance to different diseases, though it remains unclear whether all such differences have a selective basis.

The importance of genetic drift in subpopulations is of course an empirical matter. It is quite possible that some species are 'Wrightian' and some are 'Fisherian' in this respect. The observed amount of genetic diversity between subpopulations is usually quite modest (Maynard Smith p.160-161], suggesting that migration between them is usually sufficient to prevent them drifting far apart . There are theoretical reasons for expecting that 'Fisherian' species would be in a majority. Most species have adaptations for dispersal at some stage of their life. Plants, for example, have adaptations for spreading their seeds. Among animals, the juveniles of one or both sexes often disperse from their region of birth to find mates or territories. With a few exceptions, organisms that just stick to one spot are doomed to extinction within a fairly short period of evolutionary time, since the conditions of life seldom stay fixed for many generations. Even in species with relatively stable environments, there are theoretical reasons for expecting that a mixture of mobility and immobility would be adaptive (W. D. Hamilton, Narrow Roads of Gene Land, vol. 1, chapter 11). But it remains possible that 'Wrightian' processes are important in some cases. A particularly interesting case is the modern human species itself. After the dispersal of modern humans out of Africa, it is likely that human populations for most of the last 100,000 years were small and scattered, with little migration between different continental groups. These are good conditions for Wrightian genetic drift. Whether the observed differences in gene frequencies between continental populations are due to drift or selection remains an active area of research [see Jobling et al., passim].

Note 1. Neither Wright nor Fisher were very interested in genetic drift among genetic variants that are selectively entirely neutral, as expounded in Kimura's theory of neutral evolution at the molecular level. Fisher died before Kimura published his theory. Wright lived long enough to take account of it, and found it plausible enough with regard to neutral mutations of nucleotides, but considered it of no evolutionary interest (see Provine p.469-77).

Note 2. As I understand it, Wright's conception of the distribution of gene frequencies is broadly is follows. We assume that two populations have evolved separately, and are fixed for different alleles at one or more loci. (For simplicity it is assumed that there are no more than two alleles at each locus.) The two populations are then combined and interbreed freely. Assuming that the populations are of equal size, the frequencies of the alleles at each locus in the combined population will initially all be 50%. The combined population then evolves in isolation. As a result of random genetic drift, the allele frequencies will tend to drift away from 50%. Over a large number of loci (or over a large number of hypothetical populations) we can ask, what is the probability that an allele will have any particular frequency after any specified number of generations? The total of such probabilities over all possible allele frequencies, from 0 to 1, will of course add up to 1, and will have an approximately smooth (continuous) distribution, which (on the given assumptions) will be symmetrical around a frequency of 50%. Initially the probability distribution will be clumped closely around 50%, but as time goes on it will spread out. Eventually, some alleles will begin to be lost or fixed, with a probability of 1/2N per generation. Wright now assumes that beyond a certain number of generations the shape of the probability distribution of frequencies for the remaining alleles will be approximately constant, apart from the continuing occasional loss and fixation of alleles, which will affect all the remaining alleles equally. The problem is to find this constant distribution under various assumptions about mutation, migration, and selection. Much of Wright's work in the 1930s was devoted to this problem. I cannot claim to have followed Wright's derivations in detail, as his explanations are obscure even by his usual standards. The problem is not just that the mathematics is advanced (though it does involve more calculus than in most of Wright's work) but that he makes various simplifying assumptions and approximations which are not self-evidently justified. I can only take it on trust that the conclusions are correct, and that if they were not (as Dobzhansky put it) 'some mathematician would have found it out'.


[Provine] William B. Provine: Sewall Wright and Evolutionary Biology, 1986.

[ESP] Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986.

D. S. Falconer: Introduction to Quantitative Genetics, 3rd edn., 1989.

M. Jobling, M. Hurles, and C. Tyler-Smith: Human Evolutionary Genetics, 2004.

John Maynard Smith: Evolutionary Genetics, 1989.

Labels: ,

Friday, June 06, 2008

Notes on Sewall Wright: Population Size   posted by DavidB @ 6/06/2008 05:30:00 AM

Continuing my series of notes on the work of Sewall Wright, I come to the question of population size. This is important in Wright's formulation of population genetics and his evolutionary theory generally. One of the major differences between Wright and R. A. Fisher is that Fisher believed that, in general, evolutionary processes could be treated as if they took place in a very large random-mating population. He did not believe, contrary to some caricatures, that species were literally random-mating across their entire range (which is obviously false), but rather that there was usually enough migration between different parts of that range that for most purposes the departures from random mating did not matter. Wright, on the other hand, believed that in many cases local populations were sufficiently isolated from each other that they could be treated as populations evolving separately. This difference of views had a major impact on Wright's and Fisher's assessment of the relative importance of selection and genetic drift.

In his treatment of genetic drift Wright showed that in the absence of mutation and migration, genetic diversity, as measured by the proportion of heterozygotes in the population, will decline at a rate of 1/2N per generation, where N is the relevant population size. The larger the size, the slower the loss of diversity. This raises the question what is the 'relevant' size of N. As Wright explained in his great 1931 paper 'Evolution in Mendelian Populations', 'The conception is that of two random samples of gametes, N sperms and N eggs, drawn from the total gametes produced by the generation in question (N/2 males and N/2 females each with a double representation from each series of allelomorphs). Obviously N applies only to the breeding population and not to the total number of individuals of all ages' (p.111, 'Evolution: Selected Papers' (ESP). Unless otherwise stated, all citations are from this source.)

Wright immediately goes on to say that this idealised model of the population is often an oversimplification. The effective size of the population is often different from the current actual number of breeding adults. If the effective size is smaller than the apparent size (the current number of breeding adults), genetic drift will be faster than expected. We may say that the effective size of the population is the size of an idealised population, meeting the criteria outlined in the quotation from p.111 given above, which would give rise to genetic drift at the same rate as actually observed. I am not sure that Wright ever formally defines effective size, but the definition I have suggested seems to be implied in various references, e.g. ESP pp.111, 157, 251, 354.

Wright repeatedly specifies three factors which tend to reduce the effective size of the population below its apparent size:

1) different numbers of breeding males and females (ESP, pp.112, 251, 299, 354, 370). The effective population size is closer to that of the rarer sex.

2) where variance in reproductive success greater than that assumed in the idealised model (ESP pp. 112, 251, 300, 354, 270), genetic drift will be faster.

3) Occasional or cyclical reductions in population size (ESP pp.112, 157, 251, 300, 354, 370). The effect of (non-selective) reductions in population size is to take a random sample out of the gene pool. Such samples will have a variance in gene frequencies proportional to 1/n, where n is the size of the sample. The smaller the number n, the larger the variance due to 'sampling error'. If n is small relative to N (the usual population size), the effect is equivalent to concentrating many generations of slow genetic drift into a single event. In the absence of mutation and selection the effect is irreversible. A subsequent expansion of population, however large, does not reverse the loss of genetic diversity. (But note that if there is mutation and selection, an expansion of population gives an opportunity for rare advantageous mutations to appear and be selected. An expansion of population is also often associated with a relaxation of natural selection, which means that slightly disadvantageous mutations, which would normally be weeded out, may survive. This could help shift the population across a 'valley' in the adaptive landscape, if such things exist).

These three factors all tend to reduce the effective population size below the current observed number of adult males and females. Wright repeatedly claims that the effective size is usually less than the apparent size, for example, 'The effective size (N) of the theory may, however, differ much from the apparent size, being usually much less' (ESP p.251). So far as I know, Wright only once mentions a factor that might increase the effective number above the apparent level: on ESP p.300 he mentions that the variance in reproductive success could be less than in the idealised model, in which case the effective population number could be up to twice the apparent size. But he comments that this improbable except in planned breeding experiments.

So far so good. But so far as I am aware, Wright never mentions another factor which may raise the effective population size above the current number of breeding adults. This is where there is a large reserve of juvenile or dormant individuals with the ability to replace the current adults in the event of a population reduction. Such a reserve population would contain a greater amount of genetic diversity than the reduced number of current adults. This is probably a minor factor in the case of vertebrate animals, but could be important among some small invertebrates, where the number of eggs or larvae may be many times the current 'crop' of adults. It is even more important in the case of plants. Most species of plants produce resistant seeds, bulbs, etc, which are orders of magnitude more numerous than the mature plants. In some cases they can survive for years or decades in a dormant state. The genetic effect of sharp reductions in adult population numbers (e.g. due to drought) may therefore be much less among plants than among animals. This oversight vitiated one of Wright's own major empirical studies (see Provine p.485).

Another major complication is migration. Wright's idealised model of genetic drift assumes that the population is completely self-contained, that is, reproductively isolated from other populations. If the population is an entire biological species, this is true by definition, since a biological species is defined by reproductive isolation. But if the population is a subdivision of a species, there is in principle the possibility that genes will enter the population from outside. My next note will examine how Wright dealt with this complication.

William B. Provine: Sewall Wright and Evolutionary Biology, 1986.

Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986.

Labels: ,

Tuesday, May 20, 2008

Notes on Sewall Wright: Wright's F-statistics   posted by DavidB @ 5/20/2008 04:33:00 AM

Several of my previous notes have touched on the subject of Sewall Wright's F-statistics. The best known of these is FST, which is very widely used as a measure of the genetic divergence between sub-populations of a species. My aim in this note is to trace the evolution of the F-statistics in Wright's work.

Why F?

A preliminary question is one of terminology. What, if anything, does the letter 'F' stand for? One plausible answer is that it stands for 'fixation', since among other things the F-statistics can be used to measure the rate at which alleles tend to be 'fixed'. Wright himself in his later writings sometimes refers to F as an 'index of fixation'.

Plausible though this may be, it does not seem to be the origin of Wright's use of the letter F. This first appeared in his series of papers on 'Systems of Mating' in 1921, where he uses the letter F (in its lower-case form 'f') as a symbol for the 'correlation between uniting gametes' and as a measure of inbreeding. Although the word 'fixation' does occur in these papers, Wright does not say that 'f' stands for 'fixation'. The banal truth seems to be that by the time Wright needed a symbol to represent the correlation between uniting gametes, the letters a to e had already been allocated to other purposes, so that f was the first available letter in the alphabet.

F as correlation between uniting gametes

Wright's primary use of F (or f) is to designate the correlation between uniting gametes. The general idea of a correlation between gametes is now somewhat unfamiliar. If there are varying types of gametes in the population, uniting gametes may be said to be positively correlated if the same types tend to be paired together at mating, or negatively correlated if dissimilar types are paired. If the different alleles at a locus in the population are given notional numerical values, such as 0 and 1, a correlation coefficient for the correlation between pairs of uniting gametes can be calculated in the usual way. (For a fuller explanation see my post on Wright's measurement of kinship.) The resulting correlation coefficient is F.

Heterozygosis and the correlation between gametes

Also in 1921 Wright points out that the correlation between uniting gametes is connected with the proportion of heterozygotes in the population. Whether an individual is heterozygous at a locus is determined by the gametes (egg and sperm) of its parents which unite to form a zygote at fertilization. If they are identical at that locus, the offspring is homozygous, otherwise it is heterozygous. The proportion of heterozygotes (the level of heterozygosis) among the offspring, over and above the level expected with random mating, can be calculated from the correlation between uniting gametes, and vice versa. In SM1 Wright calculates that the percentage of heterozygosis is (1/2)(1 - f), where f is the correlation between uniting gametes. (This is stated without full proof, but I have checked it, calculating the correlation by the method of notional values.) This formula is only valid for the special case where there are two alleles with equal proportions of 1/2 in the population, but Wright soon (in 1922) generalised it to the case of two alleles with proportions of p and q = (1 - p), in which case the formula is 2pq(1 - f). He also began to use upper-case F, rather than f, as his preferred notation.

F as a measure of inbreeding in a population

A positive correlation between uniting gametes can arise in two ways (apart from mere sampling error): by assortative mating between similar phenotypes, or by mating between genetic relatives, in other words by inbreeding. Wright deals with both inbreeding and assortative mating, but gives more attention to inbreeding. If assortative mating is excluded, then F can be used as a measure of the average degree of inbreeding in a population.

If the correlation between gametes is due solely to inbreeding, then the formula 2pq(1 - F) for the percentage of heterozygosis in a population can be given a simple interpretation in terms of Malecot's concept of Identity by Descent. The two genes at a locus in an individual are either Identical by Descent (IBD) from a common ancestor, or they are, by assumption, drawn randomly from the gene pool. In the first case they are certainly identical. In the second case, applying the familiar Hardy-Weinberg formula, they have a probability of (1 - 2pq) of being identical. Therefore if we interpret F as the probability that the two genes are IBD, on average for the population, the total probability that they are identical is F + (1 - F)(1 - 2pq) = 1 - 2pq(1 - F). Subtracting this from 1 to get the probability of heterozygosity we get the required formula 2pq(1 - F).

F and the inbreeding of individuals

The degree of inbreeding in a class of individuals (e.g. all offspring of matings between siblings) can be derived from an analysis of the way in which they are bred. The coefficient of inbreeding then measures the correlation between any pair of alleles at the same locus in an individual belonging to that class.

The level of inbreeding in an offspring can be derived from the correlation between the uniting gametes of its parents, which in turn can be derived from the correlation between the parents themselves, in accordance with Wright's method of path analysis. The full method would involve considerations of dominance, heritability, and so on, but the coefficient of inbreeding is usually derived using a simplified method devised by Wright himself and expounded in several papers of the early 1920s (see especially paper 2 in ESP).

In the simplest case, for the offspring of half-siblings who are not themselves inbred, Wright's formula gives a coefficient of inbreeding of 1/8. This is the same as the figure derived by the methods of Malecot for the probability in this case that the two genes at a locus in the offspring are identical by descent. In Malecot's approach this result is derived from explicit assumptions about probabilities. It is assumed that each gene in an offspring has a probability of 1/2 of coming from either parent, and - very importantly - that there is an independent probability of 1/2 that the same gene is inherited by any other offspring of the same parent. This is an assumption which is usually empirically correct (with certain exceptions such as sex chromosomes), but it is not logically necessary. For example, if surviving offspring came in pairs, each member of which received genes from complementary chromosomes in the parent, such pairs of offspring would have a lower correlation with each other than the usual calculations would suggest.

It is therefore worth asking what features of Wright's approach take the place of the explicit probability assumptions in Malecot's system. The first key assumption, that each gene in an offspring has a probability of 1/2 of coming from either parent, is explicitly stated as a biological assumption (with the exception of sex-linked genes) in Wright's derivation of the path coefficient between offspring and parent. The other key assumption, that there is an independent probability of 1/2 that the same gene is inherited by any other offspring, does not seem to be explicitly stated. In SM1 Wright only directly calculates the correlation between parent and offspring. All other correlations, such as those between siblings, are derived indirectly from the parent-offspring correlation by the method of path analysis. The assumption of independent probabilities for each offspring seems to be built into the general assumptions of path analysis. In a late discussion of the principles of path analysis Wright emphasised that 'The validity of the system requires that any variable that enters into the system as a common factor back of two or more dependent variables, or as an intermediary in a chain, vary as a whole. If one part of a composite variable.... is more significant in one relation than in another, the treatment of the variable as if it were a unit may lead to grossly erroneous results' (EGP vol. 1 p.300). Fortunately, the assumption appears to be consistent with the usual pattern of genetic inheritance. Apart from special cases such as sex-linked genes, or MZ twins, it seems that each surviving offspring has an equal and independent probability of receiving any given allele from the same parent. This is despite the fact that during the formation of gametes the precursor-cells of the gametes are formed in pairs with complementary alleles from different chromosomes in the parent. In the case of eggs, only one of the proto-eggs formed from the same parental cell usually survives. In the case of sperms, so many sperms are produced in total that the chance of two sperms derived from the same parental cell both ending up in surviving offspring is negligible.

F as a measure of inbreeding relative to a foundation stock

One of Wright's original motives in devising his F statistics was to measure the effect of continued inbreeding over a number of generations. In agricultural (and laboratory) practice it is common for animals to be bred systematically over long periods using close relatives, e.g. mating sisters with brothers, or daughters with their fathers. With such practices the level of inbreeding among the offspring rises over the generations, and the level of heterozygosis declines. Wright's F-statistics provide a convenient method of measuring this process, superior to the previous ad hoc methods. The result of a number of generations of inbreeding within an inbred line can be summarised in the average F within that line, relative to the foundation stock (the population from which the inbred line is derived). The cumulative decline of heterozygosis since the inception of the line can then be calculated using the formula 2pq(1 - F). But this should raise questions about the precise meaning of F in such a case. F is in principle always a correlation coefficient, and could if necessary be expressed in terms of the Pearson product-moment formula. This requires the mean and standard deviation of the relevant statistical population to be specified. But what is the mean in the present case? The correlation is said to be 'relative to the foundation stock', so this appears to be the relevant statistical population, but the foundation stock no longer exists, and the correlated pairs are not part of it. So what is going on? Is F a legitimate correlation coefficient at all when more than one generation is involved?

This puzzled me until I paid proper attention to page 169 of SM5. This gives the key to the mystery. Rather than just considering the correlation within a single inbred line, we must consider an indefinitely large (actual or hypothetical) ensemble of lines, all separately inbred according to the same system (e.g. sibling mating) for the same number of generations, and all derived from the same 'foundation stock'. The mean gene frequencies for the entire ensemble (or a large random sample thereof) should then be the same as in the foundation stock (in the absence of selection and mutation), but will vary within each particular inbred line according to the chance variations resulting from the reproductive process. F will therefore measure the average correlation within each such line as compared with the values of the foundation stock. Such a correlation coefficient will usually be hypothetical, since no such ensemble actually exists, but in principle it has a clear meaning consistent with the general method of correlation.

The story so far

The uses of F (or f) identified so far were all first described in Wright's ground-breaking 'Systems of Mating' in 1921. The different uses therefore cannot be put in a chronological sequence. Logically, however, the sequence is as follows:

a) F as the correlation between uniting gametes. This is always the fundamental conception.

b) F as a measure of average inbreeding in a population. In this sense it is closely connected to the level of heterozygosis.

c) F as a measure of inbreeding in an individual. In this sense it is closely connected to the measurement of relatedness.

d) F as a measure of continued inbreeding in a line relative to a foundation stock - see the last paragraph.

F in natural populations

As developed by Wright in 1921, the concept of F was heavily influenced by the circumstances of agricultural stock breeding, where mating is carried out in accordance with some deliberate plan. (Wright was employed in agricultural research for the US Department of Agriculture at the time - see Provine, chapter 4). The next major step was Wright's application of F to the measurement of genetic drift in natural random-mating populations. It is clear from Provine's biography that Wright first took this step around 1925, but the results were not fully published until the major paper on 'Evolution in Mendelian Populations' in 1931.

I have discussed genetic drift in a previous post, and will not repeat that discussion here. The essential point is that in any finite population, over the course of time, there will be a tendency, purely by chance, for some lines of ancestry to be relatively successful, while others dwindle and eventually die out. The result is that, in the absence of selection or mutation, fewer alleles will account for a larger proportion of genes in the population, and the level of heterozygosis will decline.

As a result of genetic drift, F tends to increase at a rate of approximately 1/2N per generation, where N is the size (strictly, the 'effective' size) of the random mating population. But F is still in principle the correlation between uniting gametes. Since the correlation between uniting gametes within a random mating population is zero, how can there be an increasing value of F?

The answer is again that F is a correlation relative to the baseline of a 'foundation stock'. Wright does not, so far as I know, explain what exactly this means in the case of a natural random mating population, but I think we can understand it by analogy with the case of inbred agricultural breeding lines. We are to imagine that from a specified generation onwards a population is allowed to evolve by random genetic drift in a large number of hypothetical different ways. Within each of the resulting hypothetical descendent populations there will be a correlation between uniting gametes relative to the entire ensemble of hypothetical outcomes. The average of these correlations is constantly increasing. It is conceivable that in some cases the actual observed value of F - the correlation between uniting gametes within an actual population relative to that in the foundation stock - would be negative, but the expected average F is always positive.

F in subdivided populations

If a number of subgroups of a population breed within themselves in full or partial isolation from each other, the gene frequencies within them will tend to diverge from each other as a result of selection or genetic drift. Within each such subgroup, individuals will tend to be more similar to each other than to individuals randomly selected from other subgroups or from the entire population. Within the groups, individuals will therefore be positively correlated with each other relative to the entire population.

Wright developed a system of F-statistics to analyse the structure of subdivided populations. This is one of his major contributions to population genetics after the fundamental paper EMP of 1931. The best-known of the F-statistics is FST, where S and T should ideally be subscripts, and stand for 'subpopulation' and 'total population'. The expression FST is possibly first used in a paper of 1950 (ESP p.585), but the underlying concept was first developed in a paper of 1943 on 'Isolation by Distance'. (I will cite this from the reprint in ESP, but it may be available online here. I downloaded it successfully once, but on another occasion got an error message.)

Wright considers a population subdivided into a number of subpopulations of equal size, within which mating is random, and with two alleles at a locus. He shows, by a relatively simple but ingenious proof (ESP p.403), that in this case the correlation between uniting gametes within each subpopulation, relative to the total, is equivalent to Vp/pq, where Vp is the variance of the gene frequencies of the subpopulations (i.e. the mean square of their deviations from the frequency in the total population), and p and q are the frequencies in the total population. In 1943 this correlation is simply called F, but it is in fact the measure later known as FST. Wright recommends that the square root of F could usefully be taken as a measure of the genetic divergence between populations. (Of course, the rank order will be the same whether we take F itself or its square root as the measure.) It may also be noted that Vp/pq cannot be negative, as both the numerator and denominator are necessarily positive or at least zero. In general, a correlation coefficient may be either positive or negative, but in this case F measures the correlation due to the average differences between the gene frequencies of subpopulations, regardless of sign, and these cannot be less than zero.

In the same 1943 paper, and in subsequent papers of the 1940s, Wright developed methods for dealing with correlations within hierarchically subdivided populations, where mating within each division may or may not be random. His terminology varied somewhat, but by 1950 he seems to have settled on the following (with IT, IS, and ST as subscripts):

FIT: inbreeding coefficient of individuals relative to the total population
FIS: inbreeding coefficient of individuals relative to the subpopulation
FST: correlation between random gametes drawn from the subpopulation relative to the total population. (If mating is in fact not random within the subpopulation, this is a hypothetical correlation.)

Wright shows that these measures are related by the equation FST = (FIT - FIS)/(1 - FIS). (For a relatively simple proof see EGP vol. 2 p.294-5, but note that the left square bracket in Equation 12.14 on that page is in the wrong place: it should be immediately before the first occurrence of qT.) It may be seen that if FIS is zero, in other words if mating within subpopulations is random, then FST = FIT. This is as it should be, since in this case the only source of correlation between individuals is the division of the population into subpopulations. FST then accounts for the entirety of the correlation within the total population, which is FIT.

Wright's F-statistics are still widely used or alluded to, but are seldom understood in their original sense as correlation coefficients. Inbreeding within individuals is now usually explained by means of Malecot's Identity by Descent, while FST is usually explained in a way more appropriate to Masatoshi Nei's GST. Wright's work was however clearly the inspiration and foundation for the work of these later geneticists.

A few cautions about the use of FST may be useful.

a) Wright originally intended FST to be calculated as an average over a large number of subpopulations. In theory, it would be possible to calculate it for as few as two subpopulations, in which case, if they are of equal size, FST is d^2/pq, where d is the deviation of the subpopulation frequencies from the frequency in the total population. So far as I know, Wright himself never used it in this way.

b) FST is calculated from gene frequencies on a locus-by-locus basis. It may well vary from one locus to another. To get an indication of the extent of evolutionary divergence between subpopulations, it is desirable to take the average FST over a large number of loci.

c) FST is not simply proportional to the length of time or number of generations that two subpopulations have been diverging. Other factors such as the amount of migration between them and the size of the populations are also relevant. Small populations diverge by genetic drift far more quickly than large ones.

d) Wright intended FST mainly to be used for genes that are not subject to significant natural selection. Genes that are under selection may diverge either more or less in different subpopulations than an average FST would suggest.


William B. Provine: Sewall Wright and Evolutionary Biology, 1986.

Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986. (ESP)

Sewall Wright: Evolution and the genetics of populations, 4 vols., 1968-1978. (EGP)

Labels: ,

Thursday, May 08, 2008

Notes on Sewall Wright: Genetic Drift   posted by DavidB @ 5/08/2008 06:02:00 AM

Continuing my series of notes on the work of Sewall Wright, this one deals with the subject of genetic drift. I had originally planned to call this note 'Inbreeding and the decline of genetic variance', but anyone interested in the matters covered here, and searching for them on the internet, is far more likely to search for 'genetic drift'. This is one of the subjects most closely associated with Wright, to the extent that genetic drift was formerly often known as the 'Sewall Wright Effect'. My main aim is to help people follow Wright's own derivation of his key results, and to clarify the relationship between genetic drift and inbreeding.

I will refer mainly to the papers reprinted in the collection Evolution: Selected Papers, (ESP) and especially the monumental 1931 paper on 'Evolution in Mendelian Populations', which is available online here.
Anyone interested in Wright should also read William B. Provine's biography of him. If in these notes I occasionally make critical remarks on Provine, it should not detract from the general excellence of his book. See the References for details.

In an infinitely large population, in the absence of selection and mutation, the proportions of different gene types (alleles) in the population will remain unchanged indefinitely. But real populations are never infinitely large, and gene frequencies will fluctuate to some extent by chance. As Wright put it in 1931, 'Merely by chance one or the other of the allelomorphs [alleles] may be expected to increase its frequency in a given generation and in time the proportions may drift a long way from the initial values' (ESP, p.107.)

The general nature of drift can be illustrated by the hackneyed example of coin tossing. If we simultaneously toss a number of 'fair' coins, and repeat the trial a large number of times, then the average proportion of heads, by the definition of a fair coin, will be 1/2, and the average number of heads per trial will be N/2, where N is the number of coins in a trial. More generally, suppose the probability of heads for each coin is always p, where p is any fraction between 0 and 1. The long term average number of heads per trial will then be Np. But on any particular trial, purely by chance, the number of heads is likely to deviate from the average. It can be shown that the variance of the number of heads per trial is Npq, where q = 1 - p. [Note 1] If we are interested in the proportion of heads per trial (the number of heads divided by N), it can be shown that the variance of the proportion is pq/N. [Note 2] On each trial, the proportion of coins is therefore likely to deviate from the long term average by a quantity related to pq/N.

Departing now from the real behaviour of coins, let us suppose that the value of p on each trial is determined by the proportion of heads in the previous trial. The proportion of heads will then drift up and down in a 'random walk' pattern, with the size of the 'steps' being inversely related to the size of N. If N is very large, each step will be small, but if N is small the steps may be relatively large. If, by chance, the proportion of heads in a trial ever reaches 1 or 0, then p for all future trials will also be 0 or 1, and heads (or tails) will be permanently 'fixed'. This is very likely to happen sooner or later.

Genes are not coins, so the analogy is not perfect. In a population of genes, the replication of each gene is not a simple matter of 'heads or tails', as each gene may have 0, 1, 2 or more descendants. Also, while the number of coins is assumed to be fixed at N, a biological population is seldom absolutely fixed in size. Nevertheless, there are important similarities. In the absence of selection, it is a matter of chance whether or not a particular gene enters an egg or sperm and then survives to reproduce again in the next generation. Suppose that there are two alleles, A and B, at each locus, with the frequencies p and q in the population. In the absence of selection and mutation, these will also be the expected frequencies in the next generation. In a population of N diploid individuals, there are 2N genes in the population at each locus. In a stable population there will still be 2N genes in the next generation. We can schematically represent reproduction as a 'trial' consisting of 2N events, each involving the random choice of a gene to enter the new generation, with probabilities of p and q for the 'outcomes' A and B at each choice. The probabilities of obtaining the various possible combinations of A's and B's are then given by the expansion of the binomial (p + q)^2N. Wright himself uses this model of the process on several occasions, e.g. ESP p.289. While this may seem a very artificial way of viewing reproduction, it is not as unrealistic as it seems. Suppose that N diploid individuals each have the same number of offspring, the number being large, and certainly large enough to ensure that there are at least 2N copies of each allele among the population of offspring. Then select N of the offspring as 'survivors', completely at random, which is analogous to survival in a resource-limited population without natural selection. The probability of the various possible gene frequencies will then be approximately as in the schematic model (with the complication that in a finite population of offspring the probability of selecting an offspring with a given allele will be affected by the number already selected, e.g. if nearly all the alleles of a given type have, by chance, already been selected, the probability of selecting another one will be much reduced).

Nothing has so far been said about inbreeding. Moreover, the processes just described would apply not only to sexually reproducing organisms but also to asexually reproducing organisms and genetic elements, such as mitochondria and Y chromosomes, where the possibility of inbreeding does not arise. But in Wright's treatment of the subject, references to inbreeding are frequent, and the rate of genetic drift is derived by an argument which seems to depend on the existence of inbreeding. For example, on p.165 of ESP he says: 'If the population is not indefinitely large, another factor must be taken into account: the effects of accidents of sampling among those that survive and become parents in each generation and among the germ cells of these, in other words, the effects of inbreeding'. Such statements are likely to give the impression that inbreeding is fundamental to the process of genetic drift. How can this be?

The explanation is that in a sexually reproducing population a convenient measure of genetic drift is the changing proportion of homozygotes, and the existence of homozygotes is related to inbreeding. If a given allele has ultimately arisen from a single mutation, then homozygous copies of that allele can only occur in the same individual if that individual is descended from the same ancestor by at least two paths, which is by definition inbreeding. Even if the allele has more than one origin, the level of inbreeding in the population will affect the level of homozygosis. But as the example of asexual organisms shows, there is no necessary connection between genetic drift and inbreeding. R. A. Fisher, in his different approach to the subject, does not (I think) ever refer to inbreeding. Confusing the two things would be like confusing the study of heat with the study of thermometers.

It may therefore be wondered why Sewall Wright took his particular approach. The answer may be partly that his mathematical training was less advanced than Fisher's, so that he was obliged to use less mathematically sophisticated methods. This has the advantage that his work on the subject is in principle accessible to a wider range of readers. Moreover, on one important point Wright's methods got the correct result where Fisher, through neglecting a quantity which turned out not to be negligible, got the wrong result by a factor of 2 (as Wright never tired of pointing out). But I think the main reason for Wright's approach was that he first investigated genetic drift in the context of agricultural breeding, where livestock are often closely inbred. In this context one of the main concerns is to quantify the loss of genetic variation in each particular inbred strain. It was therefore natural for Wright to approach the subject by measuring the loss of heterozygosis associated with inbreeding. When he later turned to consider genetic drift in natural populations, where mating is approximately random, he continued to use the methods he had already devised for the study of inbreeding in agriculture. (I will not now explore the precise meaning of Wright's coefficients of inbreeding (the famous F-statistics) which I hope to deal with in another note.)

Wright's most important finding was that heterozygosis (the proportion of heterozygotes in the population) tends to decline at a rate of 1/2N per generation, where N is the diploid population size. (This assumes that males and females each have a population size of N/2.) Most textbooks give a simplified version of Wright's derivation of this result. Wright's own treatment, in EMP, is difficult to follow, and in view of its importance I have provided a guide in Note 3 below.

Even the simplified textbook versions are not always very clear, and I do not know of any wholly satisfactory account. Key assumptions are often not clearly stated or justified. Two relatively good accounts are those of Falconer and Maynard Smith (see Refs.) I will outline a derivation based mainly on Falconer (with some modifications).

Let us assume there is a population of N diploid individuals. Generations are separate. There is no mutation or natural selection in the period under consideration. The n'th generation is designated Gn, the previous generation by Gn-1, the following generation by Gn+1, and so on. The probability that the two genes at the same locus in an individual of Gn are identical is designated CIn, where CI stands for 'coefficient of inbreeding'. (For my approach here it is not necessary to specify whether the genes are identical 'by descent'.) The probability that two randomly selected genes at the same locus in two different individuals of Gn are identical is designated CKn, where CK stands for 'coefficient of kinship'.

For the simplest case, consider a population of hermaphrodites which are capable of self-fertilisation and mate completely at random, including with themselves. (This would be approximately true of some marine invertebrates which release gametes into the water.) From the assumptions of random mating and non-selection it follows that any individual in Gn is equally likely, with probability 2/N, to be a parent of any individual in Gn+1 (since in a stable population each individual will have on average have 2 out of the N surviving offspring). It does not follow that, if we select at random an individual in Gn+1, and then select another, there is a probability of 2/N that the second individual will have the same father (or mother) as the first. For example, if each individual in Gn produced exactly 2 surviving offspring, the probability that a second randomly selected individual in Gn+1 had the same father (or mother) as the first would only be 1/(N-1). To get a probability of 2/N we require an additional assumption, which is technically satisfied by specifying that the number of offspring for individuals follows a Poisson distribution. (This assumption is mentioned by Maynard Smith but not by Falconer.)

With these assumptions, it follows that CIn equals CKn. In the case of CIn, we select a gene at random in Gn, and then inquire whether the other gene at the same locus in that individual is identical. In the case of CKn, we select a gene at random in Gn, and then inquire whether another randomly selected gene at the same locus in a different randomly selected individual is identical to the first gene. But in both cases each gene is a copy of a gene taken absolutely at random from all the genes in Gn-1. The probabilities of identity are therefore the same, and CIn therefore equals CKn. By the same argument it follows that any two randomly selected distinct genes at a locus in Gn have the same probability of being identical, whether they are in the same or different individuals. If we call this probability CDn, we have CDn = CIn = CKn, for any value of n. But CIn can be broken down into two component probabilities. With probability 1/2N, the two genes at a locus in the same individual are copies of the very same gene in Gn-1, in which case they are certainly identical. In all other cases, therefore with probability 1-1/2N, they are copies of two distinct genes in Gn-1, in which case there is a probability CDn-1 that they are identical. But CDn-1 = CIn-1 (since the equality CDn = CIn applies for any value of n). The total probability CIn therefore comes to CIn = 1/2N + (1 - 1/2N)CIn-1. The coefficient of inbreeding in one generation is therefore derivable from the coefficient in the previous generation by a formula involving the addition of 1/2N. It can further be shown, with a little algebraic manipulation, that heterozygosis tends to decline by a factor of (1 - 1/2N) per generation (see Falconer p.64-5 for a proof).

If self-fertilisation is excluded, two genes in the same individual cannot be copies of the very same gene in the previous generation, so the analysis needs to be pushed further back. If mating between different individuals is completely random, including siblings, then CIn = CKn-1. If mating between siblings is excluded, but otherwise random, CIn = CKn-2, and so on. But it is always possible to express the 'coefficient of inbreeding' in one generation in terms of the coefficients in previous generations, and heterozygosis always tends to decline by a factor of (1 - 1/2N) per generation (assuming equal numbers of males and females).

The above argument, like Wright's own, measures the progress of genetic drift by the decline of heterozygosis and the associated increase in the coefficient of inbreeding. It should however be clear that this is not essential. If we wanted to study genetic drift in asexual haploid replicators, such as Y chromosomes, it would be possible to modify the derivation to use only coefficients of kinship, rather than inbreeding. More fundamentally, the process of genetic drift depends not on inbreeding but on the existence of variance in reproductive success. Some genes have no descendants, some have only one, and some have more than one. Over the course of time, more and more lines of descent die out, and the surviving genes are collectively descended from fewer and fewer original ancestors. Ina sexually reproducing population this also leads to increased levels of inbreeding, in a broad sense. If there were no such variance in reproductive success - if every gene had exactly the same number of surviving 'offspring' - there would be no genetic drift. Among diploids, the variance in replication of individual genes is due to two factors: the variance in the number of surviving offspring, and the random allocation of genes to gametes in the process of meiosis. Even if every diploid individual had exactly the same number of surviving offspring, there would still be variance in the replication of individual genes for the second reason. As for the variance in the number of offspring, the assumption of a Poisson distribution is probably not unreasonable in many species, but there could be departures from it in both directions (i.e. either greater or smaller variance). There might also be different variance in the two sexes. For example, among animals like Elephant Seals, the variance among females might be rather small, because all females have a low but steady rate of reproduction, whereas among males the variance would be much higher, as many males have no offspring at all, while a few have a large number. Wright takes account of some of these factors in his discussions of 'effective population size',

This note has only dealt with a few aspects of Wright's work on genetic drift. I have tried to identify the underlying assumptions and (in Note 3) to clarify Wright's most important derivation. None of this says anything one way or the other about the actual importance of genetic drift in evolution. What should be clear is that genetic drift is a weak force except in very small populations, since its effect is inversely proportional to population size. In large populations it would be overpowered by modest rates of selection or migration. (The other factor to consider is mutation, but except in large populations this is an even weaker force than drift, as mutation rates are typically of the order of only 1/100,000 per generation.) I hope to deal with some of these issues in further notes.

Note 1: Suppose we toss a single coin K times, where K is a large number. If the probability of heads is p, the total number of heads will be Kp and the average number of heads per toss will be Kp/K = p. But on each particular trial (the toss of a single coin) there can only be 1 or 0 heads, so we will have Kp trials with the deviation value (1 - p), and K(1 - p) trials with the deviation value (0 - p) = - p. Using the abbreviation q for (1 - p), the variance of the number of heads for trials consisting of a single coin toss is therefore [Kpq^2 + Kqp^2]/K = pq^2 + qp^2 = pq(q + p) = pq. It may seem odd to speak of the variance of the number of heads in trials where there is only one coin per trial, but in principle it is legitimate, and it enables us easily to derive the variance of the number of heads where the trials involve N coins. Since the variance of the sum of a number of independent numerical values equals the sum of the variances of the values individually, the variance of the number of heads in N independent coin tosses, each with variance pq, is simply Npq.

Note 2: The average proportion of heads per trial of N coin tosses, each with probability p, is in the long term p. If X is the number of heads in any particular trial of N coins (where X is a variable), the deviation values of the proportions will be of the form X/N - p = (X - Np)/N, and the variance of the proportions in K trials will be S[(X - Np)/N]^2]/K. But S[(X - Np)/]^2]/K is the variance of the number of heads, which has been proved equal to Npq, so the variance of the proportion is Npq/N^2 = pq/N.

Note 3: This is a commentary on pages 108-110 of ESP, which reprints pages 107-109 of the original paper EMP (the near identity of pagination is just a coincidence). I will mainly be concerned with page 109 of ESP, where Wright derives his fundamental results for the decline of heterozygosis. In following the derivation it is necessary to refer back frequently to the definitions at the bottom of page 108.

Wright assumes that the sexes are separate (so there is no self-fertilisation) but that mating is otherwise completely random, including between siblings. He assumes that there are Nm breeding males and Nf breeding females. With random mating, he states that the proportion of matings between full siblings is 1/NmNf. This evidently assumes that there is a probability of 1/Nm that two mates have the same father, and an independent probability of 1/Nf that they have the same mother (note that m and f stand for male and female, not mother and father). This is actually a strong assumption, which ought to be clearly stated. It assumes (a) that the number of offspring of individuals follows a Poisson distribution (or something similar) and (b) that parents have male and female offspring in the same proportions as in the population generally. This is not necessarily true: for example if some parents had a strong bias towards producing male or female offspring, the probability of mating between siblings would be reduced. (Wright does discuss some of these considerations in the section on 'The Population Number' at pp.111-12 of ESP.)

Wright then gives the proportion of matings between half siblings, and between all less closely related individuals. These depend on the same assumptions as for full siblings.

He then gives a formula for M, the correlation between mates in the current generation. Note that the formula is of the form a'^2b'^2[Z], where Z is a complicated expression in square brackets. From the definitions on p.108 we have a'^2b'^2 = [1/2(1 + F')][(1 + F'')/2], so we have M = [1/2(1 + F')][(1 + F'')/2][Z]. The expression Z can be derived by Wright's method of path analysis. The first component of Z deals with the case of mating between full siblings. If we label the siblings A and B, and their parents C and D, we have two 'direct' paths, ACB and ADB, and two 'indirect' paths, ACDB and ADCB, which involve the correlation M' between mates in the previous generation. Hence the coefficient (2 + 2M') for the first component. For half siblings A and B, there is one shared parent C and two non-shared parents D and E, so there is one direct path, ACB, and the three indirect paths ADCB, ADEB, and ACEB, giving the coefficient (1 + 3M'). For unrelated mates A and B, with the non-shared parents C, D, E and G (to avoid using F, which is already in use), we have no direct paths and four indirect paths, ACGB, ACEB, ADEB, and ADGB, giving the coefficient 4M'.

Next Wright derives an expression for F, the correlation between uniting gametes in the current generation. Here we must note from p.108 that F = b^2M, and b^2 = (1 + F')/2. Using the expression M = [1/2(1 + F')][(1 + F'')/2][Z], we therefore have F = [(1 + F')/2][1/2(1 + F')][(1 + F'')/2][Z] = [(1 + F'')/8][Z]. With a little manipulation, and using the full expression for Z, this can be put in the form F = (1 + F'')[Nm + Nf - M'Nm - M'Nf + 4F'NmNf]/8NmNf . But now we should note that M' is the correlation between mates in the previous generation. We can therefore adapt the equation F = b^2M to get the corresponding equation for the previous generation, i.e. F' = b'^2M'. But b'^2 = (1 + F'')/2, so F' = [(1 + F'')/2]M', and therefore M' = 2F'/(1 + F''). Substituting 2F'/(1 + F'') for M' in the equation F = (1 + F'')[Nm + Nf - M'Nm - M'Nf + 4F'NmNf]/8NmNf, it follows by some grinding but essentially routine algebra that F = Q, where Q is the expression on the right of the second equation on page 109. Then using the definition of P, P', etc, in terms of F, F', etc, the third equation also follows by routine algebra.

This leaves the final death-defying leap to the fourth equation. This is not helped by the puzzling statement that we can equate P/P' to P/P''. This would imply that the proportional change per generation was not just constant but zero, and P/P'' must surely be a misprint for P'/P''. (The fact that this horrible error is not corrected or commented on in the ESP reprint leaves me wondering how closely Provine, as editor, has followed the details of Wright's text.) But even with this correction, it is far from obvious how Wright derives his fourth equation. I had given up hope of solving it until I was reading volume 2 of EGP, and found a discussion of the simpler case of random mating hermaphrodites, which fills in a few gaps in the derivation (see EGP vol 2, p.194-5). First, it confirms the suspicion that P/P'' should be P'/P''. Second it shows (or at least hints) how the problem can be reduced to a quadratic equation. Taking these hints, we can apply them to the fourth equation on p.109. First, rearrange and simplify the third equation to get P - P'[1 - (Nm + Nf)/4NmNf] - P''(Nm - Nf)/8NmNf = 0. Then divide through by P'' to get P/P'' - (P'/P'')[1 - (Nm + Nf)/4NmNf] - (Nm - Nf)/8NmNf = 0. But by assumption P/P' = P'/P'', so P/P'' = (P'/P'')^2 = (P/P')^2. We can therefore treat the equation as a quadratic of the form ax^2 + bx + c = 0, with x = P/P'. This can be solved by the standard method to get (as the larger of the two roots) P/P' = (1/2)[1 - (Nm + Nf)/4NmNf)] + (1/2)[root(1 + [(Nm + Nf)/4NmNf]^2)]. This is nearly Wright's fourth equation. For the final step, we take deltaP to mean P - P', so that - deltaP/P' = - (P/P' - 1). We therefore need only subtract 1 from the expression (1/2)[1 - (Nm + Nf)/4NmNf)] + (1/2)[root(1 + [(Nm + Nf)/4NmNf]^2)], and then reverse the sign, to get Wright's fourth equation.

After this tortuous derivation, the discussion on page 110 of ESP is relatively plain sailing. The only slight puzzle is how Wright gets the approximation at the top of the page. I deduce that he uses the fact that when a is a small fraction, root(1 + a) is approximately equal to 1 + a/2. Taking [(Nm + Nf)/4NmNf]^2 as a, and grinding through the algebra, Wright's approximation can then be verified.

Overall, as often with Wright's work, I am torn between admiration for his ingenuity and frustration at his obscurity.


D. S. Falconer: Introduction to Quantitative Genetics, 3rd edn., 1989. (The 4th edn., by Falconer and Mackay (1995) appears to be the same so far as its treatment of genetic drift is concerned.)

John Maynard Smith: Evolutionary Genetics, 1989.

William B. Provine: Sewall Wright and Evolutionary Biology, 1986.

Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986.

Sewall Wright: 'Evolution in Mendelian Populations', Genetics, 16, 1931, pp.97-159. (Reprinted at pp.98-160 of ESP.)

Sewall Wright: Evolution and the genetics of populations, 4 vols., 1968-1978.

Labels: ,

Friday, April 11, 2008

Notes on Sewall Wright: the Measurement of Kinship   posted by DavidB @ 4/11/2008 03:27:00 AM

Most people with an interest in genetics will be aware that Sewall Wright made major contributions to the theory of kinship or relatedness. Fewer people will have any direct knowledge of his work on the subject, and those who do consult his writings may find them difficult. The present note is intended to help those who want to tackle Wright at first hand. See also this evaluation by the geneticist W. G. Hill.

Most of Wright's key ideas on the subject were first presented in a 5-part paper on 'Systems of Mating' (SM) in 1921. All 5 parts can be found on the internet with a little searching. SM1, which is the most fundamental, is here, and SM5, which contains a relatively un-technical summary, is here.

Rather than go straight to Wright's own approach, I will begin by comparing and contrasting it with that of the French geneticist Gustave Malecot, based on the concept of Identity by Descent. Malecot first introduced his methods around 1940, and since then they have supplanted Wright's approach, to the extent that Wright's own methods have been almost forgotten. What is presented in textbooks as due to Wright is often in reality due to Malecot. The two approaches do have some similarities, and in simple cases they lead to the same quantitative results, but there are also some important differences.

Malecot and Identity by Descent

In Malecot's system two genes at the same locus, in the same or different individuals, are defined as Identical by Descent (IBD) if they are both descended from the very same individual ancestral gene, without either of them undergoing mutation in the interim. The relatedness between two individuals can be measured, roughly speaking, by calculating the probability that two genes at the same locus in the two individuals are IBD. To do this it is necessary first to identify all the distinct paths of descent connecting the two individuals through a common ancestor, and then to calculate the probability that the same gene will have descended to both individuals from that ancestor along any given path. Since all such paths of descent are mutually exclusive (though portions of them may overlap), the resulting probabilities can be added together to give the total probability that a given gene in the two individuals is IBD. To take a simple case, consider two individuals (full siblings) who have both parents in common. I assume that the parents are not related to each other or inbred. If we select a (diploid autosomal) gene at random from one sibling, there is a probability of one-half that it comes from the mother, and, if it does, a probability of one-half that the same gene has descended from the mother to the other sibling. This gives a compound probability of one-quarter that the second sibling has received a gene from the mother that is IBD to the selected gene in the first sibling. There is likewise a probability of one-quarter that the second sibling has received an IBD copy from the father. The total probability is therefore one-half, which is often called the Coefficient of Relationship or Relatedness between full siblings. If the parents are themselves related or inbred (i.e. descended from one of their own ancestors by more than one possible path), additional paths of descent need to be taken into account. Since there are two genes at the relevant locus in the second sibling, there is a probability of one-quarter (one-half times one-half) that a particular one of these genes, chosen at random, is IBD to the selected gene in the first sibling. This is usually known as their Coefficient of Kinship. If a male and female with a non-zero Coefficient of Kinship mate together, there is a non-zero probability that any offspring will inherit two genes that are IBD to each other. This is usually known as the offspring's Coefficient of Inbreeding, and a little consideration shows that it is equal to the Coefficient of Kinship of the parents.

A point left vague in some accounts is how far back the paths of ancestry can or should be traced. There would be little point in tracing them back so far that the gene would probably have mutated along the way to one or both descendants, but with a mutation rate of only about 1 in 100,000 per generation this is not a major constraint. In practice, ancestry is seldom traced back beyond five or six generations, as the probability of Identity by Descent along any given path going beyond than this is very small (less than 1 in 1,000), and the aggregate probability along all such paths will usually be much the same for all individuals in the same population.

Wright and the Correlation between Relatives

None of this is directly due to Sewall Wright. He does uses path diagrams similar to those of Malecot (who was inspired by Wright's work), but the quantities measured along the paths are not probabilities of Identity by Descent but path coefficients. As discussed in my note on Wright's method of Path Analysis, the correlation between two variables can be derived from the path coefficients along the paths connecting them. The measures of relationship between two individuals in Wright's system are always in principle correlation coefficients. In simple cases (no inbreeding, no dominance, no assortative mating, and so on) they are quantitatively the same as Malecot's measures, but in principle they are quite different. Three important differences should be emphasised:

a) like all correlation coefficients, Wright's measures of relationship are valid only relative to a specified statistical population. The coefficient of relationship between two individuals may well vary according to the specified population; e.g. it may be different if the specified population is an ethnic group to which the individuals belong as compared with a population comprising several ethnic groups.

b) unlike probabilities, which are always positive, a correlation coefficient can be either positive or negative. In fact, although Wright seldom discusses negative relationships, within any specified population they are in principle as common as positive relationships.

c) relative to any specified population, the correlation between two randomly selected individuals from that population is zero (apart from sampling error). This point has sometimes been overlooked, for example in discussions of Hamilton's Rule. The 'r' in Hamilton's Rule should be a regression coefficient rather than a correlation coefficient (as Hamilton realised around 1970 - see Narrow Roads of Gene Land, vol. 1, p.179), but the same principle applies: the regression of one randomly selected individual on another randomly selected individual, relative to the population from which they are randomly selected, is approximately zero. Hamilton's Rule therefore predicts that altruistic behaviour will not be directed randomly towards all members of the relevant population, though it may be difficult to decide which population is 'relevant' for the purpose.

I emphasise these points partly because Wright himself does not. They are implicit in the use of correlation coefficients, but Wright seldom explicitly mentions them. An exception is in SM5, where Wright points out that the correlation between relatives within an inbred line will be small although relative to the wider population it is large. Some more general statements are made in Wright's late work on Evolution and the Genetics of Populations (EGP). In volume 2 of that work (1969) he says that 'In a panmictic [randomly mating] population, there is no correlation between homologous genes of uniting gametes relative to the gene frequencies in the whole population. On splitting up into small lines which breed within themselves, a correlation between uniting gametes is to be expected.... The relativity referred to above has sometimes been overlooked or misinterpreted. A correlation coefficient is, of course, always relative. It is a property of the population as well as the two variables....' (pp.175-77.) Wright goes on to discuss Malecot's method of Identity by Descent. He accepts that it is a useful technique and often leads to the same results as his own, but argues that his own approach is more general and in particular that his own concept of relationship allows for negative values.

Wright is often vague about the population in which the correlations are to be measured, leaving this to be inferred from the context. Sometimes the relevant population is the entire generation to which the correlated individuals belong, sometimes it is a defined sub-population, but sometimes it seems to be a 'foundation stock' from which they are descended. This is problematic, as it seems to require a correlation between individuals relative to the means and standard deviations in a population to which they do not themselves belong. I will discuss this further in dealing with Wright's work on inbreeding and genetic diversity.

Correlations between notional values

Wright was not the first person to work on the correlation between relatives. Unknown to Wright, R. A. Fisher had already treated the subject at length, by different methods, in 1918. In fact, the subject goes back at least to 1904, when Karl Pearson considered the correlations to be expected on the hypothesis of Mendelian dominant inheritance. He found that (on certain simplified assumptions) the correlation between parent and offspring would be only one-third, rather than the correlation of about one-half usually found in empirical data on human traits. Pearson considered this a serious objection to the generality of the Mendelian theory. One of the aims of Fisher's 1918 paper was to show that, when complications such as assortative mating were taken into account, the data were consistent with widespread Mendelian dominance.

The idea of a correlation between relatives is intelligible enough when the correlation involves continuous phenotypic traits such as height, but it is more obscure when the traits are purely qualitative, or when the correlation is not between phenotypes but between gametes or genotypes. If there are varying types of gametes or genotypes (e.g. different alleles at a locus) in the population, they may be said to be positively associated if the same types tend to occur together, more often than would be expected by chance, in the same individual or in certain pairs of individuals. There are several useful measures of the 'association' of qualitative variables (see any edition of G. U. Yule's Introduction to the Theory of Statistics). However, Wright (like his predecessors) preferred to use the Pearson product-moment correlation coefficient. To obtain a Pearson correlation coefficient in the case of purely qualitative variables, such as differences between alleles, it is necessary to give the correlated items notional algebraic or numerical values. Since these are to some extent arbitrary, it might be feared that this would introduce an arbitrary element into the results, but in the cases of interest the arbitrary values cancel out and leave the correlation coefficient itself unaffected.

The procedure can be illustrated by the problem of dominance, which is treated by Wright in SM1, page 117-8. If we assign the homozygotes AA and aa the arbitrary values 1 and 0 respectively, in the case of complete dominance of A, the heterozygote Aa will have the value 1, while in the case of zero dominance it will have the value 1/2. Each individual in the population will therefore have a pair of numerical values, under the assumptions of dominance and non-dominance respectively. For homozygotes the two values will be the same but for heterozygotes they will be different. If the frequencies of the various genotypes in the population are specified, the means and standard deviations of the numerical values can be calculated, and the covariance and the correlation coefficient between the pairs of values can then be derived in the usual way. The correlation coefficient will be unaffected if one or both variables are systematically multiplied by or added to a constant (see Notes on Correlation, Part 2). But this entails that we would get the same correlation if we chose any other set of arbitrary values as alternatives to 0 and 1, provided the value of the heterozygote in the absence of dominance is half-way between that of the homozygotes. We can therefore obtain a quite general result for the correlation between the values of genotypes with and without dominance. (Of course, correlations could be calculated in a similar way on different assumptions about dominance, e.g. for partial dominance.) It can be shown by this method that Wright's results at the bottom of page 117 are correct, though I do not see how Wright derived his particular formulae, which are far from obvious. [As I have mentioned elsewhere, the equation p = root-uv appears to be a printing error or slip of the pen, as under Hardy-Weinberg equilibrium it should be p = 2root-uv. In fact, I now find that this error was listed in the printed Corrigenda to the relevant volume of Genetics but has not been corrected in the pdf copy.]

Systems of Mating I
I will conclude this note with some comments on Wright's most important paper on the subject: the first in the series on Systems of Mating (SM1).

Here Wright uses his method of path analysis to derive the correlation between relatives. In principle the ultimate result is a correlation between phenotypes, which should take account of all environmental and genetic influences, including dominance, epistasis, assortative mating, and shared environment (if any).

While the method of path analysis has some advantages for this purpose, which Wright emphasised, it also has some disadvantages. The variability among individuals is partly due to the chance effects of genetic recombination and segregation. It is therefore necessary for the path diagrams to contain an independent variable designated as 'chance' (see the diagram in SM1, p.116), which may be formally justified but still looks odd. More importantly, the method of path analysis assumes that the effects of causal influences can be simply added together. In genetics this is not always the case, as the effects of epistasis and dominance are not purely additive. Wright therefore excludes epistasis from his model 'for the present' (p.117). He does attempt to incorporate an adjustment for the effects of dominance, but this is not entirely successful. For the time being I will assume that the method is confined to the additive effects of genes.

It is not always clear what is the relevant population for the purposes of the correlations, especially as more than one generation of individuals are often involved in the correlations. Wright seems to assume (see the beginning of SM4) that in the absence of selection the proportions of different alleles in the total population will be constant, but in a finite population this cannot be strictly true, as there will be fluctuation due to genetic drift. Perhaps Wright is assuming for the purpose that the population can be regarded as indefinitely large. In this case it is legitimate to assume that gene frequencies in the absence of selection are constant. More seriously, it is not clear whether the intended reference population is the current population of each generation, the 'foundation stock' from which they are descended, or some combination of the two. Wright's reference to 'random mating' at the top of page 119 of SM1 would not make much sense if the intended reference population is the current one (of the parents), since f' would then always be zero.

Each path of descent is built up from the links between parent and offspring, so this relationship is especially important. In Wright's analysis (page 118-20) the direct relationship between parent and offspring can be analysed as a path with the following steps: parent's phenotype - parent's genotype - gamete (egg or sperm) - offspring's genotype - offspring's phenotype. (If the offspring's two parents have a non-zero correlation, an indirect path via the other parent also needs to be considered.) The path coefficients along the direct path from parent to offspring can be represented in the form hbah, where h represents the correlations between the phenotypes and genotypes of the parent and offspring (which may be different). The correlation coefficient can be considered a measure of broad heritability, that is, the extent to which the individual's phenotype is determined by the genotype. Its square, h^2, measures the proportion of phenotypic variance accounted for by genetic variance. This is historically the origin of the familiar use of h^2 to represent heritability. It should however be noted that Wright's usage is not quite the same as the modern one. In modern usage h^2 usually stands for narrow or additive heritability, measured by the extent to which the offspring predictably resemble the parents. Wright's h^2 is closer to the modern concept of broad heritability, as it measures the extent to which the phenotype of an individual is determined by its genotype. The key equation (p.116) is h^2 + d^2 + e^2 = 1, where h stands for all aspects of genetic heredity, and e and d stand for predictable effects of the environment and random fluctuations in development.

The coefficients a and b are the path coefficients representing, respectively, the contribution of the gamete (egg or sperm) to the variance in the genotype of the offspring, and the contribution of the parental genotype to the variance in the gametes. As none of these entities have a measurable phenotypic value, it is necessary to assume that they have arbitrary algebraic or numerical values, in the way discussed above. Wright's derivation of the values of a and b (SM1, pp.118-19) is particularly important, and needs to be carefully studied. Unfortunately it is not easy to follow. I would offer two tips. First, it is essential to refer frequently to the path diagram on page 116, without which the derivation would be unintelligible. Second, Wright does not explain why pG.H'' = rG.H'', which is crucial to the validity of the proof. I think it follows from the fact that the only causal path from the parental genotype to the gamete is the direct path pG.H''. [Added: having written this, I am pleased to find that Wright gives this explanation in another article.]

It should be noted that if the parents are unrelated and not inbred, a and b are both equal to root-1/2, so the product ab along the path from parent to offspring in this case equals one-half, as in Malecot's method.

It may perhaps be felt that Wright's derivation of the path coefficient b is a trick with smoke and mirrors. It is mathematically valid, but Wright's claim that 'in a sense, it is legitimate to reverse the arrows....' invites the response that in another sense it is not legitimate, since there is no causal influence from the gametes back to the gametocyte. This part of the proof therefore goes against the spirit if not the mathematical letter of path analysis.

At the top of page 120 Wright explains, very terseley, how correlations between relatives can be derived from the path coefficients. Again, it should be noted that in simple cases, and with perfect additive heritability, the results are the same as Malecot's. Wright then attempts to take account of dominance. As noted above, on page 117-8 of SM1 Wright gives formulae for the correlation between genotypic values with and without dominance. In the standard case of random mating the correlation comes out at root-1/1+p, where p is the proportion of heterozygotes in the population. To adjust the correlations between relatives to allow for dominance, Wright multiplies them by 1/1+p. He does not explain the logic behind this, but I think it is that each of the two correlated relatives has a genotypic value without dominance, which is the basis for the original correlation, and that these values can each be multiplied by root-1/1+p to give a typical adjusted correlation between the values with dominance. The effect is to reduce the correlation between the individuals by the factor 1/1+p. It may perhaps be wondered why only the two individuals at each end of the chain, and not the intermediate individuals, have their values adjusted. I think the explanation is that dominance is essentially an effect on phenotypes rather than genotypes, and in calculating the correlation between the individuals at the ends of the chain we need not take account of dominance effects on intermediate phenotypes any more than we need take account of environmental effects on them, since these do not affect the path coefficients along the chain.

Unfortunately Wright discovered, after reading Fisher's 1918 paper, that except in the case of half-siblings his own treatment of dominance effects was invalid, and in a footnote to his famous 1931 paper on 'Evolution in Mendelian Populations' he withdrew it. His original method therefore never satisfactorily covered epistasis and dominance. He later attempted to incorporate a revised treatment of dominance in his method of path analysis, but the result was very complicated. [See EGP vol 2., p435-6.] In this area Fisher's Analysis of Variance has been more generally used. The method of path diagrams remains very useful for the analysis of relationships, but the paths are now usually interpreted in Malecot's fashion as probabilities of Identity by Descent, and not as correlations.

The Problem of Negative and Zero Correlations
I emphasised earlier that in Wright's system the correlations between relatives, and therefore the measures of relatedness, can be zero or even negative. Yet it seems that Wright's actual procedures for measuring relatedness, by tracing path coefficients back through common ancestors, can only produce positive figures. For example, suppose that on average two randomly chosen members of a population have a degree of relatedness, measured by Identity of Descent within, say, the last thousand years, equivalent to that of full first cousins, i.e. a Malecot Coefficient of Relationship of one-eighth. On the face of it, if we trace back the paths of descent using Wright's methods, and work out the path coefficients, assuming complete additive heritability, the result will be a correlation of one-eighth, numerically equivalent to the Malecot coefficient. But the correlation coefficient between randomly selected members of a population, relative to that population as a whole, must be approximately zero. We therefore seem to have a contradiction.

It took me a while to see how this paradox can be resolved. I think the main explanation [see Note] is that in the usual applications of Wright's methods there is a tacit assumption that only the paths leading through common ancestors need be taken into account. All other paths can be regarded merely as background noise. For example, if we trace the paths between two full first cousins, we need only take into account the paths leading through the two grandparents they have in common, and not the other four grandparents, unless some of these lead back to other common ancestors in the fairly recent past. Ordinarily this is a reasonable approach, but it breaks down if it is is applied to the kind of case referred to in the last paragraph. If we trace back the entire ancestry of two randomly chosen individuals, for some large number of generations, the ancestors will have a mixture of positively and negative correlations between them. The positive and negative correlations will (approximately) cancel out. In a complete path analysis all these correlations would need to be taken into account, even if they do not involve a direct path through a common ancestor. When properly interpreted, Wright's methods therefore do not lead to a contradiction.

I had originally planned to go on to consider the extension of Wright's measures of kinship to the relations between populations, such as his well-known FST statistic. But the post is already long, so I will reserve the subject for another time.

Note: I say the main explanation , because the effect of common ancestry itself may also be reduced when we take account of negative correlations. For example, in the case of cousins with two common grandparents, these two grandparents may be negatively correlated, in which case the indirect path running through both of them would have a negative value. Or a common ancestor might have a negative coefficient of inbreeding (i.e. be less inbred than average for the population), which would reduce the path coefficient from parental genotype to gamete. But as far as I can see, these factors would never be sufficient to offset the positive correlations due to common ancestry entirely. It is therefore also necessary to take account of negative correlations between non-common ancestors.

Labels: , ,

Saturday, March 15, 2008

Notes on Sewall Wright: Path Analysis   posted by DavidB @ 3/15/2008 05:46:00 AM

A long time ago I said I was planning a series of posts on the work of Sewall Wright. I am finally getting round to it.

I originally planned to write notes on the following topics:

1. The measurement of kinship.

2. Inbreeding and the decline of genetic variance.

3. Population size and migration.

4. The adaptive landscape.

5. The shifting balance theory of evolution.

I still hope to cover these topics, but I will begin with a few notes on Wright's method of Path Analysis.....

Path Analysis is Wright's main contribution to statistical theory. It is one of several methods of multivariate analysis developed between 1900 and 1930, after the basic theory of multivariate correlation and regression had been established by Karl Pearson and others in the 1890s. Other types of multivariate analysis include Factor Analysis, pioneered by the psychologist Charles Spearman in 1904; Principal Component Analysis, developed by H. Hotelling in the 1920s but foreshadowed by Karl Pearson in 1901; and Analysis of Variance, due mainly to R. A. Fisher from 1918 onwards.

A bibliography of Wright's main work on Path Analysis is available here.
The three most useful items are:

1. Correlation and causation (1921)
2. The theory of path coefficients: reply to Niles's criticism (1922)
3. The method of path coefficients (1934)

Items 1 and 3 are available as pdf downloads linked to the online bibliography. Item 2 is not, but it is available here. As I mentioned in a previous post, a page is missing from the pdf file of item 1, but fortunately the most important part of the missing page (the definition of path coefficients) is quoted verbatim in item 2.

The distinctive feature of Wright's path analysis is that it introduces questions of causation into the treatment of correlation and regression between variables. Every statistics textbook makes a ritual statement that 'correlation does not imply causation', but in practice there very often is a causal relationship between correlated variables. Path Analysis provides a systematic means of investigating such relationships. As Wright several times emphasised, it does not provide a method of discovering or proving causal relationships, but if these are known or hypothesised to exist on other grounds, Path Analysis can (in principle) help quantify their relative importance.

The following comments are in no way intended as a substitute for reading Wright's own studies, which are essential. I am only aiming to provide supplementary explanations on points which Wright deals with very briefly, and sometimes obscurely. In particular, I want to clarify the relationship between Path Analysis and multivariate correlation and regression. Wright's own attitude on this seems to have changed over time. It seems that initially he was dissatisfied with what he thought of as paradoxes in the existing methods, and wanted to provide a substantially different approach. But in the course of his work he discovered that his own system was more closely related to conventional multiple regression than he had realised, and increasingly he emphasised this relationship.

In Path Analysis the investigator first devises a model, shown in a path diagram, representing the assumed direction of causal relationships among a number of variables. There will be one or more dependent variables, and one or more independent variables which are assumed to influence the former. Some variables may be intermediate links in a chain of causation. The independent variables may be either correlated or uncorrelated with each other.

Each segment of a path in the diagram is assigned a path coefficient which quantitatively measures the strength of the causal influence along that segment. The fundamental problems in understanding Path Analysis are: what exactly are these path coefficients? And how are they to be quantified? I will return to these questions shortly.

Assuming for the moment that all the path coefficients are known, the correlations between the variables can be derived from the path coefficients by a few simple rules. Briefly, the correlation between any two variables is the sum of the products of the path coefficients along each distinct path (or chain of paths) joining the two variables. For this purpose a correlation between two independent variables can be counted as a path between them. The relative importance of the causal influence of an independent variable on any given dependent variable can be measured by the square of the path coefficient between them, which Wright calls a 'coefficient of determination' (possibly the first use of this term).

The rules for operating with path coefficients are explained by Wright reasonably clearly in 'Correlation and causation' and later papers. [Note 1] The real difficulty is to understand the nature of the path coefficients themselves. Wright's verbal explanation is that 'the direct influence along a given path can be measured by the standard deviation remaining in the effect after all other possible paths of influence are eliminated, while variation of the causes back of the given path is kept as great as ever, regardless of their relations to the other variables which have been made constant.' This is defined as 'the standard deviation due to' the independent variable in question. The path coefficient itself is then defined as the ratio of this standard deviation to the total standard deviation of the dependent variable.

This definition is not ideally clear, especially for cases where the independent variables are correlated with each other. Various objections were made in a critique of Wright's theory by Henry Niles. In his 'Reply to Niles' Wright admits that 'the operations suggested by the verbal definitions could not be literally carried out in extreme cases, and the definition is therefore imperfect'. Wright points out, however, that the path coefficient can always be calculated by the methods described in his 1921 paper. In the later paper on 'The method of path coefficients' Wright offers a variant on his original definition which is perhaps a little clearer: 'Each [path coefficient] obviously measures the fraction of the standard deviation of the dependent variable (with the appropriate sign) for which the designated factor is directly responsible, in the sense of the fraction which would be found if this factor varies to the same extent as in the observed data while all others .... are constant'. The problem with both formulations, as Wright was aware, is that in the case of correlated independent variables they seem to require a counterfactual assumption. If all variables other than the dependent and independent variables of interest are held constant, but one or more of those other variables are correlated with both of the variables of interest, then both of the latter variables will have their variability reduced. By insisting that the causative variable of interest retains its full variability, Wright is therefore assuming a counterfactual condition. In order to keep the variability of the causative variable unchanged, Wright says 'the definition of [the standard deviation in X due to M] implies that not only is [the other independent variable] made constant but that there is such a readjustment among the more remote causes .... that [the standard deviation of M] is unchanged ('Correlation and Causation', p.566). What Wright meant by 'readjustment' is unclear to me and, so far as I know, Wright never attempted to explain it. The causal relationships are what they are, and any 'readjustment' sounds like an artificial if not improper procedure.

Rather than make further efforts to decipher Wright's formulations, I think it will be more useful to approach the problem from first principles, drawing on the general theory of correlation and regression as set out in my Notes on Correlation, Parts 1, 2, and 3.

I hope to show that Wright's path coefficients can in fact be derived in a way which avoids the problems of his verbal formulations. I will assume linearity of all relationships. (Wright also in general assumes linearity, but does briefly consider the effects of departures from linearity.) It is presupposed, of course, that items represented by one variable are associated in some way with the items represented by the other variables, e.g. that the height of fathers is associated with the height of sons.

The general idea behind Wright's definition is that variation in one (independent) variable has an effect in producing variation in another (dependent) variable. Since we are assuming linearity, the size of the effect should be simply proportional to the size of the cause. This naturally suggests a connection with statistical regression. The regression of one variable on another measures the average size of the deviation in the dependent variable as a proportion of the associated deviation in the independent variable. In the case of a causal influence, it is therefore reasonable to say that a certain amount of variation in one variable is caused by or 'due to' its regression on the other. The effects caused in this way will have a calculable standard deviation, which can be taken as a measure of the total size of the causal influence.

Case 1
Let us begin with the simplest possible case. Suppose there is one dependent variable, X, and one independent variable, Y. I assume, as usual, that the variables are measured as deviations from their means, in appropriate units (not necessarily the same for both variables). Let the regression coefficient of X on Y be designated bxy. We are assuming that each unit of variation in the items of Y has a simple proportional effect on the corresponding items in X. The proportion must then be equal to bxy, since this is a measure of the proportional mean deviation in X associated with a given deviation in Y. For example, if bxy = .6, then for each deviation of 1 unit in Y there will on average be a deviation of .6 units in X. In general this need not be a causal relationship, but in the present case we are assuming that it is, and that the deviation in X is an effect 'due to' the deviation in Y. The total amount of variation in X that is due to variation in Y will of course depend on the total amount of variation in Y as well as on the regression coefficient. If we designate the standard deviation of Y as sy, the standard deviation in X that can be attributed to the causal influence of Y will be bxy.sy. [Note 2] If the total standard deviation of X is sx, the proportion of the standard deviation of X that is due to Y will therefore be bxy.sy/sx. But this equal to the correlation coefficient between X and Y. We therefore find that in this simple case the path coefficient between X and Y equals the correlation coefficient between them.

Case 2
Turning to a slightly more complex case, let us suppose that Y influences X via an intermediate variable Z, and that Y is uncorrelated with any other variables in the system. Each unit deviation in Y will produce a deviation of bzy in Z, and in turn each unit deviation in Z will produce a deviation of bxz in X. The indirect influence of Y on X through Z will therefore be equal to the product bzy.bxz. Since by assumption there is no other path of influence of Y on X, the product byz.bzx will measure the total influence of each unit deviation of Y on Z. The standard deviation in X due to Y will be byz.bz.sy, which as a proportion of the total standard deviation in X is bzy.bxz.sy/sx. On a little examination it can be seen that this is equal to ryz.rxz, which we may call the compound path coefficient between X and Y. But by the arguments of the previous paragraph, the path coefficient between Z and X will be rxz and that between Y and Z will be ryz. The product of the path coefficients between Y and Z and Z and X is therefore ryz.rxz, which is the same as the compound path coefficient between X and Y. It may also be noted that if the sole influence of Y on X is via Z, the partial correlation coefficient between X and Y given Z should be zero, which implies rxy = ryz.rxz. The compound path coefficient between X and Y is therefore the same as rxy, the bivariate correlation between them.

Case 3
The last conclusion can also be applied to the case of a single independent variable Z which affects two dependent variables X and Y. If Z is the only reason for correlation between X and Y, the partial correlation coefficient between X and Y given Z will be zero, which implies rxy = ryz.rxz. But ryz and rxz are also the path coefficients between Y and Z and Z and X, so the compound path coefficients between X and Y is the same as the correlation between them.

Case 4
Suppose now that we have two dependent variables, X and Y, and two independent variables, A and B, which are uncorrelated with each other. This gives us two 'paths' between X and Y. Each of these paths can be considered as an example of case 3, so that they will each give an estimate for the correlation between X and Y. The problem is, how can the two estimates be combined? Since A and B are uncorrelated, a plausible guess is that the two estimates should simply be added together. This can be proved more rigorously using the formulae for partial correlation, as is done by Wright ('Correlation and Causation', p.565). The argument can easily be extended to cases with more than two independent variables. The result is that if all the independent variables are uncorrelated with each other, the correlation between two variables is equal to the sum of the products of the correlations along all paths connecting the two variables.

Case 5
We have so far assumed that the independent variables are all uncorrelated with each other. Things get more complicated when two or more of the independent variables are correlated (including the case where two 'intermediate' variables lead back to the same independent variable, and are therefore correlated with each other). If we have dependent variables X and Y, and correlated independent variables A and C, the total correlations between X (or Y) and A will be partly attributable to A's correlation with C. [I am avoiding using B to designate variables, as I use it to designate partial regression coefficients.] If we simply added the correlations resulting from the paths X-A-Y and X-C-Y, as in case 4, the correlation between X and Y would be inflated by double-counting, and could well be greater than 1 or less than -1, which is impossible. These considerations suggest that the correlations between a dependent variable and the independent variables cannot in themselves give us the required path coefficients. But this does not tell us what the path coefficients should be, or even guarantee that any suitable measure for the purpose exists.

Drawing on the theory of multiple regression and correlation, as developed in Notes Part 3, an alternative measure does suggest itself. It was pointed out there that the partial regression coefficient of X on Y, given Z, measures the independent contribution of Y to the best estimate of X, when Z is held constant. Surprisingly, the partial regression coefficient can serve a dual purpose. When multiplied by the full deviation of the relevant independent variable, it contributes to the best estimate of the value of the dependent variable as given by a multiple regression equation. When multiplied by the residual deviation of the relevant independent variable, after subtracting the estimate derived from the regression on the other independent variable, it gives the best estimate of the residual deviation of the dependent variable. It is not intuitively obvious that the same coefficient can serve these two different purposes, but it is demonstrably the case. Wright's concern about the restricted variability of the independent variable, and the need to 'readjust the more remote causes', therefore seems unnecessary. If we take the partial regression coefficient Bxa.c, (see Notes Part 3 for this notation) and multiply it by the full deviation value of A, this should itself be a suitable measure of the independent causal influence of A on X, taking account of C. The standard deviation of the effect of A on X will then be (Bxa.c)sa, which as a proportion of the total standard deviation in X will be (Bxa.c)sa/sx. But this is equivalent to the Beta weight of X on A, given C. (See Notes Part 3.) The suggested value for the path coefficient is therefore equal to the relevant Beta weight. If the variables are measured in units of their own standard deviations, as Wright recommends for most purposes, the partial regression coefficients and Beta weights will coincide.

This is the same as Wright's result, but reached via the theory of multiple regression. By Wright's own account, he did not originally take this approach, and was surprised when late in his investigation of the problem he realised the close connection between path coefficients and multiple regression. (See 'Reply to Niles', p.242.) I would suggest that it would be better to explain Path Analysis from the outset as a 'causalised' version of multiple regression.

The other main question in Path Analysis is how to quantify the path coefficients. If all the correlations between the variables in the system are known (or hypothesised), then the path coefficients can be calculated by using in reverse the rules which enable the correlations to be derived from the path coefficients. (This will sometimes require simultaneous equations, but there should be enough equations to determine the unknowns.) If there are gaps in the available information, these may often be filled by imposing the condition that the 'coefficients of determination' for each variable must, if the scheme of causation is complete, collectively account for the total variance of the variable. Wright also often makes use of the principle that the correlation of a variable with itself is 1.

Overall, Wright's method of Path Analysis is a very impressive achievement. It is interesting to note that two of the major methods of multivariate analysis devised in the 20th century were the work of people who were only amateurs in statistics (the other example being Spearman's Factor Analysis).

Despite the scale of Wright's achievement, Path Analysis never seems to have received the same general acceptance as Fisher's Analysis of Variance. For example, it is seldom covered in general textbooks on statistical methods. It seems to have had occasional phases of fashionability in particular fields, notably in sociology, without ever quite becoming part of standard statistical practice. (Incidentally, Wright himself criticised some of the uses it was put to in the social sciences, which can hardly have encouraged would-be practitioners of the method.) Probably one reason for its unpopularity is that Wright's method requires the use of diagrams. Perhaps more important in modern times, it resists reduction to off-the-shelf computerisation. It is impossible to do Path Analysis without a human brain. But it may also be wondered whether Path Analysis has quite justified Wright's hope that it would help clarify causal relationships. Wright himself used it mainly for the narrower purpose of calculating genetic relatedness, where the nature and direction of causal influences is unambiguous. This is seldom the case in other fields. (And even in this field his methods have largely been superseded by Malecot's concept of Identity by Descent, which uses diagrams which look like an application of Path Analysis but are conceptually quite different.) It seems also that Wright was originally motivated by a belief that the existing methods of multiple regression and correlation were inadequate or paradoxical, and needed to be supplemented. But in the process of working out his method, he discovered that it was more closely related to multiple regression than he had realised at the outset. The 'added value' of Path Analysis as compared with other methods may therefore not always justify the extra effort involved in mastering and applying the technique.

Postscript: Since writing this I have found a useful explanation and evaluation of Path Analysis in an article by O. D. Duncan, 'Path Analysis: Sociological Examples', American Journal of Sociology, 72, 1966, 1-16.
Another, more technical, account is given by K. C. Land in 'Principles of Path Analysis', Sociological Methodology, 1, 1969, 3-37.
Both articles are available on JSTOR for those with access.

Note 1: One relatively obscure point is Wright's discussion of the correlation of a variable with itself, which must equal 1. Although Wright discusses this case on several occasions, I do not think he ever gives a path diagram for it, or explains how it would be drawn. I think the best way of doing it would be to insert the self-correlated variable in the diagram twice, perhaps labelled X(1) and X(2).

Note 2: For any given deviation value of Y, the associated deviation value of X will be bxy.Y. The total of the deviations due to Y will be S(bxy.Y), with the summation taken over all values of Y. Since SY is a sum of deviation values, S(bxy.Y) equals zero, but the sum of squares, S(bxy.Y)^2, will in general be non-zero. The standard deviation in X due to Y will be root-[S((bxy.Y)^2)/N]. But root-[S((bxy.Y)^2)/N] = bxy.[(root-SY^2)/N]. The expression in square brackets is the standard deviation of Y, so abbreviating this as sy we have shown that the standard deviation in X due to Y is equal to bxy.sy.

Labels: ,

Tuesday, March 04, 2008

Origins of the British   posted by DavidB @ 3/04/2008 06:44:00 AM

Despite my long-standing interest in Celts and Anglo-Saxons, it took me a long time to get round to reading Stephen Oppenheimer's The origins of the British: a genetic detective story (2006). It is an alarmingly big book, and I had other stuff to do. When I finally read it, I found that appearances were deceptive. The book has a lot of diagrams and appendices, and the print (in the UK edition) is widely spaced, so the main text is not in reality all that long. I recommend it to anyone who is interested in the subject. This does not mean I agree with Oppenheimer's conclusions. I was going to give a summary of his claims, but I find that a webpage here by Geoffrey Sampson already contains an excellent summary, which I gratefully borrow:

Overall, Oppenheimer is making the following claims about British prehistory:
1. If we forget about the very recent (post-Second-World-War) waves of immigration, then wherever we look in the British Isles, most of the ancestral bloodlines of present-day inhabitants go back to people who were already here in the Neolithic period - say, six thousand years ago. The well-known Iron Age and later 'invasions', such as the coming of the Anglo-Saxons, were more like the Norman Conquest - smallish elite groups arrived who sometimes had large cultural impacts, but never amounted to more than a tiny percentage of the subsequent bloodlines in any region.
2. The genetic division between what we think of as the 'Celtic' west and north of the British Isles and the 'English' south-east itself dates back to the Neolithic - it is not the result of late-comers expelling or killing off inhabitants in one part of the territory.
3. The Celts originated not in Central Europe as standardly believed, but in the Spain-Southern France region.
4. When the Celts came to the British Isles, they occupied only the traditionally Celtic western and northern areas; England was never inhabited by Celtic-speakers.
5. The inhabitants of England spoke a Germanic language long before the Romans arrived, and it was this language which evolved in due course into English - the invasions from the Continent at the end of the Roman period did not have much impact on the local language, except for introducing some Scandinavian influence.

Sampson also has some critical comments on Oppenheimer's claims. I will make a few comments of my own below the fold, but it is probably more useful to describe other recent work (mainly archaeological) on the transition from Roman Britain to Anglo-Saxon England ....

So far as Oppenheimer's genetic claims are concerned, the most interesting is point 2 in Sampson's summary. Unlike some authors, Oppenheimer does find a marked genetic difference between England and the 'Celtic fringe'. Whereas this would conventionally be attributed to the impact of Anglo-Saxon and Danish migration in the early middle ages, Oppenheimer believes that it goes back much further, to the Neolithic or even the Mesolithic period. Broadly, he argues that the western parts of the British Isles were originally settled mainly by people from the Iberian peninsula, using the Atlantic sea routes, while the eastern parts (i.e. most of England) were largely settled from across the North Sea, roughly from what is now Belgium. The genetic differences resulting from these different origins have persisted, and have only been marginally affected by subsequent migrations.

Now, I don't greatly care if my ancestors turn out to have been Belgian, so long as I don't have to put mayonnaise on my chips, but I am not yet convinced. In view of the importance and novelty of Oppenheimer's genetic claims, the evidence is presented surprisingly briefly. It does not appear to be based on any detailed peer-reviewed studies, but only on Oppenheimer's own unpublished analysis of haplotype data. Based on this, he considers that the 'English' haplotypes are more closely related to those of the Low Countries than to the 'Iberian' haplotypes which prevail in the 'Celtic fringe', but that the divergences from the Continental types must go back much further in time than the Anglo-Saxon migrations.

For all I know, Oppenheimer may well be right, but I would be cautious about accepting his claims until they are corroborated by other experts. Oppenheimer is not himself a geneticist by training. Neither am I, but then I am not making highly technical genetic claims based on my own research.

The boldest of Oppenheimer's claims is point 5 in Sampson's summary: that the inhabitants of eastern Britain already spoke a Germanic language long before the arrival of the Anglo-Saxons. This does have the advantage of avoiding the problem of explaining how a (supposedly) tiny number of Anglo-Saxons got the Britons to speak Old English. Otherwise, it has nothing to recommend it. There is no direct evidence that a Germanic language was spoken in Britain before the Anglo-Saxons (except for Germanic mercenaries in the Roman army), and there is a reasonable, if not overwhelming, amount of evidence from place names, etc, that a Celtic language was spoken. Most linguists also believe that Old English is far too closely related to the other Germanic languages to have diverged from them as long ago as Oppenheimer claims.

Reading Oppenheimer's book has also encouraged me to catch up on recent historical and archaeological work on the Anglo-Saxon migration period. A selection of studies is in the references below. Since there are very few documents from the period, most of the evidence is archaeological. Unfortunately, archaeologists suffer from an occupational vice of over-interpreting their evidence. It used to be the fashion to attribute every shift of style in pots (and other material remains) to the migration of peoples. Since the 1950s the fashion has swung wildly to the other extreme, and there is a strong prejudice against recognising migration at all. The conclusions that archaeologists draw from the evidence therefore need to be taken with a handful or two of salt.

So far as the archaeological facts are concerned, there seems to be a consensus that the Romano-British economy and society collapsed very completely and quickly soon after the withdrawal of the Roman army and administration early in the 5th century. Towns and villas fell into disuse, coins were no longer minted, and even pottery (that mainstay of archaeology) is very scarce. It is remarkably difficult to find any traces of the indigenous population. As Myres puts it (p.21) 'the sub-Roman Britons of the fifth and sixth centuries appear to have enjoyed - if that is the right word - a culture almost as completely devoid of durable material possessions as any culture can be.' Myres goes on to conclude that there must have been a drastic fall in population. This conclusion has not been so widely accepted. There is one strong piece of evidence against depopulation of rural areas: pollen analysis shows no widespread regeneration of woodland at the time. Across most of England the landscape would revert to woodland in a few decades if not regularly grazed or cultivated, so complete depopulation of large areas seems to be ruled out. (But I wonder how long regeneration of woodland could be prevented by grazing sheep and deer, without much human supervision?)

There has been disagreement about the date of the first significant Germanic migration to Britain. The documentary sources (Gildas, etc.) indicate the mid-5th century for this, but some historians, up to and including Myres, have believed in significant Germanic settlements in the late Roman period. This belief rests largely on the existence of metalwork and pottery from this period which appears to have been manufactured in late Roman Britain but in Germanic styles (described by Myres as 'Romano-Saxon'). This has been interpreted as made by Romano-British craftsmen for Germanic settlers. But recent archaeologists have tended to reject this interpretation, arguing that the supposedly 'Germanic' style was just a widespread late-Roman fashion. If this reinterpretation is correct, then there is little evidence for significant Germanic settlement before the mid-5th century. But the tide of archaeological opinion may turn again.

In reading the archaeological studies I was particularly looking for estimates of population numbers. Various estimates have been made for Roman Britain, but there is very little for the immediate post-Roman period. This is understandable in view of the shortage of evidence. It did however occur to me that it should be possible to form estimates of the relative numbers of Anglo-Saxon and indigenous people, based on the proportions of different types of burials. If all graves can be identified as 'Anglo-Saxon' or 'indigenous', then we can get such an estimate, admittedly subject to distortion by bias in preservation or discovery. Even if not all graves can be definitely identified, we might still get some reasonable outer limits for the estimate based on those that can be definitely identified.

Little work of this kind seems to have been done, but I was pleased to find that a bold attempt has been made in a series of papers by the British-based German archaeologist Heinrich Harke. (The 'a' should have an umlaut, but this will not show properly in all browsers.) Unfortunately two of the key papers are in German, which is not my favourite language, but I hope I have deciphered the gist of them.

5th and 6th century graves are of two kinds: burials and cremation urns. Cremation urns are found mainly in eastern England, and have long been recognised as distinctively Anglo-Saxon. But the majority of graves are burials. The important feature of Harke's work is that he believes he can distinguish reliably, at least in many cases, between Anglo-Saxon and indigenous graves. The main basis for this is the presence or absence of weapons (swords, seaxes, spears and shields) in the graves. Throughout eastern and central England, even as far west as Shropshire, bodies were often buried with weapons. Harke argues that this is usually a good indicator of Anglo-Saxon ethnicity, his main points being:

a) weapons are not found in late Roman or Celtic burials of the period, though they are common in Germanic areas on the Continent;

b) the weapon burials are too common to be confined to people of very high status, so they are not just markers of a social elite;

c) weapons are often found in what appear to be 'family graves', which suggest an inherited ethnic status;

d) the skeletons in weapon graves are on average a few centimetres taller than those in weaponless graves, which is consistent with an ethnic difference between Germanic and British people. The difference cannot be explained by social status, because the taller skeletons are otherwise similar in nutritional history, as shown by growth interruptions, etc.

Harke does not make what seems to me the important point that an invading military aristocracy, facing a risk of rebellion from an oppressed indigenous majority, does not usually let the subject people wander around with weapons! The impression we get from early documents such as the Laws of Ine is that the indigenous people were reduced to a servile status (wealh = Briton = slave), so the Britons in areas ruled by Anglo-Saxons would probably not have weapons. Or if they did have weapons, they would hardly waste them by burying them with the dead.

Based on the assumption that weapon burial indicates Anglo-Saxon ethnicity, Harke attempts some quantitative estimates. He notes that some communities seem to be entirely Anglo-Saxon, including the women, while others were probably only Anglo-Saxon on the male side, others were ethnically mixed (as shown by mixed cemeteries), and others were enclaves of Britons. In southern and eastern England he estimates that the proportion of Anglo-Saxons ranged from a sixth to a quarter, while in northern England it was smaller, at 10 percent or less. Except perhaps in East Anglia (the stronghold of cremation urns) they were nowhere in a majority, but Harke argues that the Anglo-Saxon minority would be large enough, combined with its social and military supremacy, to give it a linguistic and cultural dominance. After the collapse of Roman civilisation, and in the absence of a Celtic alternative in most of England, the indigenous majority would be eager to throw in their lot with the new cultural elite and blend in as completely as possible.

I find this a plausible and appealing scenario, which is consistent with the genetic evidence as interpreted by Weale et al., and which helps explain the otherwise perplexing absence of Celtic vocabulary in the English language. If the Britons were anxious to assimilate to the culturally dominant ethnic group, and 'pass' for Anglo-Saxon, they would avoid giving away their servile origins by using Celtic words. Harke's thesis will however no doubt be controversial. I have not seen much response as yet from the anti-migrationists, but they will doubtless deny that burial rites are reliable ethnic markers. Fashions in burial do change, so it is not impossible that weapon burial would spread as a new imported fashion. But Harke has at least made a constructive and ingenious start by grappling with difficulties which other archaeologists have tended to brush aside.

References (umlauts in German titles are omitted):

C. J. Arnold: Roman Britain to Anglo-Saxon England, 1984
C. J. Arnold: An archaeology of the early Anglo-Saxon kingdoms, 1988
A. S. Esmond Cleary: The ending of Roman Britain, 1989
Neil Faulkner: The decline and fall of Roman Britain, 2000
Heinrich Harke: 'Briten und Angelsachsen in nachromischen England: zum Nachweis der einheimische Bevolkerung in den angelsachsischen Landnahmgebieten', Studien zur Sachsenforschung, 11, 1998, 87-119
Heinrich Harke: 'Sachsische Ethnizitat und archaologische Deutung in fruhmittelalterlichen England', Studien zur Sachsenforschung, 12, 1999, 109-122
Heinrich Harke: ' "Warrior graves?" The background of the Anglo-Saxon burial rite', Past and Present, 126, 1990, 22-43
Heinrich Harke: 'Kings and warriors: population and landscape from post-Roman to Norman Britain', in The peopling of Britain: the shaping of a human landscape, ed. Paul Slack and Ryk Ward, 2002.
Richard Hodges: The Anglo-Saxon achievement: archaeology and the beginnings of English society, 1989
Michael E. Jones: The end of Roman Britain, 1996
Sam Lucy: The Anglo-Saxon way of death: burial rites in early England, 2000
J. N. L. Myres: The English settlements, 1986

Labels: ,

Saturday, February 23, 2008

Group Selection and the Wrinkly Spreader   posted by DavidB @ 2/23/2008 06:28:00 AM

A recent article by D. S. and E. O. Wilson [1] has been acclaimed by some as reviving the fortunes of group selection. It must for a time have been available on the web (since I downloaded a pdf of the published version a month or so ago), but the closest thing I can find at present is this slightly different version submitted to (and presumably rejected by) Science in 2006. [Added: I should perhaps have mentioned that the two Wilsons are not related. No kin selection here!]

As gnxp's resident critic of group selection I feel an obligation to say something about the article, but I find the task dispiriting. Much of the Wilsons' article is a re-working of issues which have been debated many times before. (See e.g. my discussion here.) The debate has been largely about the most useful way of describing and classifying the phenomena, rather than about the biological facts. Hostility to group selectionism is provoked in part by the tendency of its advocates to claim for group selection a range of phenomena that other biologists regard as more usefully described in terms of inclusive fitness (kin selection). This hostility will not be allayed by such prominent assertions as:

During evolution by natural selection, a heritable trait that increases the fitness of others in the group (or the group as a whole) at the expense of the individual possessing the trait will decline in frequency within the group.

If the 'group' contains local concentrations of relatives (as it very often will), or if the trait preferentially affects relatives, this assertion is simply not correct. Did the Wilsons not notice this, or were they deliberately loading the dice against interpretations in terms of kin selection? Another potential confusion of the issues comes later in the article, where the Wilsons discuss insect eusociality. They argue strongly that between-colony selection is important in the evolution of eusocial insects, for example in traits such as nest construction. But whoever doubted it? Once eusociality (specialisation of reproduction) has been established, of course genetic variation and selection will often be between different colonies. The difficult question is how eusociality itself becomes established. The important insights into this have come from inclusive fitness theory, not group selectionism. (See for example chapter 11 of [2].)

Rather than spend more time on arid and abstract theoretical issues, I think it will be more rewarding to focus on a single empirical case, which the Wilsons themselves offer as a good example of the benefits of a multi-level approach. It can therefore serve as a test case of the benefits of that approach. The example I have chosen is the Wrinkly Spreader...

As the Wilsons describe this case,

the "wrinkly spreader" (WS) strain of Pseudomonas fluorescens evolves in response to anoxic conditions in unmixed liquid medium, by producing a cellulosic polymer that forms a mat on the surface. The polymer is expensive to produce, which means that non-producing 'cheaters' have the highest relative fitness within the group. As they spread, the mat deteriorates and eventually sinks to the bottom. WS is maintained in the total population by between-group selection, despite its selective disadvantage within groups, exactly as envisioned by multi-level selection theory.

I have followed up the Wilsons' reference for this case, and then some other citations. [Refs. 3, 4, and 5]

The facts of the WS case (stripped of theoretical baggage) seem to be as follows.

Pseudomonas fluorescens is a rod-shaped flagellated aerobic bacterium. It is found widely in the soil and in fresh water. In nature it is normally found as a single free-moving cell. In laboratory cultures, on the other hand, it often develops mutant strains which stick together rather than living singly. One of these is the Wrinkly Spreader strain, so-called because on slides of nutrient jelly it spreads out in sheets with a distinctive wrinkly appearance. In open containers (e.g. test tubes) of nutrient fluid the WS bacteria form a mat on the surface. Within about 10 days the mat becomes too heavy and sinks to the bottom. If the supply of nutrient is adequate, the process may be repeated, with new WS mats forming and eventually sinking.

Rainey and colleagues have studied the genetics of the WS strain.[3, 4 and 5] They have found that WS bacteria produce an excess of a cellulosic polymer which causes them to stick to each other and to surfaces. A side-effect of this is that they form a scum at the liquid-air interface (I presume this is a surface-tension effect, but the precise mechanism does not matter.) The production of the polymer uses scarce resources, so WS bacteria reproduce more slowly than non-WS bacteria in the same circumstances. However, this is offset by the advantage of being able to colonise the surface layer, with its better access to oxygen.

The description so far assumes that the mats on the surface contain only WS bacteria, usually derived from a single mutant individual. WS bacteria within the mat may however mutate in various ways which stop them overproducing the polymer, so that they revert to the ancestral phenotype. These mutants reproduce more quickly than the WS strain. They therefore tend to spread within the mats. But this weakens the structural integrity of the mats, which causes them to break up and sink more rapidly than the pure WS mats.

So what has this to do with group selection? What are the 'groups', and where is the 'selection'?

I think it will help to divide the cycle into two stages: before and after the emergence of non-WS mutants within the mats. At the beginning of the process, there are only single bacteria. Some of these mutate to the WS form, and literally stick together. Within the broth culture as a whole, WS mutants have lower fitness than the ancestral form, but the mutation gives them characteristics which enable them to predominate in a particular part of the ecosystem, i.e. the surface layer. Rainey et al. describe this as a form of 'cooperation', in which 'cooperation is costly to individuals, but beneficial to the group'. They note that the WS individuals are closely related (since they are descended from the same mutant individual) and describe the trait as spreading by 'kin selection'. This seems to me an unnecessary interpretation. The WS individuals in the surface layer are not sacrificing any fitness for the benefit of other individuals: they are simply using resources in a way that enables them to occupy this part of the environment. In a heterogeneous environment it can be misleading to average fitness over the entire range of sub-environments. For analogy, suppose a species of sheep ranges over a variety of altitudes. At higher altitudes the climate is colder, and the sheep need thicker fleece to live there in the winter. Sheep with mutations causing them to grow thicker fleece may have lower fitness than the average sheep, because it is costly to grow thick fleece, but at high altitudes the thick-fleeced variant may predominate because it is better adapted to that particular environment. Similarly, the WS strain is better-adapted to the surface layer. It is merely a coincidence that the adaptation involves the formation of 'groups'. We could imagine that instead of producing a polymer, and sticking together, the mutants produced little bubbles of gas which enabled them to float at the surface. In this case, no-one would dream of describing the process as either kin or group selection.

There is a more plausible case for appealing to group selection in the later stage of the process, when non-WS individuals have emerged within the WS mats. These individuals obtain the advantage of living in the surface layer without paying the cost. It is therefore reasonable to describe them as 'cheaters' or 'defectors'. They reproduce more rapidly, for a while, but in the longer term destroy the mats, to the detriment of all. According to the Wilsons, 'WS is maintained in the total population by between-group selection, despite its selective disadvantage within groups, exactly as envisioned by multi-level selection theory.' This is one possible interpretation of the facts, but it seems to me to go beyond the evidence presented by Rainey et al. We should note (as the Wilsons do not) that all surface mats collapse within a few days, whether or not they contain defectors. The regeneration of surface mats then depends on the establishment of a new population of WS individuals at the surface. These could emerge either by new mutations from the ancestral form, or from fragments of the collapsing WS mats. (It is not clear from the papers I have seen which of these usually occurs.) Either way, the Wilsons' description is incomplete. It implies that some WS 'groups' (the ones without defectors) survive indefinitely, while others fail. This is not the case. Even if a description in terms of group selection is formally valid, it does not (in my opinion) add much of value to the understanding of the phenomena. And if this is one of the best examples of group selection that its advocates can find, one cannot have much confidence in the others. (And indeed, some of the others, like the Wilsons' reference to the territorial behaviour of female lions, seem even worse. How can anyone sensibly discuss this without mentioning that the lionesses of a pride are usually closely related? [6, p. 37])

This is not to say that an account in terms of group selection will never provide useful insights into evolutionary processes. The evolution of disease organisms such as Myxomatosis seems to be one very plausible example. But the Wilsons' article does not persuade me that group selection, as distinct from inclusive fitness, is more than a minor wrinkle on the face of evolutionary theory.


[1] D. S. and E. O. Wilson: 'Rethinking the Theoretical Foundation of Sociobiology', Quarterly Review of Biology, December 2007, vol. 82. No.4, 327-348.

[2] J. Maynard Smith and E. Szathmary: The Origins of Life: from the birth of life to the origins of language, 1999

[3] P. B. and K. Rainey: 'Evolution of cooperation and conflict in experimental bacterial populations', Nature, 425, 2003, 72-4.

[4 P. B. and K. Rainey: 'Adaptive radiation in a heterogeneous environment', Nature, 394, 1998, 69-72.

[5] A. J. Spiers et al.: 'Adaptive divergence in experimental populations of Pseudomonas fluorescens. I: Genetic and phenotypic bases of Wrinkly Spreader fitness', Genetics, 161, 2002, 33-46.

[6] G. B. Schaller: The Serengeti Lion, 1972.

Labels: , ,

Monday, November 19, 2007

Notes on Correlation: Part 2   posted by DavidB @ 11/19/2007 03:54:00 AM

Part 1 of these notes discussed the general meaning and use of the concepts of correlation and regression. The notes are intended to provide background for other posts I am planning, but if they are of any use as a general introduction to the subject, so much the better.

Part 2 discusses some problems of application and interpretation, such as circumstances that may increase or reduce correlation coefficients. I emphasise that these notes are not aimed at expert statisticians, but at the (possibly mythical) 'intelligent general reader'. I hope however that even statisticians may find a few points of interest to comment on, for example on the subjects of linearity, and the relative usefulness of correlation and regression techniques. Please politely point out any errors.

Apart from questions of interpretation, this Part contains proofs of some of the key theorems of the subject, such as the fact that a correlation coefficient cannot be greater than 1 or less than -1. There is nothing new in these proofs, but I did promise to give them, and personally I find it frustrating when an author just says 'it can be proved that...' without giving a clue how it can be proved. Readers who already know proofs of the main theorems, or are prepared to take them on trust, may prefer to go straight to the section headed 'Changes of Scale'.

Like Part 1, this Part does not deal with questions of sampling error.

Except for a few passing comments, this Part deals only with bivariate correlation and regression. I am aware that some issues, such as linearity, arise equally (if not more seriously) in the multivariate case. Part 3, if and when I get round to it, will deal with the basics of multivariate correlation and regression.


These notes avoid using special mathematical symbols, because Greek letters, subscripts, etc, may not be readable in some browsers, or even if they are readable may not be printable. The notation used will be the same as in Part 1, with the following modifications.

In Part 1, the correlation between x and y was denoted by r_xy, the covariance between x and y by cov_xy, the regression coefficient of x on y by b_xy, and the regression coefficient of y on x by b_yx. Since this Part deals only with the correlation of two variables, there will be no ambiguity if the correlation between x and y is denoted simply by r, and their covariance simply by cov. It is necessary to distinguish between the regression of x on y and the regression of y on x, and the coefficients will be denoted by bxy and byx respectively, without the subscript dashes used in Part 1 . These expressions could admittedly be confused with 'b times x times y', but I will avoid using the sequences bxy or byx in this sense.

As pointed out in Part 1, for theoretical purposes it is often convenient to assume that variables are expressed as deviations from the mean of the raw values. In this Part the variables x and y will stand for deviation values unless otherwise stated.

As previously, S stands for 'sum of', s stands for 'standard deviation of', ^2 stands for 'squared', and # stands for 'square root of'.

The derivation of the coefficients

As noted in Part 1, the Pearson regression of x on y is given by the coefficient Sxy/Sy^2, where x and y are deviation values. This is the formula which minimises the sum of the squares of the 'errors of estimate', in accordance with the Method of Least Squares. As it is the most fundamental theorem of the subject, it is worth giving a proof, using elementary calculus. (The result can be obtained without explicitly using calculus, but the explanation is then rather longer.)

We want to find a linear equation, of the form x = a + by, such that the sum of the squares of the errors of estimate, S(x - a - by)^2, is minimised.

Provided the x and y values are expressed as deviations from their means, the constant a must be zero. (If we use raw values instead of deviation values, a non-zero constant will usually be required.) The sum of squares S(x - a - by)^2 can be expanded as
Sx^2 + Na^2 + b^2(Sy^2) - 2bSxy - 2aSx + 2abSy. But the last two terms vanish, as with deviation values Sx and Sy are both zero. This leaves Na^2 as the only term involving a, and Na^2 has its lowest value (for real values of a) when a = 0. At its minimum value the expression S(x - a - by)^2 therefore reduces to S(x - by)^2.

It remains to find the value of the coefficient b for which S(x - by)^2 is minimised. This expression may be regarded as a function of b, which may be expanded as:

f(b) = Sx^2 + b^2(Sy^2) - 2bSxy

where Sx^2, Sy^2, and Sxy are quantities determined by the data.

Applying the standard techniques of differentiation, the first derivative of f(b), differentiated with respect to b, is 2bSy^2 - 2Sxy. According to the principles of elementary calculus, if the function has a minimum value, its rate of change (first derivative) at that value will be zero, so to find the minimum (if there is one) we can set the condition 2bSy^2 - 2Sxy = 0. Solving this equation for b, we get b = Sxy/Sy^2 as a unique solution. In principle, this could be a maximum or a stationary point rather than a minimum, but it can be confirmed that for values of b either higher or lower than Sxy/Sy^2 the function f(b) has a higher value. Therefore b = Sxy/Sy^2 gives a unique minimum value for the sum of squares, and may be designated as bxy, the required coefficient of the regression of x on y. The best estimate of x, for a given value of y, is x = (bxy)y.

By similar reasoning we can derive Sxy/Sx^2 as the coefficient of the regression of y on x. The correlation coefficient r can then be derived as the mean proportional between the two regression coefficients, or in the Galtonian manner by 'rescaling' the x and y values by dividing them by sx and sy respectively, giving r = Sxy/Nsx.sy.

These formulae use deviation values of x and y. If we prefer to use raw values, the appropriate formulae can be obtained by substitution. Using x and y now to designate raw values, the deviation value of x equals x - M_x, where M_x is the mean of the raw values. Similarly the deviation value of y equals y - M_y. Substituting these expressions for the deviation values of x and y in the above equation x = (bxy)y, we get the formula for raw values x = (bxy)y + M_x - (bxy)M_y. By the same methods we get y = (byx)x + M_y - (byx)M_x. These equations can be represented graphically by straight lines intercepting the axes at points determined by the constants [M_x - (bxy)M_y] and [M_y - (byx)M_x], and with slopes determined by the coefficients bxy and byx.

The range of coefficients

For any positive value of r, expressed in the form Sxy/Nsx.sy, the regression coefficients could range from 0 to infinity, since there is no upper or lower limit on the ratios sx/sy and sy/sx. Similarly, for any negative value of r, the regression coefficients could range from 0 to minus infinity. Unless sx and sy are equal (in which case regression and correlation coincide), one regression coefficient must always be greater and the other less than r. If the regression coefficients are reciprocal to each other (e.g. 2/3 and 3/2), the correlation will be perfect (1 or -1) and there will be a single regression line.

Unlike the regression coefficients, the correlation coefficient r can only range from 1 to - 1. Introductory textbooks often state this without proof, but it is a simple corollary of another fundamental theorem on correlation.

Unless the correlation is perfect (1 or -1), there will be a certain scatter of the observed values of x around the value estimated by the regression of x on y. The coefficient of regression of x on y is Sxy/Sy^2 or r(sx/sy). The estimated values of x for the corresponding values of y are therefore r(sx/sy)y, and the errors of estimate (i.e. the differences between the actual values and the estimated values) will have the form [x - r(sx/sy)y]. But these errors will themselves have a variance, which we may call Ve = [S[x - r(sx/sy)y]^2]/N. [Added: This assumes that the mean value of the errors is zero. Using deviation values of x and y this quite easy to prove, as the mean of the errors is S[x - r(sx/sy)y]/N = (Sx - r(sx/sy)Sy)/N = (0 - 0)/N.] With a little manipulation it can be shown that [S[x - r(sx/sy)y]^2]/N equals (1 - r^2)Vx. [See Note 1.] So we reach the important result that the variance of the errors of estimate of x, as estimated from the regression of x on y, is (1- r^2) times the full variance of x. In other words, the variance of the observed x values around the estimated values is reduced by the proportion r^2 (the square of the correlation coefficient) as compared with the full variance of the x values. It is therefore often said that the correlation of x with y explains or accounts for r^2 of the variance of x. Similarly, it accounts for r^2 of the variance of y. To mark the importance of r^2 it is often known as the coefficient of determination. Since r is a fraction (unless it is 1 or -1), r^2 is smaller than r. The amount of variance explained by r declines more and more rapidly as r itself declines, and a correlation of less than (say) .3 explains very little of the variance. The term 'explained' is to be understood purely in the sense just described, and does not necessarily imply a causal explanation.

The estimated values of x themselves have a variance equal to [S[(bxy)y]^2]/N = [S[r(sx/sy)y]^2]/N = [(Sy^2.r^2)Vx/Vy]/N, which can be simplified to (r^2)Vx. Therefore Vx, the total observed variance of x, can be broken down into two additive components, (r^2)Vx + (1 - r^2)Vx, representing the variance of the estimates themselves and the residual variance not accounted for by the correlation.

The closer the correlation (positive or negative), the more of the variance is 'explained'. If the correlation is perfect (1 or -1) then r^2 = 1 and it 'explains' all the variance of x, since there are no errors of estimation at all. If a correlation could be greater than 1 or less than -1, then the variance of the errors, (1- r^2)Vx, would be negative. But a variance cannot be negative, so the correlation coefficient r cannot be greater than 1 or less than -1.

Changes of Scale

The value of the correlation coefficient is unchanged (except sometimes for a reversal of sign) if all the x values, or all the y values, or both, are added to or multiplied by the same constant. For example, if we add a constant k to all the raw x values, then the mean is also increased by k, so the deviation values, the covariance, and the standard deviation, are all unchanged, and therefore the correlation coefficient r = cov/sx.sy itself is unchanged. If instead of adding k we multiply all the raw x values by k, where k is positive, then the mean, the deviation values, and the covariance are also multiplied by k. But so is the standard deviation, so the factor k cancels out of k.cov/k.sx.sy = r, leaving the correlation coefficient itself unaffected. (If k is negative, the sign of r is reversed, since the covariance changes its sign but the standard deviation does not.) Since each such operation of adding and multiplying (in the manner described) leaves r unchanged, they can be repeated any number of times, and in any order, and still leave r unchanged. This can be useful for practical purposes, because it means that if a correlation coefficient is calculated for any convenient set of x and y values, it will still be valid if we add or multiply by k in the way described. Or we might at first be faced with an inconvenient set of values and then convert them to a more manageable set.

It also means that the value of the correlation coefficient is unaffected by a change of scale in one or both variables, for example by measuring in inches instead of centimetres. A further practical implication is that correlation coefficients may be unaffected, or only slightly affected, even by major changes in the population, provided these affect all members of the population in a similar way. For example, the correlation between the heights of fathers and sons may be unchanged even if the sons grow much taller than the fathers, provided the growth is uniform in absolute or proportionate amount. Another possible example is bias in mental tests. It is sometimes supposed that if test results show the same correlation with some external criterion in two different populations, then the test must be 'unbiased' with respect to those populations. As it stands, this inference is unjustified, because the correlations would be unchanged if all the test scores in one population were arbitrarily raised (or lowered) by the same amount, which would surely be a form of bias.

The effect of changes of scale on regression is somewhat more complicated. If we always measure the variables in deviation units, relative to their current means, then the regression coefficients will not be affected by adding constants to one or both raw variables, since the deviation values, the covariance, and the standard deviations, are all unchanged, as in the case of correlation. This is not in general true if one or both of the variables are multiplied by constants. For example, if we multiply all of the y values by k, then Sxy, which is the numerator in the Pearson regression formula for bxy, will be multiplied by k, but the denominator, Vy, will be multiplied by k^2, so the regression coefficient as a whole will be divided by k. However, the value of the product (bxy)y will be unchanged, since one factor in the product is multiplied and the other divided by k. With deviation values the predicted value of the dependent variable is therefore not affected by a change of scale in the independent variable alone.

If we use the regression formula for raw values, the matter is further complicated. Adding constants to one or both variables will usually affect the 'intercept' of the regression lines with the axes, but not the 'slope', whereas multiplying by a constant is likely to affect both slope and intercept.


The above derivation of the regression and correlation coefficients assumes that the 'best estimates' of x given y, and y given x, can be expressed by equations of the form x = a + by and y = a + bx, which may be graphically represented as straight lines. For this reason they are usually known as coefficients of linear regression and correlation. [See Note 2 for this terminology.]

The question may be asked whether the assumption of linearity is justified, either in general or in any particular case.

If the correlation between the variables is perfect (1 or -1), the regressions will predict the value of the variables without error, and in a graphical representation the points representing the pairs of associated values will all fall exactly on the regression line (which in this case is the same for both variables). Here the description 'linear regression' is obviously justified. But perfect correlation is unusual, and more generally there will be some scatter of values around the regression lines. The usual criterion of linearity, adopted from Karl Pearson onwards, is that for each value (or a narrow range of values) of the independent variable, the mean of the associated values of the dependent variable (the associated 'array' of values) should fall on the regression line. By this criterion, if the mean values of all arrays fall exactly on the regression line, the regression is perfectly linear.

Linear or approximately linear regression, in this sense, is quite common. Notably, it occurs when the distribution of both variables is normal or approximately normal. (Strictly, when the bivariate distribution is normal. The distinction would take too long to explain here.) Francis Galton and Karl Pearson confined their original investigations to this case. Udny Yule extended the treatment of correlation and regression beyond this 'bivariate normal' case, but he considered that linear regression 'is more frequent than might be supposed, and in other cases the means of arrays lie so irregularly, owing to the paucity of the observations, that the real nature of the regression curve is not indicated and a straight line will give as good an approximation as a more elaborate curve'.

Statisticians differ in the importance they attach to linearity. Some say that if there is any significant departure from linearity, then the Pearson regression and correlation formulae are invalid and should not be used. They will give an inefficient estimate which leaves larger 'errors' than would be possible with a more sophisticated approach. Others take a more relaxed view, saying that if the non-linearity is not extreme, a linear regression is a useful approximation. Any non-zero Pearson regression will 'explain' some of the variance in the data, and give a better estimate (on average) than simply taking the mean of the dependent variable. Whether the increase in the 'errors' is a serious problem will depend in part on the purposes of the investigation. If the consequence of error in estimation is a large financial cost, or an injustice to individuals, then it is desirable to seek a more accurate formula.

If the departures from linearity are considered too large, alternatives to simple linear regression may be tried. For example a linear regression may still be obtained if we substitute a suitable function of one or both variables in place of the original values. The best known case (and perhaps the only one commonly arising in practice) is where the logarithms of the original values show a linear regression. This can arise if one of the variables grows or declines at a steady rate of 'compound interest' in relation to the other.

Alternatively, the researcher may try fitting a curve (such as a polynomial curve of the form x = ay + by^2 + cy^3....) to the data instead of a straight line, the aim being to pass the curve through the means of 'arrays' of the dependent variable. But there is no guarantee that any simple curve will give a good fit to the data, or that it will be any more revealing about the underlying relationships of the variables than a straight line. It should also be emphasised that, unlike with linear regression, there will not necessarily be any simple relationship between the regression of x on y and that of y on x. Each non-linear regression curve has to be separately fitted to the data. The regressions of x on y and y on x may be quite different in form.

Having fitted a curve to the data, as a non-linear regression of x on y or y on x, one may calculate how much of the variance in the dependent variable is 'explained' by the regression. But in the non-linear case there is no simple formula for this, and it will not in general be the same for both regressions. Although the term 'non-linear correlation' is sometimes used, one cannot properly speak of the correlation between two variables in the non-linear case.

In some cases a non-linear regression formula may give a good fit to the data but still be of doubtful value. Especially in the social sciences, departures from linearity may be due to lack of homogeneity in the population, for example differences of age, sex, race, class, etc. The relationship between two variables (e.g. educational achievement and IQ) might be linear within each subgroup, but quantitatively different in each such group. The 'best fit' regression line for the whole population would then probably be non-linear, but would depend on the composition of this particular population and have no wider application. Where a population is known to be heterogeneous with respect to the variables of interest, it would be better to disaggregate the data and treat each group separately. Failing that, a straight line regression, which averages out the characteristics of the different groups, may be the most useful single indicator. It is my impression that non-linear regression and correlation are not used much in practice outside the physical sciences, where it is reasonable to expect very precise relationships between variables.

Regression versus correlation?

Regression and correlation are closely related, both mathematically and historically. Some statisticians have however contrasted the roles of regression and correlation, and see one as more useful than the other, or as having different fields of application.

In the time of Karl Pearson and his students the main emphasis was put on the correlation coefficient, which is independent of scale and gives a measure of the extent to which one variable is 'explained' by another. A reaction against this emphasis on the correlation coefficient was led by R. A. Fisher, who said: 'The idea of regression used usually to be introduced in connexion with the theory of correlation, but it is in reality a more general, and a simpler idea; moreover, the regression coefficients are of interest and scientific importance in many classes of data where the correlation coefficient, if used at all, is an artificial concept of no real utility.' (R. A. Fisher, Statistical Methods for Research Workers, 14th edition, 1970, p.129. The quoted passage goes back to the 1920s.) Cyril Burt remarked that 'A correlation coefficient is descriptive solely of the set of figures on which it is based: it cannot profess to measure a physical or objective phenomenon, as a regression coefficient or a covariance may under certain conditions claim to do' (The Factors of the Mind, 1940, p.41). The American statistician John Tukey once joked that he was a member of a 'society for the suppression of correlation coefficients - whose guiding principle is that most correlation coefficients should never be calculated'. More recently, M. G. Bulmer has said: 'It is now recognised that regression techniques are more flexible and can answer a wider range of questions than correlation techniques, which are used less frequently than they once were' (Principles of Statistics, Dover edn., p.209).

This contrast between regression and correlation may seem surprising, as the Pearson coefficients of correlation and regression differ only by a factor of scale, and can be regarded as standardised and unstandardised variants of the same statistic. If we have the information necessary to calculate one of them, we can also calculate the others, since they all involve the covariance of x and y, and the data required for calculating the covariance is sufficient also to determine the coefficients of correlation and regression. But this overlooks the fact that regression coefficients can be estimated from more limited data, without knowing the covariance in the population as a whole. As Fisher pointed out, if we want to know the expected value of x for a given value of y, it is possible to estimate the regression function (whether linear or not) by taking samples of data from a few selected parts of the range of y. Unlike the correlation coefficient, the regression estimate is unaffected by errors in the measurement of x (the dependent variable), provided these go equally in either direction. The correlation coefficient may also vary according to the nature of the sample (such as restriction of range), in ways that do not affect the regression coefficients so strongly. A correlation coefficient cannot be considered 'objective' unless it is based on a random or representative sample of the relevant population. However, provided this condition is met, the correlation seems to be just as much an objective characteristic of the population as the regressions. It may be argued that the regression coefficients are less likely to vary dramatically in moving from one population to another, but one would wish to see empirical evidence for this in any particular field.

The use made of correlation and regression in practice depends on the field of study. Correlation coefficients are still very widely used in psychometrics, where the scale of measurement is often arbitrary and regression coefficients would vary with the choice of scale. In the social sciences, correlation is probably less widely used, whereas regression analysis (usually multivariate regression) is one of the main instruments of research.

Problems of interpretation

Correlation and regression raise various problems of interpretation, some of which are well known, others less so. To list some of the more important ones:

a) Restriction of range
If the x variable, or the y variable, or both, cover only a limited part of the whole population, the correlation will usually be weakened.

b) Aggregation of data
If a correlation is calculated between data that have been aggregated or averaged in some way, e.g. geographically, the correlations will often be higher - sometimes much higher - than if they were calculated at a less aggregated level.

Points (a) and (b) are both discussed in an earlier post here.

c) Correlation due to pooling of heterogeneous groups
If we have two population groups, which have different means for the x and y variables, then if the data from the two groups are combined there will be a correlation between x and y even if there is no correlation within each population group.

d) Correlation due to mathematical relationships
If one of the variables is actually a part of the other (e.g. length of leg as a part of total height), we will naturally expect there to be a correlation between them. Other mathematical relationships between the variables may also give rise to correlations. For example, if the corresponding x and y values are each arrived at by dividing some data by a third variable, which has the same value for the x and y items in each pair but different values for different pairs, then a correlation will arise (sometimes known as 'index correlation') even if the initial data are uncorrelated. Karl Pearson described these as 'spurious correlations', but whether they are really to be regarded as spurious depends on the circumstances.

e) Correlation between trends
If the x and y data represent quantities which vary over time, they will often show some long term trend: a tendency (on the whole) either to increase or decrease. If any two such data sets are paired, with the corresponding x and y items in the same chronological order, they will show a correlation: positive if the two trends are in the same direction, negative if they are in opposite directions. Such correlations can be very high. I once constructed two artificial data series, with 20 items of increasing size in each, and deliberately tried not to make the increases too regular, but still found a correlation between the two series of .99! Such correlations can arise regardless of the nature of the data. For example, there would doubtless be a positive correlation between prices in England from 1550 to 1600 and real incomes in Japan from 1950 to 2000 (paired with each other year by year), because there was a rising trend in both. In this case no-one is likely to suppose that there was a causal connection between the two trends, but in other cases there is a real danger. If the two variables are of such a kind that there plausibly may be a causal connection, and they are observed over the same period in the same place, there is a risk that any correlation will be taken more seriously than it should be. For example, if we measure the consumption of pornography and the incidence of rape in the same decade in the same country, there is likely to be some correlation between them. If it is positive, the puritan will say: 'Aha, pornography causes rape!'. If it is negative, the libertarian will say: 'Aha, pornography provides a safe outlet for sexual urges!' Both conclusions are unjustified, because the mere existence of a correlation between two trends, no matter how strong, is almost worthless as evidence of anything. Yule called these 'nonsense correlations'. He pointed out that in principle a similar problem could arise with geographical trends, such as a north-south gradient, though it was more difficult to find plausible examples. A slightly different case is correlation with wealth or income. Very many traits are correlated with economic prosperity (individual or national), so they are also likely to be correlated with each other. In this case a correlation, even a strong one, between traits is not good evidence of any direct causal connection between them. I would suggest that in the human sciences (psychology, sociology, etc) any very strong correlation (higher than, say, .9) should be viewed with suspicion, and we should examine whether some statistical technicality (such as a grouping effect) is behind it.

f) Correlation and causation
In every textbook the warning is given that 'correlation does not imply causation'. Up to a point this is correct: the examples of index correlations, and of correlations between trends, show that there may be correlations even when there is nothing that we would properly describe as a causal relationship. Unfortunately the textbooks seldom go on to say that correlation usually does imply a causal connection of some kind, even if it is obscure and indirect. The business of the investigator is then to formulate hypotheses to explain the connection, and to find ways of testing them. Sewall Wright's path analysis was designed for this purpose. The main problem arising is how to interpret the relations between more than two variables.

g) Regression towards the mean
The concept of regression also involves a danger of fallacies or paradoxes, which I discussed here.

Note 1: We start with the equation

(1) Ve = [S(x - r.y.sx/sy)^2]/N.

Expanding the expression in square brackets we get:

(2) Ve = (Sx^2 - 2Sxy.r.sx/sy + Sy^2.r^2.Vx/Vy)/N.

But Sx^2 = NVx, also Sxy = Nr.sx.sy, and Sy^2 = NVy, so substituting these expressions where appropriate in equation (2) we get:

Ve = (NVx - 2Nr.sx.sy.r.sx/sy + r^2.NVy.Vx/Vy)/N

= (NVx - 2r^2.NVx + r^2.NVx)/N

= (1 - r^2)Vx.

Note 2: Some confusion has arisen about the meaning of the terms 'linear' and 'non-linear' regression. Traditionally, at least until the 1970s, the term 'linear regression' was confined to cases where the regression equation can be represented graphically by a straight line (or by a plane or hyperplane in the multivariate case). For example: 'If the lines of regression are straight, the regression is said to be linear' (G. Udny Yule and M. Kendall, Introduction to the Theory of Statistics, 14th edition, 1950, p213), and 'When the regression line with which we are concerned is straight, or, in other words, when the regression function is linear.... ' (R. A. Fisher, Statistical Methods for Research Workers, 14th edition, 1970, p131). Many other examples could be cited. Regression that is not linear in this sense was described as 'curvilinear' (Yule, p.213) or 'non-linear' (Yule, p.255). More recently some authors have extended the term 'linear regression' to a wider class of functions, including those previously described as 'curvilinear'. Those who adopt this new usage may even accuse those (probably still the majority) who follow the traditional usage of being in error. One wonders what Fisher would have said.