Sunday, March 25, 2007

On Reading Wright   posted by DavidB @ 3/25/2007 08:49:00 AM

Last year I discussed the views of R. A. Fisher on population size, and said I would later cover Sewall Wright's views on the subject. It has taken me a while to come back to this, as I soon realised that my knowledge of Wright was too superficial for the task. On reading more of Wright's work, I think there are also several other issues worth exploring.

I aim to write notes on the following subjects:

1. The measurement of kinship.

2. Inbreeding and the decline of genetic variance.

3. Population size and migration.

4. The adaptive landscape.

5. The shifting balance theory of evolution.

It will take me some time to actually write these notes, but as a starter here are a few comments on the range of Wright's work and its influence...

Wright's Works

Where possible I will quote from the collection Sewall Wright: Selected Papers on Evolution, edited by William B. Provine, U. Chicago Press, 1986 (SPE). Provine's biography of Wright: Sewall Wright and Evolutionary Biology, 1986, (SWEB) is also useful.

SPE is a large collection, including Wright's classic 1931 paper on 'Evolution in Mendelian Populations', his 1943 paper on 'Isolation by Distance', and most of his other general writings on evolution prior to his 4-volume book on Evolution and the Genetics of Populations (EGP), published from 1968-78. EGP itself is a strange work, which despite its length deals with some important issues far too briefly, while others are pursued in excruciating detail.

There are two important omissions from the collection SPE. For reasons of length it does not contain Wright's 5-part 1921 paper on 'Systems of Mating', which is the foundation of all his later writings. Fortunately, all 5 parts of the paper are available free online: search Google Scholar for 'Sewall Wright', and 'Systems of mating'. Part 1, which contains most of the essential points, is here.

The other major omission from SPE is that there is no substantial piece on Wright's technique of path analysis. This is Wright's major contribution to statistical theory, and he constantly makes use of the technique. Fortunately, many of his papers on path analysis are also available online, and links are provided here. The most useful paper is the 1921 paper on 'Correlation and Causation', but I noticed that a page (p.561) is missing from the pdf file. As the missing page contains Wright's definition of a path coefficient, this is a serious loss. I have therefore consulted the original print version and transcribed the missing part of the definition at Note 1 below.

Wright's influence

As for Wright's influence, he is routinely cited (along with R. A. Fisher and J. B. S. Haldane, in various permutations) as one of the three founding fathers of population genetics. Unlike Fisher and Haldane, Wright's work was largely confined to genetics, with the important exception of path analysis. Within genetics, Wright was probably the most systematic of the three founding fathers, and his active career in the subject was longer. Wright remained active into the 1980s, whereas Fisher and Haldane both died in the early 1960s. On the other hand, Wright's influence may have been limited by the fact that he wrote no book on the subject (or on evolutionary theory more widely) until EGP, the first volume of which appeared in 1968. EGP itself is highly technical and very heavy going. Meanwhile, Haldane's book The Causes of Evolution (1932) was accessible both literally and metaphorically, and Fisher's Genetical Theory of Natural Selection (1930), despite its difficult mathematics, has a great deal of stimulating verbal discussion.

Fisher has a notorious reputation for obscurity, but Wright is hardly any easier to read. The difficulty lies partly in the mathematics and partly in the verbal explanations. Unlike Fisher, Wright seldom uses very advanced mathematics, but his algebra is still difficult to follow. Characteristically, he will set out a few definitions, and then say something like 'it follows that', followed by a complicated formula bearing no obvious relation to what precedes it. Sometimes a few substitutions and rearrangements will produce the desired result, but often (in my experience) repeated attempts leave the mystery unsolved. As an example, consider the equation r = (etc) on p.117 of 'Systems of mating Part 1'. This does not bear any close resemblance to the standard formula for the correlation coefficient, except for being a fraction with a square root in the denominator. By some laborious algebra I have verified that it can be derived from the standard formula, together with Wright's stated assumptions about the value of the quantities to be correlated, but I still have no clue as to how Wright himself obtained his equation. I have also checked by numerical trials that the formula in the next paragraph is correct for the case of Hardy-Weinberg equilibrium, but I have not been able to derive the formula itself algebraically [note 2]. Of course, this may be because I am not a very good mathematician (which is true) but I doubt that most biologists were any better until comparatively recently. Wright's closest colloborator, Theodosius Dobzhansky, once said:

He has a lot of extremely abstruse, in fact almost esoteric, mathematics. Mathematics, incidentally, of a kind which I certainly do not claim to understand. I am not a mathematician at all. My way of reading Sewall Wright's papers, which I still think is perfectly defensible, is to examine the biological assumptions the man is making, and to read the conclusions he arrives at, and hope to goodness that what comes in between is correct. "Papa knows best" is a reasonable assumption, because if the mathematics were incorrect, some mathematician would have found it out' (quoted in SWEB, p.346).

Dobzhansky was far from unique. Among other examples, the geneticist Harrison Hunt complained in a letter to Wright:

I have one very serious criticism, however, to offer to all these papers. I have expended, yes wasted, an enormous amount of time upon them because as a rule too few key equations are given in the mathematical analysis. You have a marked tendency to state your assumptions very briefly and then give the end results of the mathematical analysis without giving a sufficient number of the intermediate steps to make it easy for a non-mathematical person to follow your reasoning. Upon enquiry I have found other geneticists have the same difficulties that I do. (for more examples and Wright's defence see SWEB pp.400-02).

Wright's verbal explanations also present some difficulties. The problem is not with any obscurity of the language itself (unlike Fisher's often convoluted sentences, Wright's are usually short and simple) but in the excessively concise treatment of difficult subjects. Key concepts like those of path coefficient, correlation between gametes, adaptive landscape, and fitness function are introduced in just a few words, and important assumptions are either not stated at all or stated so inconspicuously that they are easily overlooked. This is not just my impression: even W. G. Hill, an admirer of Wright, and an expert geneticist, comments that Wright's methods 'were not then and are still not easy to understand... partly because critical points were dealt with very tersely'.

The difficulty of Wright's work has probably limited its direct influence on biologists. Here it is necessary to distinguish between specialists in population geneticists and other biologists. Wright is the population geneticist's population geneticist. In Crow and Kimura's textbook, for example, Wright gets more references than any other author. Many of Wright's concepts and methods, such as his kinship coefficients, the FST statistic, the effective size of a population, the treatment of migration between subdivided populations, and the proof that genetic diversity declines by approximately 1/2n per generation (where n is the effective size of the population) are part of the indispensable basis of population genetics, perhaps more so than any specific contribution of Fisher or Haldane.

Among general evolutionary biologists, on the other hand, Wright's influence in the last 40 years or so seems to have been limited. This is partly because the general trend in evolutionary biology, at least since George C. Williams's 1966 book Adaptation and Natural Selection, has been strongly adaptationist, whereas Wright has been seen as ambivalent, if not actively hostile, towards the effectiveness of natural selection. (As Provine pointed out in his biography (SWEB pp.289-91), Wright later tended to re-write history, playing down the extent to which his early writings were non-adaptationist.) Differences between Britain and the United States have also affected Wright's influence. In Britain most biologists learned their genetics directly or indirectly either from Fisher (notably E. B. Ford and the school of 'ecological genetics' at Oxford) or from Haldane (John Maynard Smith and his numerous students). W. D. Hamilton, the most influential of all recent theorists, was largely self-taught in genetics, but took Fisher's Genetical Theory of Natural Selection as his main inspiration. In much British writing on evolution Wright is therefore either ignored or presented only as the advocate of 'genetic drift', which seriously distorts his actual position. (An important exception is the strong school of agricultural genetics at Edinburgh, including Hill, Robertson, Falconer and others, where Wright was a visiting professor around 1950. Another partial exception is Julian Huxley, whose influential book Evolution: the Modern Synthesis summarised Wright's theories respectfully and at reasonable length.)

It might be expected that Wright would be a more powerful influence in the United States. His work is doubtless more widely read there than in Britain. But his influence on evolutionary biologists, as distinct from population geneticists, may have been weaker than expected. Major evolutionary theorists like G. G. Simpson, Ernst Mayr, and George C. Williams referred to Wright with respect, but were not enthusiasts for his shifting balance theory. Dobzhansky, Wright's closest collaborator, expounded Wright's theories very fully in his books on evolution, but his empirical studies tended to undermine some of Wright's key ideas. Wright has perhaps been most influential on those theorists, such as Gould, Lewontin, and D. S. Wilson, who may be considered heretics or rebels against the prevailing trend of Fisherian adaptationism.

As for my own assessment, for what little it is worth, in reading Wright I have realised that his achievement was truly massive. At the same time, I find it difficult to work up any great enthusiasm for his writings. This is partly due to the obscurities I have already mentioned, but also to a certain dryness and narrowness of scope. Whereas one can still read Fisher and Haldane and hope to find new insights and speculations, there is relatively little in Wright that cannot be found in more digestible form in a good textbook. Perhaps this is what every scientist should aspire to: to be absorbed into textbook nirvana.

Note 1.
Extract from p.561
"Where there is a network of causes and effects, the interrelations could be grasped best if a coefficient could be assigned to each path in the diagram designed to measure the direct influence along it. The following is an attempt to provide such a coefficient, which may be called a path coefficient.

We will start with the assumption that the direct influence along a given path can be measured by the standard deviation remaining in the effect after all other possible paths of influence are eliminated, while variation of the causes back of the given path is kept as great as ever, regardless of their relations to the other variables which have been made constant. Let X be the dependent variable or effect and A the independent variable or cause. The expression sX.A [where s stands for small sigma, and X and A are printed as subscripts] will be used for the standard deviation of X, which is found under the foregoing conditions, and may be read as the standard deviation of X due to A. In a system in which variation of X is completely determined by A, B, and C we have s.X.A = sACsX [where s stands for small sigma, and all letters except the first and third sigmas and the second X are printed as subscripts] representing the constant factors, B and C, and also the variation of A itself (sA) by subscripts to the left. The path" [/p.562]

Note 2: In the same paragraph the equation p = root,uv is a misprint or slip of some kind, as it should clearly be p = 2.root.uv. (I have checked that the error is in the original printed text, and not just a glitch in the pdf.) Whatever its origin, the error does not appear to affect the remainder of the paper, which uses the correct value.