Saturday, March 15, 2008

Notes on Sewall Wright: Path Analysis   posted by DavidB @ 3/15/2008 05:46:00 AM

A long time ago I said I was planning a series of posts on the work of Sewall Wright. I am finally getting round to it.

I originally planned to write notes on the following topics:

1. The measurement of kinship.

2. Inbreeding and the decline of genetic variance.

3. Population size and migration.

4. The adaptive landscape.

5. The shifting balance theory of evolution.

I still hope to cover these topics, but I will begin with a few notes on Wright's method of Path Analysis.....

Path Analysis is Wright's main contribution to statistical theory. It is one of several methods of multivariate analysis developed between 1900 and 1930, after the basic theory of multivariate correlation and regression had been established by Karl Pearson and others in the 1890s. Other types of multivariate analysis include Factor Analysis, pioneered by the psychologist Charles Spearman in 1904; Principal Component Analysis, developed by H. Hotelling in the 1920s but foreshadowed by Karl Pearson in 1901; and Analysis of Variance, due mainly to R. A. Fisher from 1918 onwards.

A bibliography of Wright's main work on Path Analysis is available here.
The three most useful items are:

1. Correlation and causation (1921)
2. The theory of path coefficients: reply to Niles's criticism (1922)
3. The method of path coefficients (1934)

Items 1 and 3 are available as pdf downloads linked to the online bibliography. Item 2 is not, but it is available here. As I mentioned in a previous post, a page is missing from the pdf file of item 1, but fortunately the most important part of the missing page (the definition of path coefficients) is quoted verbatim in item 2.

The distinctive feature of Wright's path analysis is that it introduces questions of causation into the treatment of correlation and regression between variables. Every statistics textbook makes a ritual statement that 'correlation does not imply causation', but in practice there very often is a causal relationship between correlated variables. Path Analysis provides a systematic means of investigating such relationships. As Wright several times emphasised, it does not provide a method of discovering or proving causal relationships, but if these are known or hypothesised to exist on other grounds, Path Analysis can (in principle) help quantify their relative importance.

The following comments are in no way intended as a substitute for reading Wright's own studies, which are essential. I am only aiming to provide supplementary explanations on points which Wright deals with very briefly, and sometimes obscurely. In particular, I want to clarify the relationship between Path Analysis and multivariate correlation and regression. Wright's own attitude on this seems to have changed over time. It seems that initially he was dissatisfied with what he thought of as paradoxes in the existing methods, and wanted to provide a substantially different approach. But in the course of his work he discovered that his own system was more closely related to conventional multiple regression than he had realised, and increasingly he emphasised this relationship.

In Path Analysis the investigator first devises a model, shown in a path diagram, representing the assumed direction of causal relationships among a number of variables. There will be one or more dependent variables, and one or more independent variables which are assumed to influence the former. Some variables may be intermediate links in a chain of causation. The independent variables may be either correlated or uncorrelated with each other.

Each segment of a path in the diagram is assigned a path coefficient which quantitatively measures the strength of the causal influence along that segment. The fundamental problems in understanding Path Analysis are: what exactly are these path coefficients? And how are they to be quantified? I will return to these questions shortly.

Assuming for the moment that all the path coefficients are known, the correlations between the variables can be derived from the path coefficients by a few simple rules. Briefly, the correlation between any two variables is the sum of the products of the path coefficients along each distinct path (or chain of paths) joining the two variables. For this purpose a correlation between two independent variables can be counted as a path between them. The relative importance of the causal influence of an independent variable on any given dependent variable can be measured by the square of the path coefficient between them, which Wright calls a 'coefficient of determination' (possibly the first use of this term).

The rules for operating with path coefficients are explained by Wright reasonably clearly in 'Correlation and causation' and later papers. [Note 1] The real difficulty is to understand the nature of the path coefficients themselves. Wright's verbal explanation is that 'the direct influence along a given path can be measured by the standard deviation remaining in the effect after all other possible paths of influence are eliminated, while variation of the causes back of the given path is kept as great as ever, regardless of their relations to the other variables which have been made constant.' This is defined as 'the standard deviation due to' the independent variable in question. The path coefficient itself is then defined as the ratio of this standard deviation to the total standard deviation of the dependent variable.

This definition is not ideally clear, especially for cases where the independent variables are correlated with each other. Various objections were made in a critique of Wright's theory by Henry Niles. In his 'Reply to Niles' Wright admits that 'the operations suggested by the verbal definitions could not be literally carried out in extreme cases, and the definition is therefore imperfect'. Wright points out, however, that the path coefficient can always be calculated by the methods described in his 1921 paper. In the later paper on 'The method of path coefficients' Wright offers a variant on his original definition which is perhaps a little clearer: 'Each [path coefficient] obviously measures the fraction of the standard deviation of the dependent variable (with the appropriate sign) for which the designated factor is directly responsible, in the sense of the fraction which would be found if this factor varies to the same extent as in the observed data while all others .... are constant'. The problem with both formulations, as Wright was aware, is that in the case of correlated independent variables they seem to require a counterfactual assumption. If all variables other than the dependent and independent variables of interest are held constant, but one or more of those other variables are correlated with both of the variables of interest, then both of the latter variables will have their variability reduced. By insisting that the causative variable of interest retains its full variability, Wright is therefore assuming a counterfactual condition. In order to keep the variability of the causative variable unchanged, Wright says 'the definition of [the standard deviation in X due to M] implies that not only is [the other independent variable] made constant but that there is such a readjustment among the more remote causes .... that [the standard deviation of M] is unchanged ('Correlation and Causation', p.566). What Wright meant by 'readjustment' is unclear to me and, so far as I know, Wright never attempted to explain it. The causal relationships are what they are, and any 'readjustment' sounds like an artificial if not improper procedure.

Rather than make further efforts to decipher Wright's formulations, I think it will be more useful to approach the problem from first principles, drawing on the general theory of correlation and regression as set out in my Notes on Correlation, Parts 1, 2, and 3.

I hope to show that Wright's path coefficients can in fact be derived in a way which avoids the problems of his verbal formulations. I will assume linearity of all relationships. (Wright also in general assumes linearity, but does briefly consider the effects of departures from linearity.) It is presupposed, of course, that items represented by one variable are associated in some way with the items represented by the other variables, e.g. that the height of fathers is associated with the height of sons.

The general idea behind Wright's definition is that variation in one (independent) variable has an effect in producing variation in another (dependent) variable. Since we are assuming linearity, the size of the effect should be simply proportional to the size of the cause. This naturally suggests a connection with statistical regression. The regression of one variable on another measures the average size of the deviation in the dependent variable as a proportion of the associated deviation in the independent variable. In the case of a causal influence, it is therefore reasonable to say that a certain amount of variation in one variable is caused by or 'due to' its regression on the other. The effects caused in this way will have a calculable standard deviation, which can be taken as a measure of the total size of the causal influence.

Case 1
Let us begin with the simplest possible case. Suppose there is one dependent variable, X, and one independent variable, Y. I assume, as usual, that the variables are measured as deviations from their means, in appropriate units (not necessarily the same for both variables). Let the regression coefficient of X on Y be designated bxy. We are assuming that each unit of variation in the items of Y has a simple proportional effect on the corresponding items in X. The proportion must then be equal to bxy, since this is a measure of the proportional mean deviation in X associated with a given deviation in Y. For example, if bxy = .6, then for each deviation of 1 unit in Y there will on average be a deviation of .6 units in X. In general this need not be a causal relationship, but in the present case we are assuming that it is, and that the deviation in X is an effect 'due to' the deviation in Y. The total amount of variation in X that is due to variation in Y will of course depend on the total amount of variation in Y as well as on the regression coefficient. If we designate the standard deviation of Y as sy, the standard deviation in X that can be attributed to the causal influence of Y will be [Note 2] If the total standard deviation of X is sx, the proportion of the standard deviation of X that is due to Y will therefore be But this equal to the correlation coefficient between X and Y. We therefore find that in this simple case the path coefficient between X and Y equals the correlation coefficient between them.

Case 2
Turning to a slightly more complex case, let us suppose that Y influences X via an intermediate variable Z, and that Y is uncorrelated with any other variables in the system. Each unit deviation in Y will produce a deviation of bzy in Z, and in turn each unit deviation in Z will produce a deviation of bxz in X. The indirect influence of Y on X through Z will therefore be equal to the product bzy.bxz. Since by assumption there is no other path of influence of Y on X, the product byz.bzx will measure the total influence of each unit deviation of Y on Z. The standard deviation in X due to Y will be, which as a proportion of the total standard deviation in X is On a little examination it can be seen that this is equal to ryz.rxz, which we may call the compound path coefficient between X and Y. But by the arguments of the previous paragraph, the path coefficient between Z and X will be rxz and that between Y and Z will be ryz. The product of the path coefficients between Y and Z and Z and X is therefore ryz.rxz, which is the same as the compound path coefficient between X and Y. It may also be noted that if the sole influence of Y on X is via Z, the partial correlation coefficient between X and Y given Z should be zero, which implies rxy = ryz.rxz. The compound path coefficient between X and Y is therefore the same as rxy, the bivariate correlation between them.

Case 3
The last conclusion can also be applied to the case of a single independent variable Z which affects two dependent variables X and Y. If Z is the only reason for correlation between X and Y, the partial correlation coefficient between X and Y given Z will be zero, which implies rxy = ryz.rxz. But ryz and rxz are also the path coefficients between Y and Z and Z and X, so the compound path coefficients between X and Y is the same as the correlation between them.

Case 4
Suppose now that we have two dependent variables, X and Y, and two independent variables, A and B, which are uncorrelated with each other. This gives us two 'paths' between X and Y. Each of these paths can be considered as an example of case 3, so that they will each give an estimate for the correlation between X and Y. The problem is, how can the two estimates be combined? Since A and B are uncorrelated, a plausible guess is that the two estimates should simply be added together. This can be proved more rigorously using the formulae for partial correlation, as is done by Wright ('Correlation and Causation', p.565). The argument can easily be extended to cases with more than two independent variables. The result is that if all the independent variables are uncorrelated with each other, the correlation between two variables is equal to the sum of the products of the correlations along all paths connecting the two variables.

Case 5
We have so far assumed that the independent variables are all uncorrelated with each other. Things get more complicated when two or more of the independent variables are correlated (including the case where two 'intermediate' variables lead back to the same independent variable, and are therefore correlated with each other). If we have dependent variables X and Y, and correlated independent variables A and C, the total correlations between X (or Y) and A will be partly attributable to A's correlation with C. [I am avoiding using B to designate variables, as I use it to designate partial regression coefficients.] If we simply added the correlations resulting from the paths X-A-Y and X-C-Y, as in case 4, the correlation between X and Y would be inflated by double-counting, and could well be greater than 1 or less than -1, which is impossible. These considerations suggest that the correlations between a dependent variable and the independent variables cannot in themselves give us the required path coefficients. But this does not tell us what the path coefficients should be, or even guarantee that any suitable measure for the purpose exists.

Drawing on the theory of multiple regression and correlation, as developed in Notes Part 3, an alternative measure does suggest itself. It was pointed out there that the partial regression coefficient of X on Y, given Z, measures the independent contribution of Y to the best estimate of X, when Z is held constant. Surprisingly, the partial regression coefficient can serve a dual purpose. When multiplied by the full deviation of the relevant independent variable, it contributes to the best estimate of the value of the dependent variable as given by a multiple regression equation. When multiplied by the residual deviation of the relevant independent variable, after subtracting the estimate derived from the regression on the other independent variable, it gives the best estimate of the residual deviation of the dependent variable. It is not intuitively obvious that the same coefficient can serve these two different purposes, but it is demonstrably the case. Wright's concern about the restricted variability of the independent variable, and the need to 'readjust the more remote causes', therefore seems unnecessary. If we take the partial regression coefficient Bxa.c, (see Notes Part 3 for this notation) and multiply it by the full deviation value of A, this should itself be a suitable measure of the independent causal influence of A on X, taking account of C. The standard deviation of the effect of A on X will then be (Bxa.c)sa, which as a proportion of the total standard deviation in X will be (Bxa.c)sa/sx. But this is equivalent to the Beta weight of X on A, given C. (See Notes Part 3.) The suggested value for the path coefficient is therefore equal to the relevant Beta weight. If the variables are measured in units of their own standard deviations, as Wright recommends for most purposes, the partial regression coefficients and Beta weights will coincide.

This is the same as Wright's result, but reached via the theory of multiple regression. By Wright's own account, he did not originally take this approach, and was surprised when late in his investigation of the problem he realised the close connection between path coefficients and multiple regression. (See 'Reply to Niles', p.242.) I would suggest that it would be better to explain Path Analysis from the outset as a 'causalised' version of multiple regression.

The other main question in Path Analysis is how to quantify the path coefficients. If all the correlations between the variables in the system are known (or hypothesised), then the path coefficients can be calculated by using in reverse the rules which enable the correlations to be derived from the path coefficients. (This will sometimes require simultaneous equations, but there should be enough equations to determine the unknowns.) If there are gaps in the available information, these may often be filled by imposing the condition that the 'coefficients of determination' for each variable must, if the scheme of causation is complete, collectively account for the total variance of the variable. Wright also often makes use of the principle that the correlation of a variable with itself is 1.

Overall, Wright's method of Path Analysis is a very impressive achievement. It is interesting to note that two of the major methods of multivariate analysis devised in the 20th century were the work of people who were only amateurs in statistics (the other example being Spearman's Factor Analysis).

Despite the scale of Wright's achievement, Path Analysis never seems to have received the same general acceptance as Fisher's Analysis of Variance. For example, it is seldom covered in general textbooks on statistical methods. It seems to have had occasional phases of fashionability in particular fields, notably in sociology, without ever quite becoming part of standard statistical practice. (Incidentally, Wright himself criticised some of the uses it was put to in the social sciences, which can hardly have encouraged would-be practitioners of the method.) Probably one reason for its unpopularity is that Wright's method requires the use of diagrams. Perhaps more important in modern times, it resists reduction to off-the-shelf computerisation. It is impossible to do Path Analysis without a human brain. But it may also be wondered whether Path Analysis has quite justified Wright's hope that it would help clarify causal relationships. Wright himself used it mainly for the narrower purpose of calculating genetic relatedness, where the nature and direction of causal influences is unambiguous. This is seldom the case in other fields. (And even in this field his methods have largely been superseded by Malecot's concept of Identity by Descent, which uses diagrams which look like an application of Path Analysis but are conceptually quite different.) It seems also that Wright was originally motivated by a belief that the existing methods of multiple regression and correlation were inadequate or paradoxical, and needed to be supplemented. But in the process of working out his method, he discovered that it was more closely related to multiple regression than he had realised at the outset. The 'added value' of Path Analysis as compared with other methods may therefore not always justify the extra effort involved in mastering and applying the technique.

Postscript: Since writing this I have found a useful explanation and evaluation of Path Analysis in an article by O. D. Duncan, 'Path Analysis: Sociological Examples', American Journal of Sociology, 72, 1966, 1-16.
Another, more technical, account is given by K. C. Land in 'Principles of Path Analysis', Sociological Methodology, 1, 1969, 3-37.
Both articles are available on JSTOR for those with access.

Note 1: One relatively obscure point is Wright's discussion of the correlation of a variable with itself, which must equal 1. Although Wright discusses this case on several occasions, I do not think he ever gives a path diagram for it, or explains how it would be drawn. I think the best way of doing it would be to insert the self-correlated variable in the diagram twice, perhaps labelled X(1) and X(2).

Note 2: For any given deviation value of Y, the associated deviation value of X will be bxy.Y. The total of the deviations due to Y will be S(bxy.Y), with the summation taken over all values of Y. Since SY is a sum of deviation values, S(bxy.Y) equals zero, but the sum of squares, S(bxy.Y)^2, will in general be non-zero. The standard deviation in X due to Y will be root-[S((bxy.Y)^2)/N]. But root-[S((bxy.Y)^2)/N] = bxy.[(root-SY^2)/N]. The expression in square brackets is the standard deviation of Y, so abbreviating this as sy we have shown that the standard deviation in X due to Y is equal to

Labels: ,