|« Talk about Innovation | Gene Expression Front Page | HIV and social norms »|
August 17, 2004
Regression to the Mean and Galton’s Fallacy (not!)
In a recent discussion Abiola drew attention to ‘Galton’s Fallacy’. As I am interested in Francis Galton’s work I was curious to know more about the ‘fallacy‘.
Anyway, I did some searching, with the following results…...
First, it appears that the term ‘Galton’s Fallacy’ is confined to the economics profession, and is comparatively recent. Three papers in particular are frequently cited:
1992: Milton Friedman: ’Do old fallacies ever die?’, Journal of Economic Literature, 30, 2129-32.
1993: Danny Quah: ’Galton’s Fallacy and tests of the Convergence Hypothesis’, Scandinavian Journal of Economics, 95, 427-43.
1999: Christopher Bliss: ’Galton’s Fallacy and economic convergence’, Oxford Economic Papers, 51, 4-14.
All three articles are concerned with the theory of comparative economic development. Some economists believe that there is a tendency for national levels of economic development to converge. To support this theory they offer evidence that countries with high levels of GDP at the start of a long period tend to have more moderate growth rates at the end of that period, while those with low initial levels of GDP have higher growth rates at the end of the period, supposedly leading to convergence over time.
As stated, this evidence is clearly inconclusive. The puzzle is not to see why it is inconclusive but why anyone should want to use it. Why mix up evidence about GDP levels and growth rates, at different points in time, in this convoluted way? Perhaps the reasoning is that high (or low) initial levels of GDP must be the result of previous high (low) growth rates over a long period. If later the countries concerned have growth rates closer to the average, this may suggest that their long-term performance has also converged towards the average - though why anyone should apply such an indirect test when a more direct test is available is still a puzzle. Why not just check whether GDP per head converges over time? If historical data are available for growth rates, they must be available for GDP!
In any event, previous high (or low) growth rates may be due to unique historic circumstances, which cannot be expected to be permanent. High (low) growth rates will therefore probably be followed by growth rates closer to the general average. In statistical terms, the tendency of high or low growth rates to be followed by more moderate rates may just be an example of ’regression towards the mean’ (see Technical Note). This does not necessarily lead to convergence, since there may be enough fluctuation around the mean (including higher or lower growth rates among those countries that were initially ’mediocre’) to maintain the same overall dispersal. To overlook this possibility would be fallacious. Since Francis Galton was the first to identify, name, and explain the phenomenon of regression towards the mean, it would be reasonable to call it ’Galton’s Fallacy’, rather in the sense that Down’s Syndrome is attributed to Dr. Langdon Down, i.e. as its discoverer.
Turning to the three cited articles, Milton Friedman does not actually use the term ’Galton’s Fallacy’. Friedman refers to the ’regression fallacy’, and says ’After all, the phenomenon in question is what gave regression analysis its name’. In a footnote he adds ’Galton examined the heights of fathers and sons, and found that the sons of tall fathers tended to be shorter than their fathers, i.e. regressed toward the mean; similarly, the fathers of tall sons tended to be shorter than their sons, i.e. regressed toward the mean. This is simply the well-known phenomenon that in a linear regression of x and y, the regression of y on x is flatter relative to the x axis than the regression of x on y’. It isn’t clear from this what Friedman thinks the fallacy is, but on the face of it he does not accuse Galton himself of any fallacy, he merely points out (correctly) that Galton discovered regression.
Danny Quah’s article of 1993 does use the term ‘Galton‘s Fallacy‘. The article cites no earlier use of the term, but refers to ’the famous Galton’s fallacy of regression to the mean’. This might suggest that Quah believed that ‘Galton’s Fallacy’ was already an established (even ’famous’) usage. However, I have not found any earlier example of the term. As to Galton’s role in the ‘fallacy‘, Quah says that ‘Galton, in aristocratic manner, was concerned about the sons of tall fathers regressing into a pool of mediocrity along with the sons of everyone else… He could not, however, reconcile this with the population of male heights continuing to display significant cross-section dispersion…’ Quah gives no reference to Galton’s writings, and there is nothing to show that he has actually read any. In Quah’s account the fallacy is to infer (wrongly) from the fact of regression that there will be a reduction of population variance (dispersion). Although Quah does not explicitly accuse Galton of committing this fallacy, his claim that Galton could not reconcile regression to the mean with continued cross-section dispersion does point in that direction.
Christopher Bliss’s article is aimed at elucidating several different meanings of ‘Galton’s Fallacy’, and includes a section on Galton himself. He briefly describes Galton’s discovery of regression: ’Galton noted regression towards the mean for human heights. He examined a sub-sample of fathers selected for their exceptionally high stature, and noted that their sons exhibited on average statures closer to the mean stature of the general population…Galton’s Fallacy was to wrongly infer from these valid observations that a general contraction of the spread of heights in the population was taking place; a reduction in the variance of heights…. Galton’s observation caused him anxiety because he supposed that tall men might play a crucial role in winning wars, so that a reduction in their prevalence might weaken national security…’
Bliss cites no text of Galton’s to support these points, which to be frank are virtually pure fiction. Galton’s pioneering 1885 article on ’Regression towards mediocrity in hereditary stature’ did not ‘examine a sub-sample of fathers selected for their exceptionally high stature’ (he used a sample covering the full range of heights), and he did not ‘infer that a general contraction of the spread of heights in the population was taking place’; on the contrary, one of his main aims was to explain how the distribution of heights remained constant between one generation and the next. As to tall men being needed for ‘winning wars‘, Galton explicitly argued elsewhere (Hereditary Genius, pp. 144-6) that tall men were at a disadvantage in modern warfare, because they were more likely to be shot!
But did Galton in fact commit ‘Galton’s Fallacy‘, if by this we mean the belief that regression towards the mean implies a reduction of variance?
So far as I am aware, he did not. He did indeed commit some fallacies connected with regression (see Technical Note), but not this one. In several places (e.g. Natural Inheritance, 1889, p 115-117) Galton was quite clear that regression to the mean did not entail any reduction of population variance, as the convergence of some individuals towards the mean was offset by the dispersion of others, already saying in his 1885 paper that ‘the process comprises two opposite sets of actions, one concentrative and the other dispersive, and of such a character that they necessarily neutralise one another, and fall into a state of stable equilibrium‘. Historians of statistics usually credit him with the first clear explanation of this point. It is possible that some of the modern writers who refer to ‘Galton’s Fallacy’ are more muddled on the subject of regression than Galton was, over a hundred years ago.
I also note that none of the three economists considered here seems to have read the important article on ’Regression toward the mean and the study of change’, by J. Nesselroade, S. Stigler, and P. Baltes, in Psychological Bulletin, 1980, 88, 622-37, which preceded their own work by more than ten years. In a more recent article (published after the papers by Friedman and Quah, but before that by Bliss) Stephen Stigler has discussed the history of regression theory at length. He comments: ‘It is fair to say that by 1889 Francis Galton had a clear understanding of the concept of regression… Galton’s grasp of the concepts was as firm as anyone is likely to encounter even today… The recurrence of regression fallacies is testimony to its subtlety, deceptive simplicity and, I speculate, to the wide use of the word regression to describe least squares fitting of curves, lines and surfaces. Researchers may err because they think they know about regression, yet in truth have never fully appreciated how Galton’s concept works. History suggests that this will not change soon. Galton’s achievement remains one of the most attractive triumphs in the history of statistics, but it is one that each generation must learn to appreciate anew, one that seemingly never loses its power to surprise’ (from the essay ‘Regression toward the mean’, reprinted in the book Statistics on the Table, by Stephen M. Stigler).
More precisely, suppose there are two sets of measurements, which we will call x’s and y’s. They may be measurements of the same kind or of different kinds. We assume that each x measurement is associated in some way with a particular y measurement. Initially the x’s and y’s are expressed in raw units of measurement, e.g. inches or kilograms. To avoid the confounding effect of possible differences in the variability of the x’s and y’s, it is convenient to divide the raw measurements (inches, etc.) by the standard deviation of each variable, so that they are expressed in units of their own variability. (I will call these ’standardised’ values.) For example, suppose we have a set of measurements of human height, with mean 5’9”, and standard deviation 2”. The mean will then have a standardised value of 69/2 = 34½, and a raw measurement of 5’8” will have a standardised value of 34.
Suppose now that we consider all the standardised values of x within a narrow range, and then examine the standardised values of the y measurements associated with those particular x‘s. If the mean of these standardised y values is closer to the standardised mean of all the y’s than the mean of the associated standardised x values is to the standardised mean of all the x’s, then we may say that the y’s regress towards the mean.
Regression towards the mean is commonly found when two variables are imperfectly correlated (i.e. have a correlation other than 1 or -1). Sometimes it is implied that it is always found (e.g. Nesselroade et al say that ’Lack of perfect correlation and regression toward the mean are essentially equivalent‘), but as I pointed out here, it is possible to devise artificial examples where this is not the case. But it is certainly a very common phenomenon.
We can perhaps understand regression best by supposing that each set of variables is the result of two sets of influences, one of which is common to both variables, while the other is unique to one of the variables. Schematically, we may say that x = a + b, while y = a + c, where b and c are independent influences. (In the case of a negative correlation, we suppose that the common influence has a positive effect on one variable and a negative effect on the other.) A high value of x may be the result of high a, high b, or both. The highest values come when both a and b are high. Since the y variables share the common influence a, they will tend to be higher than average when x is high, but as the x’s are also influenced by b, while the y’s are not, the associated y’s will on average not be as high as the x’s. (High b has no tendency to be associated with high c, so the combined effect of a + b is only partly reflected in the value of a + c.) Since the situation of x and y is essentially symmetrical, the x’s associated with a narrow range of values of y will also tend to regress towards their mean. Mutatis mutandis, the same obviously applies to low as well as high values.
There are numerous possible fallacies and misunderstandings concerning regression towards the mean. The so-called Galton’s Fallacy (i.e. that regression implies a reduction in variance), is probably rare, if it occurs at all outside economics. Here are some of the more common errors or misunderstandings I have come across:
1. Because regression is often illustrated by biological heredity, as in Galton’s original studies, it is sometimes supposed that it is a specifically biological phenomenon. This is not the case.
2. An opposite mistake (or at least an oversimplification), is to describe it as a ’purely statistical’ phenomenon. It is statistical in the sense that it can only be formulated precisely in statistical terms, but it is not a statistical phenomenon in the sense that it is a product of statistical procedures (like sampling error). In every case of regression there are underlying real factors which can be empirically investigated.
3. Regression is sometimes misunderstood as being a causal process occurring in time. But there can be regression from later to earlier events, or between simultaneous phenomena, e.g. between the size of different parts of the same body. It has nothing essentially to do with time.
4. Another elementary error is to suppose that the regression of x on y is the inverse of the regression of y on x, so that, e.g., if fathers who are 6’ tall have sons who are on average 5’10” tall, then sons who are 5’10” tall will on average have fathers who are 6’ tall. The fallacy here is to suppose that ’sons who are 5’10” tall’ are the same set of individuals as ’sons of fathers who are 6’ tall’. But this is not true, because: (a) 5’10” is only the average height of the sons of 6-footers, most (or even all) of whom will be scattered around the average, and (b) men who are 5’10” tall are not exclusively the sons of 6-footers. In fact, most of them will be the sons of fathers who are shorter than themselves (assuming average population height of 5’9”). Galton already made this point clearly in his 1885 paper.
5. One of the most common errors is to assume that if there is regression between variables measured at time T1 and time T2, then this indicates some general underlying causal factor producing the change. There may be, but there need not. For example, if children who perform very well on an IQ or scholastic test at age 10 do less well (on average) at age 15, there need not be any general reason for the decline. There may have been a variety of accidental or temporary factors which raised performance at age 10 but no longer apply at age 15. (Similarly, those who do badly at T1 are likely to do somewhat better at T2, regardless of any special treatment they may have been given in the interim. This is a serious problem in interpreting medical and educational data, and is one reason why it is essential to use a control group.)
6. It is unsafe to study only the extremes of a distribution, and then to generalise the results of that study to the whole distribution. This is because the extremes with respect to a particular trait are likely to be the product of a variety of untypical circumstances, and the individuals concerned will show regression towards the mean in other respects. For example, in studying the brain/body ratio of men and women, it would be unsafe to rely solely on the data for men and women of equal body size, because these are at the extremes of the male and female distributions (relatively small men and relatively large women). Their brain size is likely to regress towards the mean of each sex. So if we find that men of (say) 130 pounds weight have larger brains than women of the same weight, we cannot safely infer that men in general have larger brains relative to body size than women. (Maybe they do, but other evidence is needed to confirm this.)
7. Regression towards the mean in standardised values does not necessarily imply regression in raw values. If tall men tend to have shorter sons (measured in standardised units relative to the mean height of all sons) this does not imply that the sons are shorter when measured in inches. Maybe the entire population of sons is taller than the fathers. More subtly, changes in the standard deviation also need to be taken into account. Suppose that a population of asexual organisms each have several offspring. Suppose also that the mean value of the offspring of an individual for some trait, e.g. longevity, is always the same as that of their parent, but with some scatter around the mean. The mean value of the trait in the whole population will be unchanged, and there is no regression of the offspring when measured in raw values, but there will still be regression when standardised values are used. This is because the scatter of offspring around the parental means increases the standard deviation of the trait in the total population of offspring, and reduces the standardised values compared with the raw measurements.
8. If x regresses on y, and y regresses on z, it does not follow that x regresses on z. (To give a trivial counterexample, if the x’s are identical with the z’s, they do not regress on themselves. Or suppose that the x‘s and z‘s are MZ twins, while the y‘s are their non-twin siblings.) Nor, if we quantify the extent of regression by calculating a regression coefficient, does it follow that the regression of x on z is the sum or product of the regression of x on y and y on z. This is only true in special circumstances.
9. Regression is a property of sets of observations, not of individuals. It is strictly nonsensical to say that a given individual regresses towards the mean relative to some other individual. However, we may estimate or predict the most likely value of an individual based on the regression in a class to which the individual belongs, relative to a class to which some other individual belongs. The problem with this is that individual items (objects, people, organisms, etc), may fall into many different categories at the same time, e.g. a man may be a European, a German, a teacher, a diabetic, etc. The extent of regression towards the mean may be different for each category. To get the best estimate, we should use the regression coefficient for the narrowest category to which the individual belongs. E.g. it would be wrong to use a coefficient applicable to middle-class European males in general, if we know that the individual is a German diabetic teacher and that this subset has different regression characteristics.
Arguably, he did commit fallacies 8 and 9. On several occasions he assumed that regressions could be multiplied together, in the manner of fallacy 8, without adequate justification. This was first pointed out by Galton’s disciple Karl Pearson, who founded the theory of multiple correlation as a generalisation of Galton‘s work. Galton also arguably committed fallacy 9, because he believed that in the long term individuals regressed towards the mean of their entire ancestral population, and not to the mean of their direct ancestors. However, this is a debatable point. Galton’s belief in ’perpetual regression’ was not a simple misunderstanding of statistics, but was a consequence of his biological theory of heredity, which included a belief in ’positions of organic stability’ to which organisms tended to revert unless a new position of stability is reached by a large and sudden ‘jump’. (In this sense Galton was a forerunner of punctuated equilibrium theory, as recognised by Stephen Jay Gould.)
The one thing I have not found anywhere in Galton’s work is ’Galton’s Fallacy’!