Sunday, September 10, 2006

Some notes on g and factor analysis: A rebuttal of The Mismeasure of Man   posted by Darth Quixote @ 11:58 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine


May I end up next to Judas Iscariot, Brutus, and Cassius in the devil's mouth at the center of hell if I ever fail to present my most honest assessment and best judgment of evidence for empirical truth.
-- Stephen Jay Gould


In this post I attempt to lay out the motivation and logic underlying factor analysis and the extraction of g from a battery of mental ability tests. Some mathematical notation will be employed; my feeling is that those readers who have at least probability and lower-division linear algebra will find this notation an aid to understanding. However, the purely verbal and conceptual level at which the exposition is pitched should be enough for all GNXP readers to get something out of it. At the end of the post I will comment on various erroneous or misleading claims put forth by Stephen Jay Gould in his long chapter on factor analysis in The Mismeasure of Man, trying as much as possible not to overlap with the voluminous criticisms that have already been made[1]. You can probably skim or skip to the material at the end of the post with tolerable loss of continuity.

According to Google Scholar, The Mismeasure of Man has been cited nearly 1,000 times and is still piling up approving references in the most prestigious scientific journals (Flint, 2006; Barres, 2006). It continues to be assigned in college courses and touted as the authoritative refutation of the London School of differential psychology. This is all in spite of the negative reviews of the book that have appeared in Science, Nature, Contemporary Education Review, and Intelligence (the latter two by the differential psychologists Arthur Jensen and John Carroll) and scathing commentaries in various outlets by the late Harvard microbiologist and geneticist Bernard Davis (e.g., here). Thus, I regard the placement of this additional protest on record as somewhat of a duty.


All tests of verbal, numerical, spatial, memory, and abstract-reasoning abilities are positively correlated. This is not because mental testers have made them so; given the failure of all attempts to create ability tests with acceptable predictive validity that are uncorrelated in large and representative samples, this positive manifold should be regarded as a fact of nature crying out for explanation.

Recall from an earlier post discussing the French cross-fostering IQ study that factor analysis models an observed mental test score as a linear combination of scores on latent factors, the scalar weights equaling the "loadings" of that particular test on the respective factors. That is,

X_i - mu_i = l_{i1} * F_1 + l_{i2} * F_2 + ... + l_{im} * F_m + epsilon_i

where X_i is the subject's observed score on the ith test, mu_i is the mean of the ith test, l_{ij} is the loading of the ith test on the jth factor, F_j is the subject's score on the jth factor, and epsilon_i is a random error term. m less-than p where m is the number of factors and p is the number of tests; this requirement is imposed because we want to explain the various tests in terms of a smaller number of latent factors. This equation is a more general form of the expression in the previous post modeling an individual's Vocabulary subtest score as a weighted sum of the subjects' scores on various factors, the weighting of each factor depending on the sensitivity of Vocabulary as a measure of that particular factor. (There is evidence that the regression of some observed indicators on latent factors is nonlinear such that the indicator is not as sensitive at higher levels of ability, but for the most part the assumption of linearity in the common factors "works.")

As there is one equation in the form above for each of p mental tests, we have p such equations that can be compactly written in matrix form as

X = mu + LF + epsilon

where X is p-by-1 random vector of observed mental scores, mu is the p-by-1 mean vector of the tests, L is the p-by-m "factor loading" matrix such that the entries of the ith row are the loadings of the ith test on the m factors, F is the m-by-1 random vector of factor scores, and epsilon is the p-by-1 random vector of error terms. Moreover,

E(F) = 0, Cov(F) = I,
E(epsilon) = 0, Cov(epsilon) = Psi

where Psi is a diagonal matrix (meaning that the errors are uncorrelated). The most important point to note here is that the covariance matrix of the factors is also diagonal; this says that the factors underlying the various correlated test scores are themselves uncorrelated. Thus, if the application of the model is successful, the sources of variation accounting for the data are not only fewer than the original number of variables but are independent of each other. Now, by matrix algebra,

(X - mu)(X - mu)'
= (LF + epsilon)(LF + epsilon)'
= (LF + epsilon)(F'L' + epsilon')
= LFF'L' + epsilon(LF)' + LFepsilon' + (epsilon)(epsilon')

The expectation of this expression is

Cov(X)
= LIL' + 0 + 0 + E(epsilon)(epsilon')
= LL' + Psi

Notice also that

E(X - mu)F'
= Cov(X,F)
= LE(FF') + E(epsilonF')
= L

We thus have a natural interpretation of the factor loadings: the ijth entry l_{ij} of the loading matrix L is the covariance between test i and factor j.

The primary goal of factor analysis is to estimate L. How might we go about doing this?[2] It may occur to the reader that this is actually an instance of the diagonalization problem, which consists of finding a "preferred" basis by utilizing the eigenstructure of the given operator over the vector space. We now flesh out this intuition.

Suppose that we administer three mental ability tests from different domains to a large and representative sample. If it helps, we can suppose that these tests are the SAT Verbal, the SAT Math, and a broad test of spatial-visualization.[3]

If the range of ability is unrestricted, each of these three tests will be substantially correlated with the others. For example, the SAT-V and SAT-M, two superficially quite different tests, show a correlation of ~0.65 in the entire population of SAT takers. Suppose that we plot the data points of a large and representative sample of SAT takers; we can put the SAT-V on the x-axis and the SAT-M on the y-axis. We observe, instead of a dispersed cloud, a fairly tight clustering of the data points around an axis (not yet "drawn in") that slopes upward at an angle of 45 degrees from either axis proper. If we add a z-axis to represent scores on the spatial-visualization test, then the data points cluster around the imaginary sloping axis in three dimensions.

Looking at these data, we might suspect that these three distinct tests in fact measure something in common, something more abstract than "verbal," "mathematical," or "spatial ability"--"general ability" perhaps. How might we represent this conjecture? A natural suggestion is to perform a rigid rotation of the coordinate axes so that one of them coincides with the imaginary axis that slopes upward through the heart of the data cloud. If you can visualize how the graph looks now, you will notice that a larger projection on the new "principal" axis means a tendency toward higher standing on all three tests; this conforms exactly to our conjecture that the three disparate tests measure to some extent a single latent dimension.

Is there some objective mathematical criterion that will guide this rotation so that we can do better than eyeballing graph paper? Imagine the variance of projections of the data points on the desired principal axis. As the points are spread out all along the axis, this variance will be fairly large. Now imagine jiggling the axis about so that it departs from this "ideal" orientation. It is clear that the variance becomes smaller; the projections of the data points on the axis as it moves away from the heart of the data cloud get closer and closer together.

It is now evident that our goal is to find a basis for the data space so as to maximize the variance of the projections of the data points on one of its vectors. It turns out that this basis consists of the first m eigenvectors of the correlation matrix R of the battery X of mental test scores. [We use R, which is the standardized form of Cov(X), because mental test scores are not measured in "natural" units such as seconds or meters and so their scaling is arbitrary. In any case standardization may be desirable for purposes of exploratory factor analysis because it puts all the tests on equal footing.]

Before proving this, we need some preliminary results. There exists a decomposition of any symmetric matrix called the spectral decomposition such that the matrix is the weighted sum of the orthogonal projection matrices onto its eigenspaces with each scalar weight equaling the corresponding eigenvalue. It is clear that the spectral decomposition of A can be written as P(Lambda)P' where the columns of the orthogonal matrix P are the normalized eigenvectors of A and Lambda is a diagonal matrix such that the iith entry is the ith largest eigenvalue of A. This is the form of the spectral decomposition used in what follows.

Now, let A be a p-by-p matrix such that x'Ax is always positive for a nonzero vector x and lambda_1 >= ... >= lambda_p be the eigenvalues with corresponding normalized eigenvectors e_1, ..., e_p. We claim that the maximum value of x'Ax is lambda_1 and that this maximum is attained when x = e_1. To prove this claim, let P and Lambda be the matrices defined in the definition of the spectral decomposition and y = P'x. Then

x'Ax
= xA^(1/2)A^(1/2)x
= x'P(Lambda^(1/2))P'P(Lambda^(1/2))P'x
= y'(Lambda)y
= (sum from i=1 to p)(lambda_i * (y_i)^2))
less-than or = lambda_1 * (sum from i=1 to p)((y_i)^2)

If we constrain x to have unit length, then both sides of the inequality above can be divided by x'x = (sum from i=1 to p)((y_i)^2) to give x'Ax less than or equal to the largest eigenvalue of A. Now, set x = e_1. This gives y = P'e_1 = (1 0 ... 0)', which in turns gives y'(Lambda)y = (e_1)'A(e_1). Thus, the claim is proved. Furthermore, it can be shown by similar reasoning that the maximum value of x'Ax where x is orthogonal to the eigenvectors e_1, ..., e_k is lambda_{k+1} and that this value is attained when x = e_{k+1}.

All the hard work has been done[4]. From the above it follows that the coordinates of the desired principal axis on which the projections of the data points have maximum variance are given by the entries of the eigenvector of R corresponding to the largest eigenvalue and that this eigenvalue is equal to the total standardized variance accounted for by this dimension. The successive factors are represented by the successive eigenvectors, each one accounting for portion of the total standardized variance proportional to its eigenvalue. If we want to make the loadings more interpretable, we can standarize them such that (l_{ij})' = [(e_{ij})(lambda_i)^0.5]/SD(X_k) is the correlation between test i and factor j.

We still have some chinks to work out. If all possible factors (called principal components in this case) are extracted from R, then the loadings of all tests on all principal components are given by the p-by-p matrix L such that LL' = R. But recall that the loading matrix L is supposed to be p-by-m where m less-than p; after all, dimensionality reduction is what we are after. Usually this is not a problem. If there is structure in the data, then most of the total variance can be accounted for by (far) fewer than p factors and thus L is p-by-m and LL' roughly-equal R. But note that this is still not exactly the common-factor model that we initially elucidated, which is

R = LL' + Psi

In other words, we want the loading matrix to account completely for the off-diagonal elements of R (the correlations among the mental tests) and only partially for the diagonal elements (which are augmented by Psi). This is because some portion of the variance in scores on a given test is not accounted for by factors shared with other tests in the battery and instead is idiosyncratic to the test itself. Therefore we must reduce R to R_r such that the iith diagonal element of the latter matrix is no longer unity but rather the communality of the ith test. The communality of the ith test is the portion of its variance accounted for by common factors, i.e., the sum of squares of its loadings on the common factors. Unfortunately, we do not know these sums of squares because the loadings of the tests on the common factors are what we are trying to estimate in the first place. But it can be shown that the squared multiple correlation between the ith test and all of the other tests in the battery is a lower bound for the communality of the ith test. Moreover, we have the option of iteratively estimating the communalities using the loadings derived from the initial estimates until some criterion of convergence is reached. We can therefore estimate R_r by placing the squared multiple correlations in the diagonals. This matrix is then factored until LL' satisfactorily reproduces the off-diagonal elements of R.

I have applied this methodology to the standardization data for the Wechsler Intelligence Scale for Children-Revised (WISC-R). I described the contents of most of the WISC-R subtests (Information, Similarities, Arithmetic, Vocabulary, Comprehension, Picture Completion, Picture Arrangement, Block Design, Coding) in my previous post, but the standardization included some subtests not used in the French translation:

Digit Span: The subject is given a series of random digits or consonants and must repeat them back in either the same or in reverse order. The latter condition is more difficult.
Tapping Span: The examiner taps out a pattern on a series of four wooden blocks. The subject must tap out the same pattern. Difficulty increases with the length and complexity of the pattern.
Mazes: The subject must trace a correct path through a maze printed on a sheet of paper.

The loadings of the 13 subtests on the first principal factor are as follows:

Information: 0.74
Similarities: 0.73
Arithmetic: 0.63
Vocabulary: 0.77
Comprehension: 0.67
Digit Span: 0.48
Tapping Span: 0.39
Picture Completion: 0.56
Picture Arrangement: 0.54
Block Design: 0.69
Object Assembly: 0.56
Coding: 0.41
Mazes: 0.42

The eigenvalue corresponding to the first principal factor is 4.63. Thus, 4.63/13 = 36% of the total standardized variance in the subtests is accounted for by factor on which all subtests have salient loadings. The subtests that are least sensitive to this factor involve rote memory and psychomotor skill; those that are most sensitive implicate complex cognitive processes such as learning the meanings of words, discerning commonalities in superficially distinct concepts, and mentally manipulating visual-spatial images. Here we have the notorious Spearman's g: a single dimension measured to differing extents by all complex tests of mental ability.

Now what exactly is Gould's objection to all this? First, he states repeatedly that "Spearman's g, and its attendant claim that intelligence is a single, measurable entity, provided the only promising theoretical justification that hereditarian theories of IQ have ever had." But notice that in the derivations above the genetic architecture of mental abilities never arises. Whether mental abilities are dominated by a single dimension or are dispersed among a hundred is completely orthogonal to the issue of whether these dimensions (regardless of number) are inherited or susceptible to environmental influences. This logical fallacy on Gould's part was raised by numerous commentators, including James Flynn (a dogged but not dogmatic environmentalist when it comes to the race issue), but Gould refused to amend the second edition of his book (released as a response to The Bell Curve) to meet these criticisms.

The point can be emphasized in another way. It is possible to decompose a phenotypic correlation between mental tests (of the kind that were factor-analyzed above) into genetic and environmental components, provided that the sample is large and contains two types of kinships. The g that is extracted from the matrix of genetic correlations is undoubtedly an estimate of a purely genetic factor, yet it is subject to exactly the same arguments that Gould employs to "destroy" the "chimerical" g. Gould's claims to have demolished the quantitative-genetic theory of the inheritance of mental abilities on the basis of psychometric arguments are thus nonsensical.

Incidentally, the genetic g that is extracted from test covariance matrices in genetically sensitive designs tends to resemble the purely phenotypic g quite closely. For example, Luo, Petrill, and Thompson (1994) found in a moderately large sample of MZ and DZ twins that the magnitudes of a test's respective loadings on phenotypic and genetic g show a correlation of 0.85. From the abstract of a more recent but similar study by Rijsdijk et al. (2002):

This paper addresses the question of how a genetic hierarchical model fits the Wechsler Adult Intelligence Scale (WAIS) subtests and the Raven Standard Progressive test score, collected in 194 18-year-old Dutch twin pairs.... A hierarchical model with the 3 Cohen group-factors ... and a higher-order g factor showed the best fit to the phenotypic data and to additive genetic influences (A), whereas the unique environmental source of variance (E) could be modeled by a single general factor and specifics. There was no evidence for common environmental influences. The covariation among the WAIS group factors and the covariation between the group factors and the Raven is predominantly influenced by a second-order genetic factor and strongly support the notion of a biological basis of g.

Ponder over this some more after the next point.

Gould goes on to claim that innovations in factor analysis developed by the psychometrician L.L. Thurstone (whom he dubs "the exterminating angel of Spearman's g") utterly invalidate the primacy of the dominant dimension extracted from the correlation matrix through its eigenstructure.

Here, Thurstone had his great insight. The principal ... axes of Burt and Spearman do not lie in the only position that factor axes can assume. They represent one possible solution, dicated by Spearman's a priori conviction that a single general intelligence exists.... The real vectors of mind, Thurstone reasoned, must represent independent primary abilities. Thurstone therefore calculated the [eigenvectors] and then rotated them to different positions until they lay as close as they could (while still remaining perpendicular) to actual clusters of vectors. In this rotated position, each factor axis would receive high positive projections for the few vectors clustered near it, and zero or near zero projections on all others, Thurstone referred to the result as a simple structure. He refined the factor problem as a search for a simple structure by rotating factor axes from their [eigenvector] orientations to positions maximally close to clusters of vectors[5].

To understand what Gould is talking about here, I think it helpful to shift geometric interpretations. In the above motivating account of the data's graphical appearance, each person was represented was a point in data space; that is, each coordinate axis represented a test. But it is also possible to represent each test as a point in person space; the projections of the points on a given coordinate axis represent what a single person scored on the various tests. It is best to visualize the data points as endpoints of vectors emanating from the origin. If you can picture this, it should be clear that two vectors representing highly correlated tests will have a small angle between them. Now, the dimensions extracted by factor analysis from the correlation matrix can be represented as "dotted" vectors, if you like, that also emanate from the origin. g is represented by the vector that is the "average" of the vectors representing the tests. So what Gould wants to do is to rotate the factor vectors such that each one passes through a cluster of test vectors that are more highly correlated among themselves than they are to the other tests. If the data permit it, g will then disappear; there will be no factor vector on which all tests have projections.

This can also be put algebraically. What Gould is essentially saying is that there exist an infinite number of loading matrices L such that LL' = R. This situation is analogous to the case in scalar multiplication; given a number z, there exist an infinite number of products x,y such that xy = z. Another way to put it is that given LL' = R such that the columns of L are eigenvectors of R, it is possible to multiply L by any orthogonal matrix T and still reproduce R perfectly because LT(LT)' = LTT'L' = LL' = R. (Recall that orthogonal matrices represent rigid rotations.)

But not any rotation will do. If it is in fact the case that a battery of mental tests measures several distinct dimensions and that a single overarching dimension does not exist, then it should be possible to rotate the factors such that a test has a large projection on only one of them. In the case of three factors and nine tests, the ideal situation looks like this:

1) * / - / -
2) * / - / -
3) * / - / -
4) - / * / -
5) - / * / -
6) - / * / -
7) - / - / *
8) - / - / *
9) - / - / *

The stars represent definitely-not-zero loadings on a factor and the dashes represent loadings that are near zero. In this case tests 1-3 load saliently only on factor A, tests 4-6 load saliently only on factor B, and tests 7-9 load saliently only on factor C. Of course, no factor has salient loadings for all tests. There exist analytic criteria for rotating factors to this simple structure (or doing so as closely as possible), the most popular being Varimax. As its name implies, the Varimax solution seeks the factor orientation such that the variance of squared factor loadings is at a maximum. This means that the loadings for a given factor are driven as much as possible to either unity or zero, which of course will produce something like the schematic table above if the data permits it.

Gould claims that "Thurstone dispersed g as an illusion" by means of orthogonal rotation, but this is (at best) horrendously misleading. Thurstone's earliest results were based on small and unrepresentative samples. His later studies replicated what all factorial studies based on large and representative samples have found: the correlations among mental tests are so large that they make the achievement of simple structure wholly impossible. In terms of test vectors in person space, clusters of tests are not separated by anything close enough to 90 degrees, which is necessary for the achievement of simple structure.

Take the WISC-R, the principal-factor g loadings of which were given above. The Wechsler scales are possibly the most widely used IQ battery in the world. I do not regard the WISC-R as sampling too narrow a domain of abilities: in the standardization there are 4 verbal tests, 5 spatial-visualization tests, and 4 tests of short-term memory. (Arithmetic probably also measures a numerical factor that gets assigned to error in the factor analysis because there are no other numerical tests in the WISC-R. Psychomotor ability and perceptual speed are probably also present in the battery as well.) It is widely agreed that the Wechsler scales measure three lower-order factors (verbal, spatial-visualization, memory) in addition to g. If Gould is right, then it should be possible to rotate these four factors orthogonally so as to produce a good fit to simple structure. The maximum-likelihood Varimax four-factor solution of the WISC-R is as follows:

Information: 0.647 / 0.261 / 0.256 / 0.198
Similarities: 0.650 / 0.313 / 0.196 / 0.108
Arithmetic: 0.326 / 0.180 / 0.366 / 0.449
Vocabulary: 0.791 / 0.216 / 0.253 / 0.210
Comprehension: 0.641 / 0.250 / 0.125 / 0.118
Digit Span: 0.218 / -- / 0.607 / 0.131
Tapping Span: -- / 0.146 / 0.534 / --
Picture Completion: 0.307 / 0.520 / -- / 0.101
Picture Arrangement: 0.325 / 0.428 / 0.146 / --
Block Design: 0.248 / 0.713 / 0.259 / 0.125
Object Assembly: 0.198 / 0.667 / -- / --
Coding: 0.177 / 0.198 / 0.370 / --
Mazes: -- / 0.439 / 0.187 / --

Factors 1, 2, and 3 are identifiable as verbal, spatial-visualization, and short-term memory; their most salient loadings are on tests of recognizably similar content. But the overall picture is nothing like simple structure. Even the Varimax-rotated solution produces highly complex factors that influence performance on tests from quite distinct domains.

Gould later backtracks somewhat, although not in a coherent way. (Read him and see for yourself.) In order to achieve simple structure in the domain of mental abilities, he admits, the constraint that the factor axes remain orthogonal must be lifted. In this case the factors themselves become correlated; imagine a space where the coordinate axes are no longer at right angles to other but rather all have components in the same direction. But then the correlation matrix of these first-order factors can be factored in turn, and if only one factor is sufficient to reproduce this matrix then this factor can be nothing other than g. So how does Gould weasel out of this conclusion? In his own words:

Thurstone admitted a second-order g, but he regarded it as secondary in importance to what he continued to call "primary" mental abilities. Quite apart from any psychological speculation, the basic mathematics certainly supports Thurstone's view. Second-order g (the correlation of oblique simple structure axes) rarely accounts for more than a small percentage of the total information in a matrix of tests. On the other hand, Spearman's g ... often encompasses more than half the information. The entire psychological apparatus ... of the British school depended upon the preeminence of g, not its mere presence.

It is hard to know what exactly Gould is saying here. (Probably nothing.) Recall that the representation of the WISC-R g as the first principal factor accounts for only ~35 percent of the total standardized variance, but it clearly dominates the battery as a whole. Moreover, the portion of the total standardized variance accounted for by g is not the same as the portion of the variance in Full Scale IQ that is accounted for by g; the latter is well over 50 percent. The distinction between a "small percentage" and "half the information" is almost empty of content. But this is nitpicking. The important point is that it is absolutely false that the second-order g extracted from the oblique first-order factors differs in any meaningful way from the g represented as the first principal factor. Using a variety of real and simulated data, Jensen and Weng (1994) showed that all varieties of factor analysis not expressly designed to preclude the appearance of g produce highly similar g factors. (Note that this paper was published before Gould reissued The Mismeasure of Man.) For example, the correlation and congruence coefficients between the g factors produced by principal factors and the hierarchical factor analysis pioneered by Thurstone exceed 0.95.

Jensen and Reynolds (1982) present a hierarchical factor analysis of the WISC-R data that I subjected to principal factors above. Below are the g loadings derived from the respective methods:

Information: 0.74 / 0.67
Similarities: 0.73 / 0.67
Arithmetic: 0.63 / 0.57
Vocabulary: 0.77 / 0.72
Comprehension: 0.67 / 0.60
Digit Span: 0.48 / 0.44
Tapping Span: 0.39 / 0.35
Picture Completion: 0.56 / 0.51
Picture Arrangement: 0.54 / 0.49
Block Design: 0.69 / 0.65
Object Assembly: 0.56 / 0.50
Coding: 0.41 / 0.37
Mazes: 0.42 / 0.37

The similarity between these g factors is plain to the naked eye[6]. In fact, the Spearman rank-order correlation between the two sets of loadings is 0.9959. It is true that the loadings from the higher-order g are smaller (principal factors goes "top down," assigning as much variance to g as possible and then giving the leftovers to other factors, while the hierarchical method does exactly the opposite in going "bottom up"), but to claim as Gould does that they are diminished to insubstantiality is nothing short of preposterous. The truth is that the hierarchical g accounts for 30 percent of the total standardized variance--which is about 85 percent of the variance accounted for by the first principal factor!

Let's close on a constructive note. If it were discovered that every indicator of g increases as the number of + alleles at a locus increases (from 0, to 1, to 2), that would constitute extremely convincing evidence that g does in fact resemble the construct that the London School has traditionally envisioned: a single dimension influencing variation across a broad range of complex mental abilities. And in fact we do have some tentative evidence in this direction. Consider this study by Rujescu et al. (2003):

Correlations between general intelligence (g) and brain volume are about 0.40, and the correlation between g and white matter volume has been reported to be largely due to genetic factors. Establishing that the correlation between brain volumes and cognitive abilities is mediated by shared genetic factors is only the first step in unveiling the relation between them. We have recently shown that methionine at codon 129 in the prion protein is associated with white matter reduction in a group of healthy volunteers and schizophrenic patients. The present study examines the influence of the same genetic variation on psychometric cognitive performance measurements in 335 community-based healthy volunteers. The polymorphism was associated with Full Scale IQ (genotype: F=4.38, df=2/317, P=0.013; allele: F=8.04, df=1/658, P=0.005), as measured by HAWIE-R (German version of the Wechsler Adult Intelligence Scale, Revised). Genotype accounted for 2.7% of the total variability in Full Scale IQ. An exploratory analysis revealed association with several HAWIE-R subscales; the association with the Digit Symbol subtest remained significant after correction for multiple testing. In summary, we deliver evidence for an association of a common genetic variation in the prion protein gene with cognitive performance. However, independent replications are needed before firm conclusions can be drawn.

"Replications are needed" is right, most desirably with a within-family design. (In such a study a much larger sample would be required. Lynch and Walsh (1998) document instances of locus effect sizes being smaller in within-family designs than in association studies, probably because population substructure gives a spurious boost to the effect size of a locus in designs that are insensitive to such substructure.) But especially promising to my mind are the data presented in their Table 2. As reported, the homozygotes differ in IQ by 7 points and the heterozygotes are intermediate. But it seems to me that the trend from low homozygote less-than heterozygote less-than high homozygote holds across all indicators. 7 of the 11 subtests follow this pattern; in one test the low homozygote and the heterozygote are tied; none reverse it.

These results, although admittedly preliminary, are extremely provocative. I believe that they also reinforce the point that, in stark contrast to the degenerative, ad hoc, and atheoretical approaches of its opposition (Gould's lies and obfuscations, stereotype threat, etc.) the London School of differential psychology represents progressive and cumulative science.

[1] Other sharp criticisms of The Mismeasure of Man can be found in Rushton (1997), Eysenck (1998), Hamilton (2002), Bartholomew (2004), and Sesardic (2005). The book Measuring Intelligence: Facts and Fallacies by David Bartholomew, a statistician at the London School of Economics who has written a textbook on factor analysis for the Kendall's Library of Statistics series, is particularly valuable. William Hamilton (yes, that Hamilton; "he who was the greatest theoretical biologist of the twentieth century") is a staunch advocate for those scientists whom he feels Gould has defamed.

[2] What follows is actually mildly antiquated, but it should be sufficient to understand the precise nature of Gould's tomfoolery. The state of the art in estimating the parameters in the common-factor model makes use of structural equation modeling. A structural equation model is based on the the following measurement equations:

eta = B(eta) + (Gamma)(xi) + zeta
Y = (Lambda_y)(Eta) + epsilson
X = (Lambda_x)(xi) + delta

where eta is m-by-1, B is m-by-m, Gamma is m-by-n, xi is n-by-1, zeta is m-by-1, Y is p-by-1, Lambda_y is p-by-m, epsilon is p-by-1, X is q-by-1, Lambda_x is q-by-n, xi is n-by-1, and delta is q-by-1. The quantities are xi and zeta are the cause and outcome variables respectively and typically are not directly measured. The quantities Y and X are linearly related to eta and xi and are directly measured. Unknowns do not necessarily have closed-form solutions and in such cases must be estimated by complex iterative computer searches.

The measurement equations contain a very deep and powerful structure; multiple regression and factor analysis fall out of it as special cases. Play around with them to confirm that this is so. Go on, try it; it's fun!

[3] Typical items in a rboad spatial-visualization test might ask the subject to predict how a folded napkin pierced with a hole puncher will look when it is unfolded; to decide whether two toy blocks with various numbers and letters on their faces are the same or different (this problem is difficult because the blocks, even if they are the same, lie at different orientations and thus must be mentally rotated by the subject to bring their features into alignment); to examine a complicated Rube Goldberg machine and determine in what direction a certain gear will turn if a certain pulley is worked in a given direction; and so on. g-loaded tests that also strongly measure the spatial-visualization factor tend to show a large male advantage, which means that they are sometimes eschewed in widely used test batteries.

[4] That the eigenstructure of the correlation matrix provides the desired quantities can be also shown by the introduction of a Lagrange multiplier and some matrix calculus. Godless Capitalist (see his previous related posts here and here) assures me that this approach and the one that I employ here relying on the Spectral Theorem are trivially equivalent. I don't have
the book where I saw the alternative proof with me right now. But just sitting here thinking about it, I don't see their "trivial" equivalence. Ah ... us poor sub-160-IQ idiots.

[5] My understanding is that in the early days factorists did not literally calculate eigenvectors. The computing power necessary to find the determinants of large matrices was not available back then. So factorists used reasonable approximations such as the "centroid" method.

[6] Look at Table 3 of Jensen and Reynolds (1982) to see how beautifully simple structure for the first-order factors can be achieved if the model allows for g.