|
Leave print-view
Sunday, August 20, 2006
We recently reported on an in-press paper by Dickens and Flynn regarding a possible narrowing of the white-black IQ gap, particularly among children. An inquisitive reader has sent us the following email: I was waiting for you to comment in depth on the new Flynn paper suggesting that the black-white IQ gap among children has been decreasing over time, but you haven't done so. So I thought I'd just ask you directly what your opinion is. Do you think these findings are valid? Do they represent a real increase in black IQ, and is there any evidence that this increase will be sustained through adulthood? Thanks, I'm fascinated by the subject but I'm not a statistician.
Update: The abstract from this paper by Deary, Der, and Ford (2001) is worth quoting here: The association between reaction times and psychometric intelligence test scores is a major plank of the information-processing approach to mental ability differences. An important but unavailable datum is the effect size of the correlation in the normal population. Here we describe the associations between scores on a test of general mental ability (Alice Heim 4, AH4) and reaction times using a 'Hick'-style device. The sample is 900 people aged 56 years who are broadly representative of the Scottish population. AH4 Part I total scores correlated -.31 with simple reaction time, -.49 with four-choice reaction time, and -.26 with intraindividual variability in both reaction time procedures. The correlation between AH4 scores and the difference between simple and four-choice reaction time was -.15. Separate analyses were conducted after partitioning the total group according to sex, educational level, social class grouping, and number of errors on the four-choice reaction time task. None of these factors significantly altered the effect sizes. This is the first report of reaction time and psychometric intelligence in a large, normal sample of the population. It provides a benchmark for other studies and suggests larger effect sizes than the majority of present studies, which are dominated by young student samples.
End of update.A second update is at the bottom of the post. Our tentative response comes in two parts:
1) I don't think it's likely that the decrease in the gap is real--if you look at the scatterplot, the small linear correlation is a function of two outlier points in the upper right and lower left respectively. Quantitatively, if you do a bootstrap analysis there is a very large confidence interval around that correlation value, indicating that it's uncertain that it's positive/nonzero.
2) I am less interested in psychometrics than I was at the start of the blog--I look at it as sort of like fooling around with punch cards while solid state machines are being developed. That is, the real future is the molecular analysis of intelligence. As I've blogged before, the issue with all the arguments over the Flynn effect is fundamentally that what we are currently measuring with IQ is not a ratio-scale variable and hence is not easily comparable across time and space.
Consider height, for example. It's clear that there has been a genuine, real, and sustained increase in height over the last century among all groups in industrialized countries. Yet this increase has not eliminated the ethnic gaps; the improvements in those aspects of the environment affecting height (nutrition, healthcare, etc.) have acted as a rising tide lifting all boats. We can say all these things with confidence because height is measured in meters--a physical variable--such that we can be sure that someone 1.85 meters tall in the USA in 2006 would have been 1.85 meters tall in Nigeria in 1906. Hence we have direct comparison, w/o accusations of statistical bias, of measurements made in different times and places.
Now suppose that rather than measuring height directly in meters, we came up with a battery of statistical tests which were generally positively correlated with height. For example, number of lifetime slam dunks, number of lifetime sexual partners (for men), self-reported number of times head has been injured by standing up too fast, proportion of physical confrontations won, etc.
We could then come up, via factor analysis of this matrix, with a statistical measure that is positively correlated with height. However, it would be vulnerable to all sorts of accusations of bias. How do we know that height is positively correlated with number of partners in both the US and Namibia, for example? And so on. Basically this statistical proxy for height is less satisfying than a direct ratio-scale measurement, primarily because the correlations of the indirect manifest indicators with height (which in our example is now a latent, unobservable construct) may not be constant in magnitude across space and time.
A similar situation obtains with IQ. We want to move towards measuring IQ in terms of a battery of physical tests, like reaction time (in seconds) and MRI measurements of brain volumes (in cubic meters), and so on. By putting such measurements together we will eventually arrive at a measurement--entirely derived from simple physical units rather than binary right/wrong responses on the complex measurement devices which are mental test items--which is correlated extremely highly with existing measures of IQ. At that point we discard the old IQ tests for many purposes and start using this battery instead. The key point is that since all the inputs to this measure are in physical units (seconds, cubic centimeters, etc.) we can directly compare measurements made across space and time, and hence resolve the Flynn effect.
Expect more on the narrowing of the white-black IQ gap and the issue of mental measurement in the coming weeks.
Update #2: In response to the above exchange, the following question was asked:
Should observations that are means of hundreds or thousands of observations be treated in the same way as single data points when testing the significance of a correlation?
Our own GC was kind enough to respond:
short answer: no.
long answer: the appropriate thing to do in this case would be to calculate a non-parametric bootstrap estimate of the sampling distribution of the correlation. That sounds much more complicated than it is. What you would do is this:
1) Start with your original raw data. I am assuming it is in matrix form, with N observations on P variables.
2) Sample this data, with replacement, to generate a new synthetic data set. For now, suppose that the number of samples is some fraction, say 10%, of the original size of the data N.
3) From this synthetic data set, calculate your means.
4) From your means, calculate your correlation. Store this value in an array.
5) Repeat steps 2 through 5 many times, say K=1000 times. You will now have 1000 values of correlations stored in your array.
6) Then look at the histogram for the correlation values stored in your array. This array gives you an empirical distribution for the sampling distribution of the correlation which makes no assumptions about the data (i.e. does not assume it to be normal, or even to be continuous). From this empirical distribution you can obtain (for example) a 95% confidence interval for the sample correlation.
You can of course repeat this for other measures of association (e.g. Spearman's rho, Kendall's tau, etc.) or in general to get a handle on any function of the data. Bootstrapping may seem like magic but many powerful convergence theorems have been proven (google "Edgeworth expansion bootstrap" to get the gist of why the bootstrap works). See bit on bias-corrected, accelerated bootstrap percentile intervals. Naive confidence intervals are ok, but you should use these since it's the same price computationally.
cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf
R function:
http://stat.ethz.ch/R-manual/R-patched/library/boot/html/boot.ci.html
by the way, the basic rule of thumb is that if you have a lot of data, you shouldn't waste it. go nonparametric and computer intensive.
the days when we had to just settle for calculating means are over. you can now do exploratory data analysis of monster data sets in R, looking at the full multivariate distribution, rotating it, conditioning on tons of variables, and so on. See Rggobi and the lattice package for some examples, and see here:
addictedtor.free.fr/graphiques/

|