A new preprint, Cultural Evolution of Genetic Heritability, is useful at least as a literature review for the uninitiated:
Behavioral genetics and cultural evolution have both revolutionized our understanding of human behavior, but largely independently of each other. Here we reconcile these two fields using a dual inheritance approach, which offers a more nuanced understanding of the interaction between genes and culture, and a resolution to several long-standing puzzles. For example, by neglecting how human environments are extensively shaped by cultural dynamics, behavioral genetic approaches systematically inflate heritability estimates and thereby overestimate the genetic basis of human behavior. A WEIRD gene problem obscures this inflation. Considering both genetic and cultural evolutionary forces, heritability scores become less a property of a trait and more a moving target that responds to cultural and social changes. Ignoring cultural evolutionary forces leads to an over-simplified model of gene-to-phenotype causality. When cumulative culture functionally overlaps with genes, genetic effects become masked, or even reversed, and the causal effect of an identified gene is confounded with features of the cultural environment, specific to a particular society at a particular time. This framework helps explain why it is easier to discover genes for deficiencies than genes for abilities. With this framework, we predict the ways in which heritability should differ between societies, between socioeconomic levels within some societies but not others, and over the life course. An integrated cultural evolutionary behavioral genetics cuts through the nature-nurture debate and elucidates controversial topics such as general intelligence.
I’m not sure that the modeling here really solved things too much, though it pushed the ball forward. But in any case, a cultural evolution framework clarifies and makes more precise what was always well understood from a quantitative and behavior genetic approach. Heritability is simply a population-level statistic that is always conditional on various environmental parameters. The heritability of height and intelligence likely are both higher in WEIRD environments because of cultural homogeneity. The homogeneity reduces the environmental factor and increases the impact of genetics on variation.
The authors take pains to distinguish their framework from gene-environmental interactions or gene-environment correlations. These two are widely explored in the behavior genetic literature. Rather, they suggest that cultural evolutionary pressures and characteristics over time modulate the effect size and direction of various SNPs on a trait. They suggest that cultural evolutionary modeling can help more easily explain the Flynn effect.
This preprint makes a lot more sense when you consider that the last author has written about the importance of theory in understanding and exploring scientific domains. I think the big theme of this preprint is basically to remove the environment from the domain of ad hoc noise residuals. In fact, they state this clearly, insofar as cultural variation is not simply ad hoc noise, but often exhibits directionality. In societies with more environmental variance on a trait, obviously the heritability will be lower, and vice versa. These are novel enough insights, though I’m not sure that one can say the problems were solved in their dual-inheritance modeling.
Update: I received the below from a friend who has a long critique of this preprint.
I read this yesterday and noticed some problems so I emailed the authors to tell them. I’ll repeat them here. A lot of them are pedantic but I think they were necessary to say. I’ve edited this a bit from the email. Hopefully they use the comments to improve the study. Preemptively, I want to say that I started this around 1 and finished at 2:45; by the end of writing it I had gotten bored of writing more and just wanted to get it done so it starts to skip at the end.
>These correlations are highly statistically significant (typically at least p < 5 ×10-8 ; Fadista et al., 2016)
The typical SNP does not have a correlation at this level. If you use a program like LDPred, you can use all available SNPs to construct a better predictor, but in a GWAS, many of these will be far below the Wellcome Trust-recommended 5e-8 significance level. It may be that you meant the typical SNP included in a polygenic risk score (PRS) but this should be clarified.
>As Plomin and von Stumm (2018) put it, genome-wide polygenic scores “are an exception to the rule that correlations do not imply causation in the sense that there can be no backward causation….”
While I like this quote, genome-wide significant SNPs are not necessarily causal. A recent spate of papers have documented cases where, for instance, effects result from confounding in the form of – sometimes subtle – population stratification. There are many reasons this can happen and they put up a (not insurmountable) barrier to causal interpretations even if they cannot technically be reverse-caused in the sense Plomin & von Stumm are discussing.
>The last two decades have also seen a parallel revolution in cultural psychology and cultural evolution that have identified cultural correlates of our psychology and behavior… these range from [various, to] personality (Smaldino et al. 2019).
I have not reviewed the methodology of all of the studies you cited, but if they are like Smaldino et al.’s, then you will have problems legitimately linking these aspects of cognition to cultural variation. For instance, in Smaldino et al., they suppose that personality factor covariances vary with a measure of cultural complexity (this sort of variation can also be genetic and cultural complexity can be confounded with that). Note that these personality factors are not comparable by culture (their earlier work, a paper on sagepub, tested for measurement invariance – a lack of psychometric bias – and found no such support, and yet they kept hypothesizing). When testing measurement invariance, factor covariances are typically not constrained to equality by group, but if one of the biggest benefits of achieving strict factorial invariance (SFI) – equal error/reliability of the test – is to be found, you have to constrain factor covariances to equality (the reason this is typically not emphasized is that it doesn’t apply to, e.g., bifactor or higher-order models, though this criticism does apply to correlated group factor models like the Big Five). Their data source shows differential reliabilities by culture and no one, as far as I am aware, has found support for SFI for personality factors between cultures (every test I have run has failed it). The interpretation of their observations in support of an evolutionary/cultural hypothesis is, thus, possibly a statistical necessity of the factors not being comparable in a certain way. I found this to be the case in two separate datasets (Johnson’s and another one gathered via an ‘open psychometrics’ website) using their complexity measures (the data they referenced for personality factors was not full enough for multi-group confirmatory factor analysis) and the NEO-IPIP and Cattell’s 16PF over a number of countries. To say that they found support for cross-cultural personality variation – or for their hypotheses of interest – has not really been shown. This problem plagues all cross-cultural psychological research.
>Specifically, cultural evolution suggests that behavioral genetics overestimates the predictive power of genes. This overestimation is obscured by a WEIRD gene problem that severely restricts the range of sampled environments.
Please consult DeFries on the effect of gene-environment covariation on between and within-group heritability. If, as I presume, cultural evolution emphasizes positive gene-environment correlations across countries (e.g., ‘British genes’ covarying with ‘British culture’), then it will lead to overestimates of additive environmental (rather than genetic) variance, not overestimation of heritability. This is a matter of mathematical necessity when you have positive covariances and I believe thinking otherwise might be traced to the same variety of error that sees groups like Smaldino’s failing to notice the relationship between factor covariances and indicator reliability. On that topic, I would recommend reading Millsap’s (2011) Statistical Approaches to Measurement Invariance; he also makes an off-hand mention of this specific problem in his chapter of Maydeu-Olivares’ festschrift in honor of Rod McDonald (a late eminent psychometrician whose works are, arguably, underappreciated outside of a minor academic niche). It is unfortunate but understandable that this sort of knowledge has not trickled down to other fields yet because it is complicated, patently unintuitive, and the mathematical proofs of it published in specialty journals like Psychometrika or long books littered with equations (Millsap’s has 432 unique ones) assume a strong mathematical background and an almost unbearable desire for technical rigor that few researchers in the soft sciences will ever be able to grip. At this point, I want to repeat that I am not stating this to denigrate you, your work, or its complexity and importance, I am merely making a point that I hope receives wider recognition in the future.
>This reconciliation offers new interpretations for various puzzles, such as differences in heritability between and within populations, differences in heritability across development, and general intelligence.
I am sympathetic towards theoretical reconciliations and grand theories. However, I believe they are almost always too hasty. In order for these things to work as comprehensive explanations, they must account for a variety of highly replicable empirical facts. For instance, an underacknowledged facet of the Wilson effect (the increase in heritability from childhood to adolescence) is that it varies in the manner expected if the explanation is a decline in neuroplasticity as individuals mature. To example this, take Brant et al. (2013) who found, in a sample of approximately 11,000 twin pairs, that the increase in heritability occurred more slowly for individuals with higher measured IQs. One substantial interpretation of this finding is that the neuronal buildup that occurs in childhood prior to adolescent cortical pruning and arborization stalls earlier and is more swiftly followed by the aforementioned pruning in individuals with lower IQs, during which developmental trajectories of change shift to relative stability as ‘development’ in the maturation sense ceases. Because you cited Plomin, I assume you have seen the replications of this with PRS and GCTA-GREML.
This is but one aspect of the empirical research conducted in this field that would need to be accounted for by such a theory. That particular example is difficult to interpret from a cultural lense though I do not believe it is impossible. A problem with cultural explanation is that it runs afoul of things like adoption studies, in which individuals from different groups come to resemble their group of origin rather than their families in terms of intellectual attainment, as measured by IQ tests. There are various criticisms of this line of thinking dotting the literature but they usually fail to meet a psychometrically satisfactory standard of evidence so the findings remain and they remain particularly robust, though many may not wish it so (I am agnostic with respect to empirical findings).
Regarding differences in heritability between and within populations, it is important to recognize whether they exist and, if they do, what they are. For example, I recently contributed to a meta-analysis of the question of differences in heritability between groups. At least in the United States, we failed to find satisfactory evidence that differences in heritability exist between racial groups (redacted). This is despite the Scarr-Rowe hypothesis predicting such differences in accordance with the degree of socioeconomic difference between these groups. With respect to what these differences are, if a given assessment (because you focus on it, we will say an assessment of IQ) is biased between groups, then the heritability of it may not be interpreted in common because the thing whose heritability is being inspected does not have common properties. For international differences, test bias is rather pronounced, so if we were to, say, assess the heritability of the fullscale score of Raven’s Progressive Matrices in the US and compare this to the heritability in Zambia, we may find that it is lower in Zambia. This may not signal anything about the heritability of the trait we intend to measure, (general) intelligence, as the difference could be due to questions having relatively greater difficulty for Zambians and thus, perhaps, having lower reliability due to increased guessing relative to the US sample. The good evidence we have for similarities or differences in the heritability of psychological traits between populations suggests there are little if any, but this is reserved almost entirely to comparisons within countries, and particularly, the US. It is better not to overstate the explanatory value of some theory for this possible finding since it may or may not be true and there is little evidence for it in the first place. Please consult meta-analyses or evidence you can qualify as strong rather than singular, somewhat messy reports like Scarr-Salaptek’s (which is included in my meta-analysis!).
>Culture and genes are interwoven in the construction of many behavioral traits, making separation impossible.
I cannot understand why this would be impossible. I am sure there are many designs that could achieve statistical identification for cultural and genetic effects simultaneously. There are actually several projects (in Japan, the Netherlands, Sweden, the US, and probably elsewhere) with common measures in large twin (and family) cohorts and they have been used to try to estimate the influence of culture on the variance components of some traits like personality, politics, and achievement orientation. Adoption designs with siblings, twins, or other forms of relatives can be used to explicitly achieve identification for components that could be reasonably called “culture” and “genes”. Moreover, if measurement invariance for a trait is found, we will find ourselves in a position where a cultural lack of effect can be tenable. Mapping of the effects of all genes (and other influences) is also in principle possible. In short, impossibility is too strong a statement unless you take the position that the modern sense of causal interpretation (Imbens, Pearl, etc.) does not work (this should be stated because it invalidates counterfactual inference for causality and represents an important epistemological departure from the scientific mainstream in many fields).
>This kind of coevolution is common, undermining the tenets of the Modern Synthesis that enshrine the causal supremacy of nuclear DNA
Neither of your citations presents evidence that the modern synthesis is undermined by these (sorts of) observations and it would be strange if they did. If you believe these play a larger role than qualification or, indeed, than DNA itself, you would be hardpressed to find that evidence. The intensity of this statement should, thus, be lessened because, presently, it is not clear and it is overstated. This sort of reasoning has been dealt with amply by many, including Lynch & Walsh in their textbook.
The use of the word “common” is displeasing, as it may be true, but there is no evidence given to indicate that. The quantitative qualification of this point would be desirable, but I can understand it not being available, especially in a relevant form. For this to overturn the Modern Synthesis would suggest that these are common relative – or comparable in influence – to the influences that supposes. That is very likely not true.
>There is a fundamental asymmetry between the identification of elements that support a system and those that undermine it
You could note that some genomicSEM results seem to show that beneficial variants affect g and deleterious ones often have more specific effects.
>It is easier to identify deleterious genetic mutations than beneficial mutations, as deleterious mutations are more common and have larger effects—the space of failure is larger than the space of success.
It seems true that the majority of novel mutations are slightly deleterious (see Uricchio’s recent pub too). It does not follow that their effects are as large individually as variants with beneficial effects. The betas in the latest EA PRS are about half beneficial and half deleterious. It would be worthwhile to assess how common this finding is for other traits prior to making a statement like that when it could easily be wrong. The potential failure and success spaces are not asymmetrical even if, more often than otherwise, novel variants are deleterious unless you mean this in a different way than it is currently stated.
>But it is much harder to identify genes that are responsible for constructing a complex trait. For instance, there is no single gene for intelligence; recent analyses with samples sizes [sic] in the hundreds of thousands have detected >1000 genes linked to intelligence (Savage et al. 2018; Davies et al. 2018), with the actual number likely to be much higher.
Note the spelling error.
Lee et al. (2018) is a more recent citation with a more up to date PRS and a sample of 1.1 million and Hsu has estimated the number of rare variants affecting intelligence at around 10,000 (preprint on arxiv). The statement that it is “much harder to identify genes that are responsible for… a complex trait” is by no means necessary and the statement as it is is false because the discovery of singular genes is very different from the discovery of all involved genes. Moreover, large effect variant/Mendelian/monogenic variant discovery is a continuing process and it can be complicated by, for example, extremely high frequencies making cohorts of a certain population unable to detect an effect at all, or moderation of some phenotype by linkage in a population (like glucocerebrosidase mutations in Ashkenazim). Moderating the language here is necessary.
>Indeed, we have no systematic understanding of exactly how these genes create intelligence.
This is even more broadly true than is implied here. We have a “systematic understanding” for very few genes even if we have proposed and plausible mechanisms. This includes, for example, genes “for” PKU, since we have not had the requisite (and unethical) trials required to “systematically understand” the mechanism in humans (and we do not need them to know a target for manipulation/intervention or a causal effect, though people often do not recognize this). This statement needs qualification. Perhaps mention fine-mapping efforts finding plausible mechanisms.
>A gene can be beneficial in one environment but not in another. For example, we have known for a long time that….
There is no relationship between these sentences. You can have these environmental effects relegated entirely to the non-heritable components of variance for any trait and none of these have actually shown moderation of the genetic component – in an invariant fashion – of any psychological trait of which I am aware. The closest thing to this is the finding by Giangrande et al. (2019) of minor moderation of the latent g factor’s heritability by SES (p =0.04, n =471 pairs) at age 7 and not a significant effect at ages 8 or 9, with strange effects at age 15 and no analysis at age 12. Using Naaman’s (2016) adjustment for Lindley’s paradox, moderation is nil at all ages. There was no test of invariance by SES, which would be unexpected if the variance components changed since they are expected to have different effects (i.e., A is correlated with g, C is not, and E is negatively correlated, etc.). Unfortunately, this analysis also did not yield a commonly normed score across age groups, so comparability between ages for more analysis was limited. Two analyses by Protzko (one published, one currently a preprint) have claimed to find evidence of an environmental effect on g thanks to intervention participation but neither of these has actually tested whether the intervention effects were ‘on g’ or due to other common factors or test specificities, or indeed, whether control and intervention groups even showcased different mean levels of the modeled common factors in the latter case. As far as I know, there is no really psychometrically determinate evidence for the moderation of any behavioral phenotype by, say, level of SES. But none of this – nor of your examples – actually relates to genes having contrasting effects in different environments.
>increasing nutrition (Lynn 1990; Stoch et al. 1982), improving schooling (Ceci 1991; Davis 2014; Ritchie and Tucker-Drob 2018), and removing parasites (Wieringa et al. 2011) have positive effects on general intelligence. None of this is surprising….
This is the baffling remark that prompted this email. Not a single one of these studies supports “positive effects on general intelligence”. That you say we have “known for a long time” and that this is “[not] surprising” is incredible since not a single source says what you have said. I’m going to explain this one in a few different ways and state some potentially insulting things very plainly because your citation practices at this point may say a lot about your scholarly integrity and I do not want to see scholars with good potential tarred and feathered when less charitable people notice problems.
Firstly, you have chosen to give nested citations. Lynn (1990) cites Stoch et al. (1982) and Ritchie & Tucker-Drob (2018) cite Ceci (1991); this is apparent to anyone who knows the former of those studies even if they haven’t read the latter ones. This is because Lynn completely presents the result of Stoch et al. and suggests some potential confounding (that he attempts to suggestively address with reference to twin studies) and Ritchie & Tucker-Drob are also attempting to be comprehensive, so the content Ceci covers which one would not see in the R&T-D meta-analysis is limited at best. The typical reason for making multiple citations in which (practically) the entirety of one citation is covered in another is to increase the credibility of some claim, but using nested citations cannot do this unless there is something more to another citation. With a review and a meta-analysis as the main citations, it’s hard to imagine you would find anything in the additional ones that weren’t covered already. Another common reason is that authors have not read what they are citing. I hope neither of these applies to you and you will remove nested citations since they are unnecessary and suggestive of an attempt to fraudulently increase credibility where that is not needed. (Your citation of Wieringa et al. (2011) when the study by that title is actually Nga et al. (2011) could be damning evidence of a failure to read the study mentioned and evidence for the harrowing alternative that you copied someone else’s citation. This is made all the more likely because there are not any common pieces of citation management software – like Zotero or Mendeley’s – which pull the incorrect Wieringa et al. citation from the page the study is located at. I hope this is not the case but it is not unheard of. For example, hundreds of people copied an incorrectly formatted citation from Gould’s Mismeasurement of Man following its publication.) If the things in the review and meta are what you say they are then they are good enough.
The larger problem is that the things in your citations are not what you say they are. Lynn (1990), Ritchie & Tucker-Drob (2018), and Wieringa et al. (2011) all did not do analyses of whether general intelligence was affected. There is much to note here. Firstly, Lynn is discussing the Flynn effect – which is neither related to g nor invariant over time (Wicherts et al., 2004; te Nijenhuis & van der Flier, 2013; so many others) – and notes that the gains over time have been less pronounced on the axis “verbal-educational” (often called “crystallized”) compared to “visuo-spatial” (often called “fluid”). In general, the former is the more g loaded test alignment. Lynn even adduces evidence from a large number of studies collectively indicating that finding replicates. Importantly, Nga et al. (2011) show the same thing! Take a look at their Table 4, and specifically the results for digit span forward and backward. Backward is the more g loaded subtest and the effect on it is smaller (with the wrong sign!)! (When your references consistently contradict you, you need to keep reading or explain the problem.) The theory offered by Woodley et al. (2016; whose senior author was James Flynn) is a far more plausible explanation for the apparent brain size effect (which can be specific or general but requires modeling to know for sure).
Regarding Nga et al., you cite it as evidence of “removing parasites” having an effect on general intelligence. This is curious because, again, looking at table 4, do you see any significant differences between what you could consider the active intervention group (fortified biscuits) and the active intervention group with a deworming agent (fortified biscuits+)? I see that, among all children, the RPM mean was lower (NS) for the biscuits+ group relative to the biscuits group, that it was insignificantly and meagerly higher for these groups among anemics, the same for digit span backward and forward, no significant difference and slightly higher for block design, and no difference except a minor one in the variances for coding. What evidence did this study provide for a deworming effect? I do not even know where you got this idea that deworming effects on any aspect of cognition are shown by this study when your citation did not find that. They even state “De-worming had no significant effect on any of the cognitive tests” (p. 337) and in their discussion section “There have been several trials of the effect of deworming on tests of cognitive function or educational achievement, but the results have been inconsistent. Thus, there is no clear evidence of an effect of worm treatment on cognitive function and education achievement. Most trials showed no treatment effect by single dose or multiple dose deworming on cognition in preschool or school children or on academic achievement, even in children with high worm loads” (p. 338). This would be better suited as a reference for nutrition, but even then, they do not find general effects and they remark “The failure of the intervention to improve all cognitive test results might be caused by differences in sensitivity of different aspects of cognition to a short-term intervention and also to the complex nature of cognition, which also requires environmental factors such as stimuli and learning inputs” (p. 338).
Was the reading of these studies not thorough enough? I know that the reading of Ritchie & Tucker-Drob (2018) was not thorough by any means. I know this for two reasons. First, they say
>Fourth, which cognitive abilities were impacted? It is important to consider whether specific skills—those described as “malleable but peripheral” by Bailey et al. (2017, p. 15)—or general abilities—such as the general g factor of intelligence—have been improved (Jensen, 1989; Protzko, 2016). The vast majority of the studies in our meta-analysis considered specific tests and not a latent g factor, so we could not reliably address this question. However, it is of important theoretical and practical interest whether the more superficial test scores or the true underlying cognitive mechanisms are subject to the education effect. In our analyses with test category as a moderator, we generally found educational effects on all broad categories measured (we did observe some differences between the test categories, but it should be noted that differential reliability of the tests might have driven some of these differences). However, further studies are needed to assess educational effects on both specific and general cognitive variables, directly comparing between the two (e.g., Ritchie, Bates, & Deary, 2015).
And secondly, their last citation in that paragraph. It leads to a study in which they assessed whether the effects of education on IQ tests were general or specific and they were found to be specific rather than general. I have recently replicated this result in an independent cohort (here: redacted). Additionally, I have found that the gains to the specific subtests themselves are not really predictive of attained SES (this is one way to get at the effect, since looking at the IQ gains is a collider for education and just reiterates that more educated people make more money without telling the reason; n ~ 4k so decently powered), nor of biomedical measures (Ritchie also studied this in 2013, finding no educational effect on the speed of processing in later life), but more importantly, I have found that the gains result from measurement invariance failing, so education acted as a specific effect and lead to test bias, as expected theoretically. I will get to the reason this is expected later on in this email, but it is important because it represents another major fault in your paper. (There was a recent meta-analysis of the relationship between educational gains and general intelligence which claimed to show no relationship conducted by Jan te Nijenhuis, but the selection of studies was selective and the method improper. Everyone with knowledge in this field who has commented on that study has noted that it is actually the opposite of the expected result because educational effects are generally on so-called crystallized measures – which are, as noted, more g loaded – and that he needs to update his method.)
In all the studies you cited, the similarity of the effects of these variables to general intelligence by its loadings or any other criteria like a model (the proper way) was not assessed, nor was invariance. How you can treat these like they are real evidence for effects is unclear. Please fix your citations and do not make claims that something is “obvious”. You may say that it is “obvious” that there are effects on IQ (the observed score, and not, based on Nga et al., for deworming; remember that observed scores do not always represent constructs, a simple fact proven by what happens when you increase someone’s score by giving them the answers), but this suggests you think there are really effects on the ability(ies) it is intended to represent when this is not often the case. In fact, as the Ritchie citation makes clear, you can have gains that are ‘hollow’ with respect to general intelligence, which is generally (ha) considered to be the ‘important’ aspect of intelligence in that it explains most common variance and the vast majority of predictive validity (by the by, do not believe people who throw a ‘g’ measure into a regression alongside highly g saturated indicators of g as showing independent predictive validity; that’s a regression fallacy).
I urge you to fix this section since it is, quite frankly, awful.
>For the same reason, a correlation between lead exposure and IQ (Needleman and Gatsonis 1990; Wasserman et al. 1997) will not be revealed in a society where lead is not a problem.
A more up to date citation you should look at is Woodley of Menie et al. (2019). They found a null relationship between g and lead exposure effect estimates. Moreover, they suggested something important: much or all of the negative association between IQ and lead exposure may be due to confounding. By this they mean that individuals with lower levels of cognitive ability will tend to live in poorer areas, thus increasing their risk of lead exposure. In many studies, the relationship is reversed – often strongly so. This occurs in Mexico, China, Russia, the US, and so on. An explanation (as of yet not verified, but possible) is that in these areas, the confounding is reversed, for instance, due to those who are better off living in older homes and thus having lead exposure in some regions. The point that variation is needed to assess differences is well-taken, and indeed, Tsoi et al. (2016) have found marginal remaining differences (and sometimes in the wrong directions if we are positing them as an IQ difference explanation) between groups in the US and a general decline in blood lead levels. But this leads into a point I already made: what are expectations regarding heritability (both within and between groups) with gene-environment covariances? Is there any reason to suggest ‘real’ (i.e., measurement invariant) effects via a change in the effects of genes in different environments? Would this change the variances? To not change the variances creates an arithmetic definition of how bad a given causal environment must be in order to elicit a mean difference. The point here is that you are making mathematical statements that may or may not make sense for your argument. A critical observer will be able to notice this and see that you have made predictions that are only contradicted at present.
>Because malaria is known to have a negative impact on cognitive development… we would expect the gene for abnormal hemoglobin to be positively associated with intelligence in environments with a high risk of malaria.
You, I believe wrongly, conflate terms like “cognitive development” and “intelligence” (as well as IQ, general intelligence). These do not all mean the same things (again: think of the difference between observed scores and the constructs they are supposed to represent).
A disease effect like this is not likely to have an effect on general intelligence (nor an arsenic effect or much else) but, if it has an effect, it will very likely have a specific one (looking at IQ alone will not allow you to see this!) instead, and, with a limiting condition I will take note of, it will have to be very small. On this point, note that this can be an instance of a between-group effect because such effects will be important later in this email (I have already alluded to this).
>Because such environmental deprivation undermines performance on IQ tests, these genes will be identified by behavioral geneticists as genes for cognitive ability
This is not how that works. I know this is an intuitive thought, but we do, in fact, have many means of validating gene discoveries (including, commonly, assessing cross-population replication!). This is the naïve way in which that will work, but to think that criticism isn’t absorbed into and diffused throughout the field is tantamount to a rank insult. I hope this is not what you’re saying; promoting naïve arguments is clearly generally considered acceptable (I do not know why), but you can make stronger and more nuanced ones. We also have methods that are not confounded at all with non-shared environments, like LDSC. By saying this, you are also making a statement about effect sizes which can be moderated by the level of this sort of gene-environmental confounding in a population. This argument is common but quantitatively it just does not work out well.
>This explains why candidate genes often do not replicate outside the population in which they are discovered
This is flagrantly wrong in the technical sense that this may be true but it is not known and it is only one among many competing explanations, some of which could be true for a particular candidate and others for different ones; your statement is too strong and could, at best, be “This may explain why….” More important reasons for the failure of candidate gene studies could be measurement error or bias (predictive bias could be due to, as you note, cultural differences, which may be in the form of laws or other institutions but measurement bias also looms large), a lack of statistical power leading to overstated effect sizes and unreal effects (couple with publication bias and you’ve got a winner), and so on. Regarding the word “population” used here, you should change that to “samples” since it can be conflated with a more technical definition of population in which, say, multiple European samples still amount to the same “population”, for which candidate genes may or may not replicate.
>This argument goes beyond simply a mismatch between genes and culture. Instead, genes can only be evaluated with respect to the cultural environment in which they are expressed
Another example of a between-group effect. I will return to this.
Fix the other instances of citation nesting unless you have a reason not to. There are too many (before and after this comment) and they are suspicious.
>Lewontin and colleagues (Lewontin 1970a; 1974; Feldman and Lewontin 1975) long ago described the fallacy of extrapolating heritability scores from one population to another.
At this point, you need to understand how this criticism is actually handled in the relevant area of psychology. Lewontin’s seed metaphor (which he did not invent; for the earliest citation, you’ll turn to the Parable of the Sower, or for an academic citation, Cooley (1897) or Thoday (1969, 1973)) as applied to constructs like general intelligence suggests a violation of measurement invariance. For this, I’ll turn to two relevant citations: Lubke et al. (2003a; published in Intelligence) and Lubke et al. (2003b; published in the British Journal of Mathematical and Statistical Psychology) – (same author ordering: Lubke, Dolan, Kelderman & Mellenbergh). I will quote the first at length:
>Consider a variation of the widely cited thought experiment provided by Lewontin (1974), in which between-group differences are in fact due to entirely different factors than individual differences within a group. The experiment is set up as follows. Seeds that vary with respect to the genetic make-up responsible for plant growth are randomly divided into two parts. Hence, there are no mean differences with respect to the genetic quality between the two parts, but there are individual differences within each part. One part is then sown in soil of high quality, whereas the other seeds are grown under poor conditions. Differences in growth are measured with variables such as height, weight, etc. Differences between groups in these variables are due to soil quality, while within-group differences are due to differences in genes. If a [measurement invariance (or MI)] model were fitted to data from such an experiment, it would be very likely rejected for the following reason. Consider between-group differences first. The outcome variables (e.g., height and weight of the plants, etc.) are related in a specific way to the soil quality, which causes the mean differences between the two parts. Say that soil quality is especially important for the height of the plant. In the model, this would correspond to a high factor loading. Now consider the within-group differences. The relation of the same outcome variables to an underlying genetic factor are very likely to be different. For instance, the genetic variation within each of the two parts may be especially pronounced with respect to weight-related genes, causing weight to be the observed variable that is most strongly related to the underlying factor. The point is that a soil quality factor would have different factor loadings than a genetic factor, which means that Eqs. (9) and (10) cannot hold simultaneously. The MI model would be rejected.
>In the second scenario, the within-factors are a subset of the between-factors. For instance, a verbal test is taken in two groups from neighborhoods that differ with respect to SES. Suppose further that the observed mean differences are partially due to differences in SES. Within groups, SES does not play a role since each of the groups is homogeneous with respect to SES. Hence, in the model for the covariances, we have only a single factor, which is interpreted in terms of verbal ability. To explain the between-group differences, we would need two factors, verbal ability and SES. This is inconsistent with the MI model because, again, in that model the matrix of factor loadings has to be the same for the mean and the covariance model. This excludes a situation in which loadings are zero in the covariance model and nonzero in the mean model.
>As a last example, consider the opposite case where the between-factors are a subset of the within-factors. For instance, an IQ test measuring three factors is administered in two groups and the groups differ only with respect to two of the factors. As mentioned above, this case is consistent with the MI model. The covariances within each group result in a three-factor model. As a consequence of fitting a three-factor model, the vector with factor means, A in Eq. (9), contains three elements. However, only two of the element corresponding to the factors with mean group differences are nonzero. The remaining element is zero. In practice, the hypothesis that an element of A is zero can be investigated by inspecting the associated standard error or by a likelihood ratio test (see below).
>In summary, the MI model is a suitable tool to investigate whether within- and between group differences are due to the same factors. The model is likely to be rejected if the two types of differences are due to entirely different factors or if there are additional factors affecting between-group differences. Testing the hypothesis that only some of the within factors explain all between differences is straightforward. Tenability of the MI model provides evidence that measurement bias is absent and that, consequently, within- and between-group differences are due to factors with the same conceptual interpretation.
Given the utility of such an observation, it’s hard to tell why this test isn’t more commonly used. It would be particularly useful in cross-cultural psychology since it would allow for saying that a difference is real and it would narrow down possible explanations. The second study I mentioned elaborates further with regards to between-group factors. This one states that SFI also implies (which is a stronger claim than a mere suggestion) weak MI (see Meredith, 1993) with respect to unmeasured variables that influence common factor means and observed scores more generally. So, for instance, if there is an influence on general intelligence that is found in one group and not another, the explanation for the mean gap need not be a homogeneous influence to cause MI to fail. With respect to explanations like falciparum, this means that unless they act in a way mimicking the mean and covariance structure in the other group, with sufficient power, it will lead to a failure of MI comparing populations with and without it (so long as it is meaningfully influential). Keep this in mind when discussing between-group influences or what amount to them.
This applies to cultural influences. If they do qualitatively and not simply quantitatively differ and they influence psychological constructs like g, then they will generate measurement bias. I presented at redacted on cross-national differences in cognitive ability and found many samples where MI was and was not tenable and these results closely mapped to results that were not conducted with methods that were as psychometrically advanced. You can find that on my redacted if you’re interested. When MI fails it is also not the case that a specific reason – like culture – has been identified. Cockcroft et al. (2015) fell prey to this logic and claimed that the WAIS-III showed Eurocentric biases after they failed to support MI in a comparison of South African and British university students. I reanalyzed their data (here, with complete data, code, and output: redacted) and found that the bias was largely in the opposite direction. I do not know if the effect was cultural, but it did not operate how they expected (and, mysteriously, did not assess).
This sort of thing clearly constrains cultural explanations severely and it can help to inform theorizing. It seems that this psychometric innovation has not spread to the study of cultural evolution yet (and in Smaldino’s/Leukaszewski’s, and presumably others’, cases, it is violated!). Notably, without MI between, say, intervention and control groups given fortified biscuits (as above) we cannot also say that the changes are really ability changes. This applies broadly.
>Because observations can only occur after the iterated cultural diffusion of traits, environments are necessarily more homogenous than would be expected in the absence of culture, and heritability estimates are necessarily inflated when using standard behavioral genetic methods.
You are suggesting here that, for example, monozygotic twins will receive more similar treatment or environmental exposures than expected due to their genetic relatedness relative to dizygotic twins, for this to inflate heritability. This is a statement that there will be equal environments assumption violations or that cultures will show considerable differences in heritability, net of test bias. Rushton (1989) and Plomin (2006) – for international differences – found few differences in heritability and Pesta et al. (2020) found no differences in heritability in the US, for IQ/cognitive ability (Hur & Bates, 2019 also failed to support the Scarr-Rowe hypothesis in Africa though this is not necessarily a vindication of anything relevant). Remember that inflation of these estimates suggests effects of an arithmetically determinate magnitude and confounding of a specific variety, with an implied effect on MI between, for example, different zygosities (not usually found). The bigger thing you are looking for is deflation of the shared environment and a spurious increase in A and E – is this observed? In cross-cultural comparisons with similar questionnaires, variance components seem similar enough even when cross-cultural means don’t really differ. For relatively homogeneous populations genetically, you will even see a reduction in D and difficulty discriminating between a model with it or shared environments. It will be better if you stated precisely what you expected to see and whether observations conformed to your theory. At present, I do not believe they do for general intelligence and I believe bias dominates other measurements; if you disagree, you have a staggering evidentiary burden to overcome.
>As an illustration of how cultural transmission inflates heritability, consider the effect of standardized education.
Note the affected components (if not in the paper – though you do this a bit further down – in your minds). If you would like proof that components are expected to have effects located on different components of cognition, you may see my meta-analytic result here redacted or the same result in https://journals.sagepub.com/doi/abs/10.1177/0956797613493292 and various other places, with many different methods. Notable is that no publication bias is really possible for this finding because the majority of the studies did not look for this relationship, I only found it by looking at those studies and assessing if it was there myself.
Heritability estimates imply causal models. As an example, twins are proposed to vary exogenously in their degree of relatedness, making statistical identification of genetic, shared, and unshared environmental variance components possible and causal inference interpretable. Violations of this exogeneity lead to endogeneity problems just as they do in econometrics. With the method of identification in mind, your cultural compression will affect AC not just A, and not specifically A, relative to E. This is important to note because it just means systematic environments (it could also affect E, or residual means of E, but it is hard to imagine something systematic with respect to a group but not siblings; you could test for such an effect by assessing regression to the mean for some trait).
>Behavioral genetic samples are both culturally WEIRD and genetically WEIRD.
No criticism, I am just thankful that you note this. The WEIRD classification being treated as if it’s merely cultural has gone on too long. The explanation offered by Henrich et al. is now more than 150 years old and in its original form it was already quasi-genetic.
A problem with WEIRD, though, is that the differences proposed are generally not vetted properly (psychometrically, as in establishing MI or criterion validity, or effects of selection by the variables E, I, R, or D in other countries: quantitative variation should move countries to be more similar in the proposed ways, but does it? It should in a naïve cultural theory, but it sure doesn’t really seem to!).
>This greatly reduces the interpretability of genetic effects, as cumulative culture obscures the causal locus of phenotypic outcomes
Moderate your language here. You are proposing that it does this but this is by no means verified.
>this is not unexpected given the cultural variation that exists within a population
In truth, I do not know why this would be expected with cultural variation existing within a population either (for any particular trait). What you suggest here is an empirical result that is just not supported for cognitive ability. Horizontal and vertical cultural transfer has only received small to negative evidence for that particular trait! These parameters are statistically identifiable in many different samples when you have more family members or adoptions or whathaveyou and they do not receive ready support as explanations for familial resemblance, and as such, they are not liable to explain confounding unless we are referring to confounding in the “partialing fallacy” sense we do when making adjustments in a regression or controlling for something otherwise, which means that the confounding is not real. I am certain that you all know this already, but theoretical statements have corresponding mathematical models, empirical expectations, and graphs; some of these fly in the face of the evidence for what you’re talking about, but they may not for other traits. Much of the adjustment for population stratification done in the past few years has been horrid and has, for example, involved effectively controlling for what could be signals of selection! Do not let bad practices disorient you or your theorizing, look for consilience, etc. etc. If you controlled for pop strat via PCA (reminder: linear but selection effects and drift are not necessarily, nor must they be over certain timeframes, and the patterns of their effects can always shift, as they seem to have done for cognitive ability in the past 150 or so years), you would eliminate the signal of selection known to have occurred in Kukekova et al.’s (2018) result based on red foxes. Also: a bug in msprime led to some overestimation of pop differences in PRS validity (discovered only recently).
>but we are far from accurately representing the genetic diversity of the global population.
>but the correlation is mixed in other societies
You must have been writing this quickly because, presumably, you meant to say “but the evidence is mixed” rather than “the correlation”. You may have been discussing the dizygotic twin correlations at the basis of the Scarr-Rowe, but I don’t think that’s true and if it was then it was worded improperly.
>A meta-analysis (Tucker-Drob and Bates 2015) found the effect in a subset of US samples, but not in samples from Europe and Australia.
Later examples in the US have generally reduced the effect size. None have investigated its reality (i.e., MI) – which is not expected as you now know -, latent level investigations are limited and generally barely significant at alpha = 0.05 criteria (which is not good since Lindley’s paradox can drive them to significance spuriously; Giangrande et al., 2019, for instance, found one subgroup had a p = 0.04 effect in the Louisville twin study), and the Scarr-Rowe also takes the form of C moderation in some samples (like Hanscombe’s in the UK). SES effects on the mean of E can also appear – and probably do. These should show up in trait residual variances in an obvious, expected fashion (only Millsap has mentioned this phenomenon afaik). Latent trait measures of g are generally more heritable than observed scores (most of the Scarr-Rowe lit just uses these, sadly) for two major reasons: no measurement error and specific sources of variance are less heritable and more malleable (it seems). Take heed interpreting effect sizes and their meanings.
Implies MI violations in the SES case (wrt latent constructs and residual variances) and vastly increased variances. Increased dizygotic twin variance is what is observed, but we should also see the same thing in monozygotic twins if this is a cause of any real significance. As such, I do not think calling this a cause is tenable. Do you? There are qualified models where this works, but the test bias question is hard to evade. I think this theory and its variations are only popular because of a lack of mathematical education in psychology (really, I hope, since that would mean we could remedy it!).
>Learning explanation for changes across development
Simply put: no. This implies an increase in measurement bias across development for tests given to groups with different means or mean trajectories (this is not found comparing blacks and whites or between gifted/non-gifted student classes). Moreover, insofar as culture is not mimicking the covariance structure in one or another group found to show invariance, we will break MI again. For a clearer explication of theory which can have learning as an explanation, see Savi et al. (2019). These researchers (and that whole Dutch cluster really) are very careful about invariance and they have promoted establishing it in the study of group differences of all sorts for years. The way you are describing developmental change as explained by learning, culture, does not work unless you are willing to toss aside many qualitative differences in culture-as-influence. Stating a new theoretical explanation for something which plausibly accounts for one aspect but which does not otherwise hew to the evidence is not good and a fatal flaw of recent theories of intelligence like Process Overlap Theory (which wrongly predicts, e.g., far transfer; mutualism accounts for the lack of far transfer on the other hand) or the Dickens-Flynn explanation of the Flynn effect (which wrongly predicts MI for the Flynn effect and the opposite of the most commonly observed variance changes if the effects are real along with extreme gene-environmental correlation at all life stages, which the evidence, as you know, contradicts).
There is a difference between learning explaining cognitive development and related things like knowledge and the *how* for learning needs to be carefully considered since the naïve interpretation of this looks to be, strictly speaking, falsified. You can test whatever bias expectations your theory generates in the NLSY CYA like Koretz & Kim did in the ECLS-K (public data!) for Fryer & Levitt’s ideas. I plan to do this soon and to test additive conjoint measurement properties there too.
>Although estimates vary, one meta-analysis (Haworth et al. 2010) put the heritability of general intelligence at 0.41 in childhood and 0.66 in adulthood
No, it technically does not (and you make similar notes later on with the same error too!). They used sumscores (not even optimally weighted!). They underestimate heritability for reasons I have already stated. Unit-weighting them as they do exacerbates issues with specificities and error versus g. The treatment of them as g measures is only partially right, sadly.
Emphasis on changes in variance components overall implies extreme constraints on theorizing and these are, as far as I can see, completely unnoted.
>Bratsberg and Rogeberg (2018) find that in Norway, the negative Flynn Effect is found within families (between siblings), thereby making it unlikely to be explained by demographic changes or immigration, and instead supporting an environmental explanation.
They seem to have exploited the differences in the heritability of that particular test between normings which Sundet documented in the 1980s. This implies, more than anything, a lack of invariance to the Flynn effect (and does not contradict a life history explanation), and thus a difficulty in making this result commensurate with anything about general intelligence. More importantly, that study could be thought of as an unverifiable example of parameter hypertuning since the data are only available to Scandinavians and there exist no similar data. That analysis failed to show that the change in question was even related to the Flynn effect as it’s generally understood; psychometric comparability was just not established, and given the cyclicality of variance component estimates for that test, why should anyone consider that to be like the Flynn effect or expectations if any other construct is affected?
>increased out-breeding or ‘hybrid vigor
Anyone seriously humoring this explanation has to come to terms with hybrid vigor and inbreeding depression being Jensen effects while the Flynn effect is not (Rushton noted this, many did not get the message). Flynn has estimated that to explain the Flynn effect in full, it would not be sufficient for everyone to have married their siblings in 1900 and subsequently outbred (plus, heterosis effects would have tapered offer more than a century ago in most places and this predicts bigger declines in the Flynn effect magnitude in the US as well as smaller Flynn effects in Africa and larger in Asia, the latter two predictions being consistent with the data).
>decreasing family size
This is similarly dubious. Insofar as it is an explanation, it suggests that shared environmental effects should be like the Flynn effect and they are not. Education effects do not explain the similar magnitude effects in preschoolers and the elderly, nor the failure of an increase during education and explanations which make substantive predictions fail to handle the lack of MI of the Flynn effect.
What is very strange is that the top explanation Pietschnig & Voracek (2015) – whom you cited – offer (that is, life history) is not something you state even though it is by far the most plausible explanation for the Flynn effect (more than anything else by miles)! Is it because its biggest advocate (Woodley of Menie) has explained and adduced evidence for how this can result from genetic change and social processes affected by it? I assume not because this is quite niche, but I am hopeful that you will humor the more likely hypotheses and at least read the meta-analysis by Pietschnig & Voracek since it has important implications for explanations. Notably, Flynn believes he has totally disqualified the nutrition explanation, so try not to appear to cite him as supportive of it. He has pretty clearly stated “As [Lynn] notes, I have offered what I consider to be crushing evidence against nutrition as a sufficient cause and against its contributing much at all in advanced nations since 1950.”
>By this account, not only is the idea of a culture-free IQ test implausible, but so too is the idea of culture-free IQ
Please make sense of this quote. As it stands, it is hard to interpret. Do you mean to say that there cannot be an unbiased test? Or that a test cannot lack cultural content? That the constructs of interest for psychometricians and substantive psychologists researching intelligence are not able to be culture free? What of MI? Importantly, I have recently found that elementary cognitive tasks measure virtually the same g as standard psychometric tests, challenging a cultural explanation for that phenomenon and helping to establish its quantitivity. This is in review at redacted. There are many ways to write this and many are wrong.
>Indeed, the largest Flynn effect can be seen on the supposedly culture-free Raven’s matrices
The psychometric properties of this test have changed over time; they have not remained constant. It is no longer a highly g loaded test and it has lost much of its spatial loading. Do not make Flynn’s mistake and think that tests can be interpreted in kind over time. They cannot. Kaufman hosted a special issue at the Journal of Psychoeducational Assessment showing this firmly and making it very clear that Flynn is sticking to a psychometrically improper interpretation of his eponymous effect and the tests used to support it. Make his mistake and become cranks or move beyond it and do good scientific work.
>and on tests for fluid IQ rather than crystallized IQ
Crystallized tests have higher g loadings which is frequently used as evidence of the irrelevance of the Flynn effect for general intelligence. I noted this above.
>When it comes to heritability, subtests of intelligence that are more culturally influenced are more heritable
There are numerous, severe misinterpretations of this study and one of them is that this implies group differences are cultural or that g is cultural. This is not the case. Please read clearly and understand (a) the effect of how subtests are coded on the result and (b) that this can say the effect goes from g to culture too (and this is more likely if we admit to qualitative differences in culture between populations and want to preserve MI which is largely a fact). You may use my meta-analytic results I have linked above to try to code the entire literature and see if the result replicates if you wish. I will eventually but I have not yet (note: the Wechsler tests show higher correlations based on what I have had coded thus far).
I initially typed much more about your model (which I disagree with) but I think all of my disagreements about it can already be absorbed from what I have written. Importantly, ask yourselves if the heritabilities of IQ, g, and education change over time. For one of these, the answer is no, and for the others, the answer is “maybe depending on the test” and “yes”. Also, if you think the real phenotype is increasing and analogize this to the Flynn effect, you are making an error. However, your finding that the variance decreases in your model is actually a better description of what happens with the Flynn effect! (Look at scores from administrations of differently normed versions of the same family of tests to the same samples.) Finally, a change in variances does not have to say anything about genes (as you must know). You can find suggestive evidence of effects on genes via the variances, in particular depending on which other ones are affected, but you need different designs for interrogating changes in effects of genes in a determinate way.
Your thesis rests crucially on several empirical effects on heritability estimates that may not be in the body of evidence considered properly. More malleable sumscores (with their error and largely useless specific variance) should not be counted for a cognitive theory: model traits and see if what you expect exists or make the theory far more speculative and exploratory-sounding.
Jensen’s (1969) article was not on the genetic “determination” of IQ unless you are being uncharitable to the man (who was himself always charitable, even to his most ardent brickbat-bearing detractors). I do not understand the desire to impugn the dead or to present arguments which they recognized and contradicted in their lifetimes (Lewontin’s in this case) as somehow damning when they are not (Jensen even formulated and provided mountains of evidence for Spearman’s hypothesis in part to counter this argument).
>Intelligence is a function of both our hardware (brain) and our software (culture)
Without sources actually showing this, just qualify the statement, for instance by saying “Intelligence may be a function….” Importantly, culture does not have a place in our bodies. It affects intelligence (if it does, which I am not taking a stance on) through the brain, so this statement is confused anyway.
The continual emphasis on variation and inability to measure effects with respect to examples where that is not true like the parasite paper where no effect was found are just strange and reduce your credibility. It is technically untrue to say that only residual variance is explained, when really, what’s explained is ***MAY*** be residual variance, and is not likely to be given what we know about cross-cultural similarities in heritabilities, MI, etc., and we have statistical identification so we are explaining the “full effects” just in a given environment (cultural, otherwise material, or whatever). You say cumulative culture has a prior contribution to genes but again there is no evidence for this and more likely is developmental interplay, which you recognize, but contradict yourself about by using careless language.
>Intelligence may be highly heritable, but this is not an indication of its basis in the genome rather than the environment.
This is wrong. It is an indication of its basis in the genome in a particular environment (and you basically note this!). Playing absolutist cards means unassailable issues and rejecting certain definitions of “causal” by implication. Why do that when you can just be correct in total and without contradiction? You even quote Fisher talking about the type of causality that you’re implicitly rejecting down the page!
Saying “environmental effects channel offspring through different developmental trajectories…, self-organizing trajectories of environmental experience result in clear differentiation in phenotypes like exploration, sociality, play behavior, and postnatal neurogenesis” may make readers believe that simultaneous multiple birth or its antecedent, sharing the womb, leads to similarity. This is not the case. It leads to dissimilarity. At birth, thanks to chorion sharing, monozygotic twins are less similar than dizygotic twins; they become more similar and the environmental effects work opposite what this suggests at this stage of development. The Wilson effect is named for Wilson because he discovered this effect for some anthropometric measures. Validating this, DZs who share the chorion are less similar than DZs who do not share it (objectively shared environments frequently lead to differentiation and this has been modeled a lot). It acts as a primary bias and developmental insult, not a source of similarity. Martin, Boomsma & Machin (1997) or Price’s review are as clear a statement to this effect as any.
>The question is not whether genes or culture contribute more to a behavioral trait,
This is not true. Many people are interested in relative variances and there are ways to get identification (the question can also be interesting for a lot of reasons, if not to you, to others). I don’t know why you would subscribe to this idea that identification is impossible here – it certainly isn’t for vertical and horizontal cultural transmission as numerous people have demonstrated (even true for socioeconomic status in economics via people like Gregory Clark)! It is not for any scientific question, there are just standard errors and necessary designs.
>Nothing in behavioral genetics makes sense except in the light of cultural evolution.
Measurement invariance does not make sense in the light of cultural evolution and influence (unless you’re willing to really change your definition of culture).
I am sympathetic to cultural evolution and am friends with cultural evolutionists galore (and hope to present about it when COVID dies off), but the field should avoid error and incorporate psychometric advances more often.