Tonal languages, ASPM, and MCPH

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

The article Razib mentioned earlier on the correlation between genetics and language type has been published. Effectively, the results of the paper are in their one figure (right), which shows population frequency of the two alleles on the x and y axes. The filled squares represent non-tonal languages, and the empty squares tonal languages. There’s an association between allele frequencies and language type, and the magnitude of the association is an oulier when considering other loci in the genome, suggesting that it is not simply due to population migration and history. My comments:

1. I have never seen an article so apologetic about its conclusions. About every other sentence explains what they have not shown– a gene for “speaking Chinese”, a proof of causality of the alleles, any sort of racial anything. Probably the result of unsympathetic reviewers.

2. The approach they take is one that’s likely to be widely employed as more world-wide genotype data become available– make a hypothesis that there should be a correlation between allele x and world-wide variable y, then test the correlation for a number of loci. If your allele x is an outlier, you’ve found something interesting. I’m not sure of the best way to analyse this data, nor am I sure the outlier approach is really effective. Some sort of statistical framework would be nice here, as a way of truly assessing the significance of a result like this.

3. As both ASPM and MCPH are polymorphic within populations, the best way to test the authors’ hypothesis (as I’m sure they’re well aware) is to do an association study on any of the variables they mention as being involved in slight biases towards a given language type. This would be interesting. But the authors also claim that the lingistic bias towards a language type is small, but amplified by cultural transmission, so possibly undetectable on an individual level. So can this hypothesis be falsified?

4. I’m trying to figure out the logic behind the placement of the dotted lines in the figure. It’s not halfway between to min and max allele frequencies. It’s not computed in any manner. Yet they make claims about the relative frequencies of each type of language in each quadrant. This seems highly questionable– the human mind is highly capable of detecting patterns and forming groups, even when data are random. I’m not claiming the data points here are random, simply that the positioning of the lines in the figure serves to bias our thinking (note that a slight move to the right of the vertical line, or a move up to 50% on the horizontal line, would put a population in a quadrant they don’t think it should belong in).

Overall, the paper is suggestive. Maybe highly suggestive? But I’d wait for a bit more data before coming to any conclusions.

UPDATE: Mark Lieberman at Language Log posts on the study, explaining more concretely the difficulties of the outlier approach. Bob Ladd, one of the authors of the study, then responds in a guest post:

Consequently, we’ve gone about as far as we can go with statistics; the only real confirmation that we are onto something will now come from experimental work demonstrating the existence of the hypothesized genetically-induced “cognitive bias” in individuals, followed by studies clarifying the neurological basis of the bias. As Daniel Nettle says in his Commentary on the print version of our paper (appearing soon), our work is really hypothesis-generating rather than hypothesis-testing.
We are now generating precise hypotheses about the nature of the bias, and hope to start testing them soon.

Now, it’s certainly true, as Mark says, that our geographical correlations would mean more if they had proceeded from some experimental demonstration of some sort of genetically linked, language-related, cognitive/behavioral/perceptual difference. But given the widespread assumption (rooted in the Boasian tradition, but with a significant contemporary boost from Chomsky) that the human language faculty is absolutely uniform across the species, it’s very unlikely that we would have been able to get funding to look for such a difference first. So we started by doing something we could do on our own without such support, namely testing the apparent correlation. Having done that, we hope we are now in a better position to apply for funding for the expensive part of the research

Labels: ,

9 Comments

  1. It looks like they just split on the average of the population frequencies, both by eyeballing and by how they describe the graph: 
     
    Open squares stand for the tonal languages, and 
    their distribution corresponds to low frequencies of ASPM-D 
    (lower than 0.29), whereas filled squares stand for nontonal 
    languages, and their distribution corresponds to high frequencies 
    of MCPH-D (higher than 0.42). 
     
    WRT your concern, note that if they’d split on the median, the y line would be pretty much the same but the X line would be around 0.35.

  2. It looks like they just split on the average of the population frequencies 
     
    if that were the case, shouldn’t the horizontal line be higher? more populations have high frequencies than low frequencies. 
     
    based on the description, which I think I misunderstood the first time around, maybe the vertical line is simply at the max of the tonal languages, and the horizontal line is placed below the min of the non-tonal frequencies. that could make more sense.

  3. sorry, i haven’t read the paper — i know, i know — but couldn’t you substitute any European AIMs and get the same result?

  4. The reasons of Ladd and Dediu for working as they did are given here
     
    Most important this: 
     
    “… given the widespread assumption (rooted in the Boasian tradition, but with a significant contemporary boost from Chomsky) that the human language faculty is absolutely uniform across the species, it’s very unlikely that we would have been able to get funding to look for such a difference first. So we started by doing something we could do on our own without such support, namely testing the apparent correlation. Having done that, we hope we are now in a better position to apply for funding for the expensive part of the research. This might seem backwards …”

  5. sorry, i haven’t read the paper — i know, i know — but couldn’t you substitute any European AIMs and get the same result? 
     
    I had a similar thought after I posted. the SNPs they use to generate the null distribution are from the ALFRED database– it would be nice to know where they came from, whether they’re functional, whether they’re AIMs, etc.

  6. The reasons of Ladd and Dediu for working as they did are given here 
     
    thanks.

  7. Interesting paper. My main objection is that non-tonal languages considerably predate the latest ASPM variant (which is estimated to have arisen c. 6,000 BP). Yes, this gene variant has less effectively penetrated East Asian populations than other populations in South Asia, the Middle East, and Europe. But is the predisposing factor the presence of tonal vs. non-tonal languages? Doesn’t the geographic distribution of ideographic vs. alphabetical writing provide a closer match?

  8. Doesn’t the geographic distribution of ideographic vs. alphabetical writing provide a closer match? 
     
    until recently most people were illiterate. selection would be weak.

  9. Peter, not to mention that tonal languages post-date this or any other development… 
     
    Since the days of Karlgren, Chinese (Sinitic) has been recognized to have been non-tonal at the time of the first dynasties, which is 1500 B.C. Later, Haudricourt demonstrated that Chinese tones were no more than 2,000 years old.

a