Thursday, January 31, 2008
It's like shark week, only better! Whet your appetite with "High-Resolution Mapping of Crossovers Reveals Extensive Variation in Fine-Scale Recombination Patterns Among Humans", then top it off with "Sequence Variants in the RNF212 Gene Associate with Genomewide Recombination Rate". Enjoy!
The figure to the left is from Signatures of Positive Selection in Genes Associated with Human Skin Pigmentation as Revealed from Analyses of Single Nucleotide Polymorphisms. I thought of this chart when considering the idea that the phenotypic races that we see around us might be relatively new; perhaps an artifact of recent human evolution. Look at "Oceania," those are Bougainville Islanders, from off the coast of Papua New Guinea. In the CEPH-HGDP populations the "South Asians" are from the much lighter skinned northwest fringe of the subcontinent; otherwise, I suspect you would be seeing the South Asian group moving toward the location of the Bougainville Islanders. This is not a surprising finding, earlier studies implied that very dark-skinned populations tended to exhibit a "consensus sequence" due to functional constraint; there's a reason humans are dark-skinned around the equator, and there's only one way to do it. But here's an important point: Bougainville Islanders are closer to East Eurasians than they are to other world populations in terms of ancestry. In other words, the dark-skin and the genes which confer that trait that results in an affinity between Melanesians and Africans in appearance is not a function of relatively recent common descent, but of local adaptation. Similarly, extreme dark-skinned South Asian groups are generally closer to Europeans in terms of ancestry than light-skinned East Asians.
This is all pretty common sense when you think about it. But with that said skin color is a very salient trait. The skin is our biggest organ, it's a large part of what others see. Therefore, there is a natural human tendency to classify in colors. If you read the reports from Chinese delegations who were sent to investigate Cambodia they describe the natives as "black." Similarly, according to Mary Lefkowitz the ancient Greeks observed that there were the blacks of Ethiopia and those of Southern India. They also noted that both the Egyptians and North Indians were brown-skinned people ("wheat colored"). But, perhaps importantly, they often distinguished the various peoples by other characteristics (e.g., Ethiopians and Indian hair form). So on the one hand you have an nod to the importance of skin color as a criterion of perception & categorization, and on the other hand an acknowledgment that populations differ in more than color. But in the United States there are peculiar social conditions which result in problematic conflations.
As everyone knows, to be very dark-skinned in the United States was identical to being of one race for a greater part of our history. Certainly there was a small Native American population, but they could be discarded from the shaping of social norms because of their low numbers. To have dark-skin was to be of African ancestry. Though there were certainly other distinguishing characteristics between those of African and European ancestry, skin color was the most visible and noticeable. It was used as the main discriminatory trait because that was all that necessary. This still persists in our folk culture when people talk about individuals "being discriminated against because of the color of their skin." Skin color connotes a racial identity. And yet you have groups like South Asians, who overlap with African Americans in complexion, but are not really"black" as we understand it. Steve Sailer has been noting for years the implicit value system highlighted by the reality that the very dark-skinned Vijay Singh is not identified as a black golfer, while the lighter-skinned (and only 1/4 African in ancestry) Tiger Woods is. Of course it doesn't work this way all the time, and South Asians are often identified as black, at least upon first impression. But the more confusing situations can also occur because of the nature of American categorizations. So tight is the correlation of non-white and "black" in the minds of some people that really peculiar characterizations can ensue. For example, in high school I had an acquaintance who would refer to myself & a Cambodian girl as black. That was understandable, we both had brown-skin. But, one day he referred to a Chinese friend of mine as black. This friend was not a dark-skinned, she had a brunette white complexion (not olive). When I queried my acquaintance about the fact that this "black" individual was probably lighter skinned than at least 1/3 of our other classmates (all of whom were white), he simply insisted that she was a "Chinese black." That was about as far as I got, obviously he couldn't express the inchoate associations within his mind between racial identity and skin color. In his world, there were whites and blacks. If someone wasn't white, that entailed that they were black.
As is rather clear from the content on this weblog we are getting a good fix on the genetics of pigmentation. Not only do we know the patterns of inheritance via classical pedigree analysis, but we now have a good grasp on which regions of the genome control world-wide variation in melanin content of the skin, eye and hair. We are even beginning to understand when selection began to occur on the loci which control this variation. We have some working hypotheses of why skin color is under functional constraint, and what sort of changes might drive adaptive evolution. But all this is sometimes harder to discuss because the typical American has so many social and psychological associations between skin color and group identity. It isn't just another trait, like bristles on the back of a Drosophila, no, it is the token of one of the most significant sociological phenomena which characterize American society today. Steve will have quite a bit to blog about into the foreseeable future.
Note: I suspect that the transposition of genomic knowledge to folk wisdom is easier in societies such as Brazil or India where extant phenotypic variation on this trait exhibits a larger range, much of it within families. Race and color are still very important issues, but the joints around which the perceptions are carved are more flexible and numerous.
Wednesday, January 30, 2008
I was doing some snooping around due some questions about the HERC2 & eye color papers I mentioned yesterday. Guess what? Earlier this month a Danish group published a similar paper, Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression. It's Open Access, so you can read it yourself. The language is a bit more stilted and hurried than the two papers I mentioned yesterday, but the basically independently confirmed the Australian group's specific finding:
In conclusion, we have identified a conserved regulatory element within intron 86 of the HERC2 gene that is perfectly associated with the brown/blue eye color in studied individuals from Denmark, Turkey and Jordan. This element had an inhibitory effect on the OCA2 promoter activity in cell cultures, and the blue and the brown alleles were shown to bind non-identical subsets of nuclear extracts. In total, all these data strongly support a model where the blue eye color in humans is caused by homozygosity of the rs12913832*G allele.
Instead of just doing comparative analysis they actually tested the hypothesis in cell culture after preforming linkage & association, and seem to have come out with what you'd expect, the SNP on intron 86 of HERC2 regulates transcription at OCA2. Their Ns were a little small compared to the other two groups, but their inclusion of Middle Eastern individuals was interesting. They imply that it's a common haplotype derived from a single mutational event, presumably recently driven up in frequency by selection. Their conjecture of location and rationale aren't convincing, I'm sure commenters here could offer many more ingenious models based on historical & geographical particulars (I know the reasons proffered overlap with some of mine, but I'm a dude on a blog). I get the impression they haven't heard of Haplotter (look at the references). All that being said, at the rate that papers are being pumped out the golden age of pigmentation genetics may not have a very long shelf life (granted, that's a good thing). By the way, the gene they say has an association with hair color, RABGGTA, has been pegged as being under negative selection.
Update: ScienceDaily has a summary up with a most retarded title.
Tuesday, January 29, 2008
Two interesting articles out in the PNAS early release feed.
Molecular insights into human daily behavior:
Human beings exhibit wide variation in their timing of daily behavior. We and others have suggested previously that such differences might arise because of alterations in the period length of the endogenous human circadian oscillator. Using dermal fibroblast cells from skin biopsies of 28 subjects of early and late chronotype (11 "larks" and 17 "owls"), we have studied the circadian period lengths of these two groups, as well as their ability to phase-shift and entrain to environmental and chemical signals. We find not only period length differences between the two classes, but also significant changes in the amplitude and phase-shifting properties of the circadian oscillator among individuals with identical "normal" period lengths. Mathematical modeling shows that these alterations could also account for the extreme behavioral phenotypes of these subjects. We conclude that human chronotype may be influenced not only by the period length of the circadian oscillator, but also by cellular components that affect its amplitude and phase. In many instances, these changes can be studied at the molecular level in primary dermal cells.
Weird. ScienceNow notes some implications:
...raises the possibility of an inexpensive and objective test of a person's "owlness" or "larkness." Such a test would be no small matter, given the prevalence of sleep disorders and the fact that many drugs, including cholesterol medications and chemotherapy, work more effectively if administered at certain points in a person's sleep/wake cycle. Pinpointing individual clock cycles could pave the way for personalized sleep and drug therapies, says Achim Kramer, a Free University chronobiologist who helped design the study.
Selectivity of Black Death mortality with respect to preexisting health:
Was the mortality associated with the deadliest known epidemic in human history, the Black Death of 1347-1351, selective with respect to preexisting health conditions ("frailty")? Many researchers have assumed that the Black Death was so virulent, and the European population so immunologically naive, that the epidemic killed indiscriminately, irrespective of age, sex, or frailty. If this were true, Black Death cemeteries would provide unbiased cross-sections of demographic and epidemiological conditions in 14th-century Europe. Using skeletal remains from medieval England and Denmark, new methods of paleodemographic age estimation, and a recent multistate model of selective mortality, we test the assumption that the mid-14th-century Black Death killed indiscriminately. Skeletons from the East Smithfield Black Death cemetery in London are compared with normal, nonepidemic cemetery samples from two medieval Danish towns (Viborg and Odense). The results suggest that the Black Death did not kill indiscriminately-that it was, in fact, selective with respect to frailty, although probably not as strongly selective as normal mortality.
We've all read Farewell to Alms, so we know the argument that quick die offs can be good for standards of living by relieving some of the Malthusian pressure. Though if you ever took a normal medieval history course you'd probably be told about the premium on labor which emerged after the Black Death due to shortages and its affect on the collapse of the old manorial system (I was). But this data is interesting because it confirms that the most economically productive proportion a society where muscle power might was of essence have increased as a proportion of the population after these sorts of epidemics swept through. Perhaps these are the sorts of shocks that social systems need to shift toward another equilibrium? (I know, morbid)
Yanked out of google analytics, below the fold....
10 - Why is porn legal but prostitution illegal?
9 - IQ comparison site.
8 - Converting between IQ and SAT scores .
7 - Genetics of Hair Color (again).
6 - German penises 'too small for EU condoms'.
5 - Porno Arabica (this is due to Assman over-utilizing our search boxes!)
4 - Pigmentation variation in Europe.
3 - James Watson Tells the Inconvenient Truth: Faces the Consequences.
2 - 10 Questions for Heather Mac Donald.
1 - Intercourse and Intelligence.
Monday, January 28, 2008
A few weeks ago Kambiz of Anthropology.net was mentioning how there's very little mention of gene expression on this weblog. Fair enough, but hey, what about this? And this paper just popped into my RSS today, so check it out, Differential Allelic Expression in the Human Genome: A Robust Approach to Identify Genetic and Epigenetic Cis-Acting Mechanisms Regulating Gene Expression:
We describe a new methodology to identify individual differences in the expression of the two copies of one gene. This is achieved by comparing the mRNA level of the two alleles using a heterozygous polymorphism in the transcript as marker. We show that this approach allows an exhaustive survey of cis-acting regulation in the genome: we can identify allelic expression differences due to epigenetic mechanisms of gene regulation (e.g. imprinting or X-inactivation) as well as differences due to the presence of polymorphisms in regulatory elements. The direct comparison of the expression of both alleles nullifies possible trans-acting regulatory effects (that influence equally both alleles) and thus complements the findings from gene expression association studies. Our approach can be easily applied to any cohort of interest for a wide range of studies. It notably allows following-up association signals and testing whether a gene sitting on a particular haplotype is over- or under-expressed; or can be used for screening cancer tissues for aberrant gene expression due to newly arisen mutations or alteration of the methylation patterns.
This is a provisional paper, so one assumes there will be some revisions. In any case, cancer is important & all, but this is the kind of stuff I'm interested in (see Discussion):
...We tested 56 genes for association of differential allelic expression patterns observed with a cis-acting regulatory polymorphism using genotypes generated by the HapMap project...For 23 of these genes we identified a region statistically associated with differences in allele expression that could indicate the existence of a regulatory haplotype (i.e., a region of one chromosome likely containing the polymorphism(s) causing the differential cis-regulation). These regions are often tens of kb long, consistent with previous descriptions of the linkage disequilibrium patterns in humans....
Related: Kambiz has a post up on this with a lot more commentary, Identifying Cis-Acting Elements that regulate Human Gene Expression. Also, in Nature, Genome-wide analysis of transcript isoform variation in humans.
There are two new papers out in AJHG about eye color variation and genomics. Three Genome-wide Association Studies and a Linkage Analysis Identify HERC2 as a Human Iris Color Gene and A Single SNP in an Evolutionary Conserved Region within Intron 86 of the HERC2 Gene Determines Human Blue-Brown Eye Color. The second paper is an extension of the work of the Australian group which has been elucidating pigmentation relationships around OCA2 for several years now. The first paper is more interesting (to my mind) because it's the first genome-wide association study to focus on this region. I've extracted figure 6a out of the paper, you might recognize the map. I'm not surprised; go to Haplotter and enter in HERC2, it pops out as a region of selection near OCA2 (I first noticed it when checking for OCA2). As for the map, pretty cool huh? As the authors note there's a pretty good correlation between the frequency of the trait and the SNP of interest. The authors point to the north-south cline, but I am curious about the east-west one. Additionally, look at Bulgaria. I've been looking at Slavicization of the Balkans, and this is an interesting data point....
Related: Dienekes has a high res map up.
Note: Please be careful about taking the phenotypic clines too literally, I am to understand that there was a little extrapolation going on here and there. And of course, standard caveats on representativeness of the samples from each region and all.
Sunday, January 27, 2008
From Geographic distribution of environmental factors influencing human skin coloration:
...The UVR [ultraviolet radiation] data recorded by satellite were combined with environmental variables and data on human skin reflectance in a geographic information system (GIS). These were then analyzed visually and statistically through exploratory data analysis, correlation analysis, principal components analysis, least-squares regression analysis, and nonlinear techniques. The main finding of this study was that the evolution of skin reflectance could be almost fully modeled as a linear effect of UVR in the autumn alone. This linear model needs only minor modification, by the introduction of terms for the maximum amount of UVR, and for summer precipitation and winter precipitation, to account for almost all the variation in skin reflectance.....
The map above was generated from the regression analysis. Apparently it has been updated as of 2007 (received the link from a friend). It does look much better than it did in the original paper (which I have read and have a PDF copy of). Do note that the selection of peoples whose reflectance values were plugged into the model obviously matters. But I still think it's interesting the sort of predictions this map produces and how it fits with our intuitions of what the distributions should be, and the knowledge of what they are. Note the equivalent latitudes in Europe and North America, or Australia.
Saturday, January 26, 2008
The New York Times Magazine has a long piece about replacement of Uganda's native Ankole breed with Holsteins:
"You know, in Uganda, we have to look for survival of the fittest," Mugira said once he finished sorting out the confusion. "These ones, they are the fittest," he went on to say, gesturing toward his Holsteins. In physical terms, there was really no contest between the tough Ankoles and the fussy foreign cattle, which were always hungry and often sick. But the foreigners possessed arguably the single most important adaptive trait for livestock: they made money. Holsteins are lactating behemoths. In an African setting, a good one can produce 20 or 30 times as much milk as an Ankole.
Who could complain about over an order of magnitude increase in productivity? Well:
If the Ankole cattle are able to mount a comeback, it will be because circumstances have endowed them with a unique set of defenses, both evolutionary and political. Members of President Museveni's ethnic group populate the upper ranks of Uganda's government. Some prominent Bahima have started an organization devoted to preserving Ankoles, under the patronage of a one-eyed army general who spends his free time painting rapturous portraits of cows. One afternoon, at a pricey restaurant in Kampala, I had lunch with the organization's chairman, Samuel Mugasi. Dressed in a dapper gray suit and a French-cuffed pale blue shirt, he told me he was a civil servant and part-time rancher.
A lot of people talk as if white tourists in Third World countries are special in the way that they bemoan the passing of quaint "traditions" which they had enjoyed "experiencing," but which the "natives" were happy to get rid of. But this sort of patronizing and instrumental attitude toward the unwashed is universal, it seems to be an attitude correlated with leisured status. Indian Americans and Irish Americans who visit their ancestral "homelands" over the years complain about the destruction of the cultural traditions, i.e, poverty, which made their earlier experiences more "authentic: (luckily for Indian Americans who want to get in touch with their "roots" most of India is still living in authentic squalor and deprivation!).
But there's a serious case to be made for preservation of extant genetic variation. The question I have is this: how many individuals of various breeds do you really need to keep around so that diversity is preserved for future utilization? In other words, I understand the logic of adaptive acceleration where large Ne is critical to the production of rare positive mutations; but don't we get to a point of diminishing returns for populations where we're more interested in modal alleles which might be disjoint across breeds? That is, the genetic traits from breed A you want to preserve in case they come in handy are common in breed A, so you don't need that many of breed A around to serve as a reservoir. I just don't see why we need maximal diversity, it seems the sort of variation which is encapsulated by species richness is more important here than proportionally weighted diversity indexes.
In any case, as alluded to in the article, maintaining relict populations of dying breeds like this seems like a public good which any prudent government can provide. But another issue with the article is that it doesn't seem like the author is a science writer, so he engages in the fallacy of blending genetics. For example:
...And something else is being obliterated: genes. Each time a farmer crossbreeds his Ankoles, a little of the country's stockpile of adaptive traits disappears. It isn't easy to measure genetic "dilution." What is evident, however, is that the Ankoles possess much worth saving. For instance, their horns, often seen as ornaments, actually disperse excess body heat.I guess it's nice that he put quotes around dilution, but the rest of the article suggests to me that the author hasn't internalized that genetics is discrete, and that information isn't destroyed through cross-breeding. Rather, it seems that a good program of cross-breeding could result in a superior breeds of Holstein optimally suited to the local climate. That's what happened with indigenous African lineages as they hybridized with introduced South Asian ones 2,000 years ago to produce the Ankole according to the article! This sort of piece in a widely circulated publication such as The New York Times Magazine could have been a serious examination of agricultural and quantitative genetics, and just how much we depend on these unsexy sciences to feed the world. As it is, there's a lot of hand-waving scare-mongering....
In classic heritability studies, the variance of some phenotype Y is decomposed (in the simplest model) into the variance attributable to genetic effects, G, and the variance attributable to environment, E, such that Var(Y) = G+E. As the majority of heritability studies are done by geneticists, who are in general more interested in G than in E, the environmental variance is, to them, largely an error term. When thought of this way, it is clear that "environmental variance" can contain effects that, though not genetic, are certainly not "environmental" in any traditional sense.
In particular, the error term must includes simple stochastic noise on any part of the complex mapping from genotype to phenotype. Even at the early points in this map--the genome sequence and gene expression--there is considerable opportunity for random events to greatly affect phenotype. For lack of a better term, I'm going to call noise introduced at this level "genomic noise"; some examples follow:
1. While the genome is sometimes thought of as a constant in all cells from a given individual, that is not the case. Besides mutations, the genomes in some cell types undergo extensive remodeling during development. For example, consider the T and B cells of the immune system. During development, the genes in the immunoglobulin cluster are recombined to create the receptors presented by the cell. This recombination is stochastic-- even from an identical starting spot, the precise combination of genes obtained in independent recombinations can vary greatly. It stands to reason that this genomic noise could, in turn, propogate up to phenotypic variation, and indeed, that is the case-- if you look at identical twins who are discordant for multiple sclerosis (an autoimmune disease), you find that those early recombination events have made them less than identical.
2. Genomic noise is introduced in brain cells, as well, by the random movement of transposable elements and their effects on gene expression. The important studies (or perhaps study, singular; I can't seem to find anything other than the linked paper) here have been done in the mouse, and any phenotypic effect is highly speculative, but as the costs of sequencing drop, it will be possible to study these sorts of somatic changes on a large scale.
3. Moving up a level from genomes to gene expression, it's clear that some variation in levels of gene expression is simply stochastic. But interestingly, recent work has suggested that, though most everyone has two copies of all autosomal genes, a rather large fraction of genes (excluding imprinted ones) are only expressed from one copy, and the choice of copy to express varies from cell to cell. This opens up the possibility of cells or even entire tissues ending up effectively haploid for a given gene. So if you were to have two individuals heterozygous for some phenotypically relevant variant, they could end up with quite different phenotypes depending on the random choice of allele to express (see also G's post on the topic here).
I find these sorts of speculations entertaining, and I imagine some of these postulated effects will soon be tested. Until then, just something to keep in mind.
Friday, January 25, 2008
The first genome-wide association study on human episodic memory back in 2006 showed an association between the T allele of a gene called KIBRA and better performance on certain list-learning tasks. That study contained two replications in different populations, and the outcome was independently replicated in healthy, elderly folks. Next, another group showed an association between the T allele and very late-onset Alzheimer's. There are some issues with interpreting this study that I certainly didn't think of the first time I read it. Almeida et al. point out that this could be due to 'survivorship bias' wherein the C allele carriers that were gonna get AD got it a lot earlier and left the T allele folks to provide the 'very late-onset' crowd (or at least that's how I interpret survivorship bias).
Two studies have come out in the past few months. One replicates the effect of the T allele on memory with a little smaller effect size than before. The second fails to find any effect at all. One experiment in this latter report was an exact replication of the 2006 memory study with a population of European origin (German vs. Swiss. That shouldn't matter should it?).I don't know how to explain the failure to replicate, but it is duly noted. Perhaps it really really matters how well you vet your cohort. For instance (from Almeida and co again):
We did not find evidence in support of our original hypothesis that CC carriers would be at greater risk of MCI (ed: Mild Cognitive Impairment) (although we did observe a trend in that direction), nor were we able to show any evidence of an effect of the gene on the memory scores of older people with MCI. These results suggest that the effect size of the Tâ†’C polymorphism decreases with increasing impairment of episodic memory, and that the KIBRA gene plays all but a limited role after scoresfall below a certain threshold, as is the case in MCI.
I don't think there is any evidence that the cohort that failed to replicate had especially bad memory, but I'm not an expert in human memory assessment. A few more molecular details below:
Kibra had an especially good tie-in to memory because in yeast two-hybrid studies it binds to PKM zeta which is established as a key player in maintenance of several types of memory and synaptic plasticity in a completely separate literature. The molecular situation is foggy as well though. We don't have any published assessment of the function or localization of endogenous Kibra protein in neurons. In fact, most of the molecular work has been done with an overexpressed GFP fusion protein. The group that discovered Kibra reports that it is a 125-kDa protein with specialized "WW" protein interaction domains at one end, while the group that reported the Kibra-memory association used a custom antibody to detect human Kibra protein and identified a 100-kDa truncated protein. One final issue is that the memory-SNP (and all SNPs in linkage disequilibrium) in human Kibra is intronic, which means we have no straightforward prediction as to how it might alter protein function. Papassotiropoulos et al.(2006) could not find a difference in the total amount of Kibra protein in human brain tissue with different alleles. Either we have to predict that the SNP produces an expression change that they couldn't detect or that the SNP alters splicing such that the protein sequence changes but the size doesn't.
Reza Aslan and Rod Dreher had a disagreement about the general concept of "Clash of Civilizations" on the latest bloggingheads.tv. I think people who actually read Samuel Huntington's original book would feel that the caricature of his thesis is a bit unfair; granted, such macro-scale typologies invite criticism, and there were some embarrassing factual errors. But Aslan himself himself is coming back with Platonic seeming typologies (e.g., "Arab culture") while at the same time ridiculing the whole enterprise.1 The reality is that the human mind is geared toward these clear and distinct types, despite the fact that reality exhibits continuity. I am, for example, always surprised at the alliances of convenience which confound our expectations based on higher-level categories. For example, the Abbasid caliphs & the Carolingians engaged communications in the interests of forging common cause against the Byzantines, prefiguring the later French alliance with the Ottomans against the Hapbsurgs. This is a case where it seems geographic parameters overruled the historically contingent cultural affinities between various states (during the time of Charlemagne the Latin and Greek churches were not even in schism!). The Umayyads of Spain similarly attempted to act in concert with the Byzantines against the rising Muslim powers of North Africa who were pushing into southern Italy and challenging their status as the paramount Islamic power in the western Mediterranean. And in the last case, cities such as Amalfi long served as federates in the North African Muslim cause against other Italian Christians for decades, enabling the endemic depredations of the Muslims upon their co-religionists in exchange for a cut of the plunder and strategic alliance. In China the Hui (or Dungans), the Chinese speaking Muslims, were used by the Manchus to conquer & suppress the Turkic speaking Muslims of Xinjiang toward the interests of consolidating the hold of the Chinese Empire upon these marginal regions. And in a peculiar case, rebellions of the Hui against their non-Muslim rulers predicated on religious differences tended to succeed only when Muslim preachers embedded within their sermons metaphors and analogies drawn from common Chinese (often Daoist) mythology! And yet, you often see this:
Omar, the Kurds claim, was once an inconsequential deputy to the now-deceased terrorist chieftain Abu Musab al-Zarqawi. Omar disputed this characterization. By his own telling, he accomplished prodigies of terror against the pro-American Kurdish forces in the northern provinces of Iraq. "You are worse than the Americans," he told his Kurdish interrogator. "You are the enemy of the Muslim nation. You are enemies of God." The interrogator-I will not name him here, for reasons that will become apparent in a moment-sat sturdily opposite Omar, absorbing his invective for several minutes, absentmindedly paging through a copy of the Koran.
It is true that may Islamist Arabs have an operational tendency to conflate the "Muslim nation" with the "Arab nation," but, I do not think one can deny the internationalist tendencies of a particular tendency with Islam. Reza Aslan in the diavlog with Rod asserts the multiplicity of identities which individuals tend to have. Cultural anthropologists also tend to make this claim. It seems an obviously true claim. But, the problem to me is that Aslan (and cultural anthropologists) take this complexity and use it as a cudgel against any attempt to construct general trends or patterns of relations (outside of their own preferred narratives!). There are sociological and historical analyses of the manner in which people identify; for example, middle class Bengali speaking Muslims before the partition, and under British rule, tended to coalesce around their identity as Muslims who were marginalized by the Hindu elite of Calcutta. After independence under Pakistan Bengali speaking Muslims were dominated by a non-Bengali speaking Muslim elite; whereas before they were marginalized as mussulmans, now they were marginalized as crypto-Hindu kala Bengalis. In my own family this has manifested in a generational difference; my mother noted that her parents, especially her father who was often the only Muslim physician among his colleagues (he was born in 1896), was extremely attached to the idea of Pakistan. In contrast, her own generation experienced little discrimination from Hindus, who were by that period a minority out of power, as opposed to Urdu-speaking immigrants from India ("Biharis") who would engage in attempts to assert naked dominance in public such as forcing Bengalis out of seats on a bus if all spots were already taken (and yelling loudly in Urdu, which the bus driver might not understand, when they were denied what they wanted).
Context matters. Most of us get that. Obviously we use them as heuristics in our day to day life (among a bunch of white Americans I suppose I'm the "brown guy," and among a bunch of non-American brown guys I'm "the American"). Rather, people should engage in more scholarship to map out how how these identities apply in particular contexts and what their long-term effects are. For example, it is trivially easy to find alliances across the religious chasm for states during the medieval period; but it might be interesting to see how much deviation from expectation based purely on real-politik there was over the centuries. I think that the sincere Christian religiosity of Louis IX of France did have geopolitical consequences which could not be inferred from pure calculation of interest. It may be that though most state-action can not be derived from civilizational adherence (after, most conflict is intra-civilizational), the deviations from expectation can be, and those deviations might be particularly significant hinges of history.
Finally, I think that though broad social and historical studies are essential, we need to explore the psychology of identity in more detail. There is a difference between what people say, and what people do. I suspect many Syrian Muslims would avow more affinity to a South Asian Muslim than to a Christian, at least to the South Asian Muslim. But I also suspect that racial prejudice and to a lesser extent Arab chauvinism strongly shape realized choices, and in reality association with a Syrian Christian might actually be more likely (this doesn't take into account variables such as food, where local geography and culture matters a great deal). Ultimately, these questions of identity are empirical, and it would be nice if people spent less time arguing and more time collecting data and analyzing it.
1 - Do Syrian Christians, Arabs of Khuzistan in Iran and the Arabs of Morocco truly have in common with each other than each does with an Armenian Christian of Syria, a Persian from Fars and a Berber from the Rif?
Thursday, January 24, 2008
I've been posting a fair amount on the transition from hunter-gatherer to farmer in northern Europe lately. Though I'm obviously interested in historical scholarship in and of itself, my focus on this period has been triggered by the spate of recent papers on selection within the last 10,000 years or so. It seems that the overwhelming shift of humans from hunter-gathering toward agricultural lifestyles within the period between 10,000 and 2,000 years ago had to have had a major impact on evolutionary pressures; just as fire might have hundreds of thousands (or millions) of years in the past. The amylase and lactase persistence stories are pretty straightforward derivations of the change in lifestyle; different food inputs will result in different optimal digestive propensities. Then there are pretty obvious second order concerns; farming societies are usually characterized by more individuals per unit area because less land is needed to support one person.1 The implications of this for disease are clear from the dependence of endemic diseases on particular density thresholds. Additionally, the domestication of herbivores also likely cranked up the rate of production of new diseases as pathogens crossed the species barrier. Finally, there are more nebulous possibilities such as various in alleles which are known to have behavioral correlates, such as DRD4, and their possible relationship to a local human ecology.
That being said, attention to details is important. The farming lifestyle in Denmark is very different from the farming lifestyle on the North China plain. In Farewell to Alms Greg Clark made a few general claims derived from assumptions which I think are pretty easy to refute. For example:
...Chinese adults, despite their very long history of settled agriculture and the variety of climate zones within China, generally lack the ability to absorb lactase, suggesting that milk was never a large part of the Chinese diet, and that by implication Chinese living standards were generally low in the preindustrial era.
Clark assumes that cattle culture is a sign of local wealth, and that the gene-culture co-evolutionary process was driven by economic parameters. The reality is of course that there are ecological considerations; the distribution of cattle culture in Africa is the clearest example, but it seems likely that it was an issue elsewhere. I'm reading A Concise Economic History of the World, and the author notes before the Romans cattle raising and slash & burn farming were the norm in Gaul. With the spread of the Roman empire two-course rotation was introduced, but the thick clay rich soil of northern Europe did not yield easily to the Mediterranean plow. A more powerful heavy wheeled eventually did open up northwest Europe to intense three-course rotation, and resulted in very high population densities and the flourishing of the manorial system by the peak of the Medieval Climate Optimum. That being said, the manorial system did not spread to the Celtic Fringe or most of Scandinavia because the cereal based system was less optimal in extremely moist or cool climates; there a cattle based form of agriculture remained dominant not because of the wealth of the Irish or Norwegians, rather the local ecology placed constraints on the options they could follow.
The point of all this is that the spread of agriculture to northern Europe 8,000 years ago did not mean that the hunter-gatherer with his bow immediately became the medieval peasant on his plow, so some of my presentation comes rather close to implying this. No doubt the shift was across a continuous range, and the local configuration was subject to historical contingency and ecological constraint. I do hold that it is likely that endemic disease became much more significant with the rise of agricultural communities, but we should be cautious about projecting from the extremely productive period before the modern era, when better technology (plow, horse collar, etc.) and diversified crops combined to drive the Malthusian limit very high indeed.
With that in mind, I'd like to point to a dissertation that Paul found where there are some interest results being reported:
I'll you digest this, but, do keep in mind that histories of populations and particular genes do not always align. A major problem in modeling the past seems to be a disregard for this distinction. Demic diffusion may not be necessarily the substantial replacement of ancestral genome content. Rather, long distance colonies from southern Europe might have brought both a new lifestyle and new genes. Most of their cultural and genetic distinctiveness might have been swamped out, but a few extremely salient elements may have remained and become dominant.
1 - There are exceptions, such as the coastal Pacific Northwest where a dense and affluent society grew up around the abundance of salmon.
Via Dienekes, a new paper, A spatial analysis of genetic structure of human populations in China reveals distinct difference between maternal and paternal lineages:
Analyses of archeological, anatomical, linguistic, and genetic data suggested consistently the presence of a significant boundary between the populations of north and south in China. However, the exact location and the strength of this boundary have remained controversial. In this study, we systematically explored the spatial genetic structure and the boundary of north-south division of human populations using mtDNA data in 91 populations and Y-chromosome data in 143 populations. Our results highlight a distinct difference between spatial genetic structures of maternal and paternal lineages. A substantial genetic differentiation between northern and southern populations is the characteristic of maternal structure, with a significant uninterrupted genetic boundary extending approximately along the Huai River and Qin Mountains north to Yangtze River. On the paternal side, however, no obvious genetic differentiation between northern and southern populations is revealed.
The simplest model here is that north Chinese Han males spread over the country and intermarried with southern females. That explains the distinction between northern and southern lineages. But, I think it is important to be specific about the anthropological details which manifested on the local level. The Han are traditionally a patrlineal and patrilocal people. My understanding is that patrilineality and the "clan system" is more extreme in the south than the north. Additionally, going back to the Warring States period before the rise of the Imperial Chinese system scholar-officials would move from state to state in search of employment, power and prestige. On a larger scale there is the historical reality that several times in Chinese history Han ethnic dominance has retreated from the north China plain to a southern redoubt. The subsequent expulsion of barbarians from north China was then accompanied by the migration of long established southern lineages to northern power centers. So one might assume that these southern lineages were originally derived from the north, but after a while it might get difficult to sort out who was who (north & south). Of course the historical record might simply reflect the shifts in elites who remained in power on top of a relatively static ethnic situation in the north, while the south went through a general long term trend of sinicization which accelerated during periods of barbarian rule in the north when the gentry supplemented the local Han base. Finally, do note that south China is geographically far more fragmented than north China, and we know in other contexts this has a long term effect on mating patterns and dynamics. I am also interested why the Mandarin dialects managed to take over southwest China (see map) but not southeast China.
Genetically this sex-based distinction seems to confirmed by repeated studies. But, that being said, remember that in the early 1990s Cavalli-Sfroza reported in The History and Geography of Human Genes that north Chinese were genetically closer to Japanese and Koreans and south Chinese with southeast Asians when looking at traditional autosomal loci. It is historically attested that groups like the Thai and Vietnamese have origins within what is now south China (the Thai still have ethnic relations within China proper). Ethographic analysis also suggests the Cantonese, for example, preserve customs which are clearly descended from local traditions which pre-date a Han identity for the people in the region. It would be nice to have a STRUCTURE based analysis address these questions.....
Note: Most of the Overseas Chinese are from the south. Especially Fujian. The older Chinese communities in the United States tend to be Cantonese.
I'm sure you have been waiting with bated breath (not!) for the final part of these notes.
Part 1 is here and Part 2 here.
This final part deals with the basics of multivariate correlation and regression, that is, correlation and regression involving more than two variables. Multivariate correlation and regression are among the most commonly used tools in psychology and the social sciences, so it seems desirable to acquire some basic knowledge of the methods and their pitfalls.
I emphasise again that these notes are not aimed at expert statisticians, but at anyone who wants or needs a treatment of the subject using only elementary mathematics. I do not claim that there is anything original in the notes, but I believe that Part 3 brings together some material not easily found in any single source.
Most of Part 3 was written several months ago. I have tinkered with it at intervals since then, in an effort to make it more readable and intelligible, but without much success. I am posting it now (below the fold) as I doubt that I can improve on it, and I might as well get it out of the way.
I have only used non-mathematical typography, so on the remote chance that anyone wants to study the notes at leisure, they should be able to paste and save them in any WP program.
Multivariate Correlation and Regression
These notes avoid using special mathematical symbols, because Greek letters, subscripts, etc, may not be readable in some browsers, or even if they are readable may not be printable. As in previous parts, S stands for 'sum of', s stands for 'standard deviation of', ^2 stands for 'squared', and # stands for 'square root of'. All variables represent deviation values unless otherwise stated. The notation used for correlation and regression will be the same as in Part 2, with the following modifications.
In dealing with multivariate correlation and regression, the relevant variables are conventionally indicated by subscripts. For example, the partial correlation coefficient between x and y, taking account of z, would usually be indicated by the symbol 'r' for correlation followed by subscripts 'xy.z'. Since I cannot use subscripts, I will adopt the following conventions:
- the notation for bivariate (2-variable) correlation and regression coefficients will be as in Part 2.
- multivariate correlation and regression coefficients will be indicated by upper-case letters: R for correlation and B for regression, to contrast with lower-case r and b for the bivariate coefficients. It should therefore be possible to tell at a glance whether the coefficients are bivariate or multivariate.
- the letters immediately following R or B are to be interpreted as subscripts.
- to avoid ambiguity, the letters used to stand for variables other than subscripts will be shown in upper case. For example, Rxy.z.Y indicates the variable Y, multiplied by the partial correlation coefficient between X and Y, given Z.
- where a correlation or regression coefficient is squared, the 'squared' symbol ^2 will follow any 'subscripts'. E.g. Rxy.z^2 will indicate the square of the partial correlation coefficient Rxy.z.
I am aware that this notation is not easy to read, and sometimes produces ugly strings of upper-case letters, but after toying with various alternatives I find it the least-worst approach.
For (relative) simplicity I will deal mainly with the case of three variables, as this illustrates all the essential points. Suppose there are three sets of items, represented by the variables X, Y and Z, each set containing N items with numerical values. We assume that the items in each set are in a one-to-one correspondence (mapping) of some kind with the items of each other set. As with the bivariate case, there could be different ways of mapping the items onto each other. Correlation and regression are always relative to some intended system of mappings. The multivariate case does however raise a new potential complication. We can trace a mapping between any two of the variables either directly or indirectly, e.g. directly from X to Z, or indirectly from X through Y to Z. But are the resulting mappings from X to Z necessarily the same? Are the same X items always matched with the same Z items, whether directly or indirectly? It is not logically necessary that the mappings should be unique, and authors are sometimes vague on this point, but the intention is evidently that they are unique. If each item were indexed with a subscript from 1 to N, the intended correlations and regressions would always those between items with matching subscripts. Subscripts of this kind should be taken as implied.
Aims of Multivariate Analysis
There are two main reasons for going beyond the bivariate (2-variable) case to consider systems of more than two variables. First, we may hope to get a better estimate or prediction of the value of a variable if we make use of information about more than one other related variable. Second, the relationships between variables may not be fully understood if we only consider them on a pair-by-pair basis. For example, it may be that the correlation between two variables is affected by their correlations with a third variable.
These considerations lead to two seemingly distinct problems. First, what is the best way of predicting the value of a variable given the values of two or more other variables? Second, how can we separate and quantify the role of each variable independently of the others? These two problems turn out to be closely related.
The problem of prediction
The cases of interest for analysis arise where there are at least two non-zero correlations. Suppose for example that we have variables X, Y, and Z, and non-zero correlations rxz and ryz. For simplicity suppose also that all three variables have the same standard deviations, so that the correlation coefficients are equal to the regression coefficients. With these assumptions we can predict that a deviation of A in X will on average be associated with a deviation of rxz.A in Z. Similarly a deviation of A in Y will be associated with a deviation of ryz.A in Z. But what prediction should we make if we know the deviations in both X and Y?
In two special cases we can make a plausible guess. If X and Y are themselves perfectly correlated (rxy = 1), then a deviation of A in X always corresponds to a deviation of A in Y, and vice versa (still assuming equal standard deviations). It follows from this (and the assumed uniqueness of mappings) that rxz = ryz, since corresponding pairs of XZ and YZ products will have the same value, and the covariances, correlations, and regressions derived from these products will be equal. The X and Y values will therefore each give the same prediction of Z. So far as their correlations with Z are concerned, we might regard X and Y as being merely different names for the same entity. A knowledge of the value of Y gives us no new information if we already know the value of X, and vice versa. We would therefore not expect the correlation between Y and Z to help us predict the value of Z more accurately than we can already predict from the correlation between X and Z, and it is plausible that the 'best estimate' of Z, given X and Y, is the same as the estimate based on X or Y alone.
At the other extreme, suppose the correlation between X and Y is zero. In this case a knowledge of the value of X gives us no information at all about the value of Y. It is therefore reasonable to expect that the predictive value of Y will be entirely additional to that of X, and it is plausible that the 'best estimate' of Z, given X and Y, will be the sum of the estimates based on X and Y, giving Z = rxz.X + ryz.Y.
But these are not rigorous proofs, and they give us little help in dealing with the majority of cases, where the correlation between X and Y is neither zero nor perfect. Here, we can only vaguely expect that the predicted value of Z, given X and Y, will be somewhere between the values predicted for the two extreme cases. For anything more precise, we need a more sophisticated approach. This is given by the theory of multiple regression.
The theory of linear regression and correlation based on the method of least squares (see Parts 1 and 2) can be extended to more than two variables. In the 2-variable (bivariate) case we wished to estimate the value of one variable, X, given the value of the other variable, Y. In the extension of the theory we wish to estimate the value of X given the value of two or more other variables, Y, Z, etc. The variable whose value we wish to estimate is usually called the dependent variable, and the others the independent variables, but these terms do not imply a direct causal relationship between them, or that if there is a casual relationship the causation runs from the independent to the dependent variable. 'Dependent' is merely a conventional term to designate the term whose value we want to estimate.
The key assumption of the method is that the estimate can be made in the form X = a + bY + cZ...., where there are no terms of the form YZ, involving more than one of the independent variables, and no squares or higher powers. This is a linear multiple regression equation, and the terms b, c, etc are called partial regression coefficients. Taking X as the dependent variable (the one whose value we wish to estimate), the partial regression coefficients will be designated Bxy.z and Bxz.y, which may be read as 'the regression coefficient of X on Y, given Z' and 'the regression coefficient of X on Z, given Y'. With this notation, the multiple regression equation for the estimate is X = a + Bxy.z.Y + Bxz.y.Z. For the time being we cannot assume that the partial regression coefficients have any meaning in isolation from this equation.
To obtain the 'best' estimate, in accordance with the method of least squares (see Part 1), we need to find the values of the constant a, and the coefficients Bxy.z, and Bxz.y, for which S(X - a - Bxy.z.Y - Bxz.y.Z)^2 is at a minimum. Using the same methods as in the bivariate case (see Part 2), S(X - a - Bxy.z.Y - Bxz.y.Z)^2 can be treated as a function of each coefficient in turn, and we can derive an equation for each coefficient, in which the first derivative (strictly, partial derivative) of the function is equated to zero to find its minimum value. The only difference from the bivariate case is that the unknown value of the other coefficient appears in the resulting equation for each coefficient. We therefore do not immediately get a single solution, but a set of simultaneous equations with one equation for each coefficient. Since there are as many equations as unknowns, these equations can be solved by standard methods (determinants, etc).
As in the bivariate case, it can be shown that if all the variables are expressed as deviation values, the constant, a, must be zero. Assuming this proved, to find the coefficients Bxy.z, and Bxz.y themselves we therefore need to find the values for which S(X - Bxy.z.Y - Bxz.y.Z)^2 is at a minimum. Treating this as a function of Bxy.z and Bxz.y in turn, and equating the first derivatives to zero, we get the two equations:
SXY - Bxy.z.SY^2 - Bxz.y.SYZ = 0
SXZ - Bxz.y.SZ^2 - Bxy.z.SYZ = 0
These simultaneous equations can be solved by high school algebra to give:
Bxy.z = (SXY.SZ^2 - SXZ.SYZ)/(SY^2.SZ^2 - SYZ.SYZ)
Bxz.y = (SXZ.SY^2 - SXY.SYZ)/(SY^2.SZ^2 - SYZ.SYZ)
These coefficients can be given a more attractive form if the numerator and denominator are both divided by an appropriate factor. For example, if the first coefficient is divided through by SY^2.SZ^2, we get an expression equivalent to:
Bxy.z = (bxy - bxz.bzy)/(1 - byz.bzy)
where bxy, etc, are bivariate regression coefficients. If all the bivariate coefficients are already known, the partial regression coefficients may therefore easily be calculated. It may be noted that byz.bzy = ryz^2, so the denominator can also be expressed in the form (1 - ryz^2).
Using the same methods, Bxz.y comes out as (bxz - bxy.byz)/(1 - bzy.byz).
We have treated X as the dependent variable, but any of the other variables could also be treated as dependent, and appropriate partial regression coefficients calculated. For example, Byx.z comes out as (byx - byz.bzx)/(1 - bxz.bzx).
With the partial regression coefficients expressed in terms of bivariate regressions, the multiple regression equation for the value of X, given Y and Z, comes out as:
X = Y(bxy - bxz.bzy)/(1 - byz.bzy) + Z(bxz - bxy.byz)/(1 - bzy.byz).
This is the appropriate equation when X, Y and Z represent deviation values, as assumed here. If we want to use raw values, the partial regression coefficients will be the same, but a non-zero constant must usually be added.
It can be verified by inspection of these formulae that if the independent variables Y and Z are uncorrelated (ryz = byz = bzy = 0), then the partial regression coefficients Bxy.z and Bxz.y reduce to the bivariate regression coefficients bxy and bxz. In this case the multiple regression equation for X given Y and Z therefore reduces to X = bxy.Y + bxz.Z, in which the estimated value of X is the sum of the estimated values given Y and Z separately. This confirms the informal argument given earlier for this special case. (For the other special case, where ryz = 1, the analysis breaks down, as the denominators in the coefficients are zero.)
If there are more than three variables in the analysis, the methods described above can in principle be easily extended. If there are n variables, of which one is chosen as 'dependent', we can obtain (n - 1) equations (known as the 'normal equations') to determine the (n - 1) partial regression coefficients for the independent variables. These can be expressed in terms of regression or correlation coefficients involving (n - 2) variables, which in turn can be expressed in terms of coefficients of (n - 3) variables, and so on, until coefficients for only 2 variables are reached. The complexity of the resulting formulae rapidly increases with the number of variables, and in practice, before the computer age, multiple regression and correlation were seldom taken beyond four variables. With computers, it is easy to perform multiple regression analysis for any number of variables, but the reliability of the results falls rapidly (due to sampling error and other problems) as the number of variables increases.
On examining the 'normal equations' which determine the partial regression coefficients it may also be noted that they are equivalent to equations of the form S(ye) = 0, where y is one of the independent variables and 'e' is the 'error of estimate' in the value of the dependent variable when the partial regression coefficients are all given their optimum values. But S(ye) = 0 is only true if the correlation between the independent variable and the errors of estimate is zero. This is as it should be, as the errors left over after the best estimate has been made should be random with respect to the variables used in the estimate.
The multiple regression technique essentially solves the 'problem of prediction', so far as it can be solved by a linear equation.
The other main problem is how to disentangle the influence and relative importance of each 'independent' variable. In this context 'influence' does not necessarily imply causal influence, but often the underlying aim of the analysis is to find out which variables have causal influence on each other. For example, the IQ of parents, their social class, their level of education, and so on, are all likely to be positively correlated, but which of these variables, if any, have an independent influence on the IQ of their children?
Intuitively a plausible approach to this problem is to see what happens if all the independent variables except one are held constant. If there is still a significant correlation between the dependent variable and the remaining independent variable, this cannot be accounted for by any variation in the others (which has been excluded). It is then reasonable to infer that the remaining variable has some independent influence of its own, or at least is correlated with another variable which has such an influence. (The converse inference - that if there is no correlation, then there is no independent influence - is more hazardous, as a true independent influence may have been masked or suppressed by other variables not included in the analysis.)
'Holding variables constant' can be done in various ways. Ideally, one would conduct a controlled experiment, with randomisation, control groups, and all the other refinements of experimental technique. But this is not always possible. Where we are forced to rely on existing statistical data, a substitute for experiment is to select for analysis those data for which all the variables except two happen to have the same values. This can be done literally by grouping the data into appropriate sets and calculating correlations or regressions between the two variables of interest when all the other variables have fixed values. But this is laborious, and unless the total sample is very large the resulting correlations and regressions, for each such set of values, are likely to vary from one set to another, even if the underlying relationships are constant. It is usually more convenient to 'hold variables constant' by purely statistical methods.
Suppose we have three variables, X, Y, and Z, and we wish to find the correlation or regressions between X and Y while 'holding Z constant'. For any given value of Z, average values of X and Y can be predicted from the regressions of X and Y on Z. Unless the correlations are perfect, there will be some difference between the observed and the predicted values of X and Y. These differences may be described as the residuals of X and Y given Z. The residuals of X and Y for a given value of Z may be treated as deviations from the mean values of X and Y corresponding to that value of Z. In reality, they are only deviations from the predicted means, and unless the regressions of X and Y on Z are perfectly linear (which they seldom are), some error will introduced. But subject to this qualification, the residuals may be treated as deviation values for the purpose of calculating correlations and regressions between them. The deviation values of X, given Z, can therefore be expressed as X - bxz.Z, where X is the observed value and bxz.Z is the predicted value of X from its regression on Z. Similarly the deviation values of Y, given Z, can be expressed as Y - byz.Z. The standard deviation of the residuals of X, given Z, will be #(1- rxz^2)sx, that is, the square root of the variance in X remaining after the correlation with Z has 'explained' (1- rxz^2) of the original variance. Similarly the standard deviation of the residuals of Y, given Z, will be #(1- ryz^2)sy.
Using the standard formula for the correlation between two variables, and treating the residuals as the relevant variables, we get a correlation expressed by S(X - bxz.Z)(Y - byz.Z)/N.sx(#(1- rxz^2))sy(#(1- ryz^2)). It can be shown that this is equivalent to (rxy - rxz.ryz)/#(1 - rxz^2)#(1 - ryz^2). [Note 1] We may call this Rxy.z, or the partial correlation between X and Y given Z.
By similar methods it can be shown that the coefficient of the regression of the residuals of X on the residuals of Y is equal to (bxy - bxz.bzy)/(1 - byz.bzy). [Note 2] It will be seen that this is the same as the partial regression coefficient of X on Y, given Z, which was derived by entirely different methods in solving the 'problem of prediction'. The two problems therefore turn out to be closely connected.
It would be possible, though cumbersome, to extend the method of residuals to a system of more than three variables. For example, with the four variables W, X, Y, and Z, we could use the multiple regression equations for three variables to estimate separately the values of W and X, given Y and Z, then subtract the estimated values of W and X from their observed values to get the residuals of W and X, given Y and Z, then find the regressions and correlation between the residuals. The coefficients derived in this way should be the same as those derived by the method of least squares for a system of four variables.
The following relationships can also be demonstrated. As this requires some tedious algebra, I will not give full proofs. These relationships are presumably well known to statisticians, but I have not seen them mentioned in most textbooks, so it may be useful to bring them together.
(a) Rxy.z = Bxy.z.#(1- rxz^2)sy/#(1- ryz^2)sx. Just as ordinary bivariate correlation coefficients can be converted into regression coefficients, and vice versa, by multiplying them and dividing them by the appropriate standard deviations, so can partial correlation and partial regression coefficients, using standard deviations of residuals.
(b) If we take the partial regression coefficient Bxy.z, expressed in terms of the residuals of X and Y, given Z, and divide these residuals by their own standard deviations, the partial regression coefficients are converted into the partial correlation coefficient Rxy.z.
(c) After allowing for the regression of the residuals of X on the residuals of Y, the variance of the residuals of X is reduced by (1 - Rxy.z^2), as compared with their original variance.
(d) Rxy.z is the mean proportional between the two partial regression coefficients Bxy.z and Byx.z.
These results, taken together, show that the partial regression and correlation between X and Y, given Z, have all the essential properties of bivariate regression and correlation, with the residuals of X and Y, given Z, treated as the correlated variables. This is often described as 'controlling for Z', or 'holding Z constant', or 'partialling out Z'. With certain qualifications, it may be said that if we take any particular value of Z, and consider only the values of X and Y corresponding to that value of Z, the regressions and correlation between these values of X and Y will be as expressed by the partial regression and correlation coefficients. For example, if X represents children's ability at reading, Y represents their ability at arithmetic, and Z represents their IQ, then the partial correlation between X and Y, given Z, should equal the bivariate correlation between ability at reading and arithmetic for children with the same IQ. Real examples are seldom as neat as this account suggests, and the regression and correlation between X and Y for given Z, as calculated directly from the data, are likely to vary somewhat for different values of Z. If it is of practical importance to be sure of the true correlation between X and Y for some particular value of Z, it is desirable to calculate this directly from the data.
Since the partial correlation coefficient is the same as the bivariate correlation between residuals, its value must lie between 1 and -1 (see Part 2). On trying values for the component parts of the partial correlation coefficient in the usual form (rxy - rxz.ryz)/#(1 - rxz^2)(1 - ryz^2), it may be found that some combinations of the components rxy, rxz, and ryz would lead to values outside the range 1 to -1. This shows that some combinations would be inconsistent, and cannot actually arise. This problem, and its implications, is discussed in most of the textbooks. A point which I have not seen discussed is that if either of the correlations rxz and ryz is perfect (1 or -1), then the partial correlation coefficient is indeterminate, since the denominator is zero. If we regard the partial correlation as a bivariate correlation between the residuals of X and Y for a fixed value of Z, the correlation will be indeterminate if either of the residuals has no variance, which is the case when rxz or ryz is perfect.
The dual role of partial regression coefficients
It should be noted that the partial regression coefficients can be used in two ways. When multiplied by the full deviation of the relevant independent variable, they contribute to the best estimate of the value of the dependent variable as given by the multiple regression equation. When multiplied by the residual deviation of the relevant independent variable, controlling for the other independent variable, they give the best estimate of the residual deviation of the dependent variable.
It may seem surprising that the same coefficient can serve these two different purposes. We can however regard the partial regression coefficient as measuring the expected change in X associated with a unit change in Y, when Z is held constant. It is not unreasonable that the proportionate change (as expressed by the coefficient) might be the same even though the actual size of the change may vary according to whether full or residual deviations are being used.
In any event, it can be proved that the two uses of the partial correlation coefficients are consistent. Suppose we compare the two different estimates resulting from these two uses. To estimate the full deviation of X we have:
(1).......(estimated) X = Bxy.z.Y + Bxz.y.Z
To estimate the residual deviation of X, based on the value of Y after controlling for Z, we have (estimated) X - bxz.Z = Bxy.z.(Y - byz.Z). But the only estimated quantity in this equation is X (i.e. the full deviation of X), so we can rearrange the equation to give an explicit estimate of X based on the role of the partial regression coefficient in estimating the residual deviation of X. This gives us:
(2)......(estimated) X = Bxy.z(Y - byz.Z) + bxz.Z
Equations (1) and (2) therefore give us two estimates for the full deviation of X based on the two different applications of the partial regression coefficient. These estimates appear at first sight to be different. Notably, equation (2) shows no trace of the partial regression coefficient Bxz.y. I was therefore gratified to find that the right hand sides of the two equations are in fact equivalent. [Note 3] So there is no doubt that the two different applications of the partial regression coefficients are consistent, even if it is not intuitively obvious why this is so.
For reasons of computational convenience, some authors introduce coefficients known as Beta coefficients or Beta weights. These are not in general the same as either the partial regression or partial correlation coefficients. They can be derived from the partial regression coefficients by dividing each variable by its own (full, not residual) standard deviation. They may therefore be considered in a sense as standardised partial regression coefficients, and it is arguable that the Beta weights provide the best way of evaluating the relative importance of the different independent variables in determining the value of the dependent variable (see further below). The Beta weight for the regression of X on Y given Z can be written as Beta_xy.z. Like the partial correlation coefficient, Beta weights can be expressed in terms of bivariate correlation coefficients. For example, Beta_xy.z is (rxy - rxz.ryz)/(1 - ryz^2). [Note 4]
Equivalences between Beta weights, partial correlation coefficients, and partial regression coefficients, can be set out as follows:
(bxy - bxz.bzy)/(1 - byz.bzy) =
(rxy.sx/sy - (rxz.sx/sz)(ryz.sz/sy)/[1 - (ryz.sy/sz)( ryz.sz/sy)] =
[(rxy - rxz.ryz)/(1 - ryz^2)].sx/sy =
Rxy.z.[sx.#(1 - rxz^2)]/[sy.#(1 - ryz^2)]
Rxy.z..#(1 - rxz^2)]/#(1 - ryz^2) =
It may be noted that if all the standard deviations of X, Y and Z are equal (as would be the case if the standard deviations themselves are the units of measurement, and therefore by definition equal to 1), then the partial regression coefficients are equal to the beta weights.
We have defined partial regression and correlation, and multiple regression. There is also a multiple correlation. This is a kind of correlation between the dependent variable and the totality of independent variables. In the case of two variables, we can ask how much of the variance in one variable is 'explained' by its regression on the other, and similarly in the case of more than two we can ask how much of the variance of the dependent variable is explained by its multiple regression on all the others. For two variables, the correlation explains r^2 of the original variance, so for multiple correlation we may stipulate by definition that the multiple regression explains R^2 of the variance in the dependent variable. (I will assume that the dependent variable is X, so that R^2 is the square of the multiple correlation between X and the independent variables Y and Z.) R can then be interpreted as the multiple correlation coefficient. In practice, R itself is of little interest, so results are usually expressed in terms of R^2. A bewildering variety of formulae are used to express R^2 in terms of bivariate regression or correlation coefficients. One is R^2 = (rxy^2 + rxz^2 - 2rxy.rxz.ryz)/(1 - ryz^2). Another, which superficially looks quite different but can be proved equivalent, is Beta_xy.z.rxy + Beta_xz.y.rxz, where the Betas are as described above.
As in the case of two variables, the variance of the dependent variable can be broken down into two additive components: the variance of the estimates given by the regression and the variance of residuals (the differences between the estimated and observed values). In the case of three variables (one dependent and two independent) the estimate given by the regression is Bxy.z.Y + Bxz.y.Z. Since the mean of the estimates is 0 (assuming that deviation values are used throughout), the variance of the estimates is R^2.VX = [S(Bxy.zY + Bxz.yZ)^2]/N. This can be expanded as:
R^2 = (Bxy.z^2.SY^2 + Bxz.y^2.SZ^2 + 2Bxy.z.Bxz.y.SYZ)/NVx
= (Bxy.z^2.NVy + Bxz.y^2.NVz + 2Bxy.z.Bxz.y.Nsy.sz.ryz)/NVx
= Bxy.z^2.Vy/Vx + Bxz.y^2.Vz/Vx + 2Bxy.z.Bxz.y.sy.sz.ryz/Vx
We therefore have yet another expression for the squared multiple correlation coefficient, which can by proved (after some gruesome algebra) to be equivalent to the earlier formulae.
This expression takes a simpler form if all variables have been measured in units of their standard deviation, so that we have sx = sy = sz = Vx = Vy = Vz = 1. R^2 is then simply:
Bxy.z^2 + Bxz.y^2 + 2Bxy.z.Bxz.y.ryz.
Since in this case the Beta weights are equal to the partial regression coefficients, we could substitute Beta's for the B's. This formula is particularly important in connection with Sewall Wright's method of path analysis.
Problems of interpretation
As was stated earlier, one of the main aims of multivariate analysis is to disentangle the independent influence of one variable when all other variables are taken into account ('held constant'). Provided all the relationships are approximately linear, it can be said that if any pair of variables have non-zero partial regression and correlation coefficients between them, then they have some relationship which is not fully explained by their relationships with the other variables considered. However, neither the partial regression nor partial correlation coefficients are ideal for quantifying the importance of the relationship. In the case of the partial correlation coefficients, a coefficient might be high but still explain little of the total variance in the dependent variable. For example, if X and Y both have a high correlation with Z, the residual variance in both X and Y will be small. The residual variance in Y might then explain most of the residual variance in X, so that the partial correlation coefficient Rxy.z would be high, but it would not be safe to infer that Y is an important influence on X overall. The partial regression coefficients may be more useful for this purpose, but the contribution of each independent variable to the overall variance of the dependent variable still depends not only on the partial regression coefficients but on the variance in each independent variable itself. The best measure of the contribution of each independent variable should therefore take account of its variance as well as the value of the partial regression coefficients.
If we consider the squared multiple correlation coefficient in the form Bxy.z^2.Vy/Vx + Bxz.y^2.Vz/Vx + 2Bxy.z.Bxz.y.sy.sz.ryz/Vx, it will be seen that the independent variables Y and Z separately account for Bxy.z^2.Vy/Vx and Bxz.y^2.Vz/Vx of the total variance. But these quantities equal the squares of the Beta weights for these variables. The Beta weights (or their squares) therefore appear to give the best indication of the relative importance of the independent variables. It will however also be seen that the third term 2Bxy.z.Bxz.y.sy.sz.ryz/Vx, which involves the correlation ryz between the independent variables, also contributes to the variance. If the correlation is high, this term may well account for much of the total. The independent variables considered separately therefore do not give the whole picture. If they are positively correlated the variance of the estimates will be greater than otherwise, as there will be more relatively high or low estimates than if they were combined by chance. (If they are negatively correlated the reverse will be the case.) These joint contributions due to the correlation of independent variables cannot properly be allocated to the variables separately.
Multivariate correlation and regression raise some paradoxes and difficulties of interpretation which do not arise with the simpler bivariate case. These are discussed more or less adequately in the textbooks. For example, the accuracy of estimation can often be increased by taking account of a variable which is not itself correlated with the dependent variable. Suppose the dependent variable is X and we have correlations rxy = .4, rxz = 0, and ryz = .7. Here the independent variable Y explains only .16 (i.e. .4^2) of the variance in X, while the independent variable Z, being uncorrelated with X, explains none of its variance. But if we take the squared multiple correlation coefficient in the form (rxy^2 + rxz^2 - 2rxy.rxz.ryz)/(1 - ryz^2), for the given values this equals (.16 + 0 - 0)/.51 = .31. The accuracy of the estimation of X is therefore nearly doubled as compared with the estimate based on the correlations rxy and rxz alone. This seems counterintuitive, as the correlation between Y and Z appears to give us no new information about X. The puzzle can be partly resolved by noting that the combination of all three correlations does give us new information about X, as it implies that the partial correlation between X and Y, for given Z, is negative, which we would not know from rxy and rxz alone.
From the expression for the partial correlation coefficient, Rxy.z = (rxy - rxz.ryz)/[#(1 - rxz^2)(1 - ryz^2)], it is evident that if rxy = rxz.ryz, then the partial correlation coefficient is zero. The product rxz.ryz may therefore be interpreted as the expected level of rxy if the correlation between X and Y is due solely to their correlations with Z. If rxz.ryz is less than rxy, then the partial correlation coefficient is positive, whereas if rxz.ryz is greater than rxy, the partial correlation coefficient is negative. In particular, if X and Y are uncorrelated with each other, but both are positively correlated with Z, then the partial correlation coefficient between X and Y must be negative. Some authors have found this paradoxical, and a reason for dissatisfaction with multivariate analysis.
I am not sure that there is any real paradox here. Suppose for example that X and Y are uncorrelated variables measured in the same units (e.g. weight), while Z is a third variable formed by adding together the values of corresponding elements from each of X and Y. Z will therefore be positively, but not perfectly, correlated with both X and Y. What then will be the partial correlation between X and Y, given Z? A little consideration should show that it will be negative. It is important to note that the partial correlation between X and Y, holding Z constant, depends on the covariance between the deviation values of X and Y at a given level of Z. The deviation values of X and Y must therefore be measured relative to the average values of X and Y for that level of Z. Suppose we consider the set of all those Z items which have the value A. For each item, A is by assumption the sum of the values of the associated X and Y items. If the mean value of the associated X items, for Z = A, is B, and that of the associated Y items is C, it is easy to see that A = B + C, since SZ = SX + SY, and we need only divide through by the number of items to get A = B + C. For any particular X item, its value will usually be greater or less than B (the mean value of the relevant X items) by some (varying) amount D, but in this case the value of the associated Y item must be less or greater than C (the mean value of the relevant Y items) by an equal and opposite amount D, since their total must be A = B + C = (B + D) + (C - D), or (B - D) + (C + D). The deviations of the associated X and Y values from their own means will therefore be of opposite signs (unless by chance they are both zero). The covariance of the X and Y values, for a given value of Z, will be negative, and as a result the partial correlation between X and Y will also be negative. This is of course a simplified example, but relationships of this general kind are probably quite common, and should not be regarded as paradoxical.
As in the bivariate case, the standard treatment of multivariate regression assumes that the best estimate of the dependent variable, given the value of the independent variables, can be expressed by a linear equation. If the true relationships of the variables are significantly non-linear, the multiple regression will explain less of the variance than would be possible using non-linear regressions. It is still 'better than nothing', but the individual partial regression and correlation coefficients may be severely distorted, and may conceivably be changed from positive to negative, or vice versa, if the true relationships are non-linear. It is therefore unsafe to draw conclusions about the relative importance of different variables from a linear regression analysis if the true relationships are non-linear. Since this is one of the main purposes for which multivariate regression is used, this is a severe drawback.
A final problem to note is that the practical validity of multiple regression and correlation analysis depends on the identification of all relevant explanatory variables. If there is an important variable left out of the analysis (a so-called 'lurking variable'), the inclusion of that variable might seriously change the results of the original analysis.
Note 1: We start with the formula for the correlation between the residuals of X on the residuals of Y:
S(X - bxz.Z)(Y - byz.Z)/N.sx(#(1- rxz^2))sy(#(1- ryz^2)).
The numerator can be expanded as SXY - SYZ.bxz - SXZ.byz + SZ^2.bxz.byz.
Noting the equivalences SXY = rxy.Nsx.sy, SYZ = ryz.Nsy.sz, SXZ = rxz.Nsx.sz, bxz = rxz.sx.sz/Vz, byz = ryz.sy.sz/Vz, and SZ^2 = NVz, the numerator can be reformulated as:
rxy.N.sx.sy - ryz.Nsy.sz.rxz.sx.sz/VZ - rxz.Nsx.sz.ryz.sy.sz/Vz + NVz.(rxz.sx.sz/Vz)(ryz.sy.sz/Vz)
which can be simplified to
rxy.N.sx.sy - rxz.ryz.N.sx.sy.
But the denominator N.sx(#(1- rxz^2))sy(#(1- ryz^2)) also contains the factor N.sx.sy, so cancelling this from numerator and denominator we get
(rxy - rxz.ryz)/#(1- rxz^2)#(1- ryz^2)
Note 2: We start with the formula for the regression of the residuals of X on the residuals of Y:
S(X - bxz.Z)(Y - byz.Z)/NVy(1- ryz^2)
The numerator can be expanded as:
SXY - SYZ.bxz - SXZ.byz + SZ^2.bxz.byz
Noting the equivalences SXY = bxy.NVy = byx.NVx, SYZ = byz.NVz = bzy.NVy, SXZ = bxz.NVz = bzx.NVx, and SZ^2 = NVz, this can be reformulated as:
bxy.NVy - bzy.NVy.bxz - bxz.NVz.bzy.NVy/NVz + NVz.bxz.bzy.NVy/NVz =
(bxy - bxz.bzy)NVy
But the denominator also contains the factor NVy, so cancelling this from numerator and denominator we get the following formula for the regression of the residuals of X on the residuals of Y:
(bxy - bxz.bzy)/(1- ryz^2) =
(bxy - bxz.bzy)/(1 - byz.bzy).
But this is the same as Bxy.z, the partial regression coefficient of X on Y, given Z.
Note 3: We wish to prove that Bxy.z(Y - byz.Z) + bxz.Z = Bxy.zY + Bxz.y.Z.
Expressing Bxy.z in terms of bivariate regression coefficients, we have for the left-hand-side [(bxy - bxz.bzy)/(1 - byz.bzy)]Y - byz[(bxy - bxz.bzy)/(1 - byz.bzy)]Z + bxz.Z.
Collecting together the coefficients of Z and expressing them with the common denominator (1 - byz.bzy), we have for the Z term
[[bxz.(1 - byz.bzy) - byz.(bxy - bxz.bzy)]/(1 - byz.bzy)]Z.
But the coefficient of Z simplifies to (bxz - byz.bxy)/(1 - byz.bzy), so we have for the left-hand-side as a whole [(bxy - bxz.bzy)/(1 - byz.bzy)]Y + [(bxz - byz.bxy)/(1 - byz.bzy)]Z.
On inspection this can be seen to be equivalent to Bxy.z.Y + Bxz.y.Z, which is what we wished to prove.
Note 4: We require to prove that if the variables in the partial regression coefficient Bxy.z are divided by their own standard deviations, the result can be expressed in the form (rxy - rxz.ryz)/(1 - ryz^2).
Consider the partial regression coefficient Bxy.z in the form [Sxy/NVy - (Sxz/NVz)(Szy/NVy)]/[1 - Syz.Szy/NVz.NVy]. We now divide each variable by its own standard deviation. Here we note that the variance Vx is an abbreviation for S(x^2)/N (using deviation values). If the X variable is divided by its standard deviation, the variance of the resulting transformed variables is S(X/sx)^2)/N = (Sx^2)/NVx = 1. Likewise for the other variances. Dividing all the variables by their standard deviations, [Sxy/NVy - (Sxz/NVz)(Szy/NVy)]/[1 - Syz.Szy/NVzNVy] therefore reduces to [Sxy/Nsx.sy - (Sxz/Nsx.sz)(Szy/Nsz.sy)]/[1 - (Syz/Nsy.sz)(Szy/Nsz.sy)]. On inspection this is equivalent to (rxy - rxz.ryz)/(1 - ryz^2), which is the required expression for Beta_xy.z. Proving the remaining equivalences is tedious but relatively straightforward.
When fecundity does not equal fitness: evidence of an offspring quantity versus quality trade-off in pre-industrial humans:
Remember the fecund upper classes in Farewell to Alms? In any case, one thing that I have assumed is that this sort of model might explain the success of the Neolithic lifestyle despite its decreases of average quality of life. When populations first take up farming, or migrate to a new area, they are well below the Malthusian limit. In contrast resident hunter-gatherers, who aren't as efficient at extracting productivity per unit area, would already be at their Malthusian limit. One can imagine that a Neolithic deme would rapidly expand and demographically surpass the hunter-gatherers around them. During the initial phases of expansion there would be enough land so that all farmers might be prosperous on a absolute scale. Consider the fitness, both reproductively and physiologically, of Americans on the frontier in comparison to their European ancestors. Of course, within a few generations the land would be "filled up" and a stagnant stationary state would be reached...at which point health decreases and the social pathologies characteristic of down-trodden peasantry would manifest themselves.
Labels: Evolutionary Psychology
Wednesday, January 23, 2008
Yann points to a new paper, new paper, Cystic Fibrosis: Cystic fibrosis and lactase persistence: a possible correlation (Open Access):
The simplest and most economical explanation is that a dairy-milk diet became established in a single area and remained restricted to that area for a period of time sufficient to allow the T and the F508del alleles to attain high values. Then, in a second phase, the population of that area exported to the rest of Europe its dairy-milk diet culture together with the two adaptive genes, that is, the adaptogen and the two genetic adaptations to it. These two alleles would have then been amplified in the recipient populations because of their adaptive value owing to the co-imported dairy milk diet.
The two models to explain the high frequency of the deleterious CF allele in Europeans are that it has a high mutational bias and heterozygote advantage for those with one copy. Most people would say that the latter is much more likely. The authors here propose that the derived CF allele was a really kludgey adaptive response to a new cultural regime predicated on raw milk consumption. Paul has some Ireland related thoughts (as usual!). I've never seen the term "adaptopgen" before. In any case, I need to think on this case more...but I do think that if human evolution has been on hyperdrive the last 10,000 years we should be many kludgey genetic responses laying around the adaptive landscape.....
Related: Lactase persistence posts. Another from Yann, Is there a fitness advantage to being a CFTR carrier?
Tuesday, January 22, 2008
Check out this post. I found it via Research Blogging. I'm going to try out the RSS when they get it up and see if it's worth it....
Labels: sex differences
The always interesting Andrew Gelman, Rich state, poor state, red-state, blue-state: it's all about the rich:
Thus, the familiar "red America, blue America" pattern, the "culture war" between red and blue states, is really something happening at the higher range of incomes.
I believe that the details of history are always driven by battles between the elites....
Update: zeil asks if this is a race effect. From Rich State, Poor State, Red State, Blue State: What's the Matter with Connecticut?:
Could the varying income effects we have shown be merely a proxy for race? This is a potentially plausible story. Perhaps the high slope in Mississippi reflects poor black Democrats and rich white Republicans, while Connecticut's flatter slope arises from its more racially homogeneous population. To test this, we replicate our analysis, dropping all Africanâ€“American respondents. This reduces our key pattern by about half. For example, in a replication of Figure 5, the slopes for income remain higher in poor states than in rich states, but these slopes now go from about 0.2 to 0 rather than from 0.4 to 0.
Reading some stuff on the Neolithic transition. From Neolithic
...Neolithic populations from Europe to Sri Lanka lost an average of two inches in height, and in Japan, there was a two-to-five-year drop in the average age at death for men and a three-year drop for women....
From The Widening Harvest: The Neolithic Transition in Europe:
...This region [southern Scandinavia] is likely one of the very few places in Europe that supported a substantial indigenous population prior to the transition to agriculture...With only a few exceptions, most of the continent contains little or no indication of occupation during the period just prior to the transition to agriculture...
The stuff about the dietary deficiencies are well known. My own interest in vitamin D deficiencies are not focused on rickets; rather, I am curious about the possible relationship between the deficiency, weaker immune systems and the rise of endemic & epidemic diseases. The second book consists of a series of essays which explore the various models for the expansion of the agricultural lifestyle into Europe. Genetic data implies that around 1/4 of the total ancestry of Europeans as a whole is derived from a population signal which originated in Anatolia. That being said, this proportion varies, with far higher proportions along the southeast edge of Europe and far less in the north. The fact that much of Europe was very lightly populated prior to the rise of the farming culture probably is one reason that the genetic signal of the Anatolian cultures is so strong (elsewhere the author of the second quoted passage notes that only two Mediterranean islands were inhabited before the Neolithic).
The second researcher above implies that the spread of agriculture in Scandinavia was almost certainly due to primarily cultural diffusion, taking into account various continuities (artifacts & physical anthropology) as well as the seemingly large native hunter-gatherer population as inferred from settlement sites. It is important to note that human populations were not resident in northern Europe before 8500 B.C. because of the climatic circumstance. Additionally, I wanted to highlight the emphasis on the utilization of sea life for sources of protein, because marine organisms are relatively enriched in vitamin D. The later does note that shellfish were less prominent in Baltic pre-Neolithic sites, so I don't want to overplay that hand, but, do note that a reliance on fish for protein and a later switch to red meat is also attested for Britain.
Botanists reported the discovery of a species of palm tree in Madagascar which sprouts a giant inflorescence (images from the BBC and National Geographic). In the New York Times:
"It's spectacular," said Mijoro Rakotoarinivo of the Royal Botanical Gardens in Madagascar. "It does not flower for maybe 100 years and can be mistaken for other types of palm. But then a large shoot grows out of the top and starts to spread, a bit like a Christmas tree." Those branches then become covered in hundreds of tiny white flowers that ooze with nectar, attracting insects and birds.
The peer-reviewed paper which describes this species (Tahina spectabilis) appeared in the Botanical Journal of the Linnean Society (available by subscription from Blackwell-Synergy). Disappointingly, the supposedly newsworthy lifecycle of this species is not mentioned at all in the paper! According to the paper, Tahina spectabilis was first spotted by a local family in August 2005; at the time of this first visit, the tree had not sprouted its inflorescence, and so it was mistaken for a different species. The same family revisited in September 2006, at which time they observed the towering "Christmas tree" inflorescence. Thus, the confirmed lower bound on the flowering time of this plant is only 1 year. So why does one of the authors speculate on a 100-year flowering time when speaking to the news media? (Are there any botanists here who know?)
Regardless, this discovery is a good launching point for thinking about the evolutionary biology of sexual reproduction. Tahina spectabilis grows 4-10 meters tall - and sprouts an inflorescence that is 4 meters long! What evolutionary process could lead to such an expensive reproductive investment? How would one characterize this species in terms of r/K selection theory? Botanists have long been interested in the fitness trade-offs of different inflorescence patterns. Now, with the current tools of molecular biology, experiments in model species such as Arabidopsis promise to uncover the genetics of inflorescence. When we think about the sexual phenotypes of animals, we usually focus on mating and sociality; unfortunately, such behaviors are difficult to quantify. Plants don't have behavioral repertoires, but they do have elaborate sexual hardware. Phenotypic variation in hardware is relatively easy to quantify; one can perform time-lapse imaging of a plant over its entire development to track the spatial arrangement of the flowers, number of branch points, duration of infructescence, etc. And such variation in hardware can be directly, mechanistically related to genetic variation in the underlying developmental programs.
If agriculture, and the social and cultural revolutions triggered by this new form of extracting economic productivity from land, was a major variable in triggering recent human evolution it is important to know when it swept over a particular region. With the big ranges given for when selection pressures began to reshape a genomic region the narrower values from archaeology might be very useful. So here's another map of the spread of agriculture in Europe....
Dienekes has an old post with similar numbers. The recent results which imply that SLC24A5 (responsible for 1/3 of the European vs. African skin color difference) might have only started rising in frequency 6-12 thousand years ago would be constrained if we are to assume that changes triggered by agriculture were necessary; as farming only spread to northern Europe around 7,000 years ago.
Below the fold I cut & pasted some of the rows from the Agricultural Transition Data Set.
Syrian Arab Republic 10500
Iran, Islamic Rep. 9500
Saudi Arabia 7600
Serbia and Montenegro 7500
United Arab Emirates 7500
Egypt, Arab Rep. 7200
Bosnia and Herzegovina 7000
Czech Republic 6500
Taiwan, China 5500
United Kingdom 5500
Hong Kong, China 5000
Sri Lanka 5000
Korea, Rep. 4500
Papua New Guinea 4000
French Guiana 3600
Cote d'Ivoire 3500
Mariana Islands 3500
United States 3500
Sierra Leone 3250
Central African Republic 3000
Congo, Rep. 3000
Demcratic Rep. of Congo 3000
El Salvador 3000
Gambia, The 3000
Burkina Faso 2900
Costa Rica 2500
Trinidad and Tobago 2000
South Africa 1700
Dominican Republic 1500
Monday, January 21, 2008
Since we've been talking about anthropology, I am posting this mostly to satisfy my curiosity and get something off my chest. There was a time in the past when I was a hard-core libertarian. I was at a book store recently and flipped through Radicals for Capitalism, Brian Doherty's intellectual history of the libertarian movement. I already knew most of the key players from my past readings. Now, I'm not one of the few dozen people in the world who has actually read Ludwig von Mises' Human Action (I'd be willing to bet some gold that half of these individuals who've gotten through von Mises' magnum opus are virgins!), so my libertarian nerdishness only went so far. All that being said, there was a time I would have said I favored the Austrian School of economics. This was during a period when I was busy boning up on the Krebs cycle, I wouldn't have had any clue what an indifference curve was. I was a libertarian, and the Austrian School was congenial to libertarianism, ergo, I supported the Austrian School (I knew I opposed Keynesians as well as the neoclassical models).
But I'd always had issues because I knew that the Austrian school rejected econometrics and positivism; and being steeped in experimental science I'd always viewed positivism as a Good Thing. Eventually I read Bryan Caplan's Why I am Not an Austrian Economist, the definitive smackdown of the school of thought derived from von Mises (an aside: the aspersions cast in this post are aimed primarily at the Misesian tradition, not the Hayekian; the reason for the distinction is made clear in Caplan's piece). I'd already lost my interest in libertarianism by the time I'd stumbled upon this polemic, but it confirmed my growing suspicions that Austrian economics had turned into a cult of personality. Caplan, being an economist, has some pointed technical criticisms. But over the past few years, and especially over the past months, I've been doing some reading on Google Books and elsewhere on the intellectual history of the Austrian School, and especially praxeology. What the hell is praxeology? Well, from praxeology.net:
Praxeology is the study of those aspects of human action that can be grasped a priori; in other words, it is concerned with the conceptual analysis and logical implications of preference, choice, means-end schemes, and so forth.
The "grasped a priori" part has really bothered me. I mean, I read psychology and history, I can't derive it a priori. Recently I was going over some issues in modern Middle Eastern history, and learned that King Hussein of Jordan had apparently asked Israel for permission to send a brigade to Syria to invade the Jewish state during the 1973 Yom Kippur War. Honestly, I really don't know if I could ever grasp Arab psychology a priori. The more and more I read about psychology the more I think that anyone who believes that they could develop an axiomatic system of human action from insights they grasped a priori is totally retarded (mad props to Aristotle though, he worked before the cognitive revolution). More specifically I have to wonder if they are socially retarded. I have suggested that an attraction to libertarianism is in part a function of your personality. Normal people rarely become libertarians, rather, it's a ideology driven by young non-alpha males with Roark/Galt fantasies. There are many more Justin Raimondo & Eric Garris types than Mark Cubans in hard-core libertarianism. Any survey of the biographies of von Mises or Murray Rothbard emphasizes their stubborn heterodox tendencies; but at this point I just wonder if they were social retards to whom their a priori logic was plausible because they really weren't as complicated as most humans, who engage in habitual and casual hypocrisy and contradiction. I recall reading Rothbard once explaining how one might buy and sell children in "flourishing child markets" in an anarcho-capitalist order. Even then I remember thinking, "Dude is weird...."
Now, why am I posting this? Many readers of this weblog are sympathetic to the Austrian School of economics (e.g., Mencius Moldbug). On occasion readers have even emailed me pointing to chapters in Human Action. Seeing as that around 1/3 of the readers of this weblog are libertarian that's never surprised me, and I haven't cared enough about economics to ever elaborate my distaste for the Austrian School. There are three reasons I'm going on the record now though.
1) I've developed an interest in economics as an academic discipline. In other words, I do know what an indifference curve is, or comparative statics, or business cycles. My adherence to Austrian economics seems analogous to a young man's infatuation with the prose stylings of Piers Anthony; I didn't know any better.
2) My readings in psychology and history makes it very difficult for me to understand how anyone could adhere to a Misesian form of Austrianism with its commitment to praxeology. In short, I really think praxeology is a rotten foundation for any system of thought. Certainly when someone espouses Austrian economics it makes me question if they're a bit nuts.
3) That being said, I'm curious to see how GNXP readers would respond to my objections and sentiments. Your responses should go in the comments (no emails please). I'm curious for two primary reasons: I want to know a bit more about the psychologies attracted to the Misesian school, and, there's an chance I'll revoke my critique and explore Austrian economics in more depth (more practically, I won't dismiss readers who espouse Austrianism as bizarros if I think I've been too harsh on Mises' work).
Sunday, January 20, 2008
Each point is an individual, and the axes are the first two principal components of "genetic space". Colors correspond to individuals of different European ancestry.
Labels: Population genetics
Two posts to check out over at Half Sigma. First, he suggests that John McCain's daughter is hot. I don't have a huge N, but that looks like a good picture, and it certainly benefits from any contrast effect, if you know what I mean. But you can't discount the photo, and her mother seems relatively well preserved. If you check out the video on this page you also note Meghan McCain has a feminine voice (apparently she thinks Barack Obama is cute). Second, in John Rawls, human biodiversity, and redistribution of wealth, Half Sigma is surprisingly sympathetic to a "liberal" position (I say surprisingly because anyone who reads the blog knows of his almost visceral dislike of liberals, though it is not without foundation from where he stands). I've been making the argument that liberals could make the case that Half Sigma suggests they should be making for a while now. Of course, my own values are not the same as John Rawls', so I can't deliver the redistributionist line with any sincerity. Because of my innate empathy deficit the "original position" thought experiment has always been a stretch cognitively, and I also don't accept the max-min rule as necesarily optimal (i.e., you accept lower total summed utility to maximize the minimum value across the distribution). But I do know that some readers of this blog of Leftish inclination have always held to this position implicitly, if not fully elucidated in a formal sense....
Saturday, January 19, 2008
I was talking with a friend about Native American skin color. From the Canadian north down to Chile it seems that though there is variation these populations exhibit some sort of brownish shade. There are no black-skinned Amazonians, nor are there pink-skinned peoples on the Canadian Arctic. So what gives?
First, it seems likely that Native Americans have been "native" to the New World for only around the past 10,000 years.1 A physical anthropologist once told me that the body proportions of the natives of the Amazon are still quite "Siberian," that is, they exhibit adaptations to cold weather after all these generations. And of course time is not the only parameter, Native American populations seem to have gone through a genetic bottleneck; they likely brought over very little standing genetic variation. So you have a relatively short period of time for selection to operate upon over a very limited range of trait value.2
But this isn't persuasive to me for skin color, at least in the totality. We seem to know the genes at work now. We know that they can be selected very fast, and we know that there have been convergent dynamics across the World Island. 10,000 years is plenty of time. So perhaps the second parameter, extant genetic variation, is at work? That is, the Siberian migrants didn't bring all the genes for selection to shift them toward new adaptive optimums. For dark skin the data suggest that there is a rough consensus sequence, a constrained set of alleles across skin color genes, which produces our species' dark "Wild Type." This suite of genes probably arose when we lost our fur and became strongly pigmented to counter the negative affects of radiation, and it seems like there hasn't been any reinvention of the wheel here. Melanesian populations which are quite distant from Africans on most genes exhibit the same consensus sequences for skin color loci, by and large. I think that it is likely that the brown-skinned Siberians did lose some alleles at particular loci (that is, they were fixed for loss of function variants), so that for true blackness to reemerge there needs to be new mutations which gain the function back. And as you likely know, gain of function is far less likely than loss of function.
But that only explains why Native Americans don't get very dark. As I imply above, loss of function isn't all that hard. That's why albinos can be found in most human societies, they're an extreme mutant, but the same principle seems to be operative on many of the skin lightening genes. So why didn't Native Americans get pink? I think the fact that Siberians and Inuit are relatively brown suggests that extreme depigmentation is not always entailed by life at high latitudes. As many workers have suggested groups like the Inuit consume marine animals who are heavily loaded with Vitamin D, the lack of which is one of the presumed selective pressures driving depigmentation. That being said, most Native American tribes did not live next to the sea. And yet the recent selection events for genes such as SLC24A5 and OCA2 strongly implies that European have become very depigmented very late in prehistory, perhaps almost into historical periods!
Why? I have proposed (following many others, such as L. L. Cavalli-Sforza) that the switch to agriculture resulted in a shift in diet and nutritional intake which entailed greater endogenous Vitamin D production by necessity. But there's a problem with this model: forms of agriculture existed in the New World as well, and spread up (eventually) into what became the eastern United States. Granted, the latitude of much of this region is about where the Middle East is, but even then it seems that the natives were relatively swarthy. I discount the notion that agriculture was too recent when SLC24A5 might have had selection coefficents on the order of 0.10. Perhaps the people of the New World, at least in North America, kept a more diverse diet, supplementing their agriculture with hunting and fishing to a far greater degree than in the Old World? Additionally, one might suppose that maize was nutritionally superior staple to wheat, barley, millet or rice (I have read that this is so). Ultimately these sorts of questions need to be addressed by a survey of the archaeological literature, as well as assessing the nutritional differences. I'll get to that at some point.
But there's one last thing I thought of: disease. I can't really explain with SLC24A5 goes so far south in India. You see frequencies as high as 25% in Tamil Nadu. Vitamin D deficiency? Certainly nutritional stress is a major issue, but, one thing is for sure, South Asia is subject to a lot of disease in comparison to any other densely populated part of the world. Of the Old World civilizational hearths India was certainly the one weighed down by the greatest endemic pathogen load, in large part because it was so far south and so wet. So perhaps it was disease.
Which brings us to Native Americans. Despite the recent uproar over syphilis, the New World was relatively pathogen free for humans. Granted, with greater population densities disease would have been a major issue among the agriculture populations of the New World, but there were structural reasons why they would have been less prone to epidemic outbreaks than Old World civilizations. The relative lack of domestic animals, the non-existence of closely related species (think of ape strains of viruses), the smaller and more fragmented population networks, and of course the fact that the original migrants probably only brought a small subset of the diseases of the Old World originally. Empirically we know that the Native Americans died like flies when the Eurasians showed up. Their civilization simply didn't prepare them for Old World plagues. What I'm proposing here is that disease was a major driver of skin color evolution over the last 10,000 years. Or, at least, the same loci which control and modulate melanin production are critical in immune defenses.
I need to do a lot more digging for this to be anything more than a guess. But the disease angle seemed to be the last best hope in explaining why the New World was different. If they were subject to the same nutritional stress, why didn't they go down the same path as Eurasians? The reason may be that the path was being forged by the threat of disease (Vitamin D deficiency increases susceptibility to infectious agents), which was a less important parameter in the New World. Implausible as it may sound, it seems the most plausible of the various explanations to me.
Note: If you are really curious about the topic, check out the many posts on skin color on this GNXP and the other.
1 - Even if Clovis First is debunked, it seems more and more likely that there are problems with genetic studies which claim that the earliest migrations date to 20-40 thousand years BP.
2 - All things being equal the rate of adaptive evolution is proportional to the extant genetic variation. If there is no genetic variation evolution has no raw material to work with.
Friday, January 18, 2008
One of those generic de-lurking threads if you are in the mood.
I have published my own take on Jonah Goldberg's "Liberal Fascism" here. Since GNXP is not a political site (and since many here are not in sympathy with my point of view) I will not publish the whole thing here, though I am willing to discuss it in comments.
UPDATE: Thanks, everyone. I didn't expect agreement, and I found the discussions interesting. I'm sorry that they wandered into the abortion area, because abortion isn't really my issue, but that was my fault.
A site with the URL Anthropology.net puts up a post titled Fighting the mantra, "People vary more within the groups than vary between groups"? What's going on here! Since Lewontin's Fallacy is one of the axioms at the heart of much of modern anthropology (that is, when anthropologists bother to accept the validity of linear-logo-centric analytic frames), I think someone might have their card revoked soon. Be afraid! Of course, we know what type of person runs Anthropology.net....
Readers who want to dig deeper into the nature of the pretty charts in the papers that the apostate anthropologist mentions above should check out Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies, an early paper introducing STRUCTURE. K's in the hiz houze, as they say.
Update: Some concern in the comments that I'm painting with too broad a brush in regards to anthropologists. Obviously since I am avid follower of the work of people such as John Hawks, Henry Harpending and Heather Norton, who are all trained as anthropologists, I don't think it's all crap. That being said, physical anthropologists are anthropologists in the same way that a follower of Milton Friedman is a liberals. There are historical and definitional reasons to call oneself a liberal in such a case, but quite often it simply results in confusion because of other definitions in circulation (context matters here, since the older definition of liberal is still in currency in much of the rest of the world). In this case, there is Anthropology which deals in Theory and Ways of Knowing, and anthropology which is driven by data, models and analysis.
Of course, this isn't a total dichotomy. There are cultural anthropologists who attempt to work within a scientific framework, general hypotheses, see how the data fits the inferences, and so forth. But they seem a very small minority at this point.
Labels: human biodiversity
Thursday, January 17, 2008
Found this why doing research on the web, The Devolution of Britney Spears: From Pop Star to Celebrity Trash in Less than 7 Years.
Secret Of Scottish Sheep Evolution Discovered:
...gene and one copy of the light gene are quite large and also have quite high reproductive success. Sheep with two copies of the dark gene are larger still, but have poor reproductive success. Sheep with two copies of the light gene are small, but still have quite high reproductive success. This means that the two types of dark sheep although indistinguishable visually, vary in Darwinian fitness.
The paper will be "A Localised Negative Genetic Correlation Constrains Microevolution of Coat Colour in Wild Sheep" in Science (not on the site yet). I wish I had some numbers to put on this...because I want to know the reproductive fitness of heterozygote black sheep vs. white sheep (ergo, the question mark after "balancing selection"). In any case, I wonder how it might apply to human pigmentation genes & selection. Consider the KITLG allele which results in dominance effects for light skin but recessive effects for light hair, and was subject to recent selection in Europeans. Or the affect of an OCA2 allele on eye & skin color.
Big Think is getting some press. Personally, seems a bit of weak tea next to Meaningoflife.tv and Beyond Belief. But I guess they're new, so we'll see....
Wednesday, January 16, 2008
Because they're not into you? Ha, no, I don't mean when they turn their head or whole body away. I mean when their body and head remain facing you but they move their eyes away, usually up and away but sometimes straight to the side. Imagine them giggling and saying "Wellll....." -- that look. (For those who need reminding: one example, another, and another, plus the Penelope Cruz picture in the link just below.)
To continue on a previous post that discussed a hypothesis for why humans have white eyes -- to help others detect what we're looking at, and thus thinking about? -- I suggest that this reflex of girls is an amplifier of an underlying index of phenotypic quality, namely large clear eyes.
The biologist Oren Hasson (free PDFs) has proposed a distinction between two types of animal signals: "indices" are honest indicators of underlying quality, while "amplifiers" make the index easier to perceive for the recipient. As an example, if others care about your ectoparasite load, then having plumage whose color was complementary to that of the bug would amplify your signal: even from a distance, it would be obvious who was clean and who was crawling with bugs. Your spot-free or diseased plumage is the index, while its bug-contrasting color is the amplifier.
(Say, that sounds like another reason why humans evolved lighter skin when they left Africa and adopted agriculture, which introduced a huge disease burden, especially diseases that produce and/or leave behind unpleasant, visible cues. It would be easier to spot who had "good genes" here if they had light skin. This cause is likely minor, but there could be something to it.)
In the case of girly girls, their large clear eyes are the index (this is something we pay attention to when judging the attractiveness of female faces), while the looking-away reflex is the amplifier. People with larger eyes show more white than do those with beady eyes. By looking away, nearly all of the visible eye is white, making it easier for onlookers to judge eye size. It is simply easier for us to judge the area of a solid figure (the nearly all-white eye) than the sum of the areas of scraps and bits (the white parts on either side of the iris when a person is looking straight at you).
And as always, this doesn't have to be the only reason that the behavior has evolved. One alternative is that it serves to playfully tease the male by denying him eye-contact. I find that unsatisfying (the hypothesis, not teasing), since there are many ways to break eye-contact, and the most common way to do so while flirting just happens to make it easier to see what large, dreamy eyes a person has.
Mark Kirkorian points out that Barack Obama is a Muslim apostate:
Several implications: first, Obama's has a unique opportunity - even a responsibility - to speak out on behalf of former Muslims under threat of death for converting to other faiths. Second, there are likely to be even more lunatics trying to kill him than there would be otherwise. And third, how would a President Obama be greeted by, say, the king of "Saudi" Arabia? Probably the same way a President Lieberman would be, and that could actually be a big selling point in his favor, but it's something we can't just pretend doesn't exist.
By a broad interpretation Kirkorian is correct to assert that Obama would be considered an apostate by many Muslims if the facts of his biography were to be presented before them (I am an apostate as well by the definition that is being assumed here). Additionally, his conversion to another religion is also highly problematic, non-religious individuals who nevertheless do not opt-out of Islamic identity/culture and turn toward aggressive atheism or another religion are tolerated to some extent in many Muslim societies. Converts to other religion though are seen as a more obvious affront.
But there's a big problem with Kirkorian's inferences: they exist in in a vacuum of the true distribution of empirical data and take Muslim axioms at face value. This is common among many conservative American intellectuals who wish to rebut the anodyne reassurance from the mainstream that Islam is "really a religion of peace." So fixated on countering the "Islam is peace" propaganda conservative intellectuals don't bother to learn much about how the religion is actually practiced to compare the facts to the various inferences they make about how it would be practiced. If you look at a list of former Muslims you note several politicians, most prominently Carlos Menem, the former president of Argentina. Menem of course had good relations with the Arab world.
What gives here? We know that some apostates are threatened with death, or even killed. Context matters. Many of the attacks on apostates have other factors which serve to push Muslims to action upon their avowed axioms. The Afghan convert to Christian, Abdul Rahman, wasn't the most mentally stable individual. In the Muslim world apostasy and blasphemy laws are often enforced or implemented opportunistically; quite often there are other reasons that principals bringing the charges have for prosecution (e.g., confiscation of property).
I do think it is important that the Mark Kirkorians of the world point out the illiberalism which is accepted within the Muslim world. But that being said, I do worry that they take their own rhetoric a bit too literally. After all, consumption of alcohol does exist within the Muslim world, to the point where a king of Saudi Arabia had to abdicate because he couldn't mask his addiction anymore. To some extent I wonder if a certain Anglo-American naivete about the relationship between word & deed is at work here; a tendency to take as concrete assertions which are embedded & expressed within the constraints of practical day to day realities. On the other hand, I also think part of the issue is that when you are outside of a culture you only see the explicit axioms which are averred and are unaware of the implicit pragmatism which defines day to day life. Finally, it is important to note that though I think that the Islamic attitude toward apostasy is not sufficient to explain the outbursts of violence and intimidation to those who leave the fold, it is necessary.
Tuesday, January 15, 2008
Jonathan Rauch has an amusing piece in Reason, The Coming American Matriarchy. To some extent it is not noteworthy, it's the sort of thing you see in the mainstream press when journalists skim over data sets for some superficial insights. I know some D.C. libertarians do read this weblog, so they should point out the silliness of the column to Rauch. Consider:
Here's the problem here: having a college degree once meant that you were part of the educational elite because a college degree was rare. Rauch admits that it is no longer rare, and likely at some point in the future the majority of young adults will be expected to have a college degree of some sort as a minimal qualification for a non-manual job. If the majority have a particular credential, it is no longer a good elite signaler. Additionally, it is a relatively well known fact across many domains of achievement males are disproportionately found at the higher ranks even if there is numerical parity. About the same number of medical school graduates are now male or female, but the latter tend to go into "low status" tracks relative to the former (e.g., surgery vs. family practice). Here is Helena Cronin:
Similarly, consider the most intellectually gifted of the USA population, an elite 1%. The difference between their bottom and top quartiles is so wide that it encompasses one-third of the entire ability range in the American population, from IQs above 137 to IQs beyond 200. And who's overwhelmingly in the top quartile? Males. Look, for instance, at the boy:girl ratios among adolescents for scores in mathematical-reasoning tests: scores of at least 500, 2:1; scores of at least 600, 4:1; scores of at least 700, 13.1.
These patterns are banal and well known to regular readers of this weblog, to the point where I don't post on these topics much. But Jonathan Rauch seems like a smart enough fellow, and Reason is heterodox enough to publish someone like Ron Bailey. There are real issues here of possible interest. For example, as the proportion of female lawyers increases I wonder if firm culture may change enough so that the billable hour system becomes a thing of the past. Such a transformation might have have the outcome of diminishing the handicap that women face in making partner (because women are, on average, burdened with more expectations in family life than men). I also don't think that intelligence or its distribution are the only characteristics to consider; personality seems to be a major area of difference between the sexes that might shape their life outcomes. Finally, there are studies which suggest women tend to be much more critical of female co-workers, a powerful united sisterhood might not be a good model for the shape of future XX dominated professions. Numbers like this can stimulate some interesting projections...but imagining a 'matriarchy' really is a waste of column space.
Addendum: Terms like 'patriarchy' or 'matriarchy' make it seems like men or women in the plural dominate the other sex. This of course elides over intrasexual dynamics. For example, extreme patriarchies such as the Saudi kingdom, do not benefit all men at the expense of all women. Rather, usually these extremely sex differentiated systems as a matter of course crystallize and reinforce the dominance of a particular oligarchic clique (e.g., the House of Saud and their clients). Marginalized males may also be quite oppressed by the patriarchy. My own opinion is that the relative weakness of sisterhoods as opposed to brotherhoods is the main reason that patriarchy has become so common over the last 10,000 years.
Labels: Behavior Genetics
p-ter's post about a new allele for lactase persistence is a powerful testament to the reality of gene-culture coevolution. These alleles which allow for lactase persistence have almost certainly spread over the last 10,000 years, and likely within the last 5,000. The fact that multiple alleles arose which exhibit disparate geographic distributions suggests that population substructure was generated in part by physical barriers (e.g., the mountainous massif at the center of Eurasia) which prevented selection from sweeping from deme to deme. This brings me to a note which I think is important to make: the same parameters which make a region amenable to a flow of information (culture) likely results in it being subject to repeated influxes of advantageous alleles from without. In other words, the rich get richer. Along trade routes come both cultural and genetic innovations. We've been discussing recent adaptive evolution and its likely acceleration toward the present, but if you read history you'll also note that cultural change has also sped up a great deal. The society of ancient Egypt spanned over 2,000 years; obviously it was not static, but a farmer during the Old Kingdom would not have been particularly shocked by the customs & norms of the New Kingdom. In contrast, someone from 200 years in the past would be an alien among us.
Monday, January 14, 2008
Back in the days before I'd ever read any probability or population genetics, I imagine I considered, as many laymen still do, evolution as a sort of deterministic march towards some optimum. I still remember being amazed at the simple equations that show how much stochasticity is involved; how random chance and historical accident can shape the fate of genetic variants. But are there cases where the layman's instinct is correct, where we can say that evolution was deterministic? Obviously, in some sense this is impossible to prove; one can't simply rewind the clock a thousand times and watch the outcomes. But there are natural experiments that I think shed some light on the subject.
The advent of dairy cultures in various human populations around the world provides one such natural experiment. I'm writing about this because of a recent study identifying yet another allele leading to lactose tolerance, this time in a Saudi Arabian population that drinks sheep's milk. A previous study, regular readers may remember, identified three other polymorphisms leading to the phenotype in Sub-Saharan pastoralists. Along with the "European" allele, this brings the total of probable lactose-tolerance-causing mutations segregating in humans to five. Let's make some assumptions: lactose tolerance is perfectly dominant, has a selection coefficient of around 0.1, and all these mutations will continue to fixation (this last one would be almost certainly true if the selection coefficient were constant--all the alleles have escaped the stochastic phases of their trajectories--but is an open question. What is the fitness advantage today of lactose tolerance? Surely this is testable). With these assumptions, one predicts that lactose tolerance has arisen around 25 times since it became advantageous. Given that we're talking about less than ten thousand years since dairy farming, that's quite remarkable.
The relevant parameter here is the mutational target size--if lactose tolerance could only be caused by a change at one particular base pair in humans, it would never have arisen independently so many times. But with a mutational target so large, and a selection coefficient so strong, it becomes inevitable that any culture that developed dairy farming would eventually develop lactose tolerance. But it still seems amazing to me that it happened so quickly!
Ross Douthat points to a WSJ piece which profiles scholars who are attempting to reconstruct the evolution of the Koran based on new (old) archives. I assume that Islam will collapse once certain central tenets exhibit a high degree of falsification, just has Mormonism has due to new findings in archaeology and history since the early 19th century....
I know that there are many readers of this weblog who are interested in both history and genetics. I assume that those who share these interests with me were very excited by the publication of The History and Geography of Human Genes 15 years ago. L. L. Cavalli-Sforza's magnum opus ushered in a new era in the synthesis between the historical sciences & genetic methodologies. Much of it is sloppy; some of it is very insightful and sheds new light upon old questions. At my other blog I've been posting on this topic, and another ScienceBlogger, an archaeologist by training, has responded with a critique. My original posts, From where came the Slavs? & Overturning assumptions: why genes matter in history, Martin's response, Genes and Peoples, and my rebuttal, Categories are instruments; Slavs are tools (no offense about the title Steve C.!). If you comment on Martin's post, be nice, we disagree but he's a cool guy.
Sunday, January 13, 2008
Manish sent me this story about the rise and fall (at least in substance) of Gawker. Long time readers will note that Liz Spiers, one of the original contributors to GNXP, looms large.
Sign Up, Subscribe to the Feed and Learn More!
We've got 30 people signed up so far. I know more will come on board, so I'm looking forward to getting drowned in pure science in ~3 weeks.
Also, if you have a science themed blog post the link in the comments. Thinking about refurbishing the blogroll....
Following on the heels of HMGA2, another genome-wide scan for genes involved in height identfies a region near GDF5 and UQCC:
Identifying genetic variants that influence human height will advance our understanding of skeletal growth and development. Several rare genetic variants have been convincingly and reproducibly associated with height in mendelian syndromes, and common variants in the transcription factor gene HMGA2 are associated with variation in height in the general population1. Here we report genome-wide association analyses, using genotyped and imputed markers, of 6,669 individuals from Finland and Sardinia, and follow-up analyses in an additional 28,801 individuals. We show that common variants in the osteoarthritis-associated locus2GDF5-UQCC contribute to variation in height with an estimated additive effect of 0.44 cm (overall P < 10-15). Our results indicate that there may be a link between the genetic basis of height and osteoarthritis, potentially mediated through alterations in bone growth and developmentIt's worth noting that the loci currently identified as being "height genes" contribute less than 1% of the total variance in height in the population, while the total heritability of height is around 80% in developed countries.
Just noticed that Nature's Oracle: A Life of W. D. Hamilton is finally out. I haven't read it yet, but will have soon once my copy arrives. If you don't know who W. D. Hamilton is, you know his work. Hamilton's early theoretical papers on the evolution of sociality (e.g., kin selection) were the root of many of the ideas presented by Richard Dawkins in The Selfish Gene, while his later ideas about the origins of sex figured in the background of Matt Ridley's The Red Queen. If you wish to familiarize yourself more directly with Hamilton's science and life, I highly recommend his collections of papers with commentary, Narrow Roads of Gene Land, Volume 1: Evolution of Social Behaviour, Narrow Roads of Gene Land, Volume 2: Evolution of Sex and Narrow Roads of Gene Land, Volume 3: Last Words (you will sometimes find cheap copies on Ebay or in used book stores). The author of Nature's Oracle, Ullica Segerstrale, is a sociologist of science (with a background in chemistry) who wrote Defenders of the Truth: The Sociobiology Debate. Hamilton is second only to E. O. Wilson in Segerstrale's narrative, so it is no surprise that she would choose to focus on him now. In the end I suppose the greatest tribute to a scientist is to remember their intellectual contributions and integrate them so thoroughly into contemporary thought that they become background assumptions; but as humans we are interested in personal narrative, and I look forward to exploring the more human (and eccentric) aspects of W. D. Hamilton's character. Though it does nothing to further understanding of science as such, it seems fitting to remember the man.
Related: Richard Dawkins' euology for Hamilton.
Update: I lied! Turns out that the pub date on the Amazon page is wrong and it won't come out until mid-March. Don't blog 'till you "checkout."
Saturday, January 12, 2008
Steve Sailer has the goods on Francis Crick via his private correspondence. To be short about it Crick seemed to be of the same general opinion as James Watson regarding issues such as race & intelligence; in fact, his survey of the literature & acquaintance with the principals was clearly more thorough. Is anyone particularly surprised by this? Many of the great British scientists were heterodox fellows who generally were disinclined to bend before the winds; note W. D. Hamilton's heresies. I believe that many of these individuals whose opinions and conjectures would raise eyebrows in polite company do approach controversial questions with the same objectivity as that when they study inheritance of bristle number in Drosophila (to paraphrase James F. Crow). I don't think it is any surprise that both W. D. Hamilton and Francis Crick mooted the positive aspects of infanticide. No common men, they. I suspect that their moral sense was a bit deviated from the central tendency...which isn't always a bad thing so long as you don't allow them to manipulate the levers of executive power in an autocracy.
Labels: human biodiversity
I've recently pointed out a number of advances in dog genetics, in particular with regards to behavioral phenotypes. A just-published review lays out what the future has in store; these are exciting times. In particular, people familiar with Belyaev and his silver foxes will be particularly interested in this part:
Recently, a fox meiotic linkage map was constructed that covers the entire haploid set of 16 fox autosomes as well as the X chromosome. With this key resource now available, several experimental pedigrees have been generated to map the fox loci for both aggression and tameness. The research community is anxiously awaiting results of this 50 year study, which are expected within the next two years
Just a follow up on the post where many of the comments examined the utility of social science. I happened to walk by my copy of Judgment under Uncertainty: Heuristics and Biases today. Anyone who thinks that social science doesn't uncover "surprising" findings should check out this research program; it isn't a coincidence that Daniel Kahneman won the Nobel Prize in Economics in 2002. If the median human IQ was 4 standard deviations above what it is now there might be less concern about understanding the shape of human stupidity and how it manifests itself, but as it is we don't live in that world. From Adaptive Thinking: Rationality in the Real World:
...95 out of 100 physicians estimated the probability of breast cancer after a positive mammogram to be about 75%. The inference from an observation...to a disease, or more generally, from data D to a hypothesis H, is often referred to as a "Bayesian inference," because it can be modeled by Baye's rule...The important point is that Equation 1 [Bayes rule -Razib] results in a probability of 7.8%, not 75% as estimated by the majority of physicians....
To think in a Bayesian manner might not be natural, but neither is it cognitively taxing with some training (the equation isn't rocket science, though repeated utilization is probably essential to its practical value). Time is finite and human cognitive capacity and aptitude places constraints upon what is in the realm of the possible. By examining common cognitive errors and biases we can alert ourselves to the fallacies which we are all prone to. For a typical patient a new diagnostic device is a salient example of the utility of engineering; but if medical doctors routinely make statistically naive inferences because of the nature of human psychology then the utility of more precise and powerful devices is sharply reduced. Changes in the emphasis of topics in the curriculum of medical schools guided by psychological science is rather distant from the minds and experiences of the average person, but it may have a significant affect on our experienced utility.
Friday, January 11, 2008
Culture Influences Brain Function, Study Shows:
To find out, a team led by John Gabrieli, a professor at the McGovern Institute for Brain Research at MIT, asked 10 East Asians recently arrived in the United States and 10 Americans to make quick perceptual judgments while in a functional magnetic resonance imaging (fMRI) scanner--a technology that maps blood flow changes in the brain that correspond to mental operations.
As noted in the summary there have been previous behavioral/psychological studies which pointed to this difference, the book Geography of Thought surveyed that literature. I can't find the paper online yet, or on the site of Psychological Science, so the question I have is what was the ethnic make up of the "10 Americans"? In the previous psychological work it seems that Asian Americans tend to cluster with other Americans if they arrived as small children, but more with Asians if they arrived as adults (with teenagers somewhere in between). That argues for some level of plasticity in regards to this sort of cognition, or at least a critical period. If the fMRI confirmed this, that is, Asian Americans born in the United States exhibit signatures similar to European Americans, then that would suggest that this sort of cognitive function is subject primarily to cultural variation.
You might think "of course it is a function of culture," but do note that the physical differences in shape across brains. And recent signatures of selection show that East Asians have derived alleles on genes which are involved in development of the structure of the brain. Finally, East Asian and European infants seem to have very different initial personalities. So I don't think the priors here are clean cut at all. Culture influences the shape of our genome and biology; LCT anyone?
Note: If there is a difference between East Asians and Europeans even before or controlling for acculturation on these sorts of tasks, I would suspect that it would be a small average difference when considered in light of the difference phenotypes. I suspect that small differences in means can lead to different metastable cultural norms. Or, more clearly, small differences in propensity of East Asian origin people raised in Western culture to think in a particular manner may totally be dampened by the other exogenous inputs (primarily, the constant interaction with people of European origin with cultural norms at sharp variance with those dominant in East Asia). But as the frequency of East Asians increases one could imagine that a likelihood might exist that a "peak shift" might occur so that East Asian Americans in predominantly East Asian neighborhoods and socialized by those of the same ethnicity may develop a subculture more at variance with the European American norm and similar to what we see in East Asia. In other words, non-linearities due to gene-environment correlation & interaction.
Labels: human biodiversity
Via Dienekes, Eye Color, Hair Color, Blood Type, and the Rhesus Factor: Exploring Possible Genetic Links to Sexual Orientation:
The present study sought to expand the limited evidence that sexual orientation is influenced by genetic factors. This was accomplished by seeking statistical differences between heterosexuals and homosexuals for four traits that are known to be genetically determined: eye color, natural hair color, blood type, and the Rhesus factor. Using a sample of over 7,000 U.S. and Canadian college students supplemented with additional homosexual subjects obtained through internet contacts, we found no significant differences between heterosexuals and homosexuals regarding eye color or hair color. In the case of blood type and the Rh factor, however, interesting patterns emerged. Heterosexual males and females exhibited statistically identical frequencies of the A blood type, while gay men exhibited a relatively low incidence and lesbians had a relatively high incidence (p < .05). In the case of the Rh factor, unusually high proportions of homosexuals of both sexes were Rh- when compared to heterosexuals (p < .06). The findings suggest that a connection may exist between sexual orientation and genes both on chromosome 9 (where blood type is determined) and on chromosome 1 (where the Rh factor is regulated).
What do you think? Seems more plausible that the likelihood of homosexual orientation is partly conditional upon the other genetic factors or physiological parameters; rather then there being a common causal root. If, as some argue, homosexuality is due to a relatively recent pathogen then its relationship to particular blood groups may simply be a coincidence of varied immune response of the different ABO & Rh antigens. I would be curious as to the blood group status of the mothers of gay men and women; perhaps it is simply due to physiological conflict (this might be related to sibling order). Like IQ it seems highly likely that there's a biological component to the variation, but color me skeptical of any locus of large effect.
(FYI, I'm blood group A & Rh+)
Related: Number of biological older brothers predicts male homosexuality, The biology of homosexuality, He She didn't give you gay, did she?, Pinker on the gay gene, Gavrilets' models of homosexuality, Gay sheep, forbidden science? and The gay gene & other considerations.
Thursday, January 10, 2008
Andrew Gelman weighs in on whether or not social scientists "know things", in response to Robin Hanson's statement that indeed they do. I have no real comment, except to say that I'm moderately pleased to see that he classifies biology as a field that gets shit done.
Wednesday, January 09, 2008
An Actn3 knockout mouse provides mechanistic insights into the association between α-actinin-3 deficiency and human athletic performance:
A common nonsense polymorphism (R577X) in the ACTN3 gene results in complete deficiency of the fast skeletal muscle fiber protein α-actinin-3 in an estimated one billion humans worldwide. The XX null genotype is under-represented in elite sprint athletes, associated with reduced muscle strength and sprint performance in non-athletes, and is over-represented in endurance athletes, suggesting that α-actinin-3 deficiency increases muscle endurance at the cost of power generation. Here we report that muscle from Actn3 knockout mice displays reduced force generation, consistent with results from human association studies. Detailed analysis of knockout mouse muscle reveals reduced fast fiber diameter, increased activity of multiple enzymes in the aerobic metabolic pathway, altered contractile properties, and enhanced recovery from fatigue, suggesting a shift in the properties of fast fibers towards those characteristic of slow fibers. These findings provide the first mechanistic explanation for the reported associations between R577X and human athletic performance and muscle function.
Here's a report in the popular press....
Related: Run long, run short.....
Sunday, January 06, 2008
A simple, elegant paper published in Science maps a certain type of dwarfism to the PCNT gene:
Using genetic linkage analysis, we find that biallelic loss-of-function mutations in the centrosomal pericentrin (PCNT) gene on chromosome 21q22.3 cause microcephalic osteodysplastic primordial dwarfism type II (MOPD II) in 25 patients. Adults with this rare inherited condition have an average height of 100 centimeters and a brain size comparable to that of a 3-month-old baby, but are of near-normal intelligence. Absence of PCNT results in disorganized mitotic spindles and missegregation of chromosomes.Perhaps the goal of genetics is to understand how genetic information becomes an organismal phenotype; I find this example fascinating. It appears that cells lacking pericentrin have abnormal chromosome segregation, which leads, in some fraction of cells, to arrest of the cell cycle and possibly cell death. Now as this happens from birth, the authors propose simply that fewer cells=smaller stature. It seems rather intuitive, but this is biology--intuition is not always the best guide to reality, and it's fun when things make sense.
The authors note that a number of genes involved in microcephaly (ie. ASPM, MCPH1, etc.) are also involved in cell division. There's not much comment on this, but it does make sense these sorts of phenotypes (small stature, small brain) be affected by the rate at which cells divide or die.
Finally, having put together this nice paper, the authors take the well-deserved liberty of a little speculation:
There is an ongoing debate as to whether the Late Pleistocene hominid fossils from the island of Flores, Indonesia, represent a diminutive, small-brained new species, Homo floresiensis, or pathological modern humans. We note that individuals with MOPD II have several features in common with Homo floresiensis, including an adult height of 100 cm, grossly normal intelligence despite severely restricted brain size, absence of a sloping microcephalic morphology, and a number of minor morphological features including facial asymmetry, small chin, abnormal teeth, and subtle bony anomalies of the hand and wrist. Given these similarities, it is tempting to hypothesize that the Indonesian diminutive hominids were in fact humans with MOPD II. With the identification of the genetic basis of MOPD II, this hypothesis may soon be testable.
A review in the latest Medical Hypotheses discusses the evolutionary basis of Huntington's disease, a rare dominant genetic disorder affecting around 3-7 per 100,000 people of European origin. Individuals carrying a single mutant copy of the huntingtin gene (HD+) typically suffer serious neurological and physical problems beginning between age 35 and 50, and killing them within 10 to 15 years.
From an evolutionary POV, the existence of Huntington's disease alleles seems straightforward: the disease typically hits people after their reproductive years, and due to the nature of the mutation (an expansion of a polyglutamine repeat) the incidence of new sporadic mutations is pretty high - around 5% of cases are due to new mutations.
However, it appears that another factor is in play. The review lists five studies indicating greater reproductive fitness in HD+ individuals, who apparently produce between 1.14 and 1.34 children for every child borne by unaffected sibling controls. Apparently the popular theory is that this increased fertility is due to heightened promiscuity in HD+ individuals, presumably due to some early-onset sub-clinical psychological manifestation of the disease.
The authors of this review pooh-pooh the promiscuity hypothesis, pointing out the lack of evidence that most HD+ individuals suffer any neurological alterations during their reproductive years, and also arguing that promiscuity doesn't necessarily increase reproductive fitness. Instead, they point to a single study indicating substantially decreased cancer risks in HD+ subjects. Their hypothesis is not that this reduced cancer risk itself increases reproductive output, but rather that this reflects increased immune surveillance and vigour that would be expected to increase overall health and attractiveness in young HD+ individuals.
There is lengthy speculation about a link between the huntingtin protein and p53, a well-known tumour-suppressor. In support of their argument for a general immune boost in HD+ carriers they cite increased rates of type 2 diabetes and Alzheimer's, which both have a substantial auto-immune component.
Like most adaptive Just So stories the immune-boosting hypothesis is almost certainly wrong, and both it and the promiscuity story have essentially nothing in the way of direct evidence. Still, the increased fertility rates in HD+ individuals - which seem fairly well-supported - scream out for an explanation. With luck, someone will eventually carry out some actual experiments to figure out what that explanation is.
Saturday, January 05, 2008
In the comments on a previous post, I made reference to a paper by Demuth et al. on the evolution of gene families in mammals. As this was published in PLoS One, I took a look at the annotations. One of the comments by Laurent Duret brings up a potentially major issue--the authors use a database of gene families for their analysis, but don't try to test how exhaustive the database is. He continues:
As a control for the reliability of their analyses I looked at the 49 gene families that were considered as having been lost in the human lineage ("extinctions" in their Table 2). I retrieved in the supplementary Table S2 all the gene families that contain at least one chimp sequence and one non-primate sequence but no human sequence. These 49 gene families are all represented by a single gene in chimp...Then I extracted the corresponding chimp protein from Ensembl release 41 using BioMart...The 49 chimp genes correspond to 77 proteins (some genes encode alternative splice variants). Then I downloaded all human proteins annotated in Ensembl release 41...Finally, I BLASTed the 77 chimp proteins against the human proteome (Ensembl release 41): each of these chimp proteins has a very strong match in human : average identity (at the protein level) = 99%; minimum = 86%. Thus, none of these 49 gene families has been lost in the human lineage.I think it's fair to say that any of the numbers on gene losses/gains between species presented in this paper should be taken with a grain of salt. This is one of the advantages of the PLoS One system--critiques can be appended directly rather than floating around unpublished or getting published in a minor journal. (Of course, the modal paper published by PLoS One probably doesn't get read closely enough to generate real critiques.)
Addendum: a reader points out that overestimating the number of gene extinctions by a factor of 100% is, in fact, not overestimating it at all (a factor of 100% is a factor of 1). Perhaps Dr. Duret should have written something like "a false positive rate of 100%", but I imagine everyone got his point.
Evidence for declines in human population densities during the early Upper Paleolithic in western Europe:
In western Europe, the Middle to Upper Paleolithic (M/UP) transition, dated between ~35,000 and ~40,000 radiocarbon years, corresponded to a period of major human biological and cultural changes...New faunal data from the high-resolution record of Saint-Cesaire, France, indicate an episode of significant climatic deterioration during the early Upper Paleolithic (EUP), which also was associated with a reduction in mammalian species diversity. High correlations between ethnographic data and mammalian species diversity suggest that this shift decreased human population densities. Reliance on reindeer (Rangifer tarandus), a highly fluctuating resource, would also have promoted declines in human population densities. In this context, the possibility that a modern human expansion occurred in this region seems low. Instead, it is suggested that population bottlenecks, genetic drift, and gene flow prevailed over human population replacement as mechanisms of evolution in humans during the EUP.
I bolded words which I thought emphasized the provisional and tenuous nature of the contingent sequences of inferences being made in this paper. A summary in National Geographic where the first author is quoted is less equivocal:
Morin argues that Neandertal populations thinned out gradually as Europe's environment became harsher, with some groups going extinct.
I don't know much paleontology. Like history this is a field where theory can only take you so far, and you have a comparative advantage if you can cogitateoff an empirical distribution data that you've already internalized via years of close study. But because it is a historical science you also have to place your hypothesis is a bigger context, and the conditional and probabilistic nature of evolutionary processes makes a broader framework essential. Remember the Etruscan story? Archaeologists confronted with extremely strong genetic data (from multiple angles) simply shrugged and expressed ignorant skepticism, as opposed to changing their priors and shifting their paradigm.
When a palaeoanthroplogist makes assertions in a field where I'm not very clear about the details I pay attention to the things they say which I can evaluate. Concluding that 'modern' humans (or, specifically, the descendants of Africans of ~50,000 years B.P.) only arrived during the Neolithic seems to shed an unfavorable light on a scholar's credibility; genetic data suggests that Middle Eastern (Anatolian) Neolithic lineages (e.g., haplogroup J2) are extant across Europe on the order of ~25% penetration (peer reviewed range of 20-50%). In terms of the structure of scientific theories its seems that the researcher above is making an inference based upon their own data and model, which is disputable, and dismissing a far stronger body of work which contradicts said inference (or perhaps the author is ignorant of that body of work?).
There's also the large network of causal factors. In After the Ice Steven Mithen recounts how computer simulations show that the dynamic of a joint impact of both environmental stress (e.g., climate change) and modern human predation is the best explanation for the pattern of megafaunal extinctions we see around the world in the past few thousand years. Specific data points such as the extinction of Cuban ground sloths are highly persuasive to me. Evolutionary pressures are often interspecific, intraspecific and environmental; there is no need to assume the operation of one excludes the operation of another. And yet in natural history there is this constant tendency to present "silver bullet" models with one primary causal factor. Just because a condition is necessary does not mean that it is sufficient. I have no doubt that the correlations are highly striking, but the sequence of events need to be assessed in light of the full data set we have in terms of time and space. "Cold snaps" and extinctions are not sui generis, but rather are recurrent features of the history of our planet. Neandertals persisted for hundreds of thousands of years across at least half of Eurasia; during this period there were no doubt great fluctuations in temperature and ecological conditions. And yet they, and most other "archaic" types, disappeared in the space of several tens of thousands of years as "modern" morphologies spread across the world, while at the same time geneticists do conclude that the predominant (if not exclusive) element of our ancestry derives from the African continent within the last 100,000 years. Should we ditch this model because of a set of analyses of faunal remains? I'm skeptical, largely because I don't see the data above as falsifying the orthodoxy. Rather, it is compatible with a range of hypotheses.
As I tried to make clear in my post about cultural anthropology science that isn't physical or mathematical is hard. It's messy. There are lots of interlocking factors, statistical variation and historical contingency. I don't think that means it is impossible, but it does mean that it requires care, humility and an attention to detail. The logical structure of math is perfect, in theory you need to know only patches to infer enormous expanses. An ecologist I knew once joked about the shock that a physicist who was moon-lighting in a biostatistics class in graduate school experienced in terms of the noise that they had to confront in every experiment and analysis. In the historical and human sciences knowing a patch and generalizing from that locality seems to be a waste of time, and only justifies the critiques of extreme subjectivists. Rather, the structure of knowledge is by its nature a big picture with a riot of small details, and those details make no sense unless you familiarize yourself with the whole.
Note: The last sentence of the abstract, "it is suggested that population bottlenecks, genetic drift, and gene flow prevailed over human population replacement as mechanisms of evolution in humans during the EUP" is problematic for me...I kind of smell the tendency to use genetic drift as a deux ex machina here. Just like Judith Rich Harris pointing out the use of "interaction effects" as a get out of jail card in much of behavior genetics in No Two Alike.
Related: At Anthropology.net.
Friday, January 04, 2008
So I read that Andrew Olmstead has died. I didn't read his blog and only vaguely knew of the name, though Gene Expression is on his blog roll.* 5 years ago when I started blogging it was a relatively new medium; people shouted at each other with disembodied voices. But slowly blogging and real life have become intercalated so it doesn't seem as much of an escape or release, it's just another part of life for many of us. I thought about that when I heard that triticale had died. He wasn't a frequent commenter on this weblog, or my other one, but I have a hard time remembering when he wasn't a commenter. I suspect he goes back to the blogspot days of the summer of 2002. It's all very strange and we grow up at some point I guess. I'm sure that the adults in the readership know what I'm talking about....
* Admission, I read very few blogs (on the order of a half a dozen) with any regularity.
There's been some talk about the dynamics of genome architecture on the comments, so I thought I'd point to Mike Lynch's The origins of eukaryotic gene structure (Open Access). The paper is a reduction of the major themes presented in the book of the similar name. This is of course only Lynch's view; but by making a positive argument he sketches out implicitly objections and counterpoints to his contentions. You can find many more of his papers on his website; his ideas on genome architecture go way back (at least in terms of the "post-genomic era").
Thursday, January 03, 2008
Overcoming Bias has a post up about good comment boards. I think one's premises/ends matter on these sorts of things. What are comment boards about? What do you want to get out of them? We've run this weblog for quite a bit longer than Overcoming Bias has been in existence. I would say two specific issues crop up:
1) Stupid people. If you don't have a base of knowledge or the ability to think deeply then there is going to be a problem (a fair number of "elite" comment contributors came to this blog without much knowledge but their aptitude meant that they had no issues picking up the material over time).
2) People whose premises vary so sharply from your own that one can never have a fruitful conversation in regards to the primary issue at hand (that is, all debates devolve into explorations of alternative axioms and whether the axioms are valid or not).
A third general issue
3) The modal comment contributor (as opposed to the value added elite comment contributor) has little invested in the system. They don't police, they might never read their rant or check follow up responses. It's like someone shitting in a swimming pool for fun. It happens. And their happiness function is maximized at the expense of the health of other human beings.
I think the comment which praised ./ is spot on. I have considered implementing Slashcode, but I don't have the inclination to enter into that much work for a hobby like this. As it is, as long time readers know, I along with the other posters keep a very close eye on the comments. We prune those with little reputation on the slightest pretense and we allow those with more capital built up more freedom of expression (in substance and delivery).
Note: for those who wonder why have the crappy haloscan system, it is because it is off site. In previous years GNXP's comments were the primary reasons why we overtaxed our servers.
Wednesday, January 02, 2008
Provisional paper in PLOS Genetics, Comparing Patterns of Natural Selection Across Species Using Selective Signatures. Haven't had a chance to read it closely, but bet you if they had specified their species in the title they would have gotten fewer first looks. Flies might not be that sexy, but by definition they are sexier than prokaryotes....
Now, I'm no expert on Drosophila sex, but this paper caught my eye:
Mating in many species induces a dramatic switch in female reproductive behaviour. In most insects, this switch is triggered by factors present in the male's seminal fluid. How these factors exert such profound effects in females is unknown. Here we identify a receptor for the Drosophila melanogaster sex peptide (SP, also known as Acp70A), the primary trigger of post-mating responses in this species. Females that lack the sex peptide receptor (SPR, also known as CG16752), either entirely or only in the nervous system, fail to respond to SP and continue to show virgin behaviours even after mating. SPR is expressed in the female's reproductive tract and central nervous system. The behavioural functions of SPR map to the subset of neurons that also express the fruitless gene, a key determinant of sex-specific reproductive behaviour.In light of previous discussions on the relationship between genetic screens and "big biology", it's notable that this gene was identified in a genome-wide screen for genes involved in female post-reproductive behavior by RNAi knockdown--one of the newer tools in the modern geneticist's toolkit.
Tuesday, January 01, 2008
Evidence of still-ongoing convergence evolution of the lactase persistence T-13910 alleles in humans makes the case for a common Eurasian group of lactase persistence alleles across Eurasia, and among some African populations, derived from one mutational event within the last 10,000 years. A novel polymorphism associated with lactose tolerance in Africa: multiple causes for lactase persistence? points to other possible evolutionary events. The use of the term Africa is somewhat deceptive, these other alleles are found in groups like the Bedouin Arabs. One assumes that with the long history of agriculture in the Middle East these local alleles would have been relatively ancient as well. Why didn't they spread north as agriculture spread? And why didn't they spread east with agriculture? More of a question at this point.
Heads up, Just Science 2008 is on. You can sign up to contribute now! Here are those who have signed on so far. The dates are February 4th-8th, 2008. More notifications soon!