|
Saturday, May 31, 2008
Walker's World: French births soar:
The second development to note is that INED, France's National Institute of Demographic Studies, has done some detailed research and concluded that France's immigrant population is responsible for only 5 percent of the rise in the birthrate and that France's population would be rising anyway even without the immigrant population. A few points. First, even if there is convergence differentials still do matter. One thing I noted when surveying data on Mormon fertility is that though it has converged with non-Mormon fertility, the "floor" still usually remains higher than that of local non-Mormons. I'm not worried about a Mormon future of course because it is also a religion with a relatively high defection rate, but long term persistence of small differences do matter. Second, projecting to the year 2100 as many do today is very problematic. In the late 19th century some bureaucrats in the Ottoman government were relieved as the Christian Balkan provinces fell away through independence or assimilation into the Austro-Hungarian monarchy. The reason being the fact that Christians had higher fertility than Muslims; something most Muslims and Christians today would find a very peculiar worry. In After Tamerlane there is a reference to a racial triumphalist demographer writing in 1900 about the "fact" that in the year 2000 there will be 1.5 billion whites and only 400 million Han Chinese. Finally, variance matters. Note: Germany is something of an oddity in this. In most countries with low fertility, young women have their first child late, and stop at one. In Germany, women with children often have two or three. But many have none at all. Italy and Germany might both have low expectations in regards to the number of children a woman may have in her lifetime, but the shape of the distribution may matter a great deal if fertility is heritable to any extent (straight out of Genetical Theory here). Heritability need not be physiological; rather, it might be cultural and psychological propensities transmitted to the next generation. But if the data above hold one might expect German fertility to bounce back faster than Italian because a subset of the German population exhibit pro-natalist sentiments. (H/T Talk Islam) Labels: Demographics
Friday, May 30, 2008
This week, Probability.
Thursday, May 29, 2008
John Hawks, in a post on scientists who dispute the acceleration hypothesis (acceleration deniers?), makes reference to "the Stanford school of genetic orthodoxy". So what is this?
Essentially, he's referring to the current paradigm (I'm as much of a fan of hyperbole as anyone else, but paradigm is clearly the more appropriate word here) in the field of population genetics about the peopling of the world. The story goes like this: a small set of individuals from an ancestral population in Africa moved somewhere in the Middle East, and grew. Then from there, a small set of individuals moved nearby in each direction and settled. Ditto for those populations, and so on. These "serial bottlenecks" kept occurring until the entire world was populated, replacing the individuals that were there before them. The observation that solidified this paradigm comes from this paper, which showed an impressive negative correlation between distance from East Africa and genetic diversity, consistent with each population containing a subset of the diversity of the populations it came from. Since then, that sort of approach has been used in a number of similar applications, including this nice one on the peopling of the Americas. Further support for this paradigm comes from more recent work modeling human demography--it's simply not true that this out-of-Africa hypothesis is enforced like an orthodoxy. See, for example this paper entitled "Statistical evaluation of alternative models of human evolution" (lest you think that alternative models of human evolution aren't being evaluated), which concludes for a single origin of humans in Africa. This doesn't test the "serial bottleneck" model, but does address the multiregional hypothesis, which I think is the major point for Hawks. Or consider a more recent paper, which attempts (with moderate success) to infer the colonization history of the world. The results favor out-of-Africa, as well as serial bottlenecks (though theses bottleneck, it must be noted, were essentially built into their model). Now, new data may alter some of these models somewhat--David Reich and other claim here (in a News and Views article) that they see evidence for multiple waves of migration from Africa in PCA analysis, though it remains to be seen how those results hold up. I'm not sure what Hawks thinks of these papers--for all I know, they're making the multiregional hypothesis into a statistical straw man that is easily demolished, but the point remains that the consolidation of these observations into a paradigm is not entirely without reason. The statistical methods and genetic data are available to challenge it, and skeptics (I know many) are more than welcome to try their hand. Labels: Population genetics
Innovation in genetic variation research garners Jonathan Pritchard HHMI investigator appointment at University. HHMI = Howard Hughes Medical Institute. So when are we going to see him in Chicago Magazine? In any case:
In a 2006 paper, Pritchard and his colleagues described the identification of several hundred DNA regions in various human populations that show signals of selection. Included within those regions are genes that influence reproduction, olfaction and degradation of environmental toxins, skin pigmentation and skeletal development. Things that make you go hmmmmm....
Positive selection on EDAR, why East Asians & Native Americans have thick hair
posted by Razib @ 5/29/2008 01:36:00 AM
Positive Selection in East Asians for an EDAR Allele that Enhances NF-κB Activation:
Genome-wide scans for positive selection in humans provide a promising approach to establish links between genetic variants and adaptive phenotypes. From this approach, lists of hundreds of candidate genomic regions for positive selection have been assembled. These candidate regions are expected to contain variants that contribute to adaptive phenotypes, but few of these regions have been associated with phenotypic effects. Here we present evidence that a derived nonsynonymous substitution (370A) in EDAR, a gene involved in ectodermal development, was driven to high frequency in East Asia by positive selection prior to 10,000 years ago. With an in vitro transfection assay, we demonstrate that 370A enhances NF-κB activity. Our results suggest that 370A is a positively selected functional genetic variant that underlies an adaptive human phenotype. We've blogged about EDAR before; Could it be hair form?, EDAR controls hair thickness and EDAR and hair thickness. The story here is simple, before the populations ancestral to the Native Americans had left eastern Asia a mutation on the EDAR gene swept nearly to fixation among these populations. The derived SNP in particular is correlated with the thicker hair typical of East Asians and Native Americans. In other populations (Europeans, Africans, West and South Asians as well as Papuans and Melanesians) the SNP is in an ancestral state. The main twist in this study is that they used a molecular genetic technique to show that this derived state seems to upregulate the activity of NF-κB transcription factor. For the record, I'm really skeptical that this selective sweep occurred because the human populations of late Ice Age eastern Asia developed a really strong attraction to thick luxuriant hair with full body. The paper is Open Access, read the whole thing. Since the most interesting figure is either too small or too large, I've resized it appropriately and placed it below the fold. ![]() Labels: Population genetics
Tuesday, May 27, 2008
Group differences - within and between - pick a standard please!
posted by TangoMan @ 5/27/2008 01:59:00 PM
The debate over at The American Scene on Jim Manzi's article "Undetermined" is now closed to comments so I couldn't respond to one of the comments but the beauty of being a blogger is that you can use your own forum to vent your response.
The comment that I desperately wanted to respond to was left by Joe Shipman and reads as follows: One thing that is established beyond any possibility of scientific doubt, of course, is that the genetic variability in IQ within races is much larger than the variability between races; any ethnic group of nontrivial size will have plenty of smart people and plenty of dumb people, and basing, say, educational policy on group rather than individual characteristics is therefore not only unAmerican but scientifically misguided. Joe, will you join with me in advocating the complete dismantling of efforts to ameliorate the racial and gender wage gaps that exist, in that they too demonstrate that wage variability is larger within groups than between groups? I hit on this topic a few years ago: It is important to recognize that most wage inequality occurs within and not between groups. The unweighted average Gini coefficient across all race, gender, and education groups was 0.256 in 1995, over 80 percent of the total Gini. Put another way, if all groups had identical mean wage rates (for example, black male dropouts had the same average wages as white male college graduates) but wages differed within groups as they do today, nearly all the inequality in wage rates would remain. You know, if it's unAmerican and unscientific to craft social policy on observed group differences then surely the fact that the variability in Black or Hispanic incomes is greater within their groups than it is between their group and, say, Caucasians or Asians, is an unscientific and unAmerican basis upon which to craft social policy to address the between group differences in income. What's good for the goose is good for the gander, right Joe? Labels: human biodiversity
Monday, May 26, 2008
Via Reihan, Instapaper. Need to not get behind on my Tech Crunch feed....
(any other "life hacks" you know of?)
Sunday, May 25, 2008
When Histories Collide: The Development and Impact of Individualistic Capitalism
posted by Razib @ 5/25/2008 06:10:00 PM
A few weeks ago Steve mentioned Raymond Crotty's When Histories Collide: The Development and Impact of Individualistic Capitalism. When I clicked through the link the cover looked familiar; turns out that I'd seen it at my local used book store and had passed on it since I already had a backlog of economic history I was working through. But Steve's post piqued my interest, and Greg also had mentioned Crotty's lactose tolerant-centric theory of history. So I purchased it, and read it over the past week. Even including the foreword and preface this is a short work, around 300 pages, but When Histories Collide is relatively data dense and at times almost inscrutable to anyone not stepped in Irish agricultural economics & cliometrics. The text has a disjointed feel, and Crotty's son who had to edit the work after his father's death notes that he placed the Irish chapters near the end of the narrative despite the fact that they would likely have been interspersed through the text seamlessly if his father had his way. Though the author's death before the final revisions to When Histories Collide might have dampened its public reception somewhat, I have to observe that Crotty's swan song is laced with far more quantitative econometric detail than Greg Clark's Farewell to Alms. Combined with the constant algebra of factors of production and implicit references to comparative statics, I believe that When Histories Collide simply lacks the literary elegance to have had any mass market appeal. That being said, who cares about mass market appeal? I don't. When Histories Collide is larded with just the type of data which keeps you turning the page.Of course, data isn't the only item on the menu here, as an economist Crotty brings some noticeable theoretical baggage. His central thesis is that the rise of individualistic capitalism in West Central Europe1 is sui generis, as distinct from hunter-gatherer, pastoralist and Asiatic modes of production (he uses the term riverine agriculture, but it's pretty clear that most people would recognize this as Asiatic mode of production). Additionally, there are a few other types, such as the slave based individualistic capitalism of the ancient Mediterranean, and the elite capitalism of post-colonial states (e.g., Latin America). Crotty's macrohistorical model as it applies to development economics is rather straightforward: individualistic capitalism emerged in a particular place and time, and the Great Divergence is a byproduct of those conditions. These means the export of this system of economic development and productivity is going to be problematic; the societies of East Asia are a particular exception because they were not colonized and so their indigenous cultural systems were not extinguished. Rather, these societies integrated Western ideas and tools in an eclectic manner in keeping with their cultural biases and strengths. Crotty labels the East Asian Tigers as "collectivist capitalism." From what little I know of East Asian economic production I don't think this is an unfair characterization, though globalization is making these typologies less relevant when transnational companies span civilizational boundaries. Despite the editing which places the Irish material toward the end of the book it is quite clearly foreshadowed throughout the book. It is Crotty's deep case study which illustrates just how sui generis individualistic capitalism is, and how difficult, nigh, impossible, it is to export it to colonized societies which are habituated toward a different mode of production (Ireland leans towards pastoralism). Ireland, being the British nation's first large scale colony, and the longest experiment in such a relationship (lasting from the Tudor period down to 1921), is therefore ideally placed to illustrate the general dynamics. Additionally, a few particularities of Ireland such as its proximity to the colonizing country and its later assimilation into the European Economic Community bring into sharper focus the causal factors behind its deviations from the standard post-colonial narrative. There is unfortunately an awkward problem; the Celtic Tiger. Crotty died in 1994, and he was clearly writing until the end as his statistics are up to date as of 1992. But it is also obvious that a great deal of the material draws upon the author's nearly 40 years of scholarship in the field of agricultural economics, so the echo of the Ireland of 1960 looms large. From what I can tell it seems that Crotty assumed that the economic robusticity of the Ireland of the second half of the 20th century was a credit driven mirage which would ultimately founder on the lack of institutional support; and it seemed that he believed he saw it already occurring by the early 1990s. I am well aware that there is a great deal of debate about Ireland's economic growth and how to interpret the various indices. As with much social science there is plenty of revisionism which attempts to dig out more nuance from the "first look" impressions generated by something like per capita income. I'll grant that on the margins there is something to debate, but I think it's pretty clear at least over the past 15 years Crotty was just not right about Ireland and its inability to join the other nations of Europe on their level and terms. The foreword for When Histories Collide was penned by a colleague of the author in 2001; and I felt his assessment of the Irish economy since Crotty's death was telling, as he made only the most perfunctory attempts to defend his friend's doom & gloom prognostication. It seemed implicitly to be suggesting that prediction is less critical in a work of such grand scope as When Histories Collide, at least over normal human time frames, and the insights which one might gather are still worth an examination of the full structure of the argument.2 I agree with this. Because of my general skepticism of the predictions made in When Histories Collide I will avoid detailing the Georgist prescription which Crotty offers as his ultimate plan for how to ameliorate poverty. But, I want to highlight one more systematic flaw: a general tendency toward historical sloppiness. This seems to be a major feature of economic history in general; the preoccupation with macroeconomic forces tends to sweep aside details of history to the point where falsity and misrepresentation regularly creep in. John Nye's War, Wine, and Taxes: The Political Economy of Anglo-French Trade, 1689-1900 is a nice exception to this general rule, but I suspect the rather narrow purview of this work explains the tight fidelity to reality as opposed to the standard stylized historical sketches borrowed from high school textbooks. I will offer two examples which illustrates the weakness in the area of historical details which plague When Histories Collide. First, Crotty offers a model for the emergence of ancient Mediterranean civilization predicated on the synthesis of Indo-European pastoralists with indigenous Phoenician agriculturalists. What's the problem here? Phoenicians were a specific group of Semitic speaking peoples who flourished in what is today's Lebanon. As a colonizing people they do not pre-date ~ 1000 BCE. We know that Greek was spoken in Greece by ~1500 BCE; and likely Indo-European languages were extant on the northern shore of the Mediterranean well before 1500 BCE. Additionally, note that I stated on the northern shore of the Mediterranean, Phoenician colonies as it happened were almost all planted in non-Indo-European areas, and mostly on the south shore. By Phoenician Crotty actually means the diverse pre-Indo-European speaking substrate of the northern Mediterranean which the Greeks, Latins and other assorted peoples replaced and assimilated (some of these pre-Indo-European speakers remained down to the Roman period, e.g., Iberian). Of course, I will admit that to some extent the error doesn't truly undermine the structure of the author's thesis: that there was a synthesis between these two broad cultural types (whatever you may term them) which resulted in the civilization of the Mediterranean. The second argument is specifically about the economic motor of the ancient Mediterranean, and I believe it to be a more telling error. In short, Crotty argues that the creativity of ancient Mediterranean capitalism was contingent upon the ubiquity of slavery. Slavery was a normal institution before the modern period; all societies had some forms of slaves, but different societies practiced slavery to different quantitative extents. The cultures of the ancient Mediterranean, specifically Greece and Rome, as well as the more recent race-based slave societies of North America and the Caribbean, are exceptional in the centrality of slavery in terms of economic production (in many societies slaves are luxuries or a trivial demographic). Though slaves were only a minority in the Roman world (around 25% of the population) Crotty argues that their economic productivity was the engine which drove the efflorescence of ancient civilization. Slaves could not consume the fruit of their own labors, so their productivity so sequestered freed up ancient elites as surplus for leisure and warfare. The structure of the argument here is plausible, but the problem is when Crotty makes the argument about where the slaves came from: the steppe. Here Crotty his going back to his history where Indo-Europeans and Phoenicians came together to produce Mediterranean civilization; the steppe is a region where the need for labor is minimal because of the pastoral lifestyles, it is land that is limiting, and so the excess population is driven into the cauldron of civilization. Here, they are enslaved and their productivity drives Mediterranean society. The problem is that it seems to me totally implausible that the steppe had a large enough population that it could ever have supplied enough human beings to replenish the constantly dying 1/4 of the Roman Empire's population which consisted of slaves. As a point of fact it seems that Northern Europeans, especially those from beyond the limes, were the primary exogenous source of slaves. But, I also have read that the Romans bred slaves on farms in Sicily, so there was also endogenous production. Crotty's argument is that when the Roman Empire reached its natural limit and was no longer sucking in slaves through wars it naturally collapsed because of the diminishing of its economic engine. I don't believe this, it oversimplifies some real complexities of the period between Augustus (the early Empire) and Late Antiquity, when the classical state collapsed. The consensus scholarship seems to be that the Roman Empire recovered from a near collapse in the 3rd century in the 4th, and though society was reconstituted so that we could see the vaguest of outlines of the medieval system already in the post-Diocletian period, it may be only that repeated exogenous shocks in the 5th century succeeded where those of the late 2nd and 3rd failed. Instead of a historical deterministic process, what we have is historical contingency which would have lead to a fall probabilistically at some point. I've harped on the negative points of the book to this point because I don't want people to purchase this assuming that they'll get a literary tour de force on the scale of Guns, Germs and Steel; rather, When Histories Collide exhibits all the strengths and weaknesses of Farewell to Alms magnified. But what are those strengths? Here's a sample of what I think is worth reading through all the issues above for: ...Very roughly, the same pastoral resources will, in a year produce 400 gallons of milk (the yield from a mediocre cow) or 250 lbs. liveweight gain from a bullock, or ox. Consider that. The milking of cows can increase the agricultural productivity of a unit of land by an order of magnitude! (assuming that the land is not arable) No wonder lactose tolerance swept across many populations so fast! Crotty identifies three general cow-cultures: 1) The pastoralist model 2) The South Asian model, the "apotheosis of the cow" 3) The European model; in particular, the model associated with the rise of individualistic capitalism The pastoralist model is pretty straightforward; man have cow, man milk cow, man defend cow from enemy. This way of life puts a premium on land but requires little labor or capital inputs; you put the cattle out to pasture and make sure that they do their thing and protect them from predators and rustlers. Crotty presumes that this was the culture which arose among the Proto-Indo-Europeans on the steppe, and it is what exists among the Nilotic peoples of Africa, and also was the norm among the pre-modern Irish. The apotheosis of the cow doesn't need much description, everyone knows that South Asians (Hindus and affinal groups specifically) do not consume beef and hold the cow to be sacred. The standard economic explanation here is that cows are more efficient bundles of calories integrated over time through milk extraction than as a one time item for slaughter. But there's a twist to this in India which I only know because of Marvin Harris' Cows, Pigs, Wars, and Witches: The Riddles of Culture: the cows which wander the cities and countryside of India are generally cows, that is, female. Where are all the bulls? They're being used as draught animals! South Asian agriculture is obviously extremely intensive in labor, and cattle serve the role that water buffalo do in the moister regions of Asia (including the margins of South Asia). Crotty doesn't mention this, but I think that's another part of the puzzle. And it fits in which another datum which plays a role in the thesis of how and why individualistic capitalism arose: Indian cows need to have their young reared to give milk. Obviously you couldn't kill the calf which was the reason for the milk production. The final cattle culture is that of Western Central Europe; the region of northern France, the Low Countries and Western Germany characterized by 3-crop rotation and draft animals with mouldboard plough by the medieval period. I can't do justice to the detail of Crotty's argument here, and to some extent I don't think it all fits together, but there are many intriguing pieces. Lactose tolerance comes into the picture because when the Proto-Indo-Europeans brought the cow and their ability to digest milk into Central Europe they opened up the possibility for a new lifestyle. Because of the low agricultural productivity in this region in regards to cereals, using these crops as fodder for cattle was attractive. Middle Eastern cereals were simply not well adapted during the early phases to the Northern European climatic regime. Additionally, low quality or unpalatable crops like oats were sometimes the only option, and these were more productively fed to cattle to convert into more palatable nutritional items (whether as milk or meat). But there's a problem here: European winters mean that there's no pasturage to keep the cattle alive through the winter. So fodder is of the essence. Because this is relatively limited Europeans would have to kill most calves so as to maximize the feed for the adult cows. After you kill an animal, of course you eat it since to do otherwise would waste valuable protein and fat, so there is no apotheosis of the cow in a society where the cow may nevertheless be a central fixture. The production and generation of fodder for cattle by smallholders is a critical part of the story of the generation of individualistic capitalism. In short, it habituates the average person toward low time preference as an interlocking set of agricultural operations are set into motion to maintain subsistence in an ecologically marginal environment. For Crotty the "bottom up" nature of this societal shift is critical as the emphasis on capital over land or labor as the factor of production which would increase marginal product is what will lead to the takeoff of the Great Divergence thousands of years into the future. Unlike classical Mediterranean civilization Western European individualistic capitalism can weather famine, pestilience, and other exogenous shocks of God. Capital persists while slaves die. The dispersal of technological initiative through society gives it a redundant robusticity lacking in other top-down civilizations. In our modern world the power of capital in the form of technological innovation in perpetuating a world of plentitude always outrunning the Malthusian Trap is obvious, but only Central West Europe managed to hit upon that formula in the pre-modern world because of the confluence of particular ecological and cultural parameters at a particular moment in time. Crotty argues that the capital intensive post-Malthusian Developed World was not inevitable, but a contingent fact of history. Ecological constraints play a large role in explaining why Ireland is different in this model; the mildness of Ireland's maritime regime means that winter fodder is unnecessary. Transhumance and semi-pastoralism is feasible in this scenario; and, critically it is important to note that pre-modern Irish cattle were more like their South Asian cousins than continent European lineages. They only gave milk when with calf! The selection process whereby only the best milk producing lineages were kept and most calves killed in the fall did not apply to Ireland. From this ecological difference flows the great differences between the folkways of the Irish and those of peoples to the East. Speaking of which, When Histories Collide takes occasional forays into Eastern Europe, where it is explained that autocratic capitalism developed along the Slavic frontier. Here obviously land was not limited, and labor was at a premium, but capital intensive methods from the West could be introduced periodically to push the frontiner outward. By the time of the gunpowder empires the steppe ascendency in terms of arms was finally banished and a synergistic alliance of peasants (labor) and boyars (capital) swept across the land. But the fundamental distribution of techniques remained distinct from those of Western Europe, where technological innovation bubbled up from below rather than horizontally via elites. In the north, in Scandinavia, the local human capital was well equipped to leverage the horizontally transmitted suite of the Western European economic system, replicating individualistic capitalism once the technological wavefront had pushed far enough to overcome ecological hurdles. Obviously this is only a small slice of the arguments presented in Raymond Crotty's magnum opus, but it's a representative taste. Clearly I think there are some serious issues with the depth of the scholarship on the margins, but the details of history which I think are rather embarrassing in the ignorance that they bespeak is not entirely out of place in the corpus of economic history. That being said, as I noted above some of the arguments about the slave-based individualistic capitalism of the ancient Mediterrean are premised on unrealistic assumptions which derive directly from the lack of a dense network of historical priors. Crotty's analytic tools were ones of mathematical economics, and his empirical database was one of agricultural economics, in particular Irish agricultural econometrics and history. The limits to his disciplinary horizons often shows. Nevertheless, I don't believe that Crotty falsified the tables or the quantitative data he repeats, and those alone are worth perusing this book. Who knew that for most of history the per unit productivity of agriculture in China was about twice that of South Asia? Crotty did, and I didn't. I don't think that the grand theoretical arguments should be taken without some major salting and curing; as I said recent history seems to have proved him wrong in Ireland, and to a lesser extent the post-colonial world as a whole. His model of the past has great descriptive flaws and would have been better served with a more robust cliometric framework. But by & large When Histories Collide is a good complement to more polished recent works such as Farewell to Alms and The Great Divergence. Related: 10 Questions for Greg Clark, A World of Difference: Richard Lynn Maps World Intelligence, Group lifespan differences? Maybe it's agriculture and The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Do note that Amazon is telling me that those who purchased When Histories Collide also bought The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Not surprising, but shows the general slant of Crotty's macrohistory. 1 - By West Central Europe one can imagine the lands which are just to the West and East of the Rhine; northern France, the Low Countries and western Germany. 2 - Crotty also asserts that post-colonial poverty will increase in the future do the poor fit between individualistic capitalism and most societies. I think on the balance Crotty has again been proven wrong; even removing China from the equation it seems that on the whole the world has not seen economic retrogression with the possible exception of large swaths of Africa. Despite the Asian flu of '98 and rollback from the Washington Consensus, both Southeast Asia and Latin America seem to be better off than they were a generation ago. Labels: Economics
Friday, May 23, 2008
Thursday, May 22, 2008
While I'm recommending weblogs you might not have heard about, check out Mailund on the Internet. The tag line is "computer science, bioinformatics, genetics, and everything in between." For a taste, here are his posts on association metholodiges.
Labels: Genetics
Tuesday, May 20, 2008
My friend Jake Young has a post up, Contrasting Views on the Gender Disparity in Science:
Second, one of my primary arguments against innate differences in ability between men and women is that you are dealing with traits that have distributions and those distributions largely overlap. Making a statement about any individual man or woman is largely useless. The odds of a women or man selected at random being better or worse at math are not particularly different. This argument applies just as well to differences in preference. Maybe there are differences on average, but they are still distributions that overlap. The key question becomes: to what degree do those distributions overlap? How different on men's and women's preferences on average? James Crow's Unequal by nature: a geneticist's perspective on human differences is apropos here: There is actually a simple explanation that is well known to geneticists and statisticians, but not widely understood by the general public or, for that matter, by political leaders. Consider a quantitative trait that is distributed according to the normal, bell-shaped curve. IQ can serve as an example. About one person in 750 has an iq of 148 or higher. In a population with an average of about 108 rather than 100, hardly a noticeable difference, about 5 times as many will be in this high range. In a population averaging 8 points lower, there will be about 6 times fewer. A small difference of 8 points in the mean translates to severalfold differences in the extremes. Labels: human biodiversity
Several of my previous notes have touched on the subject of Sewall Wright's F-statistics. The best known of these is FST, which is very widely used as a measure of the genetic divergence between sub-populations of a species. My aim in this note is to trace the evolution of the F-statistics in Wright's work.
Why F? A preliminary question is one of terminology. What, if anything, does the letter 'F' stand for? One plausible answer is that it stands for 'fixation', since among other things the F-statistics can be used to measure the rate at which alleles tend to be 'fixed'. Wright himself in his later writings sometimes refers to F as an 'index of fixation'. Plausible though this may be, it does not seem to be the origin of Wright's use of the letter F. This first appeared in his series of papers on 'Systems of Mating' in 1921, where he uses the letter F (in its lower-case form 'f') as a symbol for the 'correlation between uniting gametes' and as a measure of inbreeding. Although the word 'fixation' does occur in these papers, Wright does not say that 'f' stands for 'fixation'. The banal truth seems to be that by the time Wright needed a symbol to represent the correlation between uniting gametes, the letters a to e had already been allocated to other purposes, so that f was the first available letter in the alphabet. F as correlation between uniting gametes Wright's primary use of F (or f) is to designate the correlation between uniting gametes. The general idea of a correlation between gametes is now somewhat unfamiliar. If there are varying types of gametes in the population, uniting gametes may be said to be positively correlated if the same types tend to be paired together at mating, or negatively correlated if dissimilar types are paired. If the different alleles at a locus in the population are given notional numerical values, such as 0 and 1, a correlation coefficient for the correlation between pairs of uniting gametes can be calculated in the usual way. (For a fuller explanation see my post on Wright's measurement of kinship.) The resulting correlation coefficient is F. Heterozygosis and the correlation between gametes Also in 1921 Wright points out that the correlation between uniting gametes is connected with the proportion of heterozygotes in the population. Whether an individual is heterozygous at a locus is determined by the gametes (egg and sperm) of its parents which unite to form a zygote at fertilization. If they are identical at that locus, the offspring is homozygous, otherwise it is heterozygous. The proportion of heterozygotes (the level of heterozygosis) among the offspring, over and above the level expected with random mating, can be calculated from the correlation between uniting gametes, and vice versa. In SM1 Wright calculates that the percentage of heterozygosis is (1/2)(1 - f), where f is the correlation between uniting gametes. (This is stated without full proof, but I have checked it, calculating the correlation by the method of notional values.) This formula is only valid for the special case where there are two alleles with equal proportions of 1/2 in the population, but Wright soon (in 1922) generalised it to the case of two alleles with proportions of p and q = (1 - p), in which case the formula is 2pq(1 - f). He also began to use upper-case F, rather than f, as his preferred notation. F as a measure of inbreeding in a population A positive correlation between uniting gametes can arise in two ways (apart from mere sampling error): by assortative mating between similar phenotypes, or by mating between genetic relatives, in other words by inbreeding. Wright deals with both inbreeding and assortative mating, but gives more attention to inbreeding. If assortative mating is excluded, then F can be used as a measure of the average degree of inbreeding in a population. If the correlation between gametes is due solely to inbreeding, then the formula 2pq(1 - F) for the percentage of heterozygosis in a population can be given a simple interpretation in terms of Malecot's concept of Identity by Descent. The two genes at a locus in an individual are either Identical by Descent (IBD) from a common ancestor, or they are, by assumption, drawn randomly from the gene pool. In the first case they are certainly identical. In the second case, applying the familiar Hardy-Weinberg formula, they have a probability of (1 - 2pq) of being identical. Therefore if we interpret F as the probability that the two genes are IBD, on average for the population, the total probability that they are identical is F + (1 - F)(1 - 2pq) = 1 - 2pq(1 - F). Subtracting this from 1 to get the probability of heterozygosity we get the required formula 2pq(1 - F). F and the inbreeding of individuals The degree of inbreeding in a class of individuals (e.g. all offspring of matings between siblings) can be derived from an analysis of the way in which they are bred. The coefficient of inbreeding then measures the correlation between any pair of alleles at the same locus in an individual belonging to that class. The level of inbreeding in an offspring can be derived from the correlation between the uniting gametes of its parents, which in turn can be derived from the correlation between the parents themselves, in accordance with Wright's method of path analysis. The full method would involve considerations of dominance, heritability, and so on, but the coefficient of inbreeding is usually derived using a simplified method devised by Wright himself and expounded in several papers of the early 1920s (see especially paper 2 in ESP). In the simplest case, for the offspring of half-siblings who are not themselves inbred, Wright's formula gives a coefficient of inbreeding of 1/8. This is the same as the figure derived by the methods of Malecot for the probability in this case that the two genes at a locus in the offspring are identical by descent. In Malecot's approach this result is derived from explicit assumptions about probabilities. It is assumed that each gene in an offspring has a probability of 1/2 of coming from either parent, and - very importantly - that there is an independent probability of 1/2 that the same gene is inherited by any other offspring of the same parent. This is an assumption which is usually empirically correct (with certain exceptions such as sex chromosomes), but it is not logically necessary. For example, if surviving offspring came in pairs, each member of which received genes from complementary chromosomes in the parent, such pairs of offspring would have a lower correlation with each other than the usual calculations would suggest. It is therefore worth asking what features of Wright's approach take the place of the explicit probability assumptions in Malecot's system. The first key assumption, that each gene in an offspring has a probability of 1/2 of coming from either parent, is explicitly stated as a biological assumption (with the exception of sex-linked genes) in Wright's derivation of the path coefficient between offspring and parent. The other key assumption, that there is an independent probability of 1/2 that the same gene is inherited by any other offspring, does not seem to be explicitly stated. In SM1 Wright only directly calculates the correlation between parent and offspring. All other correlations, such as those between siblings, are derived indirectly from the parent-offspring correlation by the method of path analysis. The assumption of independent probabilities for each offspring seems to be built into the general assumptions of path analysis. In a late discussion of the principles of path analysis Wright emphasised that 'The validity of the system requires that any variable that enters into the system as a common factor back of two or more dependent variables, or as an intermediary in a chain, vary as a whole. If one part of a composite variable.... is more significant in one relation than in another, the treatment of the variable as if it were a unit may lead to grossly erroneous results' (EGP vol. 1 p.300). Fortunately, the assumption appears to be consistent with the usual pattern of genetic inheritance. Apart from special cases such as sex-linked genes, or MZ twins, it seems that each surviving offspring has an equal and independent probability of receiving any given allele from the same parent. This is despite the fact that during the formation of gametes the precursor-cells of the gametes are formed in pairs with complementary alleles from different chromosomes in the parent. In the case of eggs, only one of the proto-eggs formed from the same parental cell usually survives. In the case of sperms, so many sperms are produced in total that the chance of two sperms derived from the same parental cell both ending up in surviving offspring is negligible. F as a measure of inbreeding relative to a foundation stock One of Wright's original motives in devising his F statistics was to measure the effect of continued inbreeding over a number of generations. In agricultural (and laboratory) practice it is common for animals to be bred systematically over long periods using close relatives, e.g. mating sisters with brothers, or daughters with their fathers. With such practices the level of inbreeding among the offspring rises over the generations, and the level of heterozygosis declines. Wright's F-statistics provide a convenient method of measuring this process, superior to the previous ad hoc methods. The result of a number of generations of inbreeding within an inbred line can be summarised in the average F within that line, relative to the foundation stock (the population from which the inbred line is derived). The cumulative decline of heterozygosis since the inception of the line can then be calculated using the formula 2pq(1 - F). But this should raise questions about the precise meaning of F in such a case. F is in principle always a correlation coefficient, and could if necessary be expressed in terms of the Pearson product-moment formula. This requires the mean and standard deviation of the relevant statistical population to be specified. But what is the mean in the present case? The correlation is said to be 'relative to the foundation stock', so this appears to be the relevant statistical population, but the foundation stock no longer exists, and the correlated pairs are not part of it. So what is going on? Is F a legitimate correlation coefficient at all when more than one generation is involved? This puzzled me until I paid proper attention to page 169 of SM5. This gives the key to the mystery. Rather than just considering the correlation within a single inbred line, we must consider an indefinitely large (actual or hypothetical) ensemble of lines, all separately inbred according to the same system (e.g. sibling mating) for the same number of generations, and all derived from the same 'foundation stock'. The mean gene frequencies for the entire ensemble (or a large random sample thereof) should then be the same as in the foundation stock (in the absence of selection and mutation), but will vary within each particular inbred line according to the chance variations resulting from the reproductive process. F will therefore measure the average correlation within each such line as compared with the values of the foundation stock. Such a correlation coefficient will usually be hypothetical, since no such ensemble actually exists, but in principle it has a clear meaning consistent with the general method of correlation. The story so far The uses of F (or f) identified so far were all first described in Wright's ground-breaking 'Systems of Mating' in 1921. The different uses therefore cannot be put in a chronological sequence. Logically, however, the sequence is as follows: a) F as the correlation between uniting gametes. This is always the fundamental conception. b) F as a measure of average inbreeding in a population. In this sense it is closely connected to the level of heterozygosis. c) F as a measure of inbreeding in an individual. In this sense it is closely connected to the measurement of relatedness. d) F as a measure of continued inbreeding in a line relative to a foundation stock - see the last paragraph. F in natural populations As developed by Wright in 1921, the concept of F was heavily influenced by the circumstances of agricultural stock breeding, where mating is carried out in accordance with some deliberate plan. (Wright was employed in agricultural research for the US Department of Agriculture at the time - see Provine, chapter 4). The next major step was Wright's application of F to the measurement of genetic drift in natural random-mating populations. It is clear from Provine's biography that Wright first took this step around 1925, but the results were not fully published until the major paper on 'Evolution in Mendelian Populations' in 1931. I have discussed genetic drift in a previous post, and will not repeat that discussion here. The essential point is that in any finite population, over the course of time, there will be a tendency, purely by chance, for some lines of ancestry to be relatively successful, while others dwindle and eventually die out. The result is that, in the absence of selection or mutation, fewer alleles will account for a larger proportion of genes in the population, and the level of heterozygosis will decline. As a result of genetic drift, F tends to increase at a rate of approximately 1/2N per generation, where N is the size (strictly, the 'effective' size) of the random mating population. But F is still in principle the correlation between uniting gametes. Since the correlation between uniting gametes within a random mating population is zero, how can there be an increasing value of F? The answer is again that F is a correlation relative to the baseline of a 'foundation stock'. Wright does not, so far as I know, explain what exactly this means in the case of a natural random mating population, but I think we can understand it by analogy with the case of inbred agricultural breeding lines. We are to imagine that from a specified generation onwards a population is allowed to evolve by random genetic drift in a large number of hypothetical different ways. Within each of the resulting hypothetical descendent populations there will be a correlation between uniting gametes relative to the entire ensemble of hypothetical outcomes. The average of these correlations is constantly increasing. It is conceivable that in some cases the actual observed value of F - the correlation between uniting gametes within an actual population relative to that in the foundation stock - would be negative, but the expected average F is always positive. F in subdivided populations If a number of subgroups of a population breed within themselves in full or partial isolation from each other, the gene frequencies within them will tend to diverge from each other as a result of selection or genetic drift. Within each such subgroup, individuals will tend to be more similar to each other than to individuals randomly selected from other subgroups or from the entire population. Within the groups, individuals will therefore be positively correlated with each other relative to the entire population. Wright developed a system of F-statistics to analyse the structure of subdivided populations. This is one of his major contributions to population genetics after the fundamental paper EMP of 1931. The best-known of the F-statistics is FST, where S and T should ideally be subscripts, and stand for 'subpopulation' and 'total population'. The expression FST is possibly first used in a paper of 1950 (ESP p.585), but the underlying concept was first developed in a paper of 1943 on 'Isolation by Distance'. (I will cite this from the reprint in ESP, but it may be available online here. I downloaded it successfully once, but on another occasion got an error message.) Wright considers a population subdivided into a number of subpopulations of equal size, within which mating is random, and with two alleles at a locus. He shows, by a relatively simple but ingenious proof (ESP p.403), that in this case the correlation between uniting gametes within each subpopulation, relative to the total, is equivalent to Vp/pq, where Vp is the variance of the gene frequencies of the subpopulations (i.e. the mean square of their deviations from the frequency in the total population), and p and q are the frequencies in the total population. In 1943 this correlation is simply called F, but it is in fact the measure later known as FST. Wright recommends that the square root of F could usefully be taken as a measure of the genetic divergence between populations. (Of course, the rank order will be the same whether we take F itself or its square root as the measure.) It may also be noted that Vp/pq cannot be negative, as both the numerator and denominator are necessarily positive or at least zero. In general, a correlation coefficient may be either positive or negative, but in this case F measures the correlation due to the average differences between the gene frequencies of subpopulations, regardless of sign, and these cannot be less than zero. In the same 1943 paper, and in subsequent papers of the 1940s, Wright developed methods for dealing with correlations within hierarchically subdivided populations, where mating within each division may or may not be random. His terminology varied somewhat, but by 1950 he seems to have settled on the following (with IT, IS, and ST as subscripts): FIT: inbreeding coefficient of individuals relative to the total population FIS: inbreeding coefficient of individuals relative to the subpopulation FST: correlation between random gametes drawn from the subpopulation relative to the total population. (If mating is in fact not random within the subpopulation, this is a hypothetical correlation.) Wright shows that these measures are related by the equation FST = (FIT - FIS)/(1 - FIS). (For a relatively simple proof see EGP vol. 2 p.294-5, but note that the left square bracket in Equation 12.14 on that page is in the wrong place: it should be immediately before the first occurrence of qT.) It may be seen that if FIS is zero, in other words if mating within subpopulations is random, then FST = FIT. This is as it should be, since in this case the only source of correlation between individuals is the division of the population into subpopulations. FST then accounts for the entirety of the correlation within the total population, which is FIT. Wright's F-statistics are still widely used or alluded to, but are seldom understood in their original sense as correlation coefficients. Inbreeding within individuals is now usually explained by means of Malecot's Identity by Descent, while FST is usually explained in a way more appropriate to Masatoshi Nei's GST. Wright's work was however clearly the inspiration and foundation for the work of these later geneticists. A few cautions about the use of FST may be useful. a) Wright originally intended FST to be calculated as an average over a large number of subpopulations. In theory, it would be possible to calculate it for as few as two subpopulations, in which case, if they are of equal size, FST is d^2/pq, where d is the deviation of the subpopulation frequencies from the frequency in the total population. So far as I know, Wright himself never used it in this way. b) FST is calculated from gene frequencies on a locus-by-locus basis. It may well vary from one locus to another. To get an indication of the extent of evolutionary divergence between subpopulations, it is desirable to take the average FST over a large number of loci. c) FST is not simply proportional to the length of time or number of generations that two subpopulations have been diverging. Other factors such as the amount of migration between them and the size of the populations are also relevant. Small populations diverge by genetic drift far more quickly than large ones. d) Wright intended FST mainly to be used for genes that are not subject to significant natural selection. Genes that are under selection may diverge either more or less in different subpopulations than an average FST would suggest. References: William B. Provine: Sewall Wright and Evolutionary Biology, 1986. Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986. (ESP) Sewall Wright: Evolution and the genetics of populations, 4 vols., 1968-1978. (EGP)
Think Gene points me to a new PNAS paper, Structure of TRPV1 channel revealed by electron cryomicroscopy:
The transient receptor potential (TRP) family of ion channels participate in many signaling pathways. TRPV1 functions as a molecular integrator of noxious stimuli, including heat, low pH, and chemical ligands. Here, we report the 3D structure of full-length rat TRPV1 channel expressed in the yeast Saccharomyces cerevisiae and purified by immunoaffinity chromatography. We demonstrate that the recombinant purified TRPV1 channel retains its structural and functional integrity and is suitable for structural analysis. The 19-A structure of TRPV1 determined by using single-particle electron cryomicroscopy exhibits fourfold symmetry and comprises two distinct regions: a large open basket-like domain, likely corresponding to the cytoplasmic N- and C-terminal portions, and a more compact domain, corresponding to the transmembrane portion. The assignment of transmembrane and cytoplasmic regions was supported by fitting crystal structures of the structurally homologous Kv1.2 channel and isolated TRPV1 ankyrin repeats into the TRPV1 structure. Think Gene and Scientific Blogging have summaries of the paper. Proteins are great, but what about the genes which produce them? I went to haplotter, and check out what I found.... iHS ![]() D ![]() Screenshot of genes around TRPV1 ![]() Also check out variation around that gene. Related: Genetics of taste. Labels: Genetics
Monday, May 19, 2008
|