|
Tuesday, May 13, 2008
99% Genetic? Individual Differences in Executive Function Are Almost Perfectly Heritable:
The results from this approach are jaw-dropping: variance shared among each variety of executive function (inhibition, updating, and shifting) is nearly perfectly heritable: the contribution of the "A" component to those correlations is 99%. This heritable variance in the common executive function predicts nearly all of the genetic variance in the inhibition factor, consistent with the idea that those constructs are isomorphic from a heritability standpoint. Second, genetic influences on updating and shifting were roughly half due to the common executive function (43% and 44%, respectively) and half due to unique genetic influences (56% and 42%, respectively). Thus, the overall picture is that executive functions, in both their unity and diversity, are somewhere between 86 to 100% heritable. I wonder if such high heritabilities imply many adaptive equilibria in terms of personality phenotype with all populations? (remember the rule of thumb that the more heritable a trait is the less fitness implication it has)
Monday, May 12, 2008
Here. The embed is the best bet if you can view it; the download often fails (server has been slammed?). Only a moderate amount of discussion about religion; Dawkins talks a fair bit about an obscure field, evolutionary biology. Well done.
Via Accidental Blogger.
Edward Luttwak has a column (via The Corner) up pointing out that by Muslim measures Barack Obama is an apostate; so it is permissible that he should be killed. This is true, and I think if you asked most Muslims they would accede to the principle here. But as a matter of practicality these sorts of laws aren't enacted or enforced in all circumstances without sensitivity to other parameters; unlike Barack Obama the former president of Argentina, Carlos Menem, converted to Roman Catholicism from Islam as an adult (there have also been African leaders who converted from Islam to Christianity, but I don't believe they visited the Arab world), and he remained on good terms with the Arab nations. If you look at the cases where apostasy is an issue, they seem to fall into two broad categories. The first is one of crass material interest on the part of Muslims and marginality in the case of non-Muslims; in other words, there is a rational reason for a Muslim to use the letter of the law against the apostate or non-Muslim, and that individual who is being persecuted has very little recourse because of their lack of power. Second, there is the perception that the individual is being too vocal and so disrupting social norms and public disorder. It seems from all that I have heard atheism is known and tolerated in the Muslim world so long as atheists remain silent; the problem is public profession of views which go against majority norms. I strongly suspect in the case of the president of the United States most Islamic powers that be would simply ignore the letter of the law (that is, the consensus of Muslim scholars over the ages).
This does not imply that I think the attitudes of Muslims are appropriate to the modern world. Nor do I think it implies that the probability of Obama being assassinated due to his religious history is the same, all things controlled, as someone who had a less complicated past. I'm arguing simply that his "apostasy" really shouldn't be the primary predictor when we consider this issue; powerful men are simply held to different standards in our species, that's culturally invariant and the biggest issue of context in this case. Addendum: I'm going to take a moment here to make a political comment which I hope won't spawn a thread-closing tirade from readers; but conservatives often complain that liberals don't take cultural complexity into account when they're making models of societies. Additionally, they often accuse liberals of adhering to an idealized noble savage conception of non-Western peoples (e.g., I have heard some liberals argue that Obama's Muslim background will even encourage good feelings from the Islamic world!). Unfortunately, many conservatives are guilty of the same; simple models make good rhetoric and ignorance breeds supreme confidence (I've been guilty of this, you've been guilty of this). But if any individual looks to their own life, their social circle and their culture, they will see a great deal of texture, subtly and nuance which can't be shoehorned into the avowed heuristics. Labels: Religion
Last year p-ter put up a post pointing to useful online tools such as Haplotter. One of the great things about biology today is that so much of the data from genomics is being thrown out there within reach of the plebs. And a lot of value is being added through user interfaces which smooth the connection between you and these databases. So check out NextBio; from the FAQ:
NextBio is a life science search engine that enables researchers and clinicians to access and understand the world's life sciences information. With NextBio, in just one click you can search through tens of thousands of study results with billions of data points spanning across different experimental platforms, organisms and data types. NextBio also searches across millions of publications to help you find new articles pertaining to your query. NextBio's search engine makes massive amounts of disparate biological, clinical and chemical data from public and proprietary sources searchable, regardless of data type and origin, and empowers scientists to quickly understand their own experimental results within the context of other research. I'm sure the slick AJAX-driven search tools are a nice Web 2.0+ pitch to investors; but the substantive element is the data. There are only so many researchers with eyeballs in the world; on occasion amateur astronomers can still pick out something new amongst the constellations, and I think to some extent that that sort of dynamic also holds for the amount of unprocessed data that the post-genomic era has made available to us. I really encourage readers of this weblog to poke and prod around the data piles with these new tools; Web 2.0 isn't just YouTube and Facebook.... Related: VentureBeat weighted in a few weeks ago on this company.... Labels: Genomics
Sunday, May 11, 2008
Gender Differences in the Mu Rhythm of the Human Mirror-Neuron System:
The present findings indirectly lend support to the extreme male brain theory put forward by Baron-Cohen (2005), and may cast some light on the mirror-neuron dysfunction in autism spectrum disorders. The mu rhythm in the human mirror-neuron system can be a potential biomarker of empathic mimicry. Don't know enough about this stuff to comment, but figure readers would find it of interest.... Labels: sex differences
Friday, May 09, 2008
Sandy has two posts over at Anthropology.net worth checking out; The sexiness of facial symmetry across cultures and species and Earliest known archaeological evidence of Americans found in Monte Verde, Chile.
Thursday, May 08, 2008
Over at The Corner they are discussing an interview series with Tom Wolfe. Wolfe claimed that Charles Darwin was a plagiarist. Derb pushed back. Since they keep talking about the interview, I decided to watch. A few notes....
Wolfe says that Darwin was an obscure man who had a famous grandfather (Erasmus I'm assuming, not Josiah Wedgewood). I don't think this is really right. Unfortunately, we can't run an experiment which deletes Charles Darwin's contribution to science, but before he became the great evolutionary thinker he was a prominent travel writer. The Voyage of the Beagle went through several editions; I'm not sure we would remember Charles Darwin today (how many popular Victorian authors do we remember now?), but he was not an obscure figure in mid-19th century England. Then he notes that E. O. Wilson believes everything is genetically predetermined. That we have no free will; we can't change our decisions. Wilson, especially during the Sociobiology years offered up a few naive quotes; but as anyone who has wrestled with heritability knows a simple affirmation of genetic determinism is so banal as to be trivial. Wolfe is either overreading, or not communicating the nuance of his genuine thinking. After this Wolfe goes on to make the distinction between genetic theory and neuroscience whereby the former is literature and the latter is science. He also suggests that the three leading lights of genetic theory are totally unversed in the workings of the brain. Who are these leading lights? E. O. Wilson, Daniel Dennett and Richard Dawkins. Wolfe correctly notes that by training Dennett is a philosopher and Dawkins is an ethologist; so it is peculiar that he considers them leading lights. Wilson is more properly a field ecologist who generally leaves theoretical work to a collaborator (Robert MacArthur or Charles Lumsden for example). Since Dennett is the co-director of the Center for Cognitive Studies at Tufts I assume he stumbles onto neuropsychological material now and then. Obviously Wolfe has fallen into the all too common trap of conflating popularizers with eminent researchers; easy if you don't do your homework. John Maynard Smith, W. D. Hamilton and Richard Lewontin are evolutionary genetic scientists of note; much of Dawkins's thinking is derivative from the first two, while Wilson was influenced by Hamilton, and finally Dennett seems clearly to have had evolution predigested for him by Dawkins. An emphasis on the evolutionary part is critical; from what I know it seems that molecular genetics along the biophysical margins does bleed into neuroscience quite a bit. One of the founding fathers of modern molecular genetics, Francis Crick, spent his last years focused on neuroscience. Wolfe knows this so he really didn't mean to dismiss all genetics as literature; just evolutionary biology. I won't object too strenuously to this characterization, but I will submit that neuroscience today is too young a discipline to be taking on airs. There are many facts strewn about, but it seems that even the skeleton of a theoretical superstructure does not exist to scaffold them into a coherent whole. Finally, you can check out the second to last interview segment (the last has not been put up yet), Wolfe here is claiming that the emergence of language resulted in a post-evolutionary age for our species. This is false of course; since dismissing genetic theory as literature he hasn't been keeping up on the literature obviously! The whole line of thinking struck me as incoherent, so perhaps I'm missing something. Wolfe also makes a host of extremely disputable assertions about unique human tool use, the rationality of humans and the lack of relation of modern status games with evolutionary genetics. In any case, I only checked it out because of the gushing in The Corner. I'm a dilettante myself so I wasn't going into it looking to pick out errors, but these seemed to be worthy of correction since obviously many people look to Tom Wolfe as an Authority and keen observer of the world. I'll probably check out his novels; I'm sure he makes up for his sloppy characterization of science with a sharp eye toward fluid prose.... Update: Derb weighs in again.
Continuing my series of notes on the work of Sewall Wright, this one deals with the subject of genetic drift. I had originally planned to call this note 'Inbreeding and the decline of genetic variance', but anyone interested in the matters covered here, and searching for them on the internet, is far more likely to search for 'genetic drift'. This is one of the subjects most closely associated with Wright, to the extent that genetic drift was formerly often known as the 'Sewall Wright Effect'. My main aim is to help people follow Wright's own derivation of his key results, and to clarify the relationship between genetic drift and inbreeding.
I will refer mainly to the papers reprinted in the collection Evolution: Selected Papers, (ESP) and especially the monumental 1931 paper on 'Evolution in Mendelian Populations', which is available online here. Anyone interested in Wright should also read William B. Provine's biography of him. If in these notes I occasionally make critical remarks on Provine, it should not detract from the general excellence of his book. See the References for details. In an infinitely large population, in the absence of selection and mutation, the proportions of different gene types (alleles) in the population will remain unchanged indefinitely. But real populations are never infinitely large, and gene frequencies will fluctuate to some extent by chance. As Wright put it in 1931, 'Merely by chance one or the other of the allelomorphs [alleles] may be expected to increase its frequency in a given generation and in time the proportions may drift a long way from the initial values' (ESP, p.107.) The general nature of drift can be illustrated by the hackneyed example of coin tossing. If we simultaneously toss a number of 'fair' coins, and repeat the trial a large number of times, then the average proportion of heads, by the definition of a fair coin, will be 1/2, and the average number of heads per trial will be N/2, where N is the number of coins in a trial. More generally, suppose the probability of heads for each coin is always p, where p is any fraction between 0 and 1. The long term average number of heads per trial will then be Np. But on any particular trial, purely by chance, the number of heads is likely to deviate from the average. It can be shown that the variance of the number of heads per trial is Npq, where q = 1 - p. [Note 1] If we are interested in the proportion of heads per trial (the number of heads divided by N), it can be shown that the variance of the proportion is pq/N. [Note 2] On each trial, the proportion of coins is therefore likely to deviate from the long term average by a quantity related to pq/N. Departing now from the real behaviour of coins, let us suppose that the value of p on each trial is determined by the proportion of heads in the previous trial. The proportion of heads will then drift up and down in a 'random walk' pattern, with the size of the 'steps' being inversely related to the size of N. If N is very large, each step will be small, but if N is small the steps may be relatively large. If, by chance, the proportion of heads in a trial ever reaches 1 or 0, then p for all future trials will also be 0 or 1, and heads (or tails) will be permanently 'fixed'. This is very likely to happen sooner or later. Genes are not coins, so the analogy is not perfect. In a population of genes, the replication of each gene is not a simple matter of 'heads or tails', as each gene may have 0, 1, 2 or more descendants. Also, while the number of coins is assumed to be fixed at N, a biological population is seldom absolutely fixed in size. Nevertheless, there are important similarities. In the absence of selection, it is a matter of chance whether or not a particular gene enters an egg or sperm and then survives to reproduce again in the next generation. Suppose that there are two alleles, A and B, at each locus, with the frequencies p and q in the population. In the absence of selection and mutation, these will also be the expected frequencies in the next generation. In a population of N diploid individuals, there are 2N genes in the population at each locus. In a stable population there will still be 2N genes in the next generation. We can schematically represent reproduction as a 'trial' consisting of 2N events, each involving the random choice of a gene to enter the new generation, with probabilities of p and q for the 'outcomes' A and B at each choice. The probabilities of obtaining the various possible combinations of A's and B's are then given by the expansion of the binomial (p + q)^2N. Wright himself uses this model of the process on several occasions, e.g. ESP p.289. While this may seem a very artificial way of viewing reproduction, it is not as unrealistic as it seems. Suppose that N diploid individuals each have the same number of offspring, the number being large, and certainly large enough to ensure that there are at least 2N copies of each allele among the population of offspring. Then select N of the offspring as 'survivors', completely at random, which is analogous to survival in a resource-limited population without natural selection. The probability of the various possible gene frequencies will then be approximately as in the schematic model (with the complication that in a finite population of offspring the probability of selecting an offspring with a given allele will be affected by the number already selected, e.g. if nearly all the alleles of a given type have, by chance, already been selected, the probability of selecting another one will be much reduced). Nothing has so far been said about inbreeding. Moreover, the processes just described would apply not only to sexually reproducing organisms but also to asexually reproducing organisms and genetic elements, such as mitochondria and Y chromosomes, where the possibility of inbreeding does not arise. But in Wright's treatment of the subject, references to inbreeding are frequent, and the rate of genetic drift is derived by an argument which seems to depend on the existence of inbreeding. For example, on p.165 of ESP he says: 'If the population is not indefinitely large, another factor must be taken into account: the effects of accidents of sampling among those that survive and become parents in each generation and among the germ cells of these, in other words, the effects of inbreeding'. Such statements are likely to give the impression that inbreeding is fundamental to the process of genetic drift. How can this be? The explanation is that in a sexually reproducing population a convenient measure of genetic drift is the changing proportion of homozygotes, and the existence of homozygotes is related to inbreeding. If a given allele has ultimately arisen from a single mutation, then homozygous copies of that allele can only occur in the same individual if that individual is descended from the same ancestor by at least two paths, which is by definition inbreeding. Even if the allele has more than one origin, the level of inbreeding in the population will affect the level of homozygosis. But as the example of asexual organisms shows, there is no necessary connection between genetic drift and inbreeding. R. A. Fisher, in his different approach to the subject, does not (I think) ever refer to inbreeding. Confusing the two things would be like confusing the study of heat with the study of thermometers. It may therefore be wondered why Sewall Wright took his particular approach. The answer may be partly that his mathematical training was less advanced than Fisher's, so that he was obliged to use less mathematically sophisticated methods. This has the advantage that his work on the subject is in principle accessible to a wider range of readers. Moreover, on one important point Wright's methods got the correct result where Fisher, through neglecting a quantity which turned out not to be negligible, got the wrong result by a factor of 2 (as Wright never tired of pointing out). But I think the main reason for Wright's approach was that he first investigated genetic drift in the context of agricultural breeding, where livestock are often closely inbred. In this context one of the main concerns is to quantify the loss of genetic variation in each particular inbred strain. It was therefore natural for Wright to approach the subject by measuring the loss of heterozygosis associated with inbreeding. When he later turned to consider genetic drift in natural populations, where mating is approximately random, he continued to use the methods he had already devised for the study of inbreeding in agriculture. (I will not now explore the precise meaning of Wright's coefficients of inbreeding (the famous F-statistics) which I hope to deal with in another note.) Wright's most important finding was that heterozygosis (the proportion of heterozygotes in the population) tends to decline at a rate of 1/2N per generation, where N is the diploid population size. (This assumes that males and females each have a population size of N/2.) Most textbooks give a simplified version of Wright's derivation of this result. Wright's own treatment, in EMP, is difficult to follow, and in view of its importance I have provided a guide in Note 3 below. Even the simplified textbook versions are not always very clear, and I do not know of any wholly satisfactory account. Key assumptions are often not clearly stated or justified. Two relatively good accounts are those of Falconer and Maynard Smith (see Refs.) I will outline a derivation based mainly on Falconer (with some modifications). Let us assume there is a population of N diploid individuals. Generations are separate. There is no mutation or natural selection in the period under consideration. The n'th generation is designated Gn, the previous generation by Gn-1, the following generation by Gn+1, and so on. The probability that the two genes at the same locus in an individual of Gn are identical is designated CIn, where CI stands for 'coefficient of inbreeding'. (For my approach here it is not necessary to specify whether the genes are identical 'by descent'.) The probability that two randomly selected genes at the same locus in two different individuals of Gn are identical is designated CKn, where CK stands for 'coefficient of kinship'. For the simplest case, consider a population of hermaphrodites which are capable of self-fertilisation and mate completely at random, including with themselves. (This would be approximately true of some marine invertebrates which release gametes into the water.) From the assumptions of random mating and non-selection it follows that any individual in Gn is equally likely, with probability 2/N, to be a parent of any individual in Gn+1 (since in a stable population each individual will have on average have 2 out of the N surviving offspring). It does not follow that, if we select at random an individual in Gn+1, and then select another, there is a probability of 2/N that the second individual will have the same father (or mother) as the first. For example, if each individual in Gn produced exactly 2 surviving offspring, the probability that a second randomly selected individual in Gn+1 had the same father (or mother) as the first would only be 1/(N-1). To get a probability of 2/N we require an additional assumption, which is technically satisfied by specifying that the number of offspring for individuals follows a Poisson distribution. (This assumption is mentioned by Maynard Smith but not by Falconer.) With these assumptions, it follows that CIn equals CKn. In the case of CIn, we select a gene at random in Gn, and then inquire whether the other gene at the same locus in that individual is identical. In the case of CKn, we select a gene at random in Gn, and then inquire whether another randomly selected gene at the same locus in a different randomly selected individual is identical to the first gene. But in both cases each gene is a copy of a gene taken absolutely at random from all the genes in Gn-1. The probabilities of identity are therefore the same, and CIn therefore equals CKn. By the same argument it follows that any two randomly selected distinct genes at a locus in Gn have the same probability of being identical, whether they are in the same or different individuals. If we call this probability CDn, we have CDn = CIn = CKn, for any value of n. But CIn can be broken down into two component probabilities. With probability 1/2N, the two genes at a locus in the same individual are copies of the very same gene in Gn-1, in which case they are certainly identical. In all other cases, therefore with probability 1-1/2N, they are copies of two distinct genes in Gn-1, in which case there is a probability CDn-1 that they are identical. But CDn-1 = CIn-1 (since the equality CDn = CIn applies for any value of n). The total probability CIn therefore comes to CIn = 1/2N + (1 - 1/2N)CIn-1. The coefficient of inbreeding in one generation is therefore derivable from the coefficient in the previous generation by a formula involving the addition of 1/2N. It can further be shown, with a little algebraic manipulation, that heterozygosis tends to decline by a factor of (1 - 1/2N) per generation (see Falconer p.64-5 for a proof). If self-fertilisation is excluded, two genes in the same individual cannot be copies of the very same gene in the previous generation, so the analysis needs to be pushed further back. If mating between different individuals is completely random, including siblings, then CIn = CKn-1. If mating between siblings is excluded, but otherwise random, CIn = CKn-2, and so on. But it is always possible to express the 'coefficient of inbreeding' in one generation in terms of the coefficients in previous generations, and heterozygosis always tends to decline by a factor of (1 - 1/2N) per generation (assuming equal numbers of males and females). The above argument, like Wright's own, measures the progress of genetic drift by the decline of heterozygosis and the associated increase in the coefficient of inbreeding. It should however be clear that this is not essential. If we wanted to study genetic drift in asexual haploid replicators, such as Y chromosomes, it would be possible to modify the derivation to use only coefficients of kinship, rather than inbreeding. More fundamentally, the process of genetic drift depends not on inbreeding but on the existence of variance in reproductive success. Some genes have no descendants, some have only one, and some have more than one. Over the course of time, more and more lines of descent die out, and the surviving genes are collectively descended from fewer and fewer original ancestors. Ina sexually reproducing population this also leads to increased levels of inbreeding, in a broad sense. If there were no such variance in reproductive success - if every gene had exactly the same number of surviving 'offspring' - there would be no genetic drift. Among diploids, the variance in replication of individual genes is due to two factors: the variance in the number of surviving offspring, and the random allocation of genes to gametes in the process of meiosis. Even if every diploid individual had exactly the same number of surviving offspring, there would still be variance in the replication of individual genes for the second reason. As for the variance in the number of offspring, the assumption of a Poisson distribution is probably not unreasonable in many species, but there could be departures from it in both directions (i.e. either greater or smaller variance). There might also be different variance in the two sexes. For example, among animals like Elephant Seals, the variance among females might be rather small, because all females have a low but steady rate of reproduction, whereas among males the variance would be much higher, as many males have no offspring at all, while a few have a large number. Wright takes account of some of these factors in his discussions of 'effective population size', This note has only dealt with a few aspects of Wright's work on genetic drift. I have tried to identify the underlying assumptions and (in Note 3) to clarify Wright's most important derivation. None of this says anything one way or the other about the actual importance of genetic drift in evolution. What should be clear is that genetic drift is a weak force except in very small populations, since its effect is inversely proportional to population size. In large populations it would be overpowered by modest rates of selection or migration. (The other factor to consider is mutation, but except in large populations this is an even weaker force than drift, as mutation rates are typically of the order of only 1/100,000 per generation.) I hope to deal with some of these issues in further notes. Note 1: Suppose we toss a single coin K times, where K is a large number. If the probability of heads is p, the total number of heads will be Kp and the average number of heads per toss will be Kp/K = p. But on each particular trial (the toss of a single coin) there can only be 1 or 0 heads, so we will have Kp trials with the deviation value (1 - p), and K(1 - p) trials with the deviation value (0 - p) = - p. Using the abbreviation q for (1 - p), the variance of the number of heads for trials consisting of a single coin toss is therefore [Kpq^2 + Kqp^2]/K = pq^2 + qp^2 = pq(q + p) = pq. It may seem odd to speak of the variance of the number of heads in trials where there is only one coin per trial, but in principle it is legitimate, and it enables us easily to derive the variance of the number of heads where the trials involve N coins. Since the variance of the sum of a number of independent numerical values equals the sum of the variances of the values individually, the variance of the number of heads in N independent coin tosses, each with variance pq, is simply Npq. Note 2: The average proportion of heads per trial of N coin tosses, each with probability p, is in the long term p. If X is the number of heads in any particular trial of N coins (where X is a variable), the deviation values of the proportions will be of the form X/N - p = (X - Np)/N, and the variance of the proportions in K trials will be S[(X - Np)/N]^2]/K. But S[(X - Np)/]^2]/K is the variance of the number of heads, which has been proved equal to Npq, so the variance of the proportion is Npq/N^2 = pq/N. Note 3: This is a commentary on pages 108-110 of ESP, which reprints pages 107-109 of the original paper EMP (the near identity of pagination is just a coincidence). I will mainly be concerned with page 109 of ESP, where Wright derives his fundamental results for the decline of heterozygosis. In following the derivation it is necessary to refer back frequently to the definitions at the bottom of page 108. Wright assumes that the sexes are separate (so there is no self-fertilisation) but that mating is otherwise completely random, including between siblings. He assumes that there are Nm breeding males and Nf breeding females. With random mating, he states that the proportion of matings between full siblings is 1/NmNf. This evidently assumes that there is a probability of 1/Nm that two mates have the same father, and an independent probability of 1/Nf that they have the same mother (note that m and f stand for male and female, not mother and father). This is actually a strong assumption, which ought to be clearly stated. It assumes (a) that the number of offspring of individuals follows a Poisson distribution (or something similar) and (b) that parents have male and female offspring in the same proportions as in the population generally. This is not necessarily true: for example if some parents had a strong bias towards producing male or female offspring, the probability of mating between siblings would be reduced. (Wright does discuss some of these considerations in the section on 'The Population Number' at pp.111-12 of ESP.) Wright then gives the proportion of matings between half siblings, and between all less closely related individuals. These depend on the same assumptions as for full siblings. He then gives a formula for M, the correlation between mates in the current generation. Note that the formula is of the form a'^2b'^2[Z], where Z is a complicated expression in square brackets. From the definitions on p.108 we have a'^2b'^2 = [1/2(1 + F')][(1 + F'')/2], so we have M = [1/2(1 + F')][(1 + F'')/2][Z]. The expression Z can be derived by Wright's method of path analysis. The first component of Z deals with the case of mating between full siblings. If we label the siblings A and B, and their parents C and D, we have two 'direct' paths, ACB and ADB, and two 'indirect' paths, ACDB and ADCB, which involve the correlation M' between mates in the previous generation. Hence the coefficient (2 + 2M') for the first component. For half siblings A and B, there is one shared parent C and two non-shared parents D and E, so there is one direct path, ACB, and the three indirect paths ADCB, ADEB, and ACEB, giving the coefficient (1 + 3M'). For unrelated mates A and B, with the non-shared parents C, D, E and G (to avoid using F, which is already in use), we have no direct paths and four indirect paths, ACGB, ACEB, ADEB, and ADGB, giving the coefficient 4M'. Next Wright derives an expression for F, the correlation between uniting gametes in the current generation. Here we must note from p.108 that F = b^2M, and b^2 = (1 + F')/2. Using the expression M = [1/2(1 + F')][(1 + F'')/2][Z], we therefore have F = [(1 + F')/2][1/2(1 + F')][(1 + F'')/2][Z] = [(1 + F'')/8][Z]. With a little manipulation, and using the full expression for Z, this can be put in the form F = (1 + F'')[Nm + Nf - M'Nm - M'Nf + 4F'NmNf]/8NmNf . But now we should note that M' is the correlation between mates in the previous generation. We can therefore adapt the equation F = b^2M to get the corresponding equation for the previous generation, i.e. F' = b'^2M'. But b'^2 = (1 + F'')/2, so F' = [(1 + F'')/2]M', and therefore M' = 2F'/(1 + F''). Substituting 2F'/(1 + F'') for M' in the equation F = (1 + F'')[Nm + Nf - M'Nm - M'Nf + 4F'NmNf]/8NmNf, it follows by some grinding but essentially routine algebra that F = Q, where Q is the expression on the right of the second equation on page 109. Then using the definition of P, P', etc, in terms of F, F', etc, the third equation also follows by routine algebra. This leaves the final death-defying leap to the fourth equation. This is not helped by the puzzling statement that we can equate P/P' to P/P''. This would imply that the proportional change per generation was not just constant but zero, and P/P'' must surely be a misprint for P'/P''. (The fact that this horrible error is not corrected or commented on in the ESP reprint leaves me wondering how closely Provine, as editor, has followed the details of Wright's text.) But even with this correction, it is far from obvious how Wright derives his fourth equation. I had given up hope of solving it until I was reading volume 2 of EGP, and found a discussion of the simpler case of random mating hermaphrodites, which fills in a few gaps in the derivation (see EGP vol 2, p.194-5). First, it confirms the suspicion that P/P'' should be P'/P''. Second it shows (or at least hints) how the problem can be reduced to a quadratic equation. Taking these hints, we can apply them to the fourth equation on p.109. First, rearrange and simplify the third equation to get P - P'[1 - (Nm + Nf)/4NmNf] - P''(Nm - Nf)/8NmNf = 0. Then divide through by P'' to get P/P'' - (P'/P'')[1 - (Nm + Nf)/4NmNf] - (Nm - Nf)/8NmNf = 0. But by assumption P/P' = P'/P'', so P/P'' = (P'/P'')^2 = (P/P')^2. We can therefore treat the equation as a quadratic of the form ax^2 + bx + c = 0, with x = P/P'. This can be solved by the standard method to get (as the larger of the two roots) P/P' = (1/2)[1 - (Nm + Nf)/4NmNf)] + (1/2)[root(1 + [(Nm + Nf)/4NmNf]^2)]. This is nearly Wright's fourth equation. For the final step, we take deltaP to mean P - P', so that - deltaP/P' = - (P/P' - 1). We therefore need only subtract 1 from the expression (1/2)[1 - (Nm + Nf)/4NmNf)] + (1/2)[root(1 + [(Nm + Nf)/4NmNf]^2)], and then reverse the sign, to get Wright's fourth equation. After this tortuous derivation, the discussion on page 110 of ESP is relatively plain sailing. The only slight puzzle is how Wright gets the approximation at the top of the page. I deduce that he uses the fact that when a is a small fraction, root(1 + a) is approximately equal to 1 + a/2. Taking [(Nm + Nf)/4NmNf]^2 as a, and grinding through the algebra, Wright's approximation can then be verified. Overall, as often with Wright's work, I am torn between admiration for his ingenuity and frustration at his obscurity. References: D. S. Falconer: Introduction to Quantitative Genetics, 3rd edn., 1989. (The 4th edn., by Falconer and Mackay (1995) appears to be the same so far as its treatment of genetic drift is concerned.) John Maynard Smith: Evolutionary Genetics, 1989. William B. Provine: Sewall Wright and Evolutionary Biology, 1986. Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986. Sewall Wright: 'Evolution in Mendelian Populations', Genetics, 16, 1931, pp.97-159. (Reprinted at pp.98-160 of ESP.) Sewall Wright: Evolution and the genetics of populations, 4 vols., 1968-1978.
Tuesday, May 06, 2008
In the comments here, rosko points me to a study on the effects on MC4R, a gene implicated in natural variation in human weight, on pathways involved in sexual function. It's well known, of course, that genetic pathways can be involved in multiple physiological processes--in particular, signaling pathway can generate many different phenotypes depending on what the downstream target of the signal is.
The effects of MC4R simulation in humans are, as rosko comments, kind of interesting: Methods. Ten subjects were enrolled in a double-blind, placebo-controlled, crossover study. Melanotan II (0.025 mg/kg) and vehicle were each administered twice by subcutaneous injection; real-time RigiScan monitoring and a visual analog were used to quantify the erections during a 6-hour period. The level of sexual desire and side effects were recorded with a questionnaire.I wondered what a "Rigiscan" is--find out here. Hypothetically, one could test whether natural variation in sexual behavior in humans is also affected by MC4R polymorphism, though I can't imagine that being a particularly fun study to carry out (one for agnostic's new series? 23andme + free time = association studies about erections). This reminds of the MC1R story about increased pain sensitivity in redheads in the vague sense that both involve melanocortin receptors and pleiotropy. Labels: Genetics
Having already motivated this series, I'll provide the first example of how to put your time to more productive use than participating in the WikiProject G.I. Joe or the still more urgent WikiProject Transformers. If you get interesting results, post them on your blog and provide a link in the comments here. I'll gather up all the results after awhile and summarize them in a follow-up post.
Purpose Cultural transmission has often been described verbally as viral. Mathematical models of culture incorporate this idea by borrowing epidemic disease models from biology. The goal here is to see if data on "viral videos" support the infectious model of culture. Pre-requisites To collect and analyze data: high school algebra, including familiarity with exponentials and logarithms. To get the theory behind the model: first-semester calculus and preferably an understanding of phase plane methods to study a two-variable system of ordinary differenital equations. [1] No knowledge of biology or culture is needed. Details Edelstein-Keshet's Mathematical Models in Biology (pp. 242 - 254) provides a good overview of the most basic models of epidemic diseases, and that's what I'm adapting here. In brief, we track the growth rates of two or three classes of hosts: Susceptibles (S), Infectives (I), and perhaps Recovereds (R). The names mean what you think. In the case of viral videos, you can never undo having seen the video, so we will use a very simple model, illustrated below: If you haven't seen the video yet, you're Susceptible, while if you have seen it, you're Infective. The idea is that someone who's seen the video tells their friends about it in some way, and their friends watch it in their turn. A Susceptible is turned into an Infective at a rate b, so that this parameter measures the infectivity of the video. We ignore the part of the population that has immunity to the video -- perhaps because they are not in the target demographic group -- and only track those who could be or are infected. We also assume that on the time-scale that the video spreads, the population is constant in size, which seems realistic in this case. We can write down a system of differential equations for the above picture: dS / dt = - bSI dI / dt = bSI (The product SI is used as an analogy with chemistry's law of mass action for particles that knock into or interact with each other.) We notice that dS / dt + dI / dt = 0, which means that S + I = constant, call it N. In other words, if we know the number of Infectives, we automatically know the number of Susceptibles -- it is just N - I. Therefore, we don't have to keep a separate tally of the change in S and can eliminate the first equation. Subsituting S = N - I into the second equation, our system becomes just: dI / dt = bI(N - I) Let's make a natural change of variables: i = I / N g = bN So, i is the fraction of the population that is Infective, and the growth rate g has units of inverse time (where b had units of 1 / (people * time)). The equation now only has one variable and one parameter: di / dt = gi(1 - i) This is the famous logistic equation, which you may already have seen in the context of saturating population growth or the spread of a favored allele to fixation. (The analogies between these three processes are reflected in their being modeled by the same equation, which underscores the importance of formalizing your intuition.) At equilibrium, the fraction that is Infective does not change, so di / dt = 0. This happens when either i = 0 or i = 1. When i is between 0 and 1, di / dt is positive, so as long as i is not exactly 0, i will increase as time increases and will ultimately end up at i = 1. In other words, i = 0 is an unstable steady-state since a small increase will push it to i = 1, which is stable. This model may seem simplistic since it implies that every single Susceptible will be eventually see the video, but that's not so unrealistic when you recall that we're only considering the population of the video's target audience -- in 1993, how many teenagers who had TVs in their homes never, ever saw that Blind Melon video with the bee girl? Some videos may have larger or smaller target audiences, i.e. larger or smaller values of the parameter N. Getting and Analyzing Data It is impractical for someone without the funding to survey a large random sample of the target audience to attempt to do so. Therefore, we would measure a good enough proxy: the view count for a YouTube video, tracked over time. Depending on how rapidly you think it will increase, you may want to measure it every 6 hours, or once a day. If it grows logistically, it should accelerate first and then still increase but decelerate until it more or less plateaus, like this the picture in the Wikipedia page on the logistic function. Ideally, you want to track a video that is the only one of its kind -- if there are multiple copies of the same video, that complicates things somewhat, but you might ignore that (see Further Avenues). For example, if a YouTube celebrity is able to curb reproductions of their videos, you can simply wait until their channel puts out a new video. The infectious process would go something like, "Omigod, So-and-So just put out a new video -- you have to see it!" This would be easier if they update fairly infrequently, so that word-of-mouth transmission were the primary route of infection. Another idea is to wait for the music video of a popular song to come out, but this requires that you be pretty savvy about music trends, and here the potential of multiple copies is even more serious, as fans download it and upload it themselves. Fad news items are another source, like when that retarded brat got tasered in the UCLA library. Again, multiple copies of it will probably appear. Assuming you got something like logistic growth, here's how you estimate the parameters N and b. Well, N would just be whatever the plateau value seems to be, so you'll have to wait for it to do so first. It can be shown that the solution to the logistic equation can be re-arranged to yield: ln ((N - I) / I) = - bt + ln ((N - I0) / I0) Where N is the max view count, I0 is the initial view count -- pick some small number -- I is the view count at time t, and b is the infectivity rate. So after you've got a concrete number for N and I0, you'd plot (N - I) / I on a ln scale -- it will be a linear function of t, with y-intercept = ln ((N - I0) / I0) and slope = - b. See what b turns out to be. Then compare values of N and b for different videos. If N is larger for one, that means the target audience is larger (ignoring the fact that a single person may watch a given video multiple times -- that's true for any video, no matter the target audience's size). If b is larger for one, that means it's more infectious. Further Avenues If you really kept this going long-term, you could try to classify the videos you're tracking by category and do an analysis of variance or something to see what accounts for the variation in the target audience size and in the infectiousness of a video. Intuitively, we expect more sensational videos to have higher b -- but a hard analysis might tell us more concretely what types of things count as more sensational. We have guesses about that, but we need data to see if those guesses are right. To make the model a bit more complex, you could introduce a stochastic component to the growth equation. Right now, it is deterministic: as long as the ball gets rolling, everyone in the target audience will see the video. But when few people have seen the video, chance effects could push one ball up while letting another ball stay put. This is like when several copies of a favored allele are introduced into a population -- some will be lost by drift, while another may be propelled quickly by drift at the start, after which point the deterministic equations take over. You would model it just like the frequency of a favored allele under the combined effects of drift and directional selection. In the context of viral videos, consider multiple copies of the same music video (again due to fans downloading the video from the official channel and uploading it to their own channel). As the target audience does a search for this video, chance effects may propel one of the copies up very quickly while letting other copies languish with low view counts. In this case, the copy that ends up dominating the market increases even more rapidly than under the purely deterministic model because it got a lucky big initial boost through sheer chance effects. This is what makes is somewhat inappropriate to compare a video that only has one copy to see vs. a video with multiple copies to see. The winner in the latter group will appear more infectious than the former, since it increases much faster, but part of that higher increase is accounted for by chance. [1] MIT's Open CourseWare site has a great mathematics section that allows you to teach yourself or brush up on these areas. Especially useful is the course on differential equations, which has a full set of video lectures, solved problem sets, exams, and helpful Java applets. An easy to use phase plane applet is pplane. Labels: do it yourself studies
Monday, May 05, 2008
Get off your ass and do this study: Introductory pep talk
posted by agnostic @ 5/05/2008 01:19:00 PM
I was recently directed to this panegyric on Wikipedia, which claims that editing Wikipedia is a better use of the cognitive surplus that might otherwise be spent watching TV. Like 99% of technology pundits, the author is so out of touch with reality that it is not worth taking him to task in depth. Instead, reading that has moved me to begin a regular column wherein I propose a fairly simple study for someone to carry out and increase our understanding of the world.
In fairness, it is often tough to think of a study to do, or how you would concretely carry it out. But since my soul is a font of generosity, I'm literally giving ideas away. We only have so much time and effort to invest in a project, so I have plenty of ideas that I just don't have time to pursue in any depth. Obviously I will keep what I think are the more original or important ones for myself, but there are several reasons why pursuing seemingly unoriginal ideas is still useful: 1) It gives you good practical experience. If you've never tried to find a good dataset that would answer your question, if you've never tried to analyze and summarize the data, and if you've never interpreted these findings in context for the target audience -- well, time to start. 2) Many supposedly established findings have the smell of academic urban legends because they are based on a single study that used an unimpressive sample size and didn't take into account some obvious confounding factors. Yet once it gets cited, it takes on a life of its own, as no one reads the original but simply "knows that study X showed Y." Replication studies are crucial to figure out if we were right. 3) Most published articles aren't terribly original anyway -- "here's yet another example of natural selection at work!" Still, the more astounding the mountain of evidence becomes, the more convinced we become that we are right. There are probably diminishing returns, though: we don't need yet another study showing that cognitive abilities all correlate with each other, but how pervasive is the influence of IQ -- do smart vs. dumb people prefer different types of art? 4) More mundane studies are easier to carry out, so you're not intimidated by the prospect of hunting down a solution to a Great Big Problem. (And if you liked chasing after Great Big Problems, you'd probably already be in academia or a private institute doing that, or preparing to do so in the near future.) 5) If the original study or idea was done awhile ago, improved technology may allow you to take a more in-depth look at it. For example, computers were pretty pathetic in the 1960s, and I'll be there are scores of dusty studies that would benefit from the power of modern home computers. 6) For mathematical models, the properties of a particular model may be so well known that you couldn't hope to contribute anything new on the abstract level. However, you could provide a novel interpretation of it by showing how it also models a phenomenon that no one has applied it to before. This is especially true for fields were the experts don't have much training in modeling, which tend to focus on human beings. Sociology is a perfect example -- here's a field that assumes the primary unit of society is the group, and that groups conflict and interact, while ignoring the individual differences within each group. This isn't a slight to the field, since there are group dynamics. Sociology cries out for differential equation models, where you ignore individuals and track classes of things, and typically only two or three classes! 7) For the studies that I will propose, the data would not be hard to collect, although the process from start to finish may be laborious (hey, that's life). So, I will not suggest studies that require fancy equipment, hundreds of unpaid volunteer subjects, and so on. If it is applying a well understood mathematical model to some new phenomenon, almost all of the work will already be done. However, I realize that we do have academic readers too, or readers who have graduate student friends in need of a study to publish, so occasionally I will propose something that would require access to many volunteers. Now, I don't have anything against editing Wikipedia or blogging per se, but let's get very real: most of it is a waste of time, which is why almost no academics do it. There are exceptional areas of Wikipedia, and there are exceptions in the blogosphere -- well, obviously we are, and so are bloggers like Steve Sailer, Audacious Epigone, Half Sigma, Inductivist, and others who obtain and analyze data to answer a question or hunch. If the blog is just a hobby, an afterthought after real work is being done in real life (as with my personal blog), that's OK too. What I want to see die is the practice of intellectual masturbation, where you only fool your brain into thinking that fruitful work is being done. "Participation" per se is no valid criterion for success -- I can participate in an act of masturbation, perhaps even while participating with others in a circle jerk, but I've only really accomplished something when I've contributed to increasing the fertility rate. Fortunately for everyone, though, the real world offers an abundance of problems begging to be fertilized by the seed of your brain -- get in there and tear that shit up. Labels: do it yourself studies
For many years, Grey Squirrels (an introduced North American species) has been driving out the indigenous Red Squirrel over most of mainland Britain. But now it is reported that a mutant black variety of the Grey Squirrel is threatening to displace the Greys. Apparently, the black ones have higher testosterone levels, are more aggressive, and more attractive to the lady squirrels. (Don't worry, our White Nationalist readers, this isn't a parable. I think.)
Joking apart, the real interest of this is that it seems to be a case of a single mutation with a relatively conspicuous phenotypic effect having a strong evolutionary advantage, somewhat contrary to Darwin/Fisher orthodoxy. There is of course another example in the case of industrial melanism. |