Thursday, May 08, 2008

Tendentious Tom Wolfe   posted by Razib @ 5/08/2008 10:44:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Over at The Corner they are discussing an interview series with Tom Wolfe. Wolfe claimed that Charles Darwin was a plagiarist. Derb pushed back. Since they keep talking about the interview, I decided to watch. A few notes....

Wolfe says that Darwin was an obscure man who had a famous grandfather (Erasmus I'm assuming, not Josiah Wedgewood). I don't think this is really right. Unfortunately, we can't run an experiment which deletes Charles Darwin's contribution to science, but before he became the great evolutionary thinker he was a prominent travel writer. The Voyage of the Beagle went through several editions; I'm not sure we would remember Charles Darwin today (how many popular Victorian authors do we remember now?), but he was not an obscure figure in mid-19th century England.

Then he notes that E. O. Wilson believes everything is genetically predetermined. That we have no free will; we can't change our decisions. Wilson, especially during the Sociobiology years offered up a few naive quotes; but as anyone who has wrestled with heritability knows a simple affirmation of genetic determinism is so banal as to be trivial. Wolfe is either overreading, or not communicating the nuance of his genuine thinking.

After this Wolfe goes on to make the distinction between genetic theory and neuroscience whereby the former is literature and the latter is science. He also suggests that the three leading lights of genetic theory are totally unversed in the workings of the brain. Who are these leading lights? E. O. Wilson, Daniel Dennett and Richard Dawkins. Wolfe correctly notes that by training Dennett is a philosopher and Dawkins is an ethologist; so it is peculiar that he considers them leading lights. Wilson is more properly a field ecologist who generally leaves theoretical work to a collaborator (Robert MacArthur or Charles Lumsden for example). Since Dennett is the co-director of the Center for Cognitive Studies at Tufts I assume he stumbles onto neuropsychological material now and then. Obviously Wolfe has fallen into the all too common trap of conflating popularizers with eminent researchers; easy if you don't do your homework. John Maynard Smith, W. D. Hamilton and Richard Lewontin are evolutionary genetic scientists of note; much of Dawkins's thinking is derivative from the first two, while Wilson was influenced by Hamilton, and finally Dennett seems clearly to have had evolution predigested for him by Dawkins. An emphasis on the evolutionary part is critical; from what I know it seems that molecular genetics along the biophysical margins does bleed into neuroscience quite a bit. One of the founding fathers of modern molecular genetics, Francis Crick, spent his last years focused on neuroscience. Wolfe knows this so he really didn't mean to dismiss all genetics as literature; just evolutionary biology. I won't object too strenuously to this characterization, but I will submit that neuroscience today is too young a discipline to be taking on airs. There are many facts strewn about, but it seems that even the skeleton of a theoretical superstructure does not exist to scaffold them into a coherent whole.

Finally, you can check out the second to last interview segment (the last has not been put up yet), Wolfe here is claiming that the emergence of language resulted in a post-evolutionary age for our species. This is false of course; since dismissing genetic theory as literature he hasn't been keeping up on the literature obviously! The whole line of thinking struck me as incoherent, so perhaps I'm missing something. Wolfe also makes a host of extremely disputable assertions about unique human tool use, the rationality of humans and the lack of relation of modern status games with evolutionary genetics.

In any case, I only checked it out because of the gushing in The Corner. I'm a dilettante myself so I wasn't going into it looking to pick out errors, but these seemed to be worthy of correction since obviously many people look to Tom Wolfe as an Authority and keen observer of the world. I'll probably check out his novels; I'm sure he makes up for his sloppy characterization of science with a sharp eye toward fluid prose....

Update: Derb weighs in again.


Notes on Sewall Wright: Genetic Drift   posted by DavidB @ 5/08/2008 06:02:00 AM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Continuing my series of notes on the work of Sewall Wright, this one deals with the subject of genetic drift. I had originally planned to call this note 'Inbreeding and the decline of genetic variance', but anyone interested in the matters covered here, and searching for them on the internet, is far more likely to search for 'genetic drift'. This is one of the subjects most closely associated with Wright, to the extent that genetic drift was formerly often known as the 'Sewall Wright Effect'. My main aim is to help people follow Wright's own derivation of his key results, and to clarify the relationship between genetic drift and inbreeding.



I will refer mainly to the papers reprinted in the collection Evolution: Selected Papers, (ESP) and especially the monumental 1931 paper on 'Evolution in Mendelian Populations', which is available online here.
Anyone interested in Wright should also read William B. Provine's biography of him. If in these notes I occasionally make critical remarks on Provine, it should not detract from the general excellence of his book. See the References for details.

In an infinitely large population, in the absence of selection and mutation, the proportions of different gene types (alleles) in the population will remain unchanged indefinitely. But real populations are never infinitely large, and gene frequencies will fluctuate to some extent by chance. As Wright put it in 1931, 'Merely by chance one or the other of the allelomorphs [alleles] may be expected to increase its frequency in a given generation and in time the proportions may drift a long way from the initial values' (ESP, p.107.)

The general nature of drift can be illustrated by the hackneyed example of coin tossing. If we simultaneously toss a number of 'fair' coins, and repeat the trial a large number of times, then the average proportion of heads, by the definition of a fair coin, will be 1/2, and the average number of heads per trial will be N/2, where N is the number of coins in a trial. More generally, suppose the probability of heads for each coin is always p, where p is any fraction between 0 and 1. The long term average number of heads per trial will then be Np. But on any particular trial, purely by chance, the number of heads is likely to deviate from the average. It can be shown that the variance of the number of heads per trial is Npq, where q = 1 - p. [Note 1] If we are interested in the proportion of heads per trial (the number of heads divided by N), it can be shown that the variance of the proportion is pq/N. [Note 2] On each trial, the proportion of coins is therefore likely to deviate from the long term average by a quantity related to pq/N.

Departing now from the real behaviour of coins, let us suppose that the value of p on each trial is determined by the proportion of heads in the previous trial. The proportion of heads will then drift up and down in a 'random walk' pattern, with the size of the 'steps' being inversely related to the size of N. If N is very large, each step will be small, but if N is small the steps may be relatively large. If, by chance, the proportion of heads in a trial ever reaches 1 or 0, then p for all future trials will also be 0 or 1, and heads (or tails) will be permanently 'fixed'. This is very likely to happen sooner or later.

Genes are not coins, so the analogy is not perfect. In a population of genes, the replication of each gene is not a simple matter of 'heads or tails', as each gene may have 0, 1, 2 or more descendants. Also, while the number of coins is assumed to be fixed at N, a biological population is seldom absolutely fixed in size. Nevertheless, there are important similarities. In the absence of selection, it is a matter of chance whether or not a particular gene enters an egg or sperm and then survives to reproduce again in the next generation. Suppose that there are two alleles, A and B, at each locus, with the frequencies p and q in the population. In the absence of selection and mutation, these will also be the expected frequencies in the next generation. In a population of N diploid individuals, there are 2N genes in the population at each locus. In a stable population there will still be 2N genes in the next generation. We can schematically represent reproduction as a 'trial' consisting of 2N events, each involving the random choice of a gene to enter the new generation, with probabilities of p and q for the 'outcomes' A and B at each choice. The probabilities of obtaining the various possible combinations of A's and B's are then given by the expansion of the binomial (p + q)^2N. Wright himself uses this model of the process on several occasions, e.g. ESP p.289. While this may seem a very artificial way of viewing reproduction, it is not as unrealistic as it seems. Suppose that N diploid individuals each have the same number of offspring, the number being large, and certainly large enough to ensure that there are at least 2N copies of each allele among the population of offspring. Then select N of the offspring as 'survivors', completely at random, which is analogous to survival in a resource-limited population without natural selection. The probability of the various possible gene frequencies will then be approximately as in the schematic model (with the complication that in a finite population of offspring the probability of selecting an offspring with a given allele will be affected by the number already selected, e.g. if nearly all the alleles of a given type have, by chance, already been selected, the probability of selecting another one will be much reduced).

Nothing has so far been said about inbreeding. Moreover, the processes just described would apply not only to sexually reproducing organisms but also to asexually reproducing organisms and genetic elements, such as mitochondria and Y chromosomes, where the possibility of inbreeding does not arise. But in Wright's treatment of the subject, references to inbreeding are frequent, and the rate of genetic drift is derived by an argument which seems to depend on the existence of inbreeding. For example, on p.165 of ESP he says: 'If the population is not indefinitely large, another factor must be taken into account: the effects of accidents of sampling among those that survive and become parents in each generation and among the germ cells of these, in other words, the effects of inbreeding'. Such statements are likely to give the impression that inbreeding is fundamental to the process of genetic drift. How can this be?

The explanation is that in a sexually reproducing population a convenient measure of genetic drift is the changing proportion of homozygotes, and the existence of homozygotes is related to inbreeding. If a given allele has ultimately arisen from a single mutation, then homozygous copies of that allele can only occur in the same individual if that individual is descended from the same ancestor by at least two paths, which is by definition inbreeding. Even if the allele has more than one origin, the level of inbreeding in the population will affect the level of homozygosis. But as the example of asexual organisms shows, there is no necessary connection between genetic drift and inbreeding. R. A. Fisher, in his different approach to the subject, does not (I think) ever refer to inbreeding. Confusing the two things would be like confusing the study of heat with the study of thermometers.

It may therefore be wondered why Sewall Wright took his particular approach. The answer may be partly that his mathematical training was less advanced than Fisher's, so that he was obliged to use less mathematically sophisticated methods. This has the advantage that his work on the subject is in principle accessible to a wider range of readers. Moreover, on one important point Wright's methods got the correct result where Fisher, through neglecting a quantity which turned out not to be negligible, got the wrong result by a factor of 2 (as Wright never tired of pointing out). But I think the main reason for Wright's approach was that he first investigated genetic drift in the context of agricultural breeding, where livestock are often closely inbred. In this context one of the main concerns is to quantify the loss of genetic variation in each particular inbred strain. It was therefore natural for Wright to approach the subject by measuring the loss of heterozygosis associated with inbreeding. When he later turned to consider genetic drift in natural populations, where mating is approximately random, he continued to use the methods he had already devised for the study of inbreeding in agriculture. (I will not now explore the precise meaning of Wright's coefficients of inbreeding (the famous F-statistics) which I hope to deal with in another note.)

Wright's most important finding was that heterozygosis (the proportion of heterozygotes in the population) tends to decline at a rate of 1/2N per generation, where N is the diploid population size. (This assumes that males and females each have a population size of N/2.) Most textbooks give a simplified version of Wright's derivation of this result. Wright's own treatment, in EMP, is difficult to follow, and in view of its importance I have provided a guide in Note 3 below.

Even the simplified textbook versions are not always very clear, and I do not know of any wholly satisfactory account. Key assumptions are often not clearly stated or justified. Two relatively good accounts are those of Falconer and Maynard Smith (see Refs.) I will outline a derivation based mainly on Falconer (with some modifications).

Let us assume there is a population of N diploid individuals. Generations are separate. There is no mutation or natural selection in the period under consideration. The n'th generation is designated Gn, the previous generation by Gn-1, the following generation by Gn+1, and so on. The probability that the two genes at the same locus in an individual of Gn are identical is designated CIn, where CI stands for 'coefficient of inbreeding'. (For my approach here it is not necessary to specify whether the genes are identical 'by descent'.) The probability that two randomly selected genes at the same locus in two different individuals of Gn are identical is designated CKn, where CK stands for 'coefficient of kinship'.

For the simplest case, consider a population of hermaphrodites which are capable of self-fertilisation and mate completely at random, including with themselves. (This would be approximately true of some marine invertebrates which release gametes into the water.) From the assumptions of random mating and non-selection it follows that any individual in Gn is equally likely, with probability 2/N, to be a parent of any individual in Gn+1 (since in a stable population each individual will have on average have 2 out of the N surviving offspring). It does not follow that, if we select at random an individual in Gn+1, and then select another, there is a probability of 2/N that the second individual will have the same father (or mother) as the first. For example, if each individual in Gn produced exactly 2 surviving offspring, the probability that a second randomly selected individual in Gn+1 had the same father (or mother) as the first would only be 1/(N-1). To get a probability of 2/N we require an additional assumption, which is technically satisfied by specifying that the number of offspring for individuals follows a Poisson distribution. (This assumption is mentioned by Maynard Smith but not by Falconer.)

With these assumptions, it follows that CIn equals CKn. In the case of CIn, we select a gene at random in Gn, and then inquire whether the other gene at the same locus in that individual is identical. In the case of CKn, we select a gene at random in Gn, and then inquire whether another randomly selected gene at the same locus in a different randomly selected individual is identical to the first gene. But in both cases each gene is a copy of a gene taken absolutely at random from all the genes in Gn-1. The probabilities of identity are therefore the same, and CIn therefore equals CKn. By the same argument it follows that any two randomly selected distinct genes at a locus in Gn have the same probability of being identical, whether they are in the same or different individuals. If we call this probability CDn, we have CDn = CIn = CKn, for any value of n. But CIn can be broken down into two component probabilities. With probability 1/2N, the two genes at a locus in the same individual are copies of the very same gene in Gn-1, in which case they are certainly identical. In all other cases, therefore with probability 1-1/2N, they are copies of two distinct genes in Gn-1, in which case there is a probability CDn-1 that they are identical. But CDn-1 = CIn-1 (since the equality CDn = CIn applies for any value of n). The total probability CIn therefore comes to CIn = 1/2N + (1 - 1/2N)CIn-1. The coefficient of inbreeding in one generation is therefore derivable from the coefficient in the previous generation by a formula involving the addition of 1/2N. It can further be shown, with a little algebraic manipulation, that heterozygosis tends to decline by a factor of (1 - 1/2N) per generation (see Falconer p.64-5 for a proof).

If self-fertilisation is excluded, two genes in the same individual cannot be copies of the very same gene in the previous generation, so the analysis needs to be pushed further back. If mating between different individuals is completely random, including siblings, then CIn = CKn-1. If mating between siblings is excluded, but otherwise random, CIn = CKn-2, and so on. But it is always possible to express the 'coefficient of inbreeding' in one generation in terms of the coefficients in previous generations, and heterozygosis always tends to decline by a factor of (1 - 1/2N) per generation (assuming equal numbers of males and females).

The above argument, like Wright's own, measures the progress of genetic drift by the decline of heterozygosis and the associated increase in the coefficient of inbreeding. It should however be clear that this is not essential. If we wanted to study genetic drift in asexual haploid replicators, such as Y chromosomes, it would be possible to modify the derivation to use only coefficients of kinship, rather than inbreeding. More fundamentally, the process of genetic drift depends not on inbreeding but on the existence of variance in reproductive success. Some genes have no descendants, some have only one, and some have more than one. Over the course of time, more and more lines of descent die out, and the surviving genes are collectively descended from fewer and fewer original ancestors. Ina sexually reproducing population this also leads to increased levels of inbreeding, in a broad sense. If there were no such variance in reproductive success - if every gene had exactly the same number of surviving 'offspring' - there would be no genetic drift. Among diploids, the variance in replication of individual genes is due to two factors: the variance in the number of surviving offspring, and the random allocation of genes to gametes in the process of meiosis. Even if every diploid individual had exactly the same number of surviving offspring, there would still be variance in the replication of individual genes for the second reason. As for the variance in the number of offspring, the assumption of a Poisson distribution is probably not unreasonable in many species, but there could be departures from it in both directions (i.e. either greater or smaller variance). There might also be different variance in the two sexes. For example, among animals like Elephant Seals, the variance among females might be rather small, because all females have a low but steady rate of reproduction, whereas among males the variance would be much higher, as many males have no offspring at all, while a few have a large number. Wright takes account of some of these factors in his discussions of 'effective population size',

This note has only dealt with a few aspects of Wright's work on genetic drift. I have tried to identify the underlying assumptions and (in Note 3) to clarify Wright's most important derivation. None of this says anything one way or the other about the actual importance of genetic drift in evolution. What should be clear is that genetic drift is a weak force except in very small populations, since its effect is inversely proportional to population size. In large populations it would be overpowered by modest rates of selection or migration. (The other factor to consider is mutation, but except in large populations this is an even weaker force than drift, as mutation rates are typically of the order of only 1/100,000 per generation.) I hope to deal with some of these issues in further notes.


Note 1: Suppose we toss a single coin K times, where K is a large number. If the probability of heads is p, the total number of heads will be Kp and the average number of heads per toss will be Kp/K = p. But on each particular trial (the toss of a single coin) there can only be 1 or 0 heads, so we will have Kp trials with the deviation value (1 - p), and K(1 - p) trials with the deviation value (0 - p) = - p. Using the abbreviation q for (1 - p), the variance of the number of heads for trials consisting of a single coin toss is therefore [Kpq^2 + Kqp^2]/K = pq^2 + qp^2 = pq(q + p) = pq. It may seem odd to speak of the variance of the number of heads in trials where there is only one coin per trial, but in principle it is legitimate, and it enables us easily to derive the variance of the number of heads where the trials involve N coins. Since the variance of the sum of a number of independent numerical values equals the sum of the variances of the values individually, the variance of the number of heads in N independent coin tosses, each with variance pq, is simply Npq.

Note 2: The average proportion of heads per trial of N coin tosses, each with probability p, is in the long term p. If X is the number of heads in any particular trial of N coins (where X is a variable), the deviation values of the proportions will be of the form X/N - p = (X - Np)/N, and the variance of the proportions in K trials will be S[(X - Np)/N]^2]/K. But S[(X - Np)/]^2]/K is the variance of the number of heads, which has been proved equal to Npq, so the variance of the proportion is Npq/N^2 = pq/N.

Note 3: This is a commentary on pages 108-110 of ESP, which reprints pages 107-109 of the original paper EMP (the near identity of pagination is just a coincidence). I will mainly be concerned with page 109 of ESP, where Wright derives his fundamental results for the decline of heterozygosis. In following the derivation it is necessary to refer back frequently to the definitions at the bottom of page 108.

Wright assumes that the sexes are separate (so there is no self-fertilisation) but that mating is otherwise completely random, including between siblings. He assumes that there are Nm breeding males and Nf breeding females. With random mating, he states that the proportion of matings between full siblings is 1/NmNf. This evidently assumes that there is a probability of 1/Nm that two mates have the same father, and an independent probability of 1/Nf that they have the same mother (note that m and f stand for male and female, not mother and father). This is actually a strong assumption, which ought to be clearly stated. It assumes (a) that the number of offspring of individuals follows a Poisson distribution (or something similar) and (b) that parents have male and female offspring in the same proportions as in the population generally. This is not necessarily true: for example if some parents had a strong bias towards producing male or female offspring, the probability of mating between siblings would be reduced. (Wright does discuss some of these considerations in the section on 'The Population Number' at pp.111-12 of ESP.)

Wright then gives the proportion of matings between half siblings, and between all less closely related individuals. These depend on the same assumptions as for full siblings.

He then gives a formula for M, the correlation between mates in the current generation. Note that the formula is of the form a'^2b'^2[Z], where Z is a complicated expression in square brackets. From the definitions on p.108 we have a'^2b'^2 = [1/2(1 + F')][(1 + F'')/2], so we have M = [1/2(1 + F')][(1 + F'')/2][Z]. The expression Z can be derived by Wright's method of path analysis. The first component of Z deals with the case of mating between full siblings. If we label the siblings A and B, and their parents C and D, we have two 'direct' paths, ACB and ADB, and two 'indirect' paths, ACDB and ADCB, which involve the correlation M' between mates in the previous generation. Hence the coefficient (2 + 2M') for the first component. For half siblings A and B, there is one shared parent C and two non-shared parents D and E, so there is one direct path, ACB, and the three indirect paths ADCB, ADEB, and ACEB, giving the coefficient (1 + 3M'). For unrelated mates A and B, with the non-shared parents C, D, E and G (to avoid using F, which is already in use), we have no direct paths and four indirect paths, ACGB, ACEB, ADEB, and ADGB, giving the coefficient 4M'.

Next Wright derives an expression for F, the correlation between uniting gametes in the current generation. Here we must note from p.108 that F = b^2M, and b^2 = (1 + F')/2. Using the expression M = [1/2(1 + F')][(1 + F'')/2][Z], we therefore have F = [(1 + F')/2][1/2(1 + F')][(1 + F'')/2][Z] = [(1 + F'')/8][Z]. With a little manipulation, and using the full expression for Z, this can be put in the form F = (1 + F'')[Nm + Nf - M'Nm - M'Nf + 4F'NmNf]/8NmNf . But now we should note that M' is the correlation between mates in the previous generation. We can therefore adapt the equation F = b^2M to get the corresponding equation for the previous generation, i.e. F' = b'^2M'. But b'^2 = (1 + F'')/2, so F' = [(1 + F'')/2]M', and therefore M' = 2F'/(1 + F''). Substituting 2F'/(1 + F'') for M' in the equation F = (1 + F'')[Nm + Nf - M'Nm - M'Nf + 4F'NmNf]/8NmNf, it follows by some grinding but essentially routine algebra that F = Q, where Q is the expression on the right of the second equation on page 109. Then using the definition of P, P', etc, in terms of F, F', etc, the third equation also follows by routine algebra.

This leaves the final death-defying leap to the fourth equation. This is not helped by the puzzling statement that we can equate P/P' to P/P''. This would imply that the proportional change per generation was not just constant but zero, and P/P'' must surely be a misprint for P'/P''. (The fact that this horrible error is not corrected or commented on in the ESP reprint leaves me wondering how closely Provine, as editor, has followed the details of Wright's text.) But even with this correction, it is far from obvious how Wright derives his fourth equation. I had given up hope of solving it until I was reading volume 2 of EGP, and found a discussion of the simpler case of random mating hermaphrodites, which fills in a few gaps in the derivation (see EGP vol 2, p.194-5). First, it confirms the suspicion that P/P'' should be P'/P''. Second it shows (or at least hints) how the problem can be reduced to a quadratic equation. Taking these hints, we can apply them to the fourth equation on p.109. First, rearrange and simplify the third equation to get P - P'[1 - (Nm + Nf)/4NmNf] - P''(Nm - Nf)/8NmNf = 0. Then divide through by P'' to get P/P'' - (P'/P'')[1 - (Nm + Nf)/4NmNf] - (Nm - Nf)/8NmNf = 0. But by assumption P/P' = P'/P'', so P/P'' = (P'/P'')^2 = (P/P')^2. We can therefore treat the equation as a quadratic of the form ax^2 + bx + c = 0, with x = P/P'. This can be solved by the standard method to get (as the larger of the two roots) P/P' = (1/2)[1 - (Nm + Nf)/4NmNf)] + (1/2)[root(1 + [(Nm + Nf)/4NmNf]^2)]. This is nearly Wright's fourth equation. For the final step, we take deltaP to mean P - P', so that - deltaP/P' = - (P/P' - 1). We therefore need only subtract 1 from the expression (1/2)[1 - (Nm + Nf)/4NmNf)] + (1/2)[root(1 + [(Nm + Nf)/4NmNf]^2)], and then reverse the sign, to get Wright's fourth equation.

After this tortuous derivation, the discussion on page 110 of ESP is relatively plain sailing. The only slight puzzle is how Wright gets the approximation at the top of the page. I deduce that he uses the fact that when a is a small fraction, root(1 + a) is approximately equal to 1 + a/2. Taking [(Nm + Nf)/4NmNf]^2 as a, and grinding through the algebra, Wright's approximation can then be verified.

Overall, as often with Wright's work, I am torn between admiration for his ingenuity and frustration at his obscurity.

References:

D. S. Falconer: Introduction to Quantitative Genetics, 3rd edn., 1989. (The 4th edn., by Falconer and Mackay (1995) appears to be the same so far as its treatment of genetic drift is concerned.)

John Maynard Smith: Evolutionary Genetics, 1989.

William B. Provine: Sewall Wright and Evolutionary Biology, 1986.

Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986.

Sewall Wright: 'Evolution in Mendelian Populations', Genetics, 16, 1931, pp.97-159. (Reprinted at pp.98-160 of ESP.)

Sewall Wright: Evolution and the genetics of populations, 4 vols., 1968-1978.

Tuesday, May 06, 2008

Pleiotropy in melanocortin receptors   posted by p-ter @ 5/06/2008 09:47:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

In the comments here, rosko points me to a study on the effects on MC4R, a gene implicated in natural variation in human weight, on pathways involved in sexual function. It's well known, of course, that genetic pathways can be involved in multiple physiological processes--in particular, signaling pathway can generate many different phenotypes depending on what the downstream target of the signal is.

The effects of MC4R simulation in humans are, as rosko comments, kind of interesting:
Methods. Ten subjects were enrolled in a double-blind, placebo-controlled, crossover study. Melanotan II (0.025 mg/kg) and vehicle were each administered twice by subcutaneous injection; real-time RigiScan monitoring and a visual analog were used to quantify the erections during a 6-hour period. The level of sexual desire and side effects were recorded with a questionnaire.

Results. Melanotan II initiated subjectively reported erections in 12 of 19 injections versus only 1 of 21 doses of placebo. The mean rigidity score of the responders was 6.9 on a scale of 0 to 10. The mean duration of tip rigidity greater than 80% was 45.3 minutes with Melanotan II versus 1.9 for placebo (P = 0.047). The level of sexual desire after injection was significantly higher after Melanotan II administration than after placebo. Nausea and stretching/yawning occurred more frequently with Melanotan II, and 4 of 19 injections were associated with severe nausea.
I wondered what a "Rigiscan" is--find out here. Hypothetically, one could test whether natural variation in sexual behavior in humans is also affected by MC4R polymorphism, though I can't imagine that being a particularly fun study to carry out (one for agnostic's new series? 23andme + free time = association studies about erections).

This reminds of the MC1R story about increased pain sensitivity in redheads in the vague sense that both involve melanocortin receptors and pleiotropy.

Labels:



Get off your ass and start this project: Viral videos   posted by agnostic @ 5/06/2008 04:42:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Having already motivated this series, I'll provide the first example of how to put your time to more productive use than participating in the WikiProject G.I. Joe or the still more urgent WikiProject Transformers. If you get interesting results, post them on your blog and provide a link in the comments here. I'll gather up all the results after awhile and summarize them in a follow-up post.

Purpose

Cultural transmission has often been described verbally as viral. Mathematical models of culture incorporate this idea by borrowing epidemic disease models from biology. The goal here is to see if data on "viral videos" support the infectious model of culture.

Pre-requisites

To collect and analyze data: high school algebra, including familiarity with exponentials and logarithms. To get the theory behind the model: first-semester calculus and preferably an understanding of phase plane methods to study a two-variable system of ordinary differenital equations. [1] No knowledge of biology or culture is needed.

Details

Edelstein-Keshet's Mathematical Models in Biology (pp. 242 - 254) provides a good overview of the most basic models of epidemic diseases, and that's what I'm adapting here. In brief, we track the growth rates of two or three classes of hosts: Susceptibles (S), Infectives (I), and perhaps Recovereds (R). The names mean what you think. In the case of viral videos, you can never undo having seen the video, so we will use a very simple model, illustrated below:


If you haven't seen the video yet, you're Susceptible, while if you have seen it, you're Infective. The idea is that someone who's seen the video tells their friends about it in some way, and their friends watch it in their turn. A Susceptible is turned into an Infective at a rate b, so that this parameter measures the infectivity of the video.

We ignore the part of the population that has immunity to the video -- perhaps because they are not in the target demographic group -- and only track those who could be or are infected. We also assume that on the time-scale that the video spreads, the population is constant in size, which seems realistic in this case.

We can write down a system of differential equations for the above picture:

dS / dt = - bSI

dI / dt = bSI

(The product SI is used as an analogy with chemistry's law of mass action for particles that knock into or interact with each other.)

We notice that dS / dt + dI / dt = 0, which means that S + I = constant, call it N. In other words, if we know the number of Infectives, we automatically know the number of Susceptibles -- it is just N - I. Therefore, we don't have to keep a separate tally of the change in S and can eliminate the first equation. Subsituting S = N - I into the second equation, our system becomes just:

dI / dt = bI(N - I)

Let's make a natural change of variables:

i = I / N

g = bN

So, i is the fraction of the population that is Infective, and the growth rate g has units of inverse time (where b had units of 1 / (people * time)). The equation now only has one variable and one parameter:

di / dt = gi(1 - i)

This is the famous logistic equation, which you may already have seen in the context of saturating population growth or the spread of a favored allele to fixation. (The analogies between these three processes are reflected in their being modeled by the same equation, which underscores the importance of formalizing your intuition.)

At equilibrium, the fraction that is Infective does not change, so di / dt = 0. This happens when either i = 0 or i = 1. When i is between 0 and 1, di / dt is positive, so as long as i is not exactly 0, i will increase as time increases and will ultimately end up at i = 1. In other words, i = 0 is an unstable steady-state since a small increase will push it to i = 1, which is stable.

This model may seem simplistic since it implies that every single Susceptible will be eventually see the video, but that's not so unrealistic when you recall that we're only considering the population of the video's target audience -- in 1993, how many teenagers who had TVs in their homes never, ever saw that Blind Melon video with the bee girl? Some videos may have larger or smaller target audiences, i.e. larger or smaller values of the parameter N.

Getting and Analyzing Data

It is impractical for someone without the funding to survey a large random sample of the target audience to attempt to do so. Therefore, we would measure a good enough proxy: the view count for a YouTube video, tracked over time. Depending on how rapidly you think it will increase, you may want to measure it every 6 hours, or once a day. If it grows logistically, it should accelerate first and then still increase but decelerate until it more or less plateaus, like this the picture in the Wikipedia page on the logistic function.

Ideally, you want to track a video that is the only one of its kind -- if there are multiple copies of the same video, that complicates things somewhat, but you might ignore that (see Further Avenues). For example, if a YouTube celebrity is able to curb reproductions of their videos, you can simply wait until their channel puts out a new video. The infectious process would go something like, "Omigod, So-and-So just put out a new video -- you have to see it!" This would be easier if they update fairly infrequently, so that word-of-mouth transmission were the primary route of infection.

Another idea is to wait for the music video of a popular song to come out, but this requires that you be pretty savvy about music trends, and here the potential of multiple copies is even more serious, as fans download it and upload it themselves.

Fad news items are another source, like when that retarded brat got tasered in the UCLA library. Again, multiple copies of it will probably appear.

Assuming you got something like logistic growth, here's how you estimate the parameters N and b. Well, N would just be whatever the plateau value seems to be, so you'll have to wait for it to do so first. It can be shown that the solution to the logistic equation can be re-arranged to yield:

ln ((N - I) / I) = - bt + ln ((N - I0) / I0)

Where N is the max view count, I0 is the initial view count -- pick some small number -- I is the view count at time t, and b is the infectivity rate. So after you've got a concrete number for N and I0, you'd plot (N - I) / I on a ln scale -- it will be a linear function of t, with y-intercept = ln ((N - I0) / I0) and slope = - b. See what b turns out to be.

Then compare values of N and b for different videos. If N is larger for one, that means the target audience is larger (ignoring the fact that a single person may watch a given video multiple times -- that's true for any video, no matter the target audience's size). If b is larger for one, that means it's more infectious.

Further Avenues

If you really kept this going long-term, you could try to classify the videos you're tracking by category and do an analysis of variance or something to see what accounts for the variation in the target audience size and in the infectiousness of a video. Intuitively, we expect more sensational videos to have higher b -- but a hard analysis might tell us more concretely what types of things count as more sensational. We have guesses about that, but we need data to see if those guesses are right.

To make the model a bit more complex, you could introduce a stochastic component to the growth equation. Right now, it is deterministic: as long as the ball gets rolling, everyone in the target audience will see the video. But when few people have seen the video, chance effects could push one ball up while letting another ball stay put. This is like when several copies of a favored allele are introduced into a population -- some will be lost by drift, while another may be propelled quickly by drift at the start, after which point the deterministic equations take over. You would model it just like the frequency of a favored allele under the combined effects of drift and directional selection.

In the context of viral videos, consider multiple copies of the same music video (again due to fans downloading the video from the official channel and uploading it to their own channel). As the target audience does a search for this video, chance effects may propel one of the copies up very quickly while letting other copies languish with low view counts. In this case, the copy that ends up dominating the market increases even more rapidly than under the purely deterministic model because it got a lucky big initial boost through sheer chance effects.

This is what makes is somewhat inappropriate to compare a video that only has one copy to see vs. a video with multiple copies to see. The winner in the latter group will appear more infectious than the former, since it increases much faster, but part of that higher increase is accounted for by chance.

[1] MIT's Open CourseWare site has a great mathematics section that allows you to teach yourself or brush up on these areas. Especially useful is the course on differential equations, which has a full set of video lectures, solved problem sets, exams, and helpful Java applets. An easy to use phase plane applet is pplane.

Labels:


Monday, May 05, 2008

Get off your ass and do this study: Introductory pep talk   posted by agnostic @ 5/05/2008 01:19:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

I was recently directed to this panegyric on Wikipedia, which claims that editing Wikipedia is a better use of the cognitive surplus that might otherwise be spent watching TV. Like 99% of technology pundits, the author is so out of touch with reality that it is not worth taking him to task in depth. Instead, reading that has moved me to begin a regular column wherein I propose a fairly simple study for someone to carry out and increase our understanding of the world.

In fairness, it is often tough to think of a study to do, or how you would concretely carry it out. But since my soul is a font of generosity, I'm literally giving ideas away. We only have so much time and effort to invest in a project, so I have plenty of ideas that I just don't have time to pursue in any depth. Obviously I will keep what I think are the more original or important ones for myself, but there are several reasons why pursuing seemingly unoriginal ideas is still useful:

1) It gives you good practical experience. If you've never tried to find a good dataset that would answer your question, if you've never tried to analyze and summarize the data, and if you've never interpreted these findings in context for the target audience -- well, time to start.

2) Many supposedly established findings have the smell of academic urban legends because they are based on a single study that used an unimpressive sample size and didn't take into account some obvious confounding factors. Yet once it gets cited, it takes on a life of its own, as no one reads the original but simply "knows that study X showed Y." Replication studies are crucial to figure out if we were right.

3) Most published articles aren't terribly original anyway -- "here's yet another example of natural selection at work!" Still, the more astounding the mountain of evidence becomes, the more convinced we become that we are right. There are probably diminishing returns, though: we don't need yet another study showing that cognitive abilities all correlate with each other, but how pervasive is the influence of IQ -- do smart vs. dumb people prefer different types of art?

4) More mundane studies are easier to carry out, so you're not intimidated by the prospect of hunting down a solution to a Great Big Problem. (And if you liked chasing after Great Big Problems, you'd probably already be in academia or a private institute doing that, or preparing to do so in the near future.)

5) If the original study or idea was done awhile ago, improved technology may allow you to take a more in-depth look at it. For example, computers were pretty pathetic in the 1960s, and I'll be there are scores of dusty studies that would benefit from the power of modern home computers.

6) For mathematical models, the properties of a particular model may be so well known that you couldn't hope to contribute anything new on the abstract level. However, you could provide a novel interpretation of it by showing how it also models a phenomenon that no one has applied it to before. This is especially true for fields were the experts don't have much training in modeling, which tend to focus on human beings. Sociology is a perfect example -- here's a field that assumes the primary unit of society is the group, and that groups conflict and interact, while ignoring the individual differences within each group. This isn't a slight to the field, since there are group dynamics. Sociology cries out for differential equation models, where you ignore individuals and track classes of things, and typically only two or three classes!

7) For the studies that I will propose, the data would not be hard to collect, although the process from start to finish may be laborious (hey, that's life). So, I will not suggest studies that require fancy equipment, hundreds of unpaid volunteer subjects, and so on. If it is applying a well understood mathematical model to some new phenomenon, almost all of the work will already be done. However, I realize that we do have academic readers too, or readers who have graduate student friends in need of a study to publish, so occasionally I will propose something that would require access to many volunteers.

Now, I don't have anything against editing Wikipedia or blogging per se, but let's get very real: most of it is a waste of time, which is why almost no academics do it. There are exceptional areas of Wikipedia, and there are exceptions in the blogosphere -- well, obviously we are, and so are bloggers like Steve Sailer, Audacious Epigone, Half Sigma, Inductivist, and others who obtain and analyze data to answer a question or hunch. If the blog is just a hobby, an afterthought after real work is being done in real life (as with my personal blog), that's OK too.

What I want to see die is the practice of intellectual masturbation, where you only fool your brain into thinking that fruitful work is being done. "Participation" per se is no valid criterion for success -- I can participate in an act of masturbation, perhaps even while participating with others in a circle jerk, but I've only really accomplished something when I've contributed to increasing the fertility rate. Fortunately for everyone, though, the real world offers an abundance of problems begging to be fertilized by the seed of your brain -- get in there and tear that shit up.

Labels:



Squirrel Fun   posted by DavidB @ 5/05/2008 09:00:00 AM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

For many years, Grey Squirrels (an introduced North American species) has been driving out the indigenous Red Squirrel over most of mainland Britain. But now it is reported that a mutant black variety of the Grey Squirrel is threatening to displace the Greys. Apparently, the black ones have higher testosterone levels, are more aggressive, and more attractive to the lady squirrels. (Don't worry, our White Nationalist readers, this isn't a parable. I think.)

Joking apart, the real interest of this is that it seems to be a case of a single mutation with a relatively conspicuous phenotypic effect having a strong evolutionary advantage, somewhat contrary to Darwin/Fisher orthodoxy. There is of course another example in the case of industrial melanism.

Sunday, May 04, 2008

Weight and genetics   posted by p-ter @ 5/04/2008 08:20:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Two studies report this week on the association of variation near MC4R with body mass. This is the second convincingly replicated locus to be implicated in natural variation in weight, the first being FTO. There are a couple reasons I find this association interesting.

1. Coding mutations in MC4R are known to cause severe obesity. It's to be expected that less severe mutations (the region of the genome implicated in these studies is likely regulatory) could lead to more subtle effects on body weight, but it didn't have to be that way. And this forms part of a pattern that genes that cause Mendelian forms of a disease are also associated with more common forms as well. Why is this interesting? It suggests that the candidate gene approach to finding allele associated with disease wasn't as flawed as people thought--it's just that they were all severely underpowered (the number of individuals in these studies, for example, tops 60,000).

2. One of the studies performed their association study in individuals of Indian descent. This is one of the first GWA studies to focus on a non-European population--a development that will hopefully continue. Insofar as allele frequencies vary among populations, studies of the same phenotype in different populations may get quite different results (note that studies of skin pigmentation in Europeans don't identify SLC24A5, but studies in South Asians do--the reason is that the relevant variant in the gene is fixed in Europe but at moderate frequency in India). Population genetics has always had a role in the rational choice of study population for association studies, but as all the low-hanging fruit gets taken, this role will perhaps become more pronounced.

Labels:



Strange Bedfellows   posted by DavidB @ 5/04/2008 05:49:00 AM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

For general amusement, see this report from the BBC. (Via John Hawks.)

Tuesday, April 29, 2008

Doping & genetic background?   posted by Razib @ 4/29/2008 09:13:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Some Athletes' Genes Help Outwit Doping Test :
The 55 men in a drug doping study in Sweden were normal and healthy. And all agreed, for the sake of science, to be injected with testosterone and then undergo the standard urine test to screen for doping with the hormone.

The results were unambiguous: the test worked for most of the men, showing that they had taken the drug. But 17 of the men tested negative. Their urine seemed fine, with no excess testosterone even though the men clearly had taken the drug.

It was, researchers say, a striking demonstration of a genetic discovery. Those 17 men can build muscles with testosterone, they respond normally to the hormone, but they are missing both copies of a gene used to convert the testosterone into a form that dissolves in urine. The result is that they may be able to take testosterone with impunity.

The gene deletion is especially common in Asian men, notes Jenny Jakobsson Schulze, a molecular geneticist at the Karolinska University Hospital in Stockholm. Dr. Schulze is the first author of the testosterone study, published recently in The Journal of Clinical Endocrinology and Metabolism.


The whole "Asian" angle wouldn't be as important from where I stand if China wasn't intent on becoming an athletic superpower. Specifically, from Doping Test Results Dependent on Genotype of UGT2B17, the Major Enzyme for Testosterone Glucuronidation:
We demonstrated that a deletion polymorphism in the gene coding for UGT2B17...is strongly associated with TG levels in urine...All subjects devoid of the gene had a T/E ratio below 0.4...This polymorphism was considerably more common in a Korean Asian than in a Swedish Caucasian population, with 66.7 and 9.3 % deletion/deletion (del/del) homozygotes respectively.


They don't seem to know what SNP is causing this. If you are curious, you can check out the linkage disequilibrium around UGT2B17.

Labels:



What predicts Creationism?   posted by Razib @ 4/29/2008 12:02:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Public Acceptance of Evolution



evolutioncreation.jpg

evolutioncreation2.jpg

evolutioncreation3.jpg

Labels:



Ben Stein is a barbarian?   posted by Razib @ 4/29/2008 10:40:00 AM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

John Derbyshire has a long column excoriating Ben Stein and the Discovery Institute titled A Blood Libel on Our Civilization:
And there is science, perhaps the greatest of all our achievements, because nowhere else on earth did it appear. China, India, the Muslim world, all had fine cities and systems of law, architecture and painting, poetry and prose, religion and philosophy. None of them ever accomplished what began in northwest Europe in the later 17th century, though: a scientific revolution. Thoughtful men and women came together in learned societies to compare notes on their observations of the natural world, to test their ideas in experiments, and in reasoned argument against the ideas of others, and to publish their results in learned journals. A body of common knowledge gradually accumulated. Patterns were observed, laws discerned and stated.

...

The "intelligent design" hoax is not merely non-science, nor even merely anti-science; it is anti-civilization. It is an appeal to barbarism, to the sensibilities of those Apaches, made by people who lack the imaginative power to know the horrors of true barbarism. (A thing that cannot be said of Darwin. See Chapter X of Voyage of the Beagle.)


Via Talk Islam.

Update: John also rips David Berlinski a new one. Via Quantum Ghosts.

Labels:


Sunday, April 27, 2008

The rise of Literature?   posted by Razib @ 4/27/2008 09:53:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

For a few weeks I've been mulling over a "theory" about the nature of contemporary fiction. The quotes are because this is a theory in the way that normal people have theories; they don't know much and just make up plausible (to their mind) models that are ultimately grounded in a whole lot of ignorance. I really don't know much here, and I strongly suspect I'm wrong, but I can't help but express an opinion in public though I feel I shouldn't because of my admitted ignorance. To some extent I'm putting this post up to be enlightened by readers who do know a great deal more about letters (e.g., The Man Who is Thursday, who should also resize the little dog so his front page load doesn't go well north of 300 K).

Here's the argument: contemporary mainstream fiction is very different from the storytelling of the deep past because of a demand side shift. Women consume most fiction today, and their tastes differ, on average, from those of men. How do they differ? To be short about it men are into plot, while women are into character. This means that modern literary fiction emphasizes psychological complexity, subtly and finesse. In contrast, male-oriented action adventure or science fiction exhibits a tendency toward flat monochromatic characters and a reliance on interesting events and twists. Over my lifetime I've read a fair amount; but the vast majority of the fiction has been science fiction & fantasy. Many males outgrow this bias, perhaps as they become more psychologically complex and nuanced, but I haven't (though I don't read much fiction in general at this point). I know many other males who are similar; we aren't dumb, and not all of us have Asperger's. We just aren't interested into characterization or character. We are people of exotic ideas, novelty of story arc and exploration of startling landscapes. Contemporary mainstream fiction, high, middlebrow and low, does not usually satisfy these needs.

But ancient fiction; epics, myths, etc., do fulfill these requirements. I didn't seek out fiction in any form before I was 13 or so (I was assigned books in school of course); but I had read Bullfinch's Mythology as well as translations of the Iliad and Gilgamesh. In hindsight I suspect that my interest in these works is due to the fact that they are recognizably High Fantasy. Either they are explicit myths, or, they refer to peoples and places whose lack of banality is due to their distance in time & space (obviously I have never been to the Zagros mountains!). I also have read historical fiction which is sufficiently distant in time, e.g., the whole of Colleen McCullough's Masters of Rome series.

To some extent if you know me in person you can see that I'm not interested in the details of the characters of other human beings. I'm somewhat along the autism spectrum toward Asperger's. I'm not the type to lose myself in a story, and I'm not really interested in most horror films because I have a hard time getting scared or identifying with the characters (I can't forget it's just a movie and the people aren't real). It seems clear to me why I have a hard time being interested in mainstream fiction; not only am I not interested in the characters, but I'm just not like most of the people depicted in terms of their values or personality. I can't "relate," and I'm not interested in "relating."

If you read Isaac Asimov's biography, In Memory Yet Green, I think you get a sense of why his novels depict flat characters. Though Asimov seems to be a gregarious individual, he was very narcissistic and self-involved. I don't get a sense that he was a socially sensitive soul (though he did resent the anti-Semitism he had experienced or slights from strangers). Asimov wrote something of an apologia for science fiction as a genre of ideas, but I think it reflects the set of values which I've expressed above and which many science fiction oriented individuals embody; plots, not people. (if you want every stereotype of science fiction readers confirmed, check out William Sims Bainbridge's Dimensions of Science Fiction, which is based on surveys at science fiction conventions)

For whatever reason Our Kind of People don't become literary critics or arbiters of taste & sophistication. Science fiction & fantasy can never be Great Fiction. If a work of science fiction & fantasy is Great Fiction then by definition it is not science fiction & fantasy. Slaughterhouse-Five, Brave New World and 1984 are not science fiction. Within the science fiction ghetto authors such as Ursula K. Le Guin and Ray Bradbury, who admit or manifest little interest in science as such and emphasize literary values and social messages (especially Le Guin for the latter), are held up as the great authors who are acceptable. In other words, authors for whom psychological exploration just happens to involve a spaceship in the background.

Why does any of this matter? For one, I think that it is somewhat peculiar that many of us find fiction from the past more engaging than popular contemporary works. Aupelius' Golden Ass gets my attention; most contemporary fiction does not. I am arguing here that this is partly due to the fact that in the past those who read copiously were, on average, much more like me than they were like the typical human. Not only were readers by and large men (usually of some means and comfort), but they were often also disproportionately eggheads who were eccentric by their nature. How many elite scholars were there such as Claudius who were not attracted to the public life of politics and do not appear in the annals of history? With the printing press, cheaper paper, and the rise of mass literacy,1 things changed, the distribution of taste shifted. And so did the distribution of genres.

So am I full of crap?

Addendum: I also think there is a supply-side issue; female authors tend to produce a particular type of work. This is evident within science fiction; female authors are underrepresented in hard science fiction. Here is something from the Wikipedia entry for the Tales of Genji:
The Tale of Genji...is a classic work of Japanese literature attributed to the Japanese noblewoman Murasaki Shikibu in the early eleventh century, around the peak of the Heian Period. It is sometimes called the world's first novel, the first modern novel, or the first novel to still be considered a classic. This issue is a matter of debate. See Stature below.

...

The Genji is also often referred to as "the first novel", though there is considerable debate over this - some of the debate involving whether Genji can even be considered a "novel". Some consider the psychological insight, complexity, and unity of the work to qualify it for "novel" status while simultaneously disqualifying earlier works. Others see these arguments as subjective and unconvincing. Related claims, perhaps in an attempt to sidestep these debates, are that Genji is the "first psychological novel", "the first novel still considered to be a classic", or other more qualified terms. It is, however, difficult to claim that it is the world's first novel without denying the claims of Daphnis and Chloe and Aethiopica in Greek, which author Longus and an unknown sophist respectively wrote, both around the third century, and in Latin, Petronius's Satyricon in the first century and Apuleius's Golden Ass in the second, as well as Kadambari in Sanskrit which author Banabhatta wrote in the seventh century. (The debate exists in Japanese as well, with comparison between the terms monogatari -- "tale" -- and shosetsu -- "novel".)


The first psychological novel? Sounds really boring (though it seems like she makes an attempt at plot, so perhaps I should check this out. I enjoyed Musashi, whose author was influenced by the Tales of Genji).

1 - I am not convinced that even the Athenian democracy was characterized by mass literary. See Ancient Literacy.

Labels:


Saturday, April 26, 2008

Gene Genie #30   posted by Razib @ 4/26/2008 11:26:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Over at my other weblog.

Labels: