|
Friday, November 30, 2007
A few months ago, I pointed out a paper identifying variants near the FTO gene as being involved in obesity. I noted how strikingly little was known about this gene, concluding:
So essentially, nothing is known about this gene. Thanks to this study, this is unlikely to be the case for long.Little did I know it would only take a few months to get the ball rolling! From this week's Science: Variants in the FTO (fat mass and obesity associated) gene are associated with increased body mass index in humans. Here, we show by bioinformatics analysis that FTO shares sequence motifs with Fe(II)- and 2-oxoglutarate–dependent oxygenases. We find that recombinant murine Fto catalyzes the Fe(II)- and 2OG-dependent demethylation of 3-methylthymine in single-stranded DNA, with concomitant production of succinate, formaldehyde, and carbon dioxide. Consistent with a potential role in nucleic acid demethylation, Fto localizes to the nucleus in transfected cells. Studies of wild-type mice indicate that Fto messenger RNA (mRNA) is most abundant in the brain, particularly in hypothalamic nuclei governing energy balance...This is an absolutely beautiful example of the hypothesis-generating power of genome-wide association studies. Studying the genetic variation underlying a trait is simply a great way to get at the mechanism by which the trait works. This point is lost on many people--even if the "environment", however you want to define it, plays the most important role in a trait (like it may in obesity, for example), there are an infinite number of hypotheses about which environmental variables might be relevant, and once you find a correlation, it's both difficult to establish causality and you get very little information about the mechanism by which the trait works (yes, eating a lot leads to increased weight in most people, but how?). In genetics, there is a finite number of hypotheses (there are many millions of genetic variants in humans, and all of them will eventually be testable), the road to establishing causality is much clearer (ie. this genetic variant leads to increased probability of obesity--it would be difficult to argue the inverse), and you immediately have your foot in the door to study the molecules involved in the trait. Again, this is a wonderful example of all of these points. Labels: Genetics
Happy 100 years Jacques Barzun!!!! Check out Jacques Barzun Centennial for a list of resources.
Wednesday, November 28, 2007
I have previously reported on the annual education statistics in Britain (e.g. here), so I will give an update for 2006-07. Figures have just been published for performance at the GCSE examinations, taken by most children at age 16. An official press release is here. Performance by children of all ethnic groups continues to improve (as measured by examination grades). The press release highlights the fact that the gap between ethnic groups is narrowing. Actually, this is not strictly true. The 'narrowing' is specifically between children of Black (Caribbean or African) origin and the (mainly White) average. Other 'gaps' are constant or widening. Children of Pakistani origin have the lowest rate of improvement, and have now been overtaken by Black Africans.
I won't discuss the vexed question whether improvement in examination results actually indicates any improvement in education. But I guess (unless anyone knows reasons to the contrary) that the changes in differentials between ethnic groups are real and not artificial; for example, I can't see any reason why the testing system should be biased against Pakistanis but not Bangladeshis. Added on 1 December: In comments the point has been made that a general rising trend will tend to suppress differentials. In general this is a good point, but as I mentioned in my original post, not all of the gaps are narrowing. The pattern is more complex. Also, the (mainly White) average is still nowhere near the ceiling. It has also been suggested that the narrowing gap between Black African and Black Caribbean and White children could be due to increasing proportions of mixed-race children. This should not be the case. The statistics classify the various mixed-race groups separately, so provided the children are correctly classified this should not be a problem. Finally, I should warn against taking these results as indicators of IQ. No doubt there is some correlation with IQ, but it can hardly be very close, as girls have much better GCSE results than boys despite similar IQ. I would suggest that before making further comments readers should consult the original statistics. Go here, click on the link marked 'EXCEL', then go to Table 8 for the GCSE figures. (If you don't have an Excel reader there are free downloads on the web.)
Tuesday, November 27, 2007
Mark Liberman has updated his post on race and IQ in response to my post. I actually wrote out a long response and deleted it--believe it or not, I have about as much of a desire to get sucked into this conversation as he does. But I strongly, strongly disagree with his claim that showing two populations have different distributions of IQ and claiming genetics plays a role is, in itself, a "racist theory". My point in the post was that the basic premises of Saletan's article (ie. that there are aspects of "intelligence" that are socially relevant, probably have a genetic component, and differ in distribution across populations) are entirely accepted by Shalizi (ok, he "might agree" with them). This is because they're obvious in the light of evolutionary theory (allele frequencies evolve by natural selection and genetic drift. This includes alleles involved in socially awkward phenotypes like IQ). I'm not opposed to people holding out for more evidence, but imputing nefarious motives to writers for talking about the evidence that exists I do find questionable.
Sunday, November 25, 2007
We have already seen that female adult film stars are just average in height, while sexy celebrities are a half-sigma above-average. However, consider the heights of the 2007 Miss World contestants, whose median is 68.9 in (N = 106), a clearly significant difference from the US mean of 64.1 in. (where an SD = 3 in.), let alone the mean height of the second and third-world countries that most of the contestants come from. And though I can't find the original cite, many websites quote the Association of Modeling Agents as saying that female models should be at least 5'8. So, if tall women are not more attractive physically, as the first two data-sets suggest, but are more glamorous or prestigious, as the latter two suggest, there is a simple account of all of the data.First, it's worth reviewing a few key facts about tall women and successful men, which come from Jensen & Sinha's (1993) excellent review of the physical correlates of intelligence. There is a positive correlation between height and IQ of roughly 0.2 -- however, all of this is due to between-family differences, as there is no within-family differences. In other words, while members of tall families tend to be smart, within a given family, there is no relationship between height and IQ. Therefore, we can rule out some genetic causes such as pleiotropy, where a gene has effects on more than one trait; and genetic linkage, where genes for height may lie close to genes for IQ and be pulled along with them like teammates in a game of Red Rover. Common environmental causes like nutrition do not likely account for the pattern since most of the data comes from first-world populations not subject to much environmental stress, and also because the height-IQ correlation holds even among those with gifted IQs, who do no inhabit slums or want for basic nutrients. Interestingly, the height-IQ correlation is entirely due to differences in leg length, since the correlation vanishes when sitting height is used instead of standing height. The simplest explanation that Jensen & Sinha propose is that there is cross-assortative mating between female leg length and male IQ. They summarize several studies which show that tall women, no matter what economic class they are born into, tend to climb the economic ladder more easily and marry higher-status husbands. That pools tall and smart genes into the same family, but any given kid of theirs doesn't get to pick and choose which parent he gets his height or IQ genes from, which explains why height and IQ are uncorrelated within families. Moreover, this is not a pattern only among the rich and bright: at every level of IQ, the pattern holds. Jensen & Sinha suggest that men find tall women physically more attractive, and they mention the heights of Miss Universe contestants as support. But as we've seen, beauty pageant contestants and runway models are an entirely different group from adult film stars and sexy celebrities, who more accurately reflect what males find physically attractive. Therefore, to the extent that tall women are preferred as mates, it is probably so that the man can show her off as a hard-to-acquire status symbol, like a Porsche. This is an honest signal of high status since you don't have to conduct studies to know that a guy with a tall wife is far more likely to be a somebody than a nobody. That's especially true when the woman is not just tall, but taller than her mate, as shown in this gallery of famous shorter man / taller woman couples. We leave aside what makes a man high-status -- it could include wealth, power independent of wealth (as with Dennis Kucinich), and so on. Broadcasting his status in this way might allow him to attract the attention of a large number of attractive onlooking females, who he may then seek on-the-sly copulations with. It may also allow him to be taken more seriously by his male colleagues and inferiors, and so to rise further in status: "Hey, that guy has a 6' tall wife -- he must be a real go-getter." Both of these effects serve to increase his reproductive fitness. And importantly, parading around your tall wife is a far less vulgar signal of status than, for example, driving up in an obscenely expensive car or sporting tons of jewelry. Consequently, the man does not suffer a loss of reputation as he would with those other signals, and because it is less conspicuous, he is less likely to draw the ire of those around him. He will provoke class envy in them, for sure, but he has to be careful not to enrage or offend them either, since social politics are central to his status. Finally, because height is highly heritable, he may seek a taller wife more as a long-term wife than a short-term fling since he is concerned about the upward mobility of the children who he invests in. Mating with a tall woman will give his kids a leg up in the status competition. In the case of on-the-sly mating, he will not invest much in them later on, so he could be less worried about their social mobility -- just have a lot of them and hope some do well. Reference: Jensen, A. & S. Sinha (1993). Physical correlates of human intelligence. In P. Vernon (Ed.), Biological approaches to the study of human intelligence, pp. 139-242. Labels: babes and hunks, height, reproductive strategies, status
Linguist: I can use R, you can't. Thus, your motives are questionable. QED.
posted by p-ter @ 9:18 AM
Mark Liberman at Language Log (a blog which I very much enjoy, I should point out) approvingly links to Cosma Shalizi's rant against Slate for publishing a series of articles on race and IQ. His conclusion:
So to start with, you should ask yourself whether you can define and calculate the variance of a set of numbers, or the correlation between two sequenccs of numbers. If not, then read the (linked) wikipedia articles -- and spend a little time playing with the concepts in the context of an interactive program like R. Once you've paid that entry fee, read Cosma's posts. (It's more fun that you might think -- I especially recommend the discussion of the heritability of zip codes, and you could go back and read the prequel about the heritability of accent.) And then go through William Saletan's articles, and decide for yourself what they mean about the abilities and motivations of the writer and his editors.It's amazing how quickly people go from simple disagreement to armchair psychologist mode; a little perspective is in order here. Dr. Liberman assumes that Cosma concludes that heritability estimates are worthless. This is not the case. Cosma points out that estimating heritability involves making assumptions that are often incorrect, but (I feel like I've said this many times before) all models are wrong, but some are useful. And buried in his prose (which contains many important, ill-understood points about the estimation of heritability), he cites a nice paper on the heritability of IQ, which concludes for a narrow-sense heritability of ~0.34 (that is, additive genetic factors account for ~34% of the variance in IQ, see the linked post). Cosma wants to add additional parameters to this model before he makes any definitive statements, but he can't bring himself to treat IQ differently than other traits: If you put a gun to my head and asked me to guess [whether there are genetic variants that contribute to IQ], and I couldn't tell what answer you wanted to hear, I'd say that my suspicion is that there are, mostly on the strength of analogy to other areas of biology where we know much more. I would then - cautiously, because you have a gun to my head - suggest that you read, say, Dobzhansky on the distinction between "human equality" and "genetic identity", and ask why it is so important to you that IQ be heritable and unchangeable.So if he had to guess, there is probably a genetic component to IQ, environment also plays a role, and human equality is not dependent on genetic identity. Seriously, read Saletan's column--these are exactly his points! Referring back to my point about the utility of incorrect models, it's worth noting that, if you don't accept any of the heritability estimates proposed in humans, you're rejecting that any trait could be determined to have a genetic component before, oh, 2001. I don't think that's a good idea, and here's why: the heritability of type II diabetes was estimated at a "mere" 0.25 (using all those horribly flawed methods, and including, since it is a dichotomous trait, even more assumptions); now molecular studies have identified at least 9 loci involved in the disease. The heritability of Type I diabetes was estimated at about 0.88; now, there are 10 loci undoubtably associated with the disease. There are other examples, and more sure to come, but suffice it to say that heritability studies, with all their seemingly ridiculous assumptions, are not worthless. Now look to Cosma's post on g. Again, this time in the footnotes, we see something in line with Saletan's article. Referring to the observation by economist Tyler Cowen that some people he knew in a village in Mexico were smart in ways not measureable by IQ tests, he writes: Cowen points out behaviors which call for intelligence, in the ordinary meaning of the word, and that these intelligent people would score badly on IQ tests. A reasonable counter-argument would be something like: "It's true that 'intelligence', in the ordinary sense, is a very broad and imprecise concept, and it's not surprising the tests don't capture it perfectly. But the aspects of 'intelligence' they do capture are ones which are vastly more important for economic development than the ones displayed by Cowen's friends in San Agustin Oapan, however amiable or even admirable those traits might be in their own right." This would be a position about which one could have a rational argument. (Indeed, I might even agree with that statement, as far as it goes, as might A. R. Luria.)So Cosma "might" agree that intelligence, as operationally defined by psychologists, is important for economic development and differs in distribution between groups. Interesting. Cosma's posts seem to follow any discussion of IQ around in the "blogosphere". They're well-written, include legitimate discussion of many important issues in quantitative genetics and IQ testing (ok, I don't know much about IQ testing, but I'm assured this is the case by people who do), and come from an authority. But for whatever reason (I'm tempted to think that people don't actually read what he writes. I mean, it has, like, math and stuff), he's interpreted as saying that intelligence tests and the concept of heritability are entirely meaningless. That is not the case. Labels: Genetics, IQ, Statistics
As a follow-up on a previous post about the heights of female sex symbols picked from the pool of celebrities, which found that they're about 1/2 SD above-average, let's now look at how tall adult film stars are. They're worth examining since they are chosen almost exclusively based on how attractive they are to the average male consumer, not how elegant or confident they appear. The website of the modeling agency that hosts the more elite stars -- LA Direct Models (NSFW) -- has height data for all but a couple of their members. If anything, these data are probably biased toward taller height since everyone lies in the upward direction.
Here is the frequency distribution of this sample of 121: The mean is 64.5 in., the SD is 2.6 in., and the skewness is 0.24, which indicates it is weakly positively skewed (more of the points are bunch around the lower end). In a representative sample of the general population (see this PDF, p.10), females aged 20-29 have a mean height of 64.1 in. Because the adult film star sample could easily by biased by a half-inch, and because the means are close enough anyway, I won't bother running a t-test. If you really want to, feel free to post it in the comments, but it's clear that the adult film stars are not taller or shorter than the population at large. Because the females are chosen only based on how physically attractive they are, this result goes against the hypothesis that long legs are in general physically attractive to men (although some men may find them sexy). There is another, non-physical reason why tall women may be preferred as mates, which I'll post about soon. Labels: babes and hunks, height, maintenance of variation
Friday, November 23, 2007
Reading the Bhagavad Gita I am struck (as usual) by commonalities between mystical philosophies rooted in a method of psychological introspection and meditation. For example, the tendency toward monism is marked across many traditions which emerge out of specific religious or philosophical movements. This even includes the monotheistic religions of the West, whose creeds and beliefs tend to notionally reject monism and imply the separation of a personal God from his Creation. The Perennial Philosphy emergred from this empirical observation of the relatively uniform experience of mystics, and the field of Religious Studies has been influenced this idea, in particular through the work of Mircea Eliade. Eliade and his fellow travelers conceive of religious experience as a window into a sacred reality, distinct from the profane world. Obviously, I don't believe this. Rather, I am struck by the fact that very few mystics ever report that they have looked upon the 6
3 essences of the universe. Or any specific deviation from the One. Rather, mystical trance seems to blur distinctions across categories as all perception melts into a unitary underlying essence, whether you call it God or the One. In contrast to mysticism theology tends to explore a huge sample space of possibilities and configurations. Why is this? I suspect it is because theology tends to rely on explicit chains of inferences based on verbal logic, and quite often individuals may differ in their sense of what is implied by a particular proposition. In contrast, the heightened consciousness of mysticism and the sense of the One is probably reflecting underlying neurological realities. The One isn't the real nature of the universe, it is simply the common output the brain pops out when put under the ascetic stresses or mental techniques which mystics utilize to change their consciousness. I am generally skeptical of neurotheology when it claims to explain religion, but I do believe it is on its way to accurately sketching out the shape of mysticism (obviously it doesn't explain religion because I think that mysticism is simply a subset of religion, not the totality of it). Labels: Religion
Thursday, November 22, 2007
Research in the latest issue of Nature provides evidence that babies can distinguish 'helpful' from 'unhelpful' people at a very early age, before they acquire language and (presumably) before they can have learned the distinction from their own experience. The evidence comes from staged scenarios using 'nice' and 'nasty' dolls. Babies prefer the nice ones. The researchers argue that this must be an evolved adaptation for social living, which seems plausible enough. Someone should try the same experiment with chimps and other primates. To understand the evolution of morality (in my opinion) we need more good experiments and less mathematical theorising, or at least a better balance between the two.
Here is a report from today's London Times. No doubt there are others. Added: I assume, though it is not clear from the Times report, that the researchers have excluded the possibility that babies just prefer triangles to squares.
In a few recent posts I've referred to the fact that variation on the OCA2 locus can predict about 3/4 of the eye color variation in the European population. Specifically, OCA2 is probably the quasi-Mendelian locus which is the culprit behind the classical dominant/recessive pedigree inheritance patterns which geneticists have long noted. The genomic region has also been subjected to a recent selection event. Why?
One model posits that the selection is directly for blue eyes. For example, some sort of sexual selection where blue eyes are strongly preferred. There's a problem with any model which posits selection for blue eyes: at very low frequencies selection on recessive traits is weak. That is, if you have alleles responsible for blue eyes extant at a frequency of 10%, only 1% of the population will express blue eyes (assumes random mating and a tighter correlation between the alleles and the phenotype as well as perfect dominance/recessiveness, all violated, but gets the logic across). So only 1 out of 10 blue eye causing alleles can be subject to selection. A way to get around this issue is population substructure, imagine that you have small demes drifting in all directions. A deme which drifts to a high or fixed frequency of blue eyes can then allow selection to operate strongly upon the allele responsible for this trait. This also requires specific meta-population dynamics so as to prevent these high frequency demes from being swamped out by gene flow from low frequency demes. Frankly, I'm really skeptical that a continent wide Shifting Balance process can really explain the third longest haplotype in the European genome. But there's another model, a bit simpler: the gene responsible for blue eyes is being selected for a another reason. Blue eyes are simply a byproduct, and that other reason is additive in its phenotypic expression so that even single copy variants are subject to the power of selection. I would hazard to guess that the most boring explanation here would be skin color. I've offered below that OCA2 does track skin color variation, but I've been pretty vague about this. The data isn't always easy to find, so I've repackaged Table 5 from A Three-Single-Nucleotide Polymorphism Haplotype in Intron 1 of OCA2 Explains Most Human Eye-Color Variation. Please note that there is a typo in the table in the paper, they have the correct data in the text, so I went by that.
Obviously there are other genes at work in regards to skin color, there's some population substructure which is probably lurking in the data, and the association of the variants themselves with an eye color aren't perfect either. That being said, this isn't the only study which does note that OCA2 has not only localized affects, but some global affects as well. Labels: Genetics
Wednesday, November 21, 2007
Season of Birth and Dopamine Receptor Gene Associations with Impulsivity, Sensation Seeking and Reproductive Behaviors.
Labels: Genetics
Tuesday, November 20, 2007
Reading the links that come in to GNXP, I happened upon this post on what the author referrs to as "scientific racism". This bit caught my eye:
I sat on a grant review committee recently for a national-level competition for multi-million dollar grants of an agency I won't name. The review committee was quite large, probably 25 or more scholars from around the U.S. One of the grant applications that the other reviewers (mostly from the biological sciences) rated the highest was one that proposed to look at the "genetic racial differences among Blacks and whites" to different kinds of treatment for HIV/AIDS. I rated this grant proposal among the lowest I had reviewed because of the methodology: all of the participants in the study would be sorted into the supposedly self-evident categories "Black" and "white" based on self-identification. When I raised this objection among my colleagues in the biological and health sciences, they all blinked hard, and looked at me as if I'd committed some sort of unpleasant faux pas. The chair of the committee finally acquiesced that this was a methodological flaw in the proposal, but the grant was nevertheless awarded millions of dollars.Biomedical researchers are caught between a rock and a hard place here--none of them enjoy being referred to as scientific racists by their colleagues, I'm sure, but they're also interested in real phenomena. It's well-known that minorities are less likely to participate in biomedical research (though recent studies suggest this is not because they're less willing). From a geneticist's perspective, the discomfiting implication of this is that tests of a drug's efficacy and safety are done on a range of genetic backgrounds that are strongly biased towards the European mean. That is, drugs are accepted or rejected largely based on how well they perform in a sample of individuals of European descent. This is obviously a problem for the applicability of any results, and the NIH is indeed making minority inclusion a requirement for funding certain projects. Clearly, none of this would be an issue if everyone responded identically to drugs (or if the correlation between drug response and race were zero). It is, however, an issue. Now, ancestry could be related to drug response through any number of mechanisms, either directly (through genetics) or indirectly (through socioeconomic status, etc). Teasing apart those influences means looking at both of them. But then you have someone like the author, who simply dismisses the correlation altogether! And who evidently has some say in the funding of these studies! It's worth pointing out that HIV progression does indeed have a genetic component and that certain alleles like CCR5-delta32, which is strongly protective against HIV infection, show marked geographic differences in frequency. A priori, looking for genetic components to differential drug response between populations seems entirely reasonable. Lastly, her main point seems to be that genetic ancestry and self-identified race might not match up. They do. I'm well-aware that positions like the author's are made possible by people at the other end of the spectrum, who see races as the embodiment of some Platonic ideal. But rejecting idiocy certainly does not require one to embrace blindness! Related: Cancer and Race
Peter Frost states:
I suspect there is some incipient sex-linkage, i.e., European women may be somewhat likelier to have non-brown eyes and non-black hair. If this sex-linkage is mediated by prenatal estrogenization there may also be some impact on personality and temperament. But I really don’t know, and unfortunately there are still more questions than answers. I've read Peter's book, Fair Women, Dark men, and it is a great collection of data. Also, he has theorized that European color variation is a byproduct of selection selection. So I have been primed to look for a trend where women seem to express blondism or light eye color at higher frequencies. But I just haven't found anything like that. In fact, I've found data which goes in the other direction, that is, females have a higher frequency of brown eyes! But this really clinched it for me:![]() The source is this paper, Genetic determinants of hair, eye and skin pigmentation in Europeans. Note that women tend to score higher on skin sensitivity toward sun, which implies that they do have ligher skin. And as for hair color, well, perhaps there is a difference in how one judges blonde vs. brunette for males and females? I don't know. But the eye color data I've seen elsewhere and just dismissed it as small N or something like that. At this point my assumption is that there isn't really the sexual dimorphism in eye color that there is most definitely is in skin color. As for hair, I'm more open to this since it seems that it is subject to more genes, and there could be some hormonal factor as the tendency toward greater blondism in children and females is noted among Australian Aboriginals as well. Anyway, forget visual inspection. Here's the associations taking sex into account (from Table 4 of supplementary info): ![]() The authors don't want to make a judgment based on these data. But I'm not religious about 0.05 P values. And it looks like there's some action on KITLG anyhow. Labels: human biodiversity
Monday, November 19, 2007
An article in The New York Times, Are Scientists Playing God? It Depends on Your Religion, surveys attitudes toward cloning and biological engineering in general. Roughly the thesis being reported is that there is a trichotomy between post-Christian societies, traditional Christian societies and those where Eastern religions predominate. Generally I'm skeptical of these grand cultural typologies, but in this case I think there is an underlying component that explains a large part of the trend: the Roman Catholic Church has long opposed many forms of biological intervention and will no doubt oppose many forms of biological engineering which it deems unethical. Though I do not doubt the sincerity of the believers of the Roman Catholic religion in their adherence to their Church's position here, I think this is a case where the elite formulation of the clergy and intellectuals has really made a significant impact on public policy. Reading about the anti-abortion movement in the United States during early days after Roe vs. Wade it is clear that the Roman Catholics were at the forefront, and fundamentalist Protestants joined the fray quite a bit later. Similarly, when it came to eugenics laws they were quite widespread in Protestant countries, but the Catholic Church threw up a concerted and consistent resistance to them in nations where it was an institution which could affect public policy significantly.There are also specific and general problems with the typology. Consider the specific: Asia offers researchers new labs, fewer restrictions and a different view of divinity and the afterlife. In South Korea, when Hwang Woo Suk reported creating human embryonic stem cells through cloning, he did not apologize for offending religious taboos. He justified cloning by citing his Buddhist belief in recycling life through reincarnation. Hwang Woo Suk is a convert from Christianity to Buddhism. South Korea is a nation that is about 1/2 non-affiliated, 1/4 Buddhist and 1/4 Christian. Its ethical culture has been traditionally dominated by Confucianism, and there is a powerful substratum of indigenous shamanistic religion which suffuses the practices and outlooks of Christians & Buddhists alike. Christianity is gaining ground among the youth and in the educated segment of the population, and is the dominant religion in Seoul. The last two presidents of South Korea have been Roman Catholic, and that denomination is generally considered the most well educated, affluent and liberal of the religious pillars in South Korean society. South Korea also sends out the most Christian missionaries to the rest of the world aside from the United States. Christian fundamentalists in South Korea have even engaged in iconoclastic violence against Buddhist religious art and statuary. And yet South Koreans were also rather proud of their "cloning research." Then there is the biggest general issue with the typology: By contrast, in the Judeo-Christian tradition, God is the master creator who gives out new souls to each individual human being and gives humans "dominion over soul-less plants and animals. To traditional Christians who consider an embryo to be a human being with a soul, it is wrong for scientists to use cloning to create human embryos or to destroy embryos in the course of research. I think the term Judeo-Christian is stupid. In any case, not only are there very few Jews in the world, their attitude toward biological engineering tends to be pragmatic and consequentialist from what I can tell. There is one religious group which is left out the typology: Islam. About 15-20% of the world's population this seems like a large oversight. There don't seem to be many laws about cloning in the Muslim world, but take a look at abortion laws. Their objection to interventions might be less coherent or precise than those of Roman Catholics, but they seem to mirror them pretty well. The New York Times piece also points out that in the post-Christian world, such as Sweden, there is a fear of some sorts of biological changes due to a resurgence in a form of natural religion or spirituality. This shouldn't surprise; the decline of institutional Christianity in northern and eastern Europe has been met with both a rise in a scientific materialist outlook, but even more significantly an unspecified monistic theism reminiscent of pre-Christian traditions. The Left-Right convergences alluded too suggest to me that the typology is too coarse and inchoate. There is a universal "Yuck" within our species, probably rooted in our cognitive hardware. Channeling the impulses culturally can be a tricky thing. For instance, the Japanese and Israelis are far less advanced than Americans in their acceptance or practice of organ donation, generally due to religious rationales. Obviously the Japanese and Israelis don't share a common spiritual root or background. Note: I place an emphasis on the Catholic Church as an institution affecting public policy because, for example, abortion rates of Catholics in the United States are at the national average. Moral suasion can only go so far, especially when individuals are making personal utility calculations. Labels: Bioethics
PLOS early release, Discerning the ancestry of European Americans in genetic association studies:
We have analyzed four different genome-wide data sets involving European American samples, and demonstrated that the same two major axes of variation are consistently present in each data set. The first major axis roughly corresponds to a geographic axis of northwest-southeast European ancestry, with Ashkenazi Jewish samples tending to cluster with southeastern European ancestry; the second major axis largely distinguishes Ashkenazi Jewish ancestry from southeastern European ancestry. The whole thing is free. Nothing too surprising, just pushing the decimal places further to the right, which is always a good thing when considering something which has medical applications. Labels: Genetics
Part 1 of these notes discussed the general meaning and use of the concepts of correlation and regression. The notes are intended to provide background for other posts I am planning, but if they are of any use as a general introduction to the subject, so much the better. Part 2 discusses some problems of application and interpretation, such as circumstances that may increase or reduce correlation coefficients. I emphasise that these notes are not aimed at expert statisticians, but at the (possibly mythical) 'intelligent general reader'. I hope however that even statisticians may find a few points of interest to comment on, for example on the subjects of linearity, and the relative usefulness of correlation and regression techniques. Please politely point out any errors. Apart from questions of interpretation, this Part contains proofs of some of the key theorems of the subject, such as the fact that a correlation coefficient cannot be greater than 1 or less than -1. There is nothing new in these proofs, but I did promise to give them, and personally I find it frustrating when an author just says 'it can be proved that...' without giving a clue how it can be proved. Readers who already know proofs of the main theorems, or are prepared to take them on trust, may prefer to go straight to the section headed 'Changes of Scale'. Like Part 1, this Part does not deal with questions of sampling error. Except for a few passing comments, this Part deals only with bivariate correlation and regression. I am aware that some issues, such as linearity, arise equally (if not more seriously) in the multivariate case. Part 3, if and when I get round to it, will deal with the basics of multivariate correlation and regression. Notation These notes avoid using special mathematical symbols, because Greek letters, subscripts, etc, may not be readable in some browsers, or even if they are readable may not be printable. The notation used will be the same as in Part 1, with the following modifications. In Part 1, the correlation between x and y was denoted by r_xy, the covariance between x and y by cov_xy, the regression coefficient of x on y by b_xy, and the regression coefficient of y on x by b_yx. Since this Part deals only with the correlation of two variables, there will be no ambiguity if the correlation between x and y is denoted simply by r, and their covariance simply by cov. It is necessary to distinguish between the regression of x on y and the regression of y on x, and the coefficients will be denoted by bxy and byx respectively, without the subscript dashes used in Part 1 . These expressions could admittedly be confused with 'b times x times y', but I will avoid using the sequences bxy or byx in this sense. As pointed out in Part 1, for theoretical purposes it is often convenient to assume that variables are expressed as deviations from the mean of the raw values. In this Part the variables x and y will stand for deviation values unless otherwise stated. As previously, S stands for 'sum of', s stands for 'standard deviation of', ^2 stands for 'squared', and # stands for 'square root of'. The derivation of the coefficients As noted in Part 1, the Pearson regression of x on y is given by the coefficient Sxy/Sy^2, where x and y are deviation values. This is the formula which minimises the sum of the squares of the 'errors of estimate', in accordance with the Method of Least Squares. As it is the most fundamental theorem of the subject, it is worth giving a proof, using elementary calculus. (The result can be obtained without explicitly using calculus, but the explanation is then rather longer.) We want to find a linear equation, of the form x = a + by, such that the sum of the squares of the errors of estimate, S(x - a - by)^2, is minimised. Provided the x and y values are expressed as deviations from their means, the constant a must be zero. (If we use raw values instead of deviation values, a non-zero constant will usually be required.) The sum of squares S(x - a - by)^2 can be expanded as Sx^2 + Na^2 + b^2(Sy^2) - 2bSxy - 2aSx + 2abSy. But the last two terms vanish, as with deviation values Sx and Sy are both zero. This leaves Na^2 as the only term involving a, and Na^2 has its lowest value (for real values of a) when a = 0. At its minimum value the expression S(x - a - by)^2 therefore reduces to S(x - by)^2. It remains to find the value of the coefficient b for which S(x - by)^2 is minimised. This expression may be regarded as a function of b, which may be expanded as: f(b) = Sx^2 + b^2(Sy^2) - 2bSxy where Sx^2, Sy^2, and Sxy are quantities determined by the data. Applying the standard techniques of differentiation, the first derivative of f(b), differentiated with respect to b, is 2bSy^2 - 2Sxy. According to the principles of elementary calculus, if the function has a minimum value, its rate of change (first derivative) at that value will be zero, so to find the minimum (if there is one) we can set the condition 2bSy^2 - 2Sxy = 0. Solving this equation for b, we get b = Sxy/Sy^2 as a unique solution. In principle, this could be a maximum or a stationary point rather than a minimum, but it can be confirmed that for values of b either higher or lower than Sxy/Sy^2 the function f(b) has a higher value. Therefore b = Sxy/Sy^2 gives a unique minimum value for the sum of squares, and may be designated as bxy, the required coefficient of the regression of x on y. The best estimate of x, for a given value of y, is x = (bxy)y. By similar reasoning we can derive Sxy/Sx^2 as the coefficient of the regression of y on x. The correlation coefficient r can then be derived as the mean proportional between the two regression coefficients, or in the Galtonian manner by 'rescaling' the x and y values by dividing them by sx and sy respectively, giving r = Sxy/Nsx.sy. These formulae use deviation values of x and y. If we prefer to use raw values, the appropriate formulae can be obtained by substitution. Using x and y now to designate raw values, the deviation value of x equals x - M_x, where M_x is the mean of the raw values. Similarly the deviation value of y equals y - M_y. Substituting these expressions for the deviation values of x and y in the above equation x = (bxy)y, we get the formula for raw values x = (bxy)y + M_x - (bxy)M_y. By the same methods we get y = (byx)x + M_y - (byx)M_x. These equations can be represented graphically by straight lines intercepting the axes at points determined by the constants [M_x - (bxy)M_y] and [M_y - (byx)M_x], and with slopes determined by the coefficients bxy and byx. The range of coefficients For any positive value of r, expressed in the form Sxy/Nsx.sy, the regression coefficients could range from 0 to infinity, since there is no upper or lower limit on the ratios sx/sy and sy/sx. Similarly, for any negative value of r, the regression coefficients could range from 0 to minus infinity. Unless sx and sy are equal (in which case regression and correlation coincide), one regression coefficient must always be greater and the other less than r. If the regression coefficients are reciprocal to each other (e.g. 2/3 and 3/2), the correlation will be perfect (1 or -1) and there will be a single regression line. Unlike the regression coefficients, the correlation coefficient r can only range from 1 to - 1. Introductory textbooks often state this without proof, but it is a simple corollary of another fundamental theorem on correlation. Unless the correlation is perfect (1 or -1), there will be a certain scatter of the observed values of x around the value estimated by the regression of x on y. The coefficient of regression of x on y is Sxy/Sy^2 or r(sx/sy). The estimated values of x for the corresponding values of y are therefore r(sx/sy)y, and the errors of estimate (i.e. the differences between the actual values and the estimated values) will have the form [x - r(sx/sy)y]. But these errors will themselves have a variance, which we may call Ve = [S[x - r(sx/sy)y]^2]/N. [Added: This assumes that the mean value of the errors is zero. Using deviation values of x and y this quite easy to prove, as the mean of the errors is S[x - r(sx/sy)y]/N = (Sx - r(sx/sy)Sy)/N = (0 - 0)/N.] With a little manipulation it can be shown that [S[x - r(sx/sy)y]^2]/N equals (1 - r^2)Vx. [See Note 1.] So we reach the important result that the variance of the errors of estimate of x, as estimated from the regression of x on y, is (1- r^2) times the full variance of x. In other words, the variance of the observed x values around the estimated values is reduced by the proportion r^2 (the square of the correlation coefficient) as compared with the full variance of the x values. It is therefore often said that the correlation of x with y explains or accounts for r^2 of the variance of x. Similarly, it accounts for r^2 of the variance of y. To mark the importance of r^2 it is often known as the coefficient of determination. Since r is a fraction (unless it is 1 or -1), r^2 is smaller than r. The amount of variance explained by r declines more and more rapidly as r itself declines, and a correlation of less than (say) .3 explains very little of the variance. The term 'explained' is to be understood purely in the sense just described, and does not necessarily imply a causal explanation. The estimated values of x themselves have a variance equal to [S[(bxy)y]^2]/N = [S[r(sx/sy)y]^2]/N = [(Sy^2.r^2)Vx/Vy]/N, which can be simplified to (r^2)Vx. Therefore Vx, the total observed variance of x, can be broken down into two additive components, (r^2)Vx + (1 - r^2)Vx, representing the variance of the estimates themselves and the residual variance not accounted for by the correlation. | ||||||||||||||||||||