Tuesday, May 06, 2008

Pleiotropy in melanocortin receptors   posted by p-ter @ 5/06/2008 09:47:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

In the comments here, rosko points me to a study on the effects on MC4R, a gene implicated in natural variation in human weight, on pathways involved in sexual function. It's well known, of course, that genetic pathways can be involved in multiple physiological processes--in particular, signaling pathway can generate many different phenotypes depending on what the downstream target of the signal is.

The effects of MC4R simulation in humans are, as rosko comments, kind of interesting:
Methods. Ten subjects were enrolled in a double-blind, placebo-controlled, crossover study. Melanotan II (0.025 mg/kg) and vehicle were each administered twice by subcutaneous injection; real-time RigiScan monitoring and a visual analog were used to quantify the erections during a 6-hour period. The level of sexual desire and side effects were recorded with a questionnaire.

Results. Melanotan II initiated subjectively reported erections in 12 of 19 injections versus only 1 of 21 doses of placebo. The mean rigidity score of the responders was 6.9 on a scale of 0 to 10. The mean duration of tip rigidity greater than 80% was 45.3 minutes with Melanotan II versus 1.9 for placebo (P = 0.047). The level of sexual desire after injection was significantly higher after Melanotan II administration than after placebo. Nausea and stretching/yawning occurred more frequently with Melanotan II, and 4 of 19 injections were associated with severe nausea.
I wondered what a "Rigiscan" is--find out here. Hypothetically, one could test whether natural variation in sexual behavior in humans is also affected by MC4R polymorphism, though I can't imagine that being a particularly fun study to carry out (one for agnostic's new series? 23andme + free time = association studies about erections).

This reminds of the MC1R story about increased pain sensitivity in redheads in the vague sense that both involve melanocortin receptors and pleiotropy.

Labels:


Sunday, May 04, 2008

Weight and genetics   posted by p-ter @ 5/04/2008 08:20:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Two studies report this week on the association of variation near MC4R with body mass. This is the second convincingly replicated locus to be implicated in natural variation in weight, the first being FTO. There are a couple reasons I find this association interesting.

1. Coding mutations in MC4R are known to cause severe obesity. It's to be expected that less severe mutations (the region of the genome implicated in these studies is likely regulatory) could lead to more subtle effects on body weight, but it didn't have to be that way. And this forms part of a pattern that genes that cause Mendelian forms of a disease are also associated with more common forms as well. Why is this interesting? It suggests that the candidate gene approach to finding allele associated with disease wasn't as flawed as people thought--it's just that they were all severely underpowered (the number of individuals in these studies, for example, tops 60,000).

2. One of the studies performed their association study in individuals of Indian descent. This is one of the first GWA studies to focus on a non-European population--a development that will hopefully continue. Insofar as allele frequencies vary among populations, studies of the same phenotype in different populations may get quite different results (note that studies of skin pigmentation in Europeans don't identify SLC24A5, but studies in South Asians do--the reason is that the relevant variant in the gene is fixed in Europe but at moderate frequency in India). Population genetics has always had a role in the rational choice of study population for association studies, but as all the low-hanging fruit gets taken, this role will perhaps become more pronounced.

Labels:


Tuesday, April 29, 2008

Doping & genetic background?   posted by Razib @ 4/29/2008 09:13:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Some Athletes' Genes Help Outwit Doping Test :
The 55 men in a drug doping study in Sweden were normal and healthy. And all agreed, for the sake of science, to be injected with testosterone and then undergo the standard urine test to screen for doping with the hormone.

The results were unambiguous: the test worked for most of the men, showing that they had taken the drug. But 17 of the men tested negative. Their urine seemed fine, with no excess testosterone even though the men clearly had taken the drug.

It was, researchers say, a striking demonstration of a genetic discovery. Those 17 men can build muscles with testosterone, they respond normally to the hormone, but they are missing both copies of a gene used to convert the testosterone into a form that dissolves in urine. The result is that they may be able to take testosterone with impunity.

The gene deletion is especially common in Asian men, notes Jenny Jakobsson Schulze, a molecular geneticist at the Karolinska University Hospital in Stockholm. Dr. Schulze is the first author of the testosterone study, published recently in The Journal of Clinical Endocrinology and Metabolism.


The whole "Asian" angle wouldn't be as important from where I stand if China wasn't intent on becoming an athletic superpower. Specifically, from Doping Test Results Dependent on Genotype of UGT2B17, the Major Enzyme for Testosterone Glucuronidation:
We demonstrated that a deletion polymorphism in the gene coding for UGT2B17...is strongly associated with TG levels in urine...All subjects devoid of the gene had a T/E ratio below 0.4...This polymorphism was considerably more common in a Korean Asian than in a Swedish Caucasian population, with 66.7 and 9.3 % deletion/deletion (del/del) homozygotes respectively.


They don't seem to know what SNP is causing this. If you are curious, you can check out the linkage disequilibrium around UGT2B17.

Labels:


Saturday, April 26, 2008

Gene Genie #30   posted by Razib @ 4/26/2008 11:26:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Over at my other weblog.

Labels:


Thursday, April 24, 2008

The genetics of adaptation in Arabidopsis   posted by p-ter @ 4/24/2008 09:34:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

One of the "debates" currently occupying evolutionary biology is whether evolution occurs primarily via changes in protein-coding sequence or via changes in gene regulation (apparently it's become so heated that battles between the two camps are now fought through t-shirts).

As understanding of the genetics of adaptation advances, this debate will likely fade away--a priori, it's easy to make the case for either, and well-studied individual examples are showing that, as one might expect, evolution isn't particularly dogmatic about the sources of variation it works with.

It's these case studies that are most interesting--take, for example, a recent study on the adaptation of Arabidopsis halleri to the heavy-metal-polluted soils it now occupies. This is quite a nice example of evolution via gene regulation--the authors map the ability to tolerate heavy metals to a particular candidate gene, then identify both a change in copy number as well as changes in the promoter region of the gene that lead to high levels of expression. To complete the story, they then pop this highly expressing version of the gene back into A. thaliana (the model organism) and show they're able to recreate the crucial aspects of the adaptation in that species.

Labels: ,


Thursday, April 17, 2008

MCPH1 & cranial volume in Chinese   posted by Razib @ 4/17/2008 01:41:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

A common SNP of MCPH1 is associated with cranial volume variation in Chinese population:
Microcephaly (MCPH) genes are informative in understanding the genetics and evolution of human brain volume. MCPH1 and abnormal spindle-like MCPH associated (ASPM) are the two known MCPH causing genes that were suggested undergone recent positive selection in human populations. However, previous studies focusing only on the two tag single nucleotide polymorphisms(SNPs) of MCPH1 and ASPM failed to detect any correlation between gene polymorphisms and variations of brain volume and cognitive abilities. We conducted an association study on eight common SNPs of MCPH1 and ASPM in a Chinese population of 867 unrelated individuals. We demonstrate that a non-synonymous SNP (rs1057090, V761A in BRCA1 C-terminus (BRCT) domain) of MCPH1 other than the two known tag SNPs is significantly associated with cranial volume in Chinese males. The haplotype analysis confirmed the association of rs1057090 with cranial volume, and the homozygote males containing the derived alleles of rs1057090 have larger cranial volumes compared with those containing the ancestral alleles. No recent selection signal can be detected on this SNP, suggesting that the brain volume variation in human populations is likely neutral or under very weak selection in recent human history.

They used EHH & iHS. Also, they suggest that the derived form of rs1057090 is very ancient (the SNP has a very small window of linkage disequilibrium around it).

Related:
This is Bruce Lahn's brain on ASPM and MCPH1, Did Modern Humans Get a Brain Gene from Neandertals?, Microcephalin & ASPM and Selection "controversy".

Labels:


Tuesday, April 15, 2008

European population substructure...again   posted by Razib @ 4/15/2008 04:25:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

The discussion continues in regards to the relationship of various West Eurasian and North African groups (i.e., Europeans, North Africans and Near Easterners). There have been several papers published within the last few years which shed some light on these questions. We've blogged them before, and I don't think that they radically alter what you might find in History and Geography of Human Genes, but I thought I'd point to them again, with a special focus on figures of note.

European Population Substructure: Clustering of Northern and Southern Populations. Figure 4 B:


Analysis and Application of European Genetic Substructure Using 300 K SNP Information, Figure 1 B & D:


Discerning the Ancestry of European Americans in Genetic Association Studies, Figure 3 A:


Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping, Figure 9:


Since these papers are all Open Access there's really no excuse to not read them (at least the "Discussion" sections). I hope people won't go around looking for charts to "prove" whatever pet hypothesis they want to promote, the population-level classifications we generate often have only an approximate relationship to the multi-dimensional shape of human genetic variation at the finer-grained level. Note that some of these principal component charts really don't have that many individuals typed, and you may wonder about the representativeness of the samples of their putative national populations. Though these are important points, I do think we need to be cautious about our expectations in regards to the sort information we're going to extract on the margins as the N increases and the individuals typed come from every region of a nation. I suspect we'll get more oddities like the Etruscans as isolated or peculiar populations are included in these samples, and the exceptions to the broad patterns tell us a lot about the details of human history. But, I doubt we'll overturn the general shape of the relationships and clinal gradients we see here.

Addendum: I somewhat played down the future surprises that these sorts of fine grained analyses might have for us...but I do want to note that the studies will continue. That's because they aren't done for the purposes of elucidating human genetic history as such, rather, the primary rationale is to highlight substructure which might be relevant when attempting to ascertain disease relevant alleles. In the medical context then there may be significant returns on the investment here which I don't want to underestimate. If, for example, a particular drug's efficacy within the African American population in the United States is directly proportional to the makeup of one's ancestry then identifying ancestry-informative markers is very useful.

Update: Measuring European Population Stratification with Microarray Genotype Data, Figure 1 A:

Labels:


Monday, April 14, 2008

Europeans, Jews and Middle Easterners   posted by Razib @ 4/14/2008 10:26:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Greg's post about SNPs, Jews and evolutionary genetic parameters has been getting a lot of play around the blogs & forums. Most of it seems to be due to the persistent interest in the genetic relationship of Ashkenazi Jews to other European populations. This makes sense, since the 19th century the question of how the "Jewish race" relates to European gentiles has had some sociopolitical relevance.... But a commenter at Steve's blog pointed out that Bauchet et al. from last year had a PC chart which included Armenians, who are I think a good proxy for northern Middle Eastern populations in general. One interesting result from surveys of Y chromosomal lineages is the finding that Jews may have more affinities with northern Levantine & Anatolian Middle Eastern populations than with southern Levantine and Arabian ones. The non-trivial female mediated input of Sub-Saharan ancestry into many Arab populations since the rise of Islam is far less evident in non-Arab Muslim populations (Kurds, Persians and Turks) as well as Middle Eastern Jews, and obviously Ashkenazi ones. But another point is that recent work suggests that the impact of historical events (e.g., the Arab conquest) might have been more demographically significant than we had previously assumed, and so Jewish affinity with northern Middle Eastern populations may reflect that these groups have been less affected by exogenous genetic inputs within the last 2,000 years.

Caution about the sample sizes of course (though I assume within the next year we'll have much better data to go off of), but something to include into your list of priors when making phylogenetic background assumptions.

Note: I added geographic labels to the PC chart for clarification.

Update: Steve has another post up:

On the first two axes, Ashkenazi Jews are rather close to "Europeans" and "Russians." They are similar to Yemenites (from Southern Arabian peninsula) on the first axis, but not on the second. And they are similar to Samaritans (who currently subsist on two hilltops in Israel), good, bad or indifferent, on the second axis but not on the first. They are fairly similar to the Druze (of Lebanon and Israel) on the first two axes, but not on the third.

...

So, Ashkenazis look pretty European on this chart compared to a few Middle Eastern groups. But, as the recent graph showed, genetics has progressed to the point where Ashkenazis (at least those with four Ashkenazi grandparents) can now be reliably distinguished from other Europeans.


The Samaritans are cousins of the Jews. But:
In the past, the Samaritans are believed to have numbered several hundred thousand, but persecution and assimilation have reduced their numbers drastically. In 1919, an illustrated National Geographic report on the community stated that their numbers were less than 150.


Like the Kalash or Sardinians the Samaritans are going to be weird outliers because of their demographic history. Inbreeding and no gene flow in for that long will do that to you (many people in the Middle East are descended from Samaritans of course, but very few Samaritans are descended from non-Samaritans).

The Yemenites are also a peculiar comparison point because they are geographic outliers in relation to other Middle Eastern populations with a long and distinct history. They have a large proportion of Sub-Saharan ancestry for an Arab group. An interesting historical note is that during the Islamic expansion Yemenite tribes were prominent in Iraq and Egypt, though I doubt they left a very strong genetic imprint in these regions.

The Druze are a better point of comparison, being a more mainstream Middle Eastern group. But that's only relative to the Samaritans, who are at an advanced stage of pedigree collapse, or the Yemenites, who are on the geographic margins of the Near East (it is easy to argue that before Islam Yemen was more a part of the trans-Indian Ocean world than it was of the Near East). The Druze are an esoteric ethno-religious group which as been resident in the mountains of Lebanon. who have not accepted converts since 1031, so again you have a recipe for some genetic distinctiveness developing because of social norms.

All that being said...perhaps as we explore the genetics of the Middle East further we'll find that most groups exhibit these sorts of inbred tendencies because of the prevalence of consanguinity?

Addendum: Modest levels of gene flow are very good at equilibrating and mitigating the build up of variation between groups. Islands, like Sardinia, often develop unique genetic profiles because water seems to be a powerful barrier to marriage connections. The Samaritans & Kalash have not had any gene flow in for a very long time, in both cases in part because of being embedded among Muslims who do not generally tolerate conversion to other religions, and in the case of the Kalash their geographic isolation. Some of the same issues apply to the Druze, though I suspect much more modestly (in part because Druze isolation is more recent).

Labels:


Saturday, April 12, 2008

SNPs don't lie   posted by gcochran @ 4/12/2008 10:25:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

There was an interesting paper in BMC Genetics back in in February: "Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping. " They ran 500K Affy chips on 100 Ashkenazi women and on 60 CEPH-derived HapMap (CEU) individuals. They hoped to find greater levels of linkage disequilibrium and lower haplotype complexity among the Ashkenazim, as a putatively bottlenecked population. This would simply some forms of genetic mapping. Some earlier work had suggested that this might be the case - but that earlier work had either looked at a single chromosome or at a small samples from a number of chromosomes.

The expected pattern is not there. Average LD is very similar in the two populations, although it varies from chromosome to chromosome. It's slightly smaller among the Ashkenazi at short distances, slighter greater for longer distances, but overall very similar, as you can see.

There were somewhat _more_ haplotype blocks among the Ashkenazi sample, not fewer.
You would expect a bottlenecked population to have more monomorphic sites, but the Ashkenazi sample had noticeably fewer, 9.1 % versus 12.4 %.

Altogether, the paper concludes that "These data are more consistent with the AJ as an older, larger population than CEU. " Which means that there is no sign of any bottleneck in this data. The paper, obviously written by several people, _refers_ to several bottlenecks that have been discussed in earlier studies, but this measurement set contains thousands of times more data than those earlier studies. If there had been a bottleneck, they would have seen it, and if they don't see it, there must not have been one.

They see very significant gene frequency differences in a couple of fair-sized regions: LCT and and HLA. Those differences were of course generated by selection. There are differences in smaller regions at a number of other positions, and long homozygous regions in the Ashkenazi sample average about 20% longer - so at least some of their long haplotypes are younger.

Fact: we find long haplotypes around the mutations causing common Ashkenazi diseases, on the order of one to ten Mb.

Bottlenecks affect the whole genome, but selection only affects a small fraction. Selection would not change genome-wide LD much, would not much increase the number of monomorphic sites, but it could generate long haplotypes around selected mutations.

The authors think that these differences "reflect the impact of both selection as well as genetic drift." - but there is, as far as I can tell, no evidence of drift in this data at all. Perhaps I'm missing something.

This SNP study (and others) also shows that Ashkenazim are genetically distinct from other Europeans, which allows fairly accurate identification of group membership. Almost perfectly distinct, if you look at Ashkenazim whose grandparents are all Ashkenazi (the violet dots). Obviously, there was low inward gene flow for a long time, but that has increased a lot in the last century. Distinct local selection pressures could have caused noticeable change when gene flow was that low.

Check out this figure, from a recent paper in PLOS Genetics ( Tian et al, Analysis and Application of European Genetic Substructure Using 300 K SNP Information):

Heny Harpending and I came to these same conclusions several years ago, using a far smaller data set: the evidence indicated low gene flow that would allow local selection, and we found no evidence for - indeed, solid evidence against - the kind of bottleneck that would explain the observed spectrum of genetic disease among the Ashkenazim. Which leaves selection as the only explanation - but selection for what?


Labels:


Friday, April 11, 2008

Notes on Sewall Wright: the Measurement of Kinship   posted by DavidB @ 4/11/2008 03:27:00 AM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Most people with an interest in genetics will be aware that Sewall Wright made major contributions to the theory of kinship or relatedness. Fewer people will have any direct knowledge of his work on the subject, and those who do consult his writings may find them difficult. The present note is intended to help those who want to tackle Wright at first hand. See also this evaluation by the geneticist W. G. Hill.


Most of Wright's key ideas on the subject were first presented in a 5-part paper on 'Systems of Mating' (SM) in 1921. All 5 parts can be found on the internet with a little searching. SM1, which is the most fundamental, is here, and SM5, which contains a relatively un-technical summary, is here.


Rather than go straight to Wright's own approach, I will begin by comparing and contrasting it with that of the French geneticist Gustave Malecot, based on the concept of Identity by Descent. Malecot first introduced his methods around 1940, and since then they have supplanted Wright's approach, to the extent that Wright's own methods have been almost forgotten. What is presented in textbooks as due to Wright is often in reality due to Malecot. The two approaches do have some similarities, and in simple cases they lead to the same quantitative results, but there are also some important differences.



Malecot and Identity by Descent

In Malecot's system two genes at the same locus, in the same or different individuals, are defined as Identical by Descent (IBD) if they are both descended from the very same individual ancestral gene, without either of them undergoing mutation in the interim. The relatedness between two individuals can be measured, roughly speaking, by calculating the probability that two genes at the same locus in the two individuals are IBD. To do this it is necessary first to identify all the distinct paths of descent connecting the two individuals through a common ancestor, and then to calculate the probability that the same gene will have descended to both individuals from that ancestor along any given path. Since all such paths of descent are mutually exclusive (though portions of them may overlap), the resulting probabilities can be added together to give the total probability that a given gene in the two individuals is IBD. To take a simple case, consider two individuals (full siblings) who have both parents in common. I assume that the parents are not related to each other or inbred. If we select a (diploid autosomal) gene at random from one sibling, there is a probability of one-half that it comes from the mother, and, if it does, a probability of one-half that the same gene has descended from the mother to the other sibling. This gives a compound probability of one-quarter that the second sibling has received a gene from the mother that is IBD to the selected gene in the first sibling. There is likewise a probability of one-quarter that the second sibling has received an IBD copy from the father. The total probability is therefore one-half, which is often called the Coefficient of Relationship or Relatedness between full siblings. If the parents are themselves related or inbred (i.e. descended from one of their own ancestors by more than one possible path), additional paths of descent need to be taken into account. Since there are two genes at the relevant locus in the second sibling, there is a probability of one-quarter (one-half times one-half) that a particular one of these genes, chosen at random, is IBD to the selected gene in the first sibling. This is usually known as their Coefficient of Kinship. If a male and female with a non-zero Coefficient of Kinship mate together, there is a non-zero probability that any offspring will inherit two genes that are IBD to each other. This is usually known as the offspring's Coefficient of Inbreeding, and a little consideration shows that it is equal to the Coefficient of Kinship of the parents.

A point left vague in some accounts is how far back the paths of ancestry can or should be traced. There would be little point in tracing them back so far that the gene would probably have mutated along the way to one or both descendants, but with a mutation rate of only about 1 in 100,000 per generation this is not a major constraint. In practice, ancestry is seldom traced back beyond five or six generations, as the probability of Identity by Descent along any given path going beyond than this is very small (less than 1 in 1,000), and the aggregate probability along all such paths will usually be much the same for all individuals in the same population.

Wright and the Correlation between Relatives

None of this is directly due to Sewall Wright. He does uses path diagrams similar to those of Malecot (who was inspired by Wright's work), but the quantities measured along the paths are not probabilities of Identity by Descent but path coefficients. As discussed in my note on Wright's method of Path Analysis, the correlation between two variables can be derived from the path coefficients along the paths connecting them. The measures of relationship between two individuals in Wright's system are always in principle correlation coefficients. In simple cases (no inbreeding, no dominance, no assortative mating, and so on) they are quantitatively the same as Malecot's measures, but in principle they are quite different. Three important differences should be emphasised:

a) like all correlation coefficients, Wright's measures of relationship are valid only relative to a specified statistical population. The coefficient of relationship between two individuals may well vary according to the specified population; e.g. it may be different if the specified population is an ethnic group to which the individuals belong as compared with a population comprising several ethnic groups.

b) unlike probabilities, which are always positive, a correlation coefficient can be either positive or negative. In fact, although Wright seldom discusses negative relationships, within any specified population they are in principle as common as positive relationships.

c) relative to any specified population, the correlation between two randomly selected individuals from that population is zero (apart from sampling error). This point has sometimes been overlooked, for example in discussions of Hamilton's Rule. The 'r' in Hamilton's Rule should be a regression coefficient rather than a correlation coefficient (as Hamilton realised around 1970 - see Narrow Roads of Gene Land, vol. 1, p.179), but the same principle applies: the regression of one randomly selected individual on another randomly selected individual, relative to the population from which they are randomly selected, is approximately zero. Hamilton's Rule therefore predicts that altruistic behaviour will not be directed randomly towards all members of the relevant population, though it may be difficult to decide which population is 'relevant' for the purpose.

I emphasise these points partly because Wright himself does not. They are implicit in the use of correlation coefficients, but Wright seldom explicitly mentions them. An exception is in SM5, where Wright points out that the correlation between relatives within an inbred line will be small although relative to the wider population it is large. Some more general statements are made in Wright's late work on Evolution and the Genetics of Populations (EGP). In volume 2 of that work (1969) he says that 'In a panmictic [randomly mating] population, there is no correlation between homologous genes of uniting gametes relative to the gene frequencies in the whole population. On splitting up into small lines which breed within themselves, a correlation between uniting gametes is to be expected.... The relativity referred to above has sometimes been overlooked or misinterpreted. A correlation coefficient is, of course, always relative. It is a property of the population as well as the two variables....' (pp.175-77.) Wright goes on to discuss Malecot's method of Identity by Descent. He accepts that it is a useful technique and often leads to the same results as his own, but argues that his own approach is more general and in particular that his own concept of relationship allows for negative values.

Wright is often vague about the population in which the correlations are to be measured, leaving this to be inferred from the context. Sometimes the relevant population is the entire generation to which the correlated individuals belong, sometimes it is a defined sub-population, but sometimes it seems to be a 'foundation stock' from which they are descended. This is problematic, as it seems to require a correlation between individuals relative to the means and standard deviations in a population to which they do not themselves belong. I will discuss this further in dealing with Wright's work on inbreeding and genetic diversity.

Correlations between notional values

Wright was not the first person to work on the correlation between relatives. Unknown to Wright, R. A. Fisher had already treated the subject at length, by different methods, in 1918. In fact, the subject goes back at least to 1904, when Karl Pearson considered the correlations to be expected on the hypothesis of Mendelian dominant inheritance. He found that (on certain simplified assumptions) the correlation between parent and offspring would be only one-third, rather than the correlation of about one-half usually found in empirical data on human traits. Pearson considered this a serious objection to the generality of the Mendelian theory. One of the aims of Fisher's 1918 paper was to show that, when complications such as assortative mating were taken into account, the data were consistent with widespread Mendelian dominance.

The idea of a correlation between relatives is intelligible enough when the correlation involves continuous phenotypic traits such as height, but it is more obscure when the traits are purely qualitative, or when the correlation is not between phenotypes but between gametes or genotypes. If there are varying types of gametes or genotypes (e.g. different alleles at a locus) in the population, they may be said to be positively associated if the same types tend to occur together, more often than would be expected by chance, in the same individual or in certain pairs of individuals. There are several useful measures of the 'association' of qualitative variables (see any edition of G. U. Yule's Introduction to the Theory of Statistics). However, Wright (like his predecessors) preferred to use the Pearson product-moment correlation coefficient. To obtain a Pearson correlation coefficient in the case of purely qualitative variables, such as differences between alleles, it is necessary to give the correlated items notional algebraic or numerical values. Since these are to some extent arbitrary, it might be feared that this would introduce an arbitrary element into the results, but in the cases of interest the arbitrary values cancel out and leave the correlation coefficient itself unaffected.

The procedure can be illustrated by the problem of dominance, which is treated by Wright in SM1, page 117-8. If we assign the homozygotes AA and aa the arbitrary values 1 and 0 respectively, in the case of complete dominance of A, the heterozygote Aa will have the value 1, while in the case of zero dominance it will have the value 1/2. Each individual in the population will therefore have a pair of numerical values, under the assumptions of dominance and non-dominance respectively. For homozygotes the two values will be the same but for heterozygotes they will be different. If the frequencies of the various genotypes in the population are specified, the means and standard deviations of the numerical values can be calculated, and the covariance and the correlation coefficient between the pairs of values can then be derived in the usual way. The correlation coefficient will be unaffected if one or both variables are systematically multiplied by or added to a constant (see Notes on Correlation, Part 2). But this entails that we would get the same correlation if we chose any other set of arbitrary values as alternatives to 0 and 1, provided the value of the heterozygote in the absence of dominance is half-way between that of the homozygotes. We can therefore obtain a quite general result for the correlation between the values of genotypes with and without dominance. (Of course, correlations could be calculated in a similar way on different assumptions about dominance, e.g. for partial dominance.) It can be shown by this method that Wright's results at the bottom of page 117 are correct, though I do not see how Wright derived his particular formulae, which are far from obvious. [As I have mentioned elsewhere, the equation p = root-uv appears to be a printing error or slip of the pen, as under Hardy-Weinberg equilibrium it should be p = 2root-uv. In fact, I now find that this error was listed in the printed Corrigenda to the relevant volume of Genetics but has not been corrected in the pdf copy.]

Systems of Mating I
I will conclude this note with some comments on Wright's most important paper on the subject: the first in the series on Systems of Mating (SM1).

Here Wright uses his method of path analysis to derive the correlation between relatives. In principle the ultimate result is a correlation between phenotypes, which should take account of all environmental and genetic influences, including dominance, epistasis, assortative mating, and shared environment (if any).

While the method of path analysis has some advantages for this purpose, which Wright emphasised, it also has some disadvantages. The variability among individuals is partly due to the chance effects of genetic recombination and segregation. It is therefore necessary for the path diagrams to contain an independent variable designated as 'chance' (see the diagram in SM1, p.116), which may be formally justified but still looks odd. More importantly, the method of path analysis assumes that the effects of causal influences can be simply added together. In genetics this is not always the case, as the effects of epistasis and dominance are not purely additive. Wright therefore excludes epistasis from his model 'for the present' (p.117). He does attempt to incorporate an adjustment for the effects of dominance, but this is not entirely successful. For the time being I will assume that the method is confined to the additive effects of genes.

It is not always clear what is the relevant population for the purposes of the correlations, especially as more than one generation of individuals are often involved in the correlations. Wright seems to assume (see the beginning of SM4) that in the absence of selection the proportions of different alleles in the total population will be constant, but in a finite population this cannot be strictly true, as there will be fluctuation due to genetic drift. Perhaps Wright is assuming for the purpose that the population can be regarded as indefinitely large. In this case it is legitimate to assume that gene frequencies in the absence of selection are constant. More seriously, it is not clear whether the intended reference population is the current population of each generation, the 'foundation stock' from which they are descended, or some combination of the two. Wright's reference to 'random mating' at the top of page 119 of SM1 would not make much sense if the intended reference population is the current one (of the parents), since f' would then always be zero.

Each path of descent is built up from the links between parent and offspring, so this relationship is especially important. In Wright's analysis (page 118-20) the direct relationship between parent and offspring can be analysed as a path with the following steps: parent's phenotype - parent's genotype - gamete (egg or sperm) - offspring's genotype - offspring's phenotype. (If the offspring's two parents have a non-zero correlation, an indirect path via the other parent also needs to be considered.) The path coefficients along the direct path from parent to offspring can be represented in the form hbah, where h represents the correlations between the phenotypes and genotypes of the parent and offspring (which may be different). The correlation coefficient can be considered a measure of broad heritability, that is, the extent to which the individual's phenotype is determined by the genotype. Its square, h^2, measures the proportion of phenotypic variance accounted for by genetic variance. This is historically the origin of the familiar use of h^2 to represent heritability. It should however be noted that Wright's usage is not quite the same as the modern one. In modern usage h^2 usually stands for narrow or additive heritability, measured by the extent to which the offspring predictably resemble the parents. Wright's h^2 is closer to the modern concept of broad heritability, as it measures the extent to which the phenotype of an individual is determined by its genotype. The key equation (p.116) is h^2 + d^2 + e^2 = 1, where h stands for all aspects of genetic heredity, and e and d stand for predictable effects of the environment and random fluctuations in development.

The coefficients a and b are the path coefficients representing, respectively, the contribution of the gamete (egg or sperm) to the variance in the genotype of the offspring, and the contribution of the parental genotype to the variance in the gametes. As none of these entities have a measurable phenotypic value, it is necessary to assume that they have arbitrary algebraic or numerical values, in the way discussed above. Wright's derivation of the values of a and b (SM1, pp.118-19) is particularly important, and needs to be carefully studied. Unfortunately it is not easy to follow. I would offer two tips. First, it is essential to refer frequently to the path diagram on page 116, without which the derivation would be unintelligible. Second, Wright does not explain why pG.H'' = rG.H'', which is crucial to the validity of the proof. I think it follows from the fact that the only causal path from the parental genotype to the gamete is the direct path pG.H''. [Added: having written this, I am pleased to find that Wright gives this explanation in another article.]

It should be noted that if the parents are unrelated and not inbred, a and b are both equal to root-1/2, so the product ab along the path from parent to offspring in this case equals one-half, as in Malecot's method.

It may perhaps be felt that Wright's derivation of the path coefficient b is a trick with smoke and mirrors. It is mathematically valid, but Wright's claim that 'in a sense, it is legitimate to reverse the arrows....' invites the response that in another sense it is not legitimate, since there is no causal influence from the gametes back to the gametocyte. This part of the proof therefore goes against the spirit if not the mathematical letter of path analysis.

At the top of page 120 Wright explains, very terseley, how correlations between relatives can be derived from the path coefficients. Again, it should be noted that in simple cases, and with perfect additive heritability, the results are the same as Malecot's. Wright then attempts to take account of dominance. As noted above, on page 117-8 of SM1 Wright gives formulae for the correlation between genotypic values with and without dominance. In the standard case of random mating the correlation comes out at root-1/1+p, where p is the proportion of heterozygotes in the population. To adjust the correlations between relatives to allow for dominance, Wright multiplies them by 1/1+p. He does not explain the logic behind this, but I think it is that each of the two correlated relatives has a genotypic value without dominance, which is the basis for the original correlation, and that these values can each be multiplied by root-1/1+p to give a typical adjusted correlation between the values with dominance. The effect is to reduce the correlation between the individuals by the factor 1/1+p. It may perhaps be wondered why only the two individuals at each end of the chain, and not the intermediate individuals, have their values adjusted. I think the explanation is that dominance is essentially an effect on phenotypes rather than genotypes, and in calculating the correlation between the individuals at the ends of the chain we need not take account of dominance effects on intermediate phenotypes any more than we need take account of environmental effects on them, since these do not affect the path coefficients along the chain.

Unfortunately Wright discovered, after reading Fisher's 1918 paper, that except in the case of half-siblings his own treatment of dominance effects was invalid, and in a footnote to his famous 1931 paper on 'Evolution in Mendelian Populations' he withdrew it. His original method therefore never satisfactorily covered epistasis and dominance. He later attempted to incorporate a revised treatment of dominance in his method of path analysis, but the result was very complicated. [See EGP vol 2., p435-6.] In this area Fisher's Analysis of Variance has been more generally used. The method of path diagrams remains very useful for the analysis of relationships, but the paths are now usually interpreted in Malecot's fashion as probabilities of Identity by Descent, and not as correlations.

The Problem of Negative and Zero Correlations
I emphasised earlier that in Wright's system the correlations between relatives, and therefore the measures of relatedness, can be zero or even negative. Yet it seems that Wright's actual procedures for measuring relatedness, by tracing path coefficients back through common ancestors, can only produce positive figures. For example, suppose that on average two randomly chosen members of a population have a degree of relatedness, measured by Identity of Descent within, say, the last thousand years, equivalent to that of full first cousins, i.e. a Malecot Coefficient of Relationship of one-eighth. On the face of it, if we trace back the paths of descent using Wright's methods, and work out the path coefficients, assuming complete additive heritability, the result will be a correlation of one-eighth, numerically equivalent to the Malecot coefficient. But the correlation coefficient between randomly selected members of a population, relative to that population as a whole, must be approximately zero. We therefore seem to have a contradiction.

It took me a while to see how this paradox can be resolved. I think the main explanation [see Note] is that in the usual applications of Wright's methods there is a tacit assumption that only the paths leading through common ancestors need be taken into account. All other paths can be regarded merely as background noise. For example, if we trace the paths between two full first cousins, we need only take into account the paths leading through the two grandparents they have in common, and not the other four grandparents, unless some of these lead back to other common ancestors in the fairly recent past. Ordinarily this is a reasonable approach, but it breaks down if it is is applied to the kind of case referred to in the last paragraph. If we trace back the entire ancestry of two randomly chosen individuals, for some large number of generations, the ancestors will have a mixture of positively and negative correlations between them. The positive and negative correlations will (approximately) cancel out. In a complete path analysis all these correlations would need to be taken into account, even if they do not involve a direct path through a common ancestor. When properly interpreted, Wright's methods therefore do not lead to a contradiction.

I had originally planned to go on to consider the extension of Wright's measures of kinship to the relations between populations, such as his well-known FST statistic. But the post is already long, so I will reserve the subject for another time.


Note: I say the main explanation , because the effect of common ancestry itself may also be reduced when we take account of negative correlations. For example, in the case of cousins with two common grandparents, these two grandparents may be negatively correlated, in which case the indirect path running through both of them would have a negative value. Or a common ancestor might have a negative coefficient of inbreeding (i.e. be less inbred than average for the population), which would reduce the path coefficient from parental genotype to gamete. But as far as I can see, these factors would never be sufficient to offset the positive correlations due to common ancestry entirely. It is therefore also necessary to take account of negative correlations between non-common ancestors.

Labels:


Monday, April 07, 2008

Age & association studies?   posted by Razib @ 4/07/2008 10:10:00 AM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

When it comes to association studies population substructure is something you have to keep in mind, but what about age? On the Replication of Genetic Associations: Timing Can Be Everything!:
The failure of researchers to replicate genetic-association findings is most commonly attributed to insufficient statistical power, population stratification, or various forms of between-study heterogeneity or environmental influences...Here, we illustrate another potential cause for nonreplications that has so far not received much attention in the literature. We illustrate that the strength of a genetic effect can vary by age, causing "age-varying associations." If not taken into account during the design and the analysis of a study, age-varying genetic associations can cause nonreplication. By using the 100K SNP scan of the Framingham Heart Study, we identified an age-varying association between a SNP in ROBO1 and obesity and hypothesized an age-gene interaction. This finding was followed up in eight independent samples comprising 13,584 individuals. The association was replicated in five of the eight studies, showing an age-dependent relationship...Furthermore, this study illustrates that it is difficult for cross-sectional study designs to detect age-varying associations. If the specifics of age- or time-varying genetic effects are not considered in the selection of both the follow-up samples and in the statistical analysis, important genetic associations may be missed.


More digestable summary at ScienceDaily.

Labels:


Sunday, April 06, 2008

Swarming loci; the genetics of height   posted by Razib @ 4/06/2008 03:08:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Identification of ten loci associated with height highlights new biological pathways in human growth & Genome-wide association analysis identifies 20 loci that influence adult height. ScienceDaily has a long review.

Update: I was pretty sure Genetic Future would hit this, so I didn't say much. Well, here's what he notes:
ScienceDaily puts a positive spin on the story ("Scientists are beginning to develop a clearer picture of what makes some people stand head and shoulders above the rest"), but the real story is this: despite the massive scale of these studies, they're still only capturing less than 5% of the total variance in a trait that is almost entirely (90%) genetic. This is a powerful demonstration of the inability of current GWAS technology to access the genetic variants responsible for the vast majority of heritable variation in at least some complex traits, for reasons I have previously discussed in detail.


The bolded parts are exactly right in my estimation; of course nearly a century of biometric analysis of human height should lead us to expect this. In contrast, 50 years ago there was pedigree based work which implied that skin color was going to resolve itself so that about half a dozen loci of large effect explaining most of the variance. That's what we see.

But I think height is important & interesting. Our species has shrunk since the last Ice Age (even modern nutrition hasn't brought us all the way back). Why? Cross-cultural evidence seems to suggest that tall men are more reproductively fit, but the fact that there is a normal range of variation within populations tells us that strong directional selection hasn't been effective over the long term. Otherwise, variation would quickly be exhausted. But it seems likely that some of the between-population differences are due to genetic differences.

Related: Why you be short or tall (well, a little bit) & Why Asians are so short (perhaps).

Labels:


Thursday, April 03, 2008

pre-Clovis ancient DNA?   posted by Razib @ 4/03/2008 04:24:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

DNA from Pre-Clovis Human Coprolites in Oregon, North America:
The timing of the first human migration into the Americas and its relation to the appearance of the Clovis technological complex in North America ca. 11-10.8 thousand radiocarbon years before present (14C ka B.P.) remains contentious. We establish that humans were present at Paisley 5 Mile Point Caves, south-central Oregon, by 12,300 14C yr. B.P., through recovery of human mtDNA from coprolites, directly dated by accelerator mass spectrometry. The mtDNA corresponds to Native American founding haplogroups A2 and B2. The dates of the coprolites are >1000 14C years earlier than currently accepted dates for the Clovis-complex.


ScienceDaily has a really long review of the results & their implications. Also check out A Three-Stage Colonization Model for the Peopling of the Americas.

Labels:


Tuesday, March 04, 2008

Cooperation and heritability   posted by Razib @ 3/04/2008 10:47:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Apropos of pathogens and collectivism, Heritability of cooperative behavior in the trust game (Open Access):
Although laboratory experiments document cooperative behavior in humans, little is known about the extent to which individual differences in cooperativeness result from genetic and environmental variation. In this article, we report the results of two independently conceived and executed studies of monozygotic and dizygotic twins, one in Sweden and one in the United States. The results from these studies suggest that humans are endowed with genetic variation that influences the decision to invest, and to reciprocate investment, in the classic trust game. Based on these findings, we urge social scientists to take seriously the idea that differences in peer and parental socialization are not the only forces that influence variation in cooperative behavior.


I'm not holding my breath....

Labels:


Sunday, March 02, 2008

Computational Biology and Evolution blog   posted by Razib @ 3/02/2008 11:22:00 PM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine

Evolgen points me to a new blog, Computational Biology and Evolution. Only quibble, http://bioinf.cs.auckland.ac.nz/ isn't a memorable domain....

Labels:


Saturday, March 01, 2008

EDAR and hair thickness   posted by p-ter @ 3/01/2008 08:47:00 AM
StumbleUpon Toolbar Digg Reddit Del.icio.us Ma.gnolia Newsvine