After the evolutionary revolution


Image credit:
Luna04

My post The paradigm is dead, long live the paradigm! expressed to some extent my befuddlement at the current state of human evolutionary genetics and paleoanthropology. After the review of the paper of possible elevated admixture with Neandertals on the dystrophin locus a friend emailed, “Remember when we thought everything would be so simple once we could finally see this stuff?” Indeed I do remember. The fact that things aren’t simple is very exhilarating, but it is also a major quash on theoretical clarity. Science is after all not a collection of facts, but it is in part facts which one can sieve through a analytic framework.

In hindsight with the relative robustness of ancient DNA results we can make some assessments about the role of human bias within particular heuristic frameworks over the past generation. From the mid-1980s up until 2000 it was victory after victory for the Out-of-Africa with total replacement model. The rise of mtDNA and Y chromosomal lineage studies seemed to buttress the idea of common descent from neo-Africans within the last 100-200,000 years for all human populations. There wasn’t much of a perturbation from this march toward paradigm ascendancy in the aughts, except that there were now also now a trickle of papers which claimed to phylogenetic “long branches” in the human genome. The 2006 Evans et al. paper, Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage, was probably the one that made the biggest media splash. But these were inferences. Subsequent analysis of the draft Neandertal genome seems to suggest that in fact the microcephalin allele in question did not introgress.

Case closed? Obviously not. Now we’re in a different era. The Evans et al. paper may have wrong in the specifics, but its general framework seems to likely have been validated: there are genetic lineages in the modern human genome which are not derived from the neo-Africans. But, let us remember that the overwhelming majority of the human genome is neo-African. A reasonable interval for non-Africans is 90-99% neo-African. But, a non-trivial minority has introgressed or admixed from other lineages. Out-of-Africa is mostly correct, but in some ways so is Multiregionalism. But how do we describe this? “Weighted multiregionalism”? “Mostly Out-of-Africa?” The old terms were nice because they were punchy and precise. If you look at Multiregionalism or Out-of-Africa in Wikipedia the newest results are noted, but it doesn’t seem that they’ve been integrated into the analytic narrative. Yet.

The American historical "dark matter"

1936 presidential election, blue = F.D.R.

Walter Russell Mead has a fascinating blog post up, The Birth of the Blues. In it, he traces the roots of modern American “Blue-state” liberalism back to the Puritans, the Yankees of New England. This is a plausible argument. I believe that many social-political coalitions and configurations in contemporary America do have deep historical roots. But assertions and models must be tested. It is for example absolutely correct that early New England was the redoubt of American statism. First the Federalists, and then later to a lesser extent the Whigs, took refuge in New England during the long phase of anti-government Democratic ascendancy which led up to the presidency of Abraham Lincoln. But New England statism has its limits; the map above shows that it is in Greater New England that resistence to FDR seems to have been deepest. I don’t necessarily chalk this up to “flinty Yankee” anti-government sentiment. Rather, I think we need to consider that the ideological content of social-political coalitions and configurations sometimes matter less than long persistent affinities across cultural networks and domains.

Very few Americans for example are aware today that in 1800 New England was the region with the strongest adherence in the United States to orthodox Protestant Christianity. In contrast, Deism was firmly rooted among the Southern planter aristocracy. As late as 1850, even after the Second Great Awakening transformed the religious landscape of the South, the conservative Carolina aristocrat John C. Calhoun remained a Unitarian. And it was in the South than support for Revolutionary France ran strongest, while New England favored the United Kingdom and its allies. I suspect most modern Americans would be taken aback by such affinities simply based on the substance of what New England and the American South represent in terms of ideology at any given moment.

Until a few years ago I was very ignorant of American history. And therefore I was totally innocent of many important patterns which span the generations in our nation. Scholars such as Walter Russell Mead would have impressed me with their erudition, but I didn’t have the data base to evaluate the plausibility of their claims. In everyday discourse we often bandy about history learned when we were teenagers as if they can serve as robust frames for the sorts of inferences we make. Alas, they can not. There is no substitute for genuine knowledge. Albion’s Seed is a good start, but many accessible books which cover the first period of American sectionalism are filled with much relevant insight.

Notes on the future

If you’re a regular reader, you may have noticed some changes. Since I moved to Discover blogs I’ve been posting less and less here. Additionally, I’ve been putting some of my shorter less science oriented stuff at Brown Pundits and Secular Right. And I suspect twitter has cannibalized some of the link aggregation function of blogging in general.

So where does this leave this website? The archives are obviously active and useful for many people. Even without any front page content this blog serves 1-2,000 pages per day just as a function of search engines sending traffic to old posts. That’s important. GNXP could turn into an archive site, as I always imagined it would at some point, and still play a vital role in the information ecology.

But I’m not ready to turn this into a hibernating site yet. Kevin and David are still posting obviously. And, because of the traffic and the old links that come to this domain GNXP has good PageRank. My main interest then is to promote science bloggers whose content should “get out there.” So I’ve been soliciting contributions from people now and then with the promise that cross-posting will boost the PageRank of their site and give them some publicity. If you have a weblog with content that I think would fit the front page of this weblog, and are interested in cross-posting, feel free to email me at contactgnxp -at- gmail.com with a link. I’ll add it to my RSS and see if it’s a good fit. If you seem a good candidate for front page privs, I’ll shoot you an email with the details about your login, etc.

Additionally, I’ve modified the column format some. At the top of the sidebar now are a set of articles which come from an aggregation site where I curated various weblog RSS feeds (as well as some google searches). And, there’s always my pinboard and Jason’s delicious. There’s also a footer column now where you can find archives, books, etc.

I’ll probably be tweaking with the format and what not every now and then. All things must change.

Speaking of using PageRank, the Harappa Ancestry Project now has its own domain, http://www.harappadna.org. If you’re South Asian, Iranian, Tibetan, or Burmese, please check it out.

Visualizing variation, input → output

I have noted a few times that one thing you have to be careful about in two dimensional plots which show genetic variance is that the dimensions in which the data are projected upon are often generated from the data itself. So adding more data can change the spatial relationships of previous data points. Additionally, in 23andMe’s global similarity advanced plot you are projected onto the dimensions generated from the HGDP data set. There are some practical reasons for this. First, it’s computationally intensive to recalculate components of variance every time someone is added to the data set. Second, it isn’t as if the ethnic identity of any given individual is validated. What would you do if an alien sent in a kit and spuriously put “French” as their ancestry?

So, in reply to this comment: “Let me rephrase: is there any difference when you switch to the world-wide plot? I imagine not, or you would’ve mentioned it.” Actually, there is a slight difference. Below on the right you have a “world view,” with my position being marked with green, and on the left a “zoom in” for Central/South Asia in the HGDP data set.

Read More

Neandertal admixture, revisiting results after shaken priors

After 2010’s world-shaking revolutions in our understanding of modern human origins, the admixture of Eurasian hominins with neo-Africans, I assumed there was going to be a revisionist look at results which seemed to point to mixing between different human lineages over the past decade. Dienekes links to a case in point, a new paper in Molecular Biology and Evolution,  An X-linked haplotype of Neandertal origin is present among all non-African populations. The authors revisit a genetic locus where there have been earlier suggestions of hominin admixture dating back 15 years. In particular, they focus on an intronic segment spanning exon 44 of the dystrophin gene, termed dys44. Of the haplotypes in this they suggested one, B006, introgressed from a different genetic background than that of neo-Africans. The map of B006 shows the distribution of the putative “archaic” haplotype from a previous paper cited in the current one from 2003. As you can see there’s a pattern of non-African preponderance of this haplotype. So what’s dystrophin‘s deal? From Wikipedia:

Dystrophin is a rod-shaped cytoplasmic protein, and a vital part of a protein complex that connects the cytoskeleton of a muscle fiber to the surrounding extracellular matrix through the cell membrane. This complex is variously known as the costamere or the dystrophin-associated protein complex. Many muscle proteins, such as α-dystrobrevin, syncoilin, synemin, sarcoglycan, dystroglycan, and sarcospan, colocalize with dystrophin at the costamere.

Dystrophin is the longest gene known on DNA level, covering 2.4 megabases (0.08% of the human genome) at locus Xp21. However, it does not encode the longest protein known in humans. The primary transcript measures about 2,400 kilobases and takes 16 hours to transcribe; the mature mRNA measures 14.0 kilobases….

Dystrophin deficiency has been definitively established as one of the root causes of the general class of myopathies collectively referred to as muscular dystrophy. The large cytosolic protein was first identified in 1987 by Louis M. Kunkel…after the 1986 discovery of the mutated gene that causes Duchenne muscular dystrophy (DMD) ….

OK, so we’ve established that this is not an obscure gene. Here’s the abstract of the new paper:

Read More

23andMe v3 chip & me

Yesterday the first batch of results from 23andMe’s v3 chip came online. Instead of 550,000 SNPs you get ~1 million. The difference is pretty clear when you look at the raw SNPs. Under Account → Browse Raw Data, I can enter LCT, and this is what I see:

I’m line #2. A sibling is line #1. Looking at this sort of stuff makes it really likely I’ll upgrade. My main rationale for not upgrading is that there’s diminishing marginal returns for ancestry related stuff. Speaking of ancestry, let’s compare my sibling’s ancestry painting to my own.

Read More

Around the Web – January 24th, 2011

Participants So Far. Zack reports 10 people of South Asian ancestry have sent their raw data. His coverage seems OK, but he only has multiple samples from Punjabis. I know some people who will be sending their data in soon, and I’m going to swap my parents in for me, so Bengalis will go from N = 1 to N = 2, but please spread the word. Better coverage in eastern and southern South Asia is really needed.

Why Rich Parents Don’t Matter. Jonah Lehrer references my post When genes matter for intelligence. This is a possibility which I think needs to be more widely spread by the mainstream media: “Eliminating such inequalities in the early years of life would simply create a new kind of inequality, driven by genetics.” When people fret about the relative lack of class mobility into Ivy League universities compared to the 1960s, they might consider if the mobility of that era was simply a function of the relatively recent removal of previous discriminatory barriers. Once those barriers are gone for a few generations there’s no reason to expect that the “peak churn” would match the transition phase.

US equivalents. Comparing the aggregate GDP of American states to nations around the world.

Human Prehistory and Genetics Wiki. I don’t “do” the wiki thing myself, but in case you’re interested.

Read More

Harappa Ancestry Project, update

Last week I announced the Harappa Ancestry Project. It now has its own dedicate website, http://www.harappadna.org. Additionally, it has its own Facebook page. For Zack to get his own URL he needs about 10 more “likes,” so please like it! (if you are so disposed) Finally, from what I’ve heard the first wave of the 23andMe holiday sale results are coming online this week. Actually, one of the relatives who I purchased the kit for is in processing currently, so I know that we should have a bunch of new people in the system very, very, soon.

Speaking of people, last I heard Zack had gotten about a dozen responses. That’s enough to start an initial round of runs, but obviously he needs more people. More importantly, the goal here is to get better population coverage. One of the things we know intuitively and also from the most current research is the existence of a lot of within-region population variation in South Asia which is structured by community. In other words, a sample of 30 people, where you have 3 from 10 different communities exhibiting geographical and caste diversity is going to be far more useful right now than 300 Jatts from Indian Haryana. Getting 300 Jatts for Haryana would be interesting in that it would give you a window into intra-communal variance, but there’s diminishing returns on the inferences you could make about South Asians as a whole.

If you know someone who has done the 23andMe testing and has preponderant ancestry from South Asia, Iran, Burma, or Tibet, please forward the the URL for the Harappa Ancestry Project. If you are a 23andMe member, and involved in the forums, it might be useful to post a comment thread on this project, as the people you share genes with would see it.

The genomic heritage of French Canadians


Image Credit: Anirudh Koul

One of the great things about the mass personal genomic revolution is that it allows people to have direct access to their own information. This is important for the more than 90% of the human population which has sketchy genealogical records. But even with genealogical records there are often omissions and biases in transmission of information. This is one reason that HAP, Dodecad, and Eurogenes BGA are so interesting: they combine what people already know with scientific genealogy. This intersection can often be very inferentially fruitful.

But what about if you had a whole population with rich robust conventional genealogical records? Combined with the power of the new genomics you could really crank up the level of insight. Where to find these records? A reason that Jewish genetics is so useful and interesting is that there is often a relative dearth of records when it comes to the lineages of American Ashkenazi Jews. Many American Jews even today are often sketchy about the region of the “Old Country” from which their forebears arrived. Jews have been interesting from a genetic perspective because of the relative excess of ethnically distinctive Mendelian disorders within their population. There happens to be another group in North America with the same characteristic: the French Canadians. And importantly, in the French Canadian population you do have copious genealogical records. The origins of this group lay in the 17th and 18th century, and the Roman Catholic Church has often been a punctilious institution when it comes to preserving events under its purview such as baptisms and marriages. The genealogical archives are so robust that last fall a research group input centuries of ancestry for ~2,000 French Canadians, and used it to infer patterns of genetic relationships as a function of geography, as well as long term contribution by provenance. Admixed ancestry and stratification of Quebec regional populations:

Population stratification results from unequal, nonrandom genetic contribution of ancestors and should be reflected in the underlying genealogies. In Quebec, the distribution of Mendelian diseases points to local founder effects suggesting stratification of the contemporary French Canadian gene pool. Here we characterize the population structure through the analysis of the genetic contribution of 7,798 immigrant founders identified in the genealogies of 2,221 subjects partitioned in eight regions. In all but one region, about 90% of gene pools were contributed by early French founders. In the eastern region where this contribution was 76%, we observed higher contributions of Acadians, British and American Loyalists. To detect population stratification from genealogical data, we propose an approach based on principal component analysis (PCA) of immigrant founders’ genetic contributions. This analysis was compared with a multidimensional scaling of pairwise kinship coefficients. Both methods showed evidence of a distinct identity of the northeastern and eastern regions and stratification of the regional populations correlated with geographical location along the St-Lawrence River. In addition, we observed a West-East decreasing gradient of diversity. Analysis of PC-correlated founders illustrates the differential impact of early versus latter founders consistent with specific regional genetic patterns. These results highlight the importance of considering the geographic origin of samples in the design of genetic epidemiology studies conducted in Quebec. Moreover, our results demonstrate that the study of deep ascending genealogies can accurately reveal population structure.

Read More