Personal genomics around the web

Posted on February 10, 2011 by Razib Khan

Just some pointers. Dr. Daniel MacArthur has put up a guest post where I outline my own experience with personal genomics. Cool times that we live in. Also, Zack Ajmal has started posting higher K’s of HAP participants. He’s now in the second batch. My parents will be in the third. Lots of Tamils and Punjabis. The Khan’s are the only Bengalis so far. One individual to represent all of Uttar Pradesh. Here’s a list of participants so far.

Finally, I know 3-D visualization is bad form, but I went for it anyway. Below is a cube which shows the positions of Gujaratis, Chinese, Mexican Americans, and Utah whites and Tuscans from the HapMap, along with a few extra samples from friends and family. Can you tell where my parents are?

Tiger mom for some, not for others

Posted on February 10, 2011 by Razib Khan

In a rumination on the “Tiger mom” phenomenon, Andrew Gelman suggests:

…Back when I taught at Berkeley and it was considered the #1 statistics department, a lot of my tenured colleagues seemed to have the attitude that their highest achievement in live was becoming a Berkeley statistics professor. Some of them spent decades doing mediocre work, but it didn’t seem to matter to them. After all, they were Tenured at Berkeley. Now, I’m not saying Chua is like that–in writing this book, she’s certainly not coasting on her academic reputation–but I do think it’s natural for someone in her position to define her success based on where she stands in the academic pecking order (and, for that matter, a best-selling popular book will help here too) rather than on her accomplishments for their own sake.

That is an unfortunate, and frankly, scary side effect of the way meritocracy sometimes works. Some people fixate more on the proxy measures than the underlying variable which it is intended to measure. I immediately recall two close friends who were going to graduate school M.I.T. and Harvard at the same time, and by an unfortunate coincidence they made the same complaints about their advisors: that once these academics had reached their ultimate goal, they lost all sense of purpose, and simply decided to glide along after tenure. Status, not substantive contribution, turned out to be their ultimate motivation (one of my friends complained that his advisor had transformed himself into an extremely devoted family man after his reputation had reached its maximal value and there was no status return on labor investment!). No one could take away their positions as tenured faculty at M.I.T. and Harvard, and that was enough.

I think this is connected to this Slate piece, Mary Gates and Karen Zuckerberg Weren’t Tiger Moms: Is the Amy Chua approach bad for the American economy?:

Swedes not so homogeneous?

Posted on February 10, 2011 by Razib Khan

Credit: David Shankbone

The more and more I see fine-scale genomic analyses of population structure across the world the more and more I believe that the “stylized” models which were in vogue in the early 2000s which explained how the world was re-populated after the last Ice Age (and before) were wrong in deep ways. I’m talking about the grand narratives outlined in works such as Bryan Sykes’ The Seven Daughters of Eve, the subtitle of which was “The Science That Reveals Our Genetic Ancestry.” If I had less faith in science to always ultimately right its course I’d probably become a post-modernist type who asserts that all these stories are fictions. Sykes’ model in particular seems to be very likely incorrect because of the utilization of ancient DNA to elucidate population movements past in Europe. From what we can gather it looks like coarse attempts to infer past distributions from current distributions (of specific lineages and their diversity) resulted in a great deal of false clarity. We’re not talking differences on the margins, but fundamental confusions. For example, Basques were always assumed to be a viable “reference” population for descendants of European hunter-gatherers. This was one of the linchpins of older historical genetics models. It turns out that this fixed assumption may have been a false one.

Not only were our past assumptions in simple models wrong, but the real explanations may also be rather complex. It turns out that ancient DNA of the “first farmers” and their “hunter-gatherer” neighbors in Central Europe reveals a lot of discontinuity between both these groups and modern Europeans. Why? It may be that in fact there were multiple migrations, and the palimpsest is going to be a tough cookie to excavate. But there’s no need to be disheartened, the old paradigms came crashing down thanks to data.

With that in mind I’ve been particularly interested in the European fringe, the far west and north. If any hunter-gatherer descendants survive in large numbers, it will be here. This is why I’m curious as to the genetics of the Sami as well as the archaeology which tracks the spread of agriculture in Northern Europe. A new paper in PLoS ONE focuses on Sweden, Swedish Population Substructure Revealed by Genome-Wide Single Nucleotide Polymorphism Data:

My genetic odyssey

Posted on February 10, 2011 by Razib Khan

I have a guest post at Genomes Unzipped, summarizing what I’ve found via ancestry analysis over the past 6 months with the results from 23andMe. It is in many ways a brief overview of the detailed posts which you’ve see in this space.

Counting beans the proper way

Posted on February 10, 2011 by Razib Khan

Apropos of several of my recent posts, The New York Times has an interesting article up, Counting by Race Can Throw Off Some Numbers. Basically it outlines the difficulty of enumerating different racial and ethnic groups for different purposes in a more diverse and racially mixed USA. Numbers matter when it comes to apportioning resources, and the current methods are often quite coarse (though some interest groups prefer it that way, because it bolsters their numbers). Let’s focus on the point germane to the focus of this weblog:

The National Center for Health Statistics collects vital statistics from the states to document the health of the population. When it comes to collecting birth certificate information, though, the center encounters a problem: 38 states and the District of Columbia report race data in the new and more expansive manner that allows for the recording of more than one race. But a dozen states do not, because they still use old data systems and outdated forms. As a result, the center cannot produce consistent national data for what it calls “medical and health purposes only.”
To get around that problem, the center reclassifies mixed-race births using a complex algorithm. For example, a birth to a parent who marked white, Asian and Native American would be declared just one of those races, depending on a number of variables in a probability model, like sex, age of the mother and place of birth. (Birth data is reported, in most cases, by the race of the mother.)

The medical part is disturbing to me, because I just realized I’ve been part of the problem. You see, the article doesn’t acknowledge that the category “Asian” is genetically incoherent! A friend stated what I was thinking as a good solution: everyone gets a genomic admixture analysis done, and that’s what gets entered into the medical databases. So a white Hispanic with “pure Spanish ancestry” will be counted as white for medical purposes, but counted as Hispanic for the purposes of identity politics. And black Americans who are more than 50% European in ancestry, such as Henry Louis Gates Jr., will be appropriated “weighted” when it comes to medical genetics focusing on African Americans.

Moderates are dull, liberals are smarter, conservatives are middling

Posted on February 10, 2011 by Razib Khan

Long time reader Ian comments:

A comparison with “the American public” isn’t really appropriate – to even be in the pool where you’re thinking about an academic career, you need to have a college degree. And that population if memory serves, is far more liberal than the population at large. More realistic would be a comparison with the population of people who have graduate degree….

Roughly about ~20% of Americans self-identify as “liberal,” and ~40% as conservative. The General Social Survey has a variable POLVIEWS, which asks individuals to assign themselves to a position on a political spectrum, from “extremely liberal” to “extremely conservative,” like so:

1 = Extremely liberal
2 = Liberal
3 = Slightly liberal
4 = Moderate
5 = Slightly conservative
6 = Conservative
7 = Extremely conservative

So in other words, the higher the integer, the more conservative the individual. The GSS has a variable, EDUCATION, which records the highest level attained. It falls into three classes, high school, bachelor’s, and graduate degrees (I assume those who did not complete high school are omitted because they didn’t attain an education?). Additionally, it has a 10 word vocabulary test, WORDSUM, which has a 0.71 correlation with general intelligence. I combined those on the interval 0-4 (they got 0 to 4 answers correct on the test), and labeled them “dull.” 5-8 I labelled “average. And finally, 9 and 10 I labeled smart (about 20% are dull, 65% average, and 15% smart, in the total data set). Constraining the sample to the year 2000 and later, I produced the following charts:

Inferring and visualizing patterns in genomic data

Posted on February 9, 2011 by Razib Khan

I’ve been playing around with ADMIXTURE and EIGENSOFT with the the HapMap data set along with a few friends & family merged into it. It is interesting to see how the intuitive inferences you make from ADMIXTURE bar plots differ somewhat from PCA scatter plots. In any case, I’ve been posting some of the preliminary results on Facebook (in part because one of my friends is on Facebook and is curious about his own genetic background), and a friend who is a grad student pointed me to Structurama, which infers the best number of categories* (one can do cross-vaidation in ADMIXTURE). I’ve avoided STRUCTURE because it’s computationally more intensive. Any other recommendations? Specifically, something not mentioned by Dienekes or David.

Below the fold is a taste of the games my computer has been up to overnight. K = 5 ancestral populations in ADMIXTURE. HapMap Utah whites, Tuscans, Mexicans, Beijing Chinese, in that order. The last 6 bars are: my father, my mother, and then four individuals of European ancestry, Euro 1, Euro 2, Euro 3, and Euro 4. After merging files and pruning founders and thinning the markers to reduce linkage disequilibrium I was left with 120,000 SNPs. Just a note, I’ve played around with different numbers of SNPs at various K’s, and some very small differences are surprising consistent. My mother is always just a bit more “Asian” than my father.

Why race will matter after we all get our full sequences

Posted on February 9, 2011 by Razib Khan

In my post “Health care costs and ancestry”, a commenter says:
“Race” is a concept that should have died with disco. I imagine it will soon be feasible for every patient to have their genome analysis included in their medical file and the various risk and other pertinent factors explicated.

The chart to the left shows how race is a social construct. It’s a bar plot which partitions ancestry, and as you can see, the Asian children are a mix of European and Asian. How does that happen? Because in 1980 the US Census included people of South Asian origin as “Asian Americans.” In contrast, those of Middle Eastern origin remain “non-Hispanic white” (this not totally crazy, think Ralph Nader or Marlo Thomas). But it means that an ethnic Baloch from Pakistan is “Asian,” and an ethnic Baloch from Iran is a “non-Hispanic white.”

The neo-Malthusian petro-kings

Posted on February 8, 2011 by Razib Khan

One of the major problems with natural scientists when they “project” into the future they often do not take into account the power of innovation to change the fundamental parameters of the game. I believe this was part of the issue at the heart of the famous Simon-Ehrlich wager. Though Julian Simon was untutored in many aspects of natural science, he did comprehend the recent economic history of the world, which has seen a break with the shackles of the iron laws of Malthus. Those laws have been operative for all of human history until the mid-19th century, when Britain started to become the first nation which was a clear exception to the pattern (some may argue that the Dutch pre-figured the English case, but this seems to be debatable).

There are two major changes which Thomas Malthus and his contemporaries (including economists such as David Ricardo) could not anticipate. First, that the rate of innovation in the 19th and 20th centuries would simply surpass anything that the world had seen before. In The Fall of Rome: And the End of Civilization Bryan Ward-Perkins reports that the pollutants which are the byproducts of industrial activity did not reach Roman levels in Britain until the 18th century! I am not naive enough to be such a partisan of the “ancients” as to suggest that Europe did not reach Roman levels of civilization in the generality until the 1700s. But, up until the Industrial Revolution occurred in Britain there were aspects of European civilization which had not yet climbed back up to the Roman scale of grandeur or virtuosity. For example, it seems that it was only in 1800 that London attained the size of the city of ancient Classical Rome (which had fallen in its nadir in the 7th century to a population of 50,000).

The second major parameter is more subtle, and perhaps even more surprising, than innovation. It’s the demographic transition. Even with higher growth rates, if population rises to “catch” up with the bigger economic “pie,” then per capita wealth remains the same. What began in the advanced nations of Western Europe in the 19th century was that the urban middle classes began to reduce their fertility, while at the same time economic productivity continued to increase. The growing pie would was not matched by concomitant population increase. Ergo, greater per capita wealth.

Believe it or not, the world is going through a demographic transition, life expectancy is increasing, as has per capita income (PPP). This is due to continued economic growth, and, a decrease in the rate of population growth. At least in the aggregate.

But different conditions hold in different locales. Below is a comparison of per capita income (PPP) for a few selected nations, as well as their population growth rates:

The academy is liberal, deal!

Posted on February 8, 2011 by Razib Khan

A new article in The New York Times, Social Scientist Sees Bias Within, profiles Jonathan Haidt’s quest to get some political diversity within social psychology. This means my post Is the Academy liberal?, is getting some links again. The data within that post is just a quantitative take on what anyone knows: the academy is by and large a redoubt of political liberals. To the left you see the ratio of liberals to conservatives for selected disciplines. Haidt points out that in the American public the ratio is 1:2 in the other direction, so it would be 0.50. He goes on to say that: “Anywhere in the world that social psychologists see women or minorities underrepresented by a factor of two or three, our minds jump to discrimination as the explanation,” said Dr. Haidt, who called himself a longtime liberal turned centrist. “But when we find out that conservatives are underrepresented among us by a factor of more than 100, suddenly everyone finds it quite easy to generate alternate explanations.” Haidt now calls himself a “centrist,” but you define yourself in part by the distribution around you. In the general public he’d probably still be a liberal, as evidenced by the logic he’s using here. The proportionalist idea is so common the Left, that institutions and communities should reflect the broader society, that he’s now attempting to apply the framework to ideology. But there may be many reasons not having to do with crass discrimination why different groups are differently represented in different disciplines. Consider this case:

– Academics tend to be much smarter than average, and liberals may be overrepresented among the very bright. That to me could explain why education professors are more conservative, though I doubt political scientists are that much brighter than engineers!

– Liberals and conservatives have different values, so that people of similar aptitudes may choose different life paths. The standard assumption is that conservatives value the remuneration of the conventional private sector more than liberals, who may opt for the prestige and status of the academy.

– Studying social science may make you liberal, in that conservative ideas are just not correct.

– Finally, subcultures are probably subject to positive feedback loops where small initial differences may result in disproportionate attraction of various types of individuals to different groups. After the initial positive feedback loop is generated, i.e. bright liberal undergraduates know that graduate school is socially congenial to their values, while conservatives know that it is not, group conformity effects can make the politically “out” reminder more liberal or conservative than they would otherwise be (as an inverse case is Wall Street, where may people from conventional liberal backgrounds may still identify as relatively liberal, but on many issues their environment has shifted their absolute viewpoints to a more right-wing position).

Not only do I think there are reasons not having to do with straightforward discrimination as to the skewed ratios, but, I think that barring a Ministry of Conservative Representation enforcing quotas from on high it’s pretty much impossible to change the basic statistics. You could, for example, simply mandate that conservatives get paid 50% more to incentivize them to becoming academics. But why stop here? How about more liberals in the military and corporate boardrooms?

Does this matter? I think it does. “Positive” Results Increase Down the Hierarchy of the Sciences: