GRE utility for graduate school and conditioning on the dependent variable

One of the things that seems to be popular in biological sciences right now is the push to get rid of the GRE as part of the criteria for entrance. Two of the major rationales are that it’s expensive, so discriminates against lower socioeconomic status candidates, and, that it makes it harder to recruit underrepresented minorities since on average they score lower on the GRE (many departments have either explicit or implicit GRE cut-offs).

I’m not going to litigate these issues. To be honest I believe it is a fait accompli that many departments will stop using the GRE. This will probably increase diversity in some ways. But I also suspect it will result in a greater bias toward more “polished” candidates since very high GRE scores sometimes indicate to admissions committees that applicants who are otherwise spotty or irregular may have promise.

But, I do want to enter into the record a major problem with the argument that GRE does not correlate with academic success at the graduate level (supported by research). Yes, part of the issue may simply be range restriction. But there is another issue which many biological scientists may not be familiar with.

First, right now this paper from early this year is getting a lot of attention, The Limitations of the GRE in Predicting Success in Biomedical Graduate School.

It was, of course, a political scientist who objected immediately:

This blog post is of interest for those curious, That one weird third variable problem nobody ever mentions: Conditioning on a collider. Basically, it is well known that at many universities graduate admittees exhibit a weak negative association between GRE scores and grade point averages. This was commented on as far back as the 1970s in ScienceGraduate Admission Variables and Future Success:

The standard variables considered in selecting students for graduate school do not correlate well with later measures of the success or attainments of the selected students (1, 2). The low correlations have led at least one investigator (3) to propose abandoning one of these standard variables, the Graduate Record Examination (GRE). The purpose of the present report is to demonstrate that variables that are the basis for admitting students to graduate school must have low correlations with future measures of the success of these students.

What’s going on?

As noted in the paper there are some universities which are first-choices for graduate school in a field to such an extent that they will admit candidates who have very high GPAs and very high GREs. In this case, neither of the criteria will predict success because there is very little variation to generate a correlation. But, at many universities, there is a negative correlation between admittee GRE score and undergraduate GPA. That is because very few applicants will be admitted with both low GRE and GPA scores, but some will be admitted with high GRE scores and low(er) GPAs and others with higher GPAs and low(er) GREs (usually there is still a GPA and GRE floor).

Consider the relation:
[latexpage]
\[
R^2 = \frac{r_1^2 + r_2^2 – 2r_1r_2r}{1 – r^2}
\]

Where $\R^2$ is the proportion of the variance of the variable you want to predict, and $r_1^2$ and $r_2^2$ are the correlations between GRE and GPA and that the variable of interest, and $r$ is the correlation between GRE and GPA.

Basically, when you have negative correlations you’re going to get into a situation where $r_1^2$ and $r_2^2$ are not going to be able to explain a lot of the variance in what you want to predict.

This may seem like a nerdy issue. And it is well known to social scientists. But since the people I see talking about the GRE are academics in the biological sciences I thought I would at least highlight this nerdy issue.

As I said above, I do think GRE is going to be dropped as a requirement at many universities for graduate programs. This is going to be a natural experiment, so we’ll be able to test many hypotheses. The paper above ends like so:

…Without a study in which a sample of the applicants-rather than of the selected students is evaluated, it is impossible to tell [the validity of the criteria -RK]. Yet such a study is completely infeasible. Even if rejected applicants are monitored throughout the rest of their working careers, it is impossible to evaluate how they would have done had they been admitted, because the rejection itself constitutes an important “treatment” difference between them and the selected students. The alternative is to admit a sample of the applicant population without using the standard admission variables to select them-preferably, to select at random.

Selection may not be random, but I believe we may be able to test some hypotheses in the next generation by testing a set of students later on after admittance on the GRE and see what the future correlation is.

The postdoc salary range with cost of living (situation probably worst than reported)


Nature has an article, Pay for US postdocs varies wildly by institution. True, but as Matt Hahn, professor of biology at Indian University in Bloomington (cost of living 93% of the USA average) observed there isn’t any correction for cost of living. The researcher who dug through the data actually posted it online, so I decided to correct that oversight.

I took the institutions with N > 20, and looked up the cost of living in Best Places. The plot above is messy, but you can see that lots of institutions are paying a standard median salary of around $47,500, no matter the cost of living.

The correlation between cost of living and postdoc salary is 0.39. The weighted correlation is 0.48. These are pretty modest. That means you can find a really good situation, or a really bad one (also, institution reputation matters, there are some gems which pay well and have great reputations from what I can tell!).

Also, I’m pretty sure that the situation is worse than the numbers above suggest. Looking at the list of universities it seems there’s a bias for institutions at high cost of living locations not to want to report their salary data I think. Aside from UCSB the whole UC system denied the attempt to get data, and I don’t see Stanford, Columbia, or Harvard on the list.

The full table is below the fold, but adjusted for cost of living UCSB postdocs get $20,866 per year. In contrast, Michigan State, University of Maryland, Baltimore, and Wayne State University postdocs make more than $60,000 per year when you adjust. Stanford isn’t on the list, but online it says Stanford postdocs make between the low $50,000 to low $60,000 range, which seems reasonable for life sciences, though definitely poverty wages where the university is located (though if you are in a lucrative field it can be more, and depending on your supervisor outside consulting is a possibility, though good luck living in Silicon Valley on a $100,000 yearly gross income if you have a family, as many postdocs do).

Read More

The Rising Waters of Human Tribal Nature


I’m excited to read Steven Pinker’s Enlightenment Now: The Case for Reason, Science, Humanism, and Progress. I’ve read every one one of his books except for The Stuff of Thought, and The Blank Slate is one of my favorite books of all time. I still remember how much of a page-turner The Language Instinct was for me back in the late 1990s. But I’m most excited about Enlightenment Now because I’m looking for a little hope. At this point, I am very pessimistic as to the prospects for the Enlightenment project.

This is pretty obvious to anyone who reads me closely. I’ve been writing and discussing with people on the internet, and in private, for many years now, and have come to the conclusion most people are decent, but they’re also craven and intellectually unserious outside of their domain specificity when they are intellectual. Many of our institutions are quite corrupt, and those which are supposedly the torchbearers of the Enlightenment, such as science, are filled with people who are also blind to their own biases or dominated by those who will plainly lie to advance their professional prospects or retain esteem from colleagues.

That’s why I laughed out loud when I saw this tweet:

In psychology, much of the replication crisis was simply due to personal self-interest (more publications). But some of it was obviously political (see stereotype threat). Similarly, look at the fiasco in nutrition science. Some of it was personal, but there were also political demands from on high that there be something done. So “scholars” set some guidelines that people followed for decades, even if later they were shown to be totally ineffective. I’m not even going to get into the travesty that is modern biomedical science, with professional advancement and institutional interests combined in a deadly cocktail.

Also, I enjoy science popularizing (or did, I don’t read science books much anymore) as much as the next person, but isn’t it interesting how much of modern science confirms the mainstream elite cultural norms of ~2020? Curiously, if you read science popularizations in newspapers in 1920 they would also confirm the elite cultural norms of 1920…. But this time we’re right!

Other institutions aren’t doing better. The media is going through economic collapse, and journalists and their paymasters are reacting by pandering to their audiences. Instead of illuminating, they’re confirming. That’s what the audience wants, and I’m sure it’s more satisfying to journalists anyway. But can you blame them with the economics that are before us?

This is 2017, Nazi-pizza

Don’t get me started on Facebook or Twitter.

I was having a discussion with a reasonably prominent pundit (you would recognize the name) today who bemoaned the reality that so many journalists are now driven to sating tribal passions and generating clicks for their paymasters. He was trying to argue against my pessimism, suggesting that the fever was starting to break. We’ll see. I hope I’m wrong.

People have always been biased and subject to motivated reasoning. We’ve had our disputes whatever our ideology, whether it be conservative, moderate, or liberal. But the Enlightenment perspective of critical rationalism, which took philosophical realism seriously, meant that ultimately people who disagreed often assumed that fundamentally they were trying to converge on the same facts, the same reality. Reality existed, and you couldn’t just wish it away. Discussion might forward two individuals to a convergence!

We’re not there anymore. Whether it be Bush-era contempt for “Reality-Based Community”, or the rising crest of “Critical Theory”, the acid of subjectivism is eroding the vast edifice of aspirational realism which grew organically in the wake of the Enlightenment. This isn’t a Left vs. Right phenomenon, it’s a human dynamic, because for most of human history what is true has been determined by what the tribe dictates to be true, and what the tribe dictates to be true has often not been based on a critical evaluation of facts and theories. What the tribe dictates to be true is computationally less intensive than thinking things through yourself, and, it’s often right-enough.

The reality is that this cultural cognition and conformity has always held. It’s just that it seems that for a few centuries substantial latitude was given in public to a relative amount of heterodoxy from broad tribal visions. And it was always a work in progress. But there was a goal, and an ideal, even if we habitually failed. We failed in the direction of truth.

We live in a post-modern age now. Feelings are paramount, facts must bow before them. But the curious fact is that the post-modern age is just the pre-modern age. When I first read the Christian author Alister McGrath I literally scoffed at his contention that atheism would fail before the ascendancy of post-modernism. Ten years on I will admit that I now believe he was right and I was wrong. Though I don’t think the New Atheism failed miserably, I do think that the problems it is encountering from the cultural Left are due to its cold modernist baggage.

No truth, no liberalism. No liberalism, and democracy become the mob. The passions of the mob do eventually fail, and its wake a more oligarchic and hierarchical system will emerge. We may simply be seeing the end of the liberal individualist interregnum, as history reverts to its despotic collectivist norm.

Art, the applied sciences of engineering, and many human endeavors will continue to develop in the new order. Illiberal societies, all societies until recently, can be cultured and civilized. My own preference is for the dignity of the individual and legal egalitarianism of the liberal world in which I grew up (but in which I was not born), but humans have flourished and continue to flourish in illiberal environments.

One way to think about the past century or so is that more or less the waters of human nature receded, and a great undersea world was exposed. But now human nature is rising, and that world is submerging before our eyes. But islands of the old world we grew up in will persist. We need to find each other out and cherish the values of critical inquiry as we have for thousands of years. An archipelago of learning for learning’s sake can sill maintain itself in a world where our values no longer hold the leash. But like the mammals during the Mesozoic, we will have to go back into the night and the shadows. There will hopefully be oligarchic patrons who sympathize with us, and despots like Frederick the Great who give us some latitude to work. Our values will fade and diminish, but they will not disappear.* One day they may come to the fore again!

Finally, understanding that most people don’t need to be right or utter the truth, but simply need to win, has made me much more cheerful and less sour observing everyday stupidities. It is no great insight to observe that I’ve never been one who has had much esteem for the admiration of my peers. I like to do my own thing. But tribal acclamation must be the best of all things for most humans, and now I understand why they fight unfairly and stupidly with such ease and naturalness: their aim not to be right in the eyes of nature, but to rise in the esteem their fellow human. That is the summum bonum.

Note: I’ll be very happy to be proven wrong in 15 years. But as it is I think by then we’ll be dealing with the final breakdown of the institutions of the republic in the wake of a Left-wing attempt to forestall the economic immiseration of the middle-class that failed.

* The main reason I hated religion as a child is the mindless boredom of attendance at services. I quickly realized I didn’t believe any of that tripe and never had. But the liberty that I have to dissent from public values may not be a liberty we always have. Private dissent may come back and become the norm as it has been for much of human history.

$9.99 to get into the Helix exome ecosystem

Will try to keep self-interested product placement to a minimum normally, but I thought I’d pass on that Helix has a $100 off sale for the next 72 hours. That means that the company I work for has a Neanderthal app on sale for $9.99. The regular price is $29.99, and added $80.00 for exome+ sequencing if you aren’t in the Helix database (which most people are not).

The upshot here is that the $9.99 will get you an exome+ sequence, which at some point in early 2018 you can download for $600. But if you don’t want to download it it’s a great way to get into the ecosystem on-the-cheap.

I assume most of my readers know what the exome is, but it’s basically the portion of your genome which is directly translated into functional proteins. That’s about ~1% of the genome, or ~30,000,000 bases. This is a major expansion on the SNP-chip platforms which are DTC which are in the 500,000 to 1,000,000 SNP range.

Anyway, not sure this will be appealing to readers who need a full download of data. But if you are the type who is more interested in getting applications related to your genome, this is a pretty good deal at a sub-$10 price point.

Note: To my knowledge only ships to USA currently.

Our time in the sun

The New York Times has a story up, After the Dinosaurs’ Demise, Many Mammals Seized the Day. It’s a write-up of a new paper that is open access, Temporal niche expansion in mammals from a nocturnal ancestor after dinosaur extinction.

This research illustrates how computational power has changed evolutionary biology. There has long been an intuitive verbal model that mammals were ancestrally night-adapted creatures based on aspects of their biology, as well as the evolutionary reality that for most of the lineages’ existence they were overshadowed by dinosaurs (remember, more than half of our evolutionary history predates the Cenozoic).

But today we do more than posit models which match and predict the fossil (or genetic) data. Computationally intensive phylogenetic frameworks are tested using extant lineages to generate probabilities of given scenarios generating the data we see given particular models. Something like the Reversible-jump Markov chain Monte Carlo (which is used in this paper) could actually be done manually…if a phylogeneticist had thousands of slaves to do all the computations. Obviously, the emergence of powerful computers accessible to all really changed the game in terms of analytic power.

And yet I wonder about the sense of precision that people gain from these methods. Verbal models are necessarily vague. When you give a probability of a given hypothesis being 0.71, that gives understanding a solidity. But is it warranted? Though researchers understand all the individual moving parts of the phylogenetic framework, only a computer can really bring it all together.

It’s something to consider. This is to a great extent the future of evolutionary biology. Positing models, and put it into a calculating machine like Leibniz dreamed of.

Citation: Temporal Niche Expansion In Mammals From A Nocturnal Ancestor After Dinosaur Extinction
Roi Maor, Tamar Dayan, Henry Ferguson-Gow, Kate Jones

Addendum: This is stupid of me, but only after reading the above paper did I reflect that most amniotes are diurnal and that mammals are the exception. Think about it, birds. And reptiles are probably more sluggish at night.

The end of the Kingdom of Saudi Arabia

The most important thing happening in the world that is different this week from last week from what I can tell is that the the Kingdom of Saudi Arabia is going “full Ishmael” on us. By this I mean the reference in the Hebrew Bible to Abraham’s firstborn son, Ishmael, and the legendary ancestor of the Arabs: “And he will be a wild man; his hand will be against every man, and every man’s hand against him….”

What’s going on now? As you know there seems to be an internal purge going on, and a centralization of power around the Crown Prince. This, after the rollback of the power of the religious establishment.

Externally the quagmire in Yemen continues, and the Saudi state is now becoming more belligerent toward both Iran and Lebanon.

Most of you probably know the general issues about why the Saudi state is attempting to change and reform. Though petroleum will remain important for plastics and jet fuel, it is quite possible that the proportion used for gasoline will decline with the rise of electric cars. Additionally, there seem more supply-side possibilities with fracking technologies.

But perhaps the biggest factors are demographic. Over ten years ago Peter Turchin wrote a paper, Scientific Prediction in Historical Sociology: Ibn Khaldun meets Al Saud. It’s pretty useful in understanding what’s going on right now. The big issue which Turchin talks about more generally and is relevant to Saudi Arabia is elite overproduction. The Royal House is highly fecund. And all the scions demand unsustainable leisured lives….

China’s wealthiest come from only a few regions

In Kenneth Pomeranz’s The Great Divergence: China, Europe, and the Making of the Modern World Economy he argues that the difference in per capita economic wealth between Europe and China is a relatively recent phenomenon. One of the major arguments he makes is that one has to make an apples-to-apples comparison. Comparing Northwest Europe to China is not apples-to-apples, but comparing Northwest Europe to the lower Yangzi Delta region of Central China is apples-to-apples. Using this measure Europe and China are roughly comparable up until 1800.

At least that’s the argument. Others make the case for much deeper and older roots for the differences between Western Europe and the rest of the world, most articulately in Gregory Clark’s A Farewell to Alms.

I don’t have a dog in this fight and am not decided, though I follow the field somewhat closely. Rather, I’ve always been curious about differences between Chinese regions, and how they never undermine national unity. I recall reading years ago in The Age of Confucian Rule that imperial examinations to determine candidates for the bureaucracy had quotas on candidates from the southeastern province of Fujian. They were simply filling up too many slots, at the expense of northern Chinese candidates.

The tension between social and economic orientations of different regions of China cropped up periodically. Basically, the Overseas Chinese community is derived from southern regions such as Guangdong and Fujian, the central government over the centuries attempted to stamp out these regions’ propensity toward international commerce. A figure like Howqua is typical, though he certainly would not be met with approval by stern Neo-Confucians such as Zhu Xi (also a southern Chinese born and bred).

With all this in mind, I was curious about the origins of the 20 wealthiest Chinese as of 2017. Below you see the results:

NameNet worth (USD)Sources of wealthProvinceCertainty
Wang Wenyin14 billionmining, copper productsAnhui 
Liu Yongxing6.6 billionagribusinessFujian 
Ma Huateng24.9 billioninternet mediaGuangdong 
He Xiangjian12.3 billionhome appliancesGuangdong 
Yang Huiyan9 billionreal estateGuangdong 
Yao Zhenhua8.4 billionconglomerateGuangdong?
Zhang Zhidong8.4 billioninternet mediaGuangdong?
Hui Ka Yan (Xu Jiayin)10.2 billionreal estateHenan 
Lei Jun6.8 billionsmartphonesHubei 
Liu Qiangdong7.7 billione-commerceJiangsu 
Zhang Shiping6.7 billionaluminum productsShandong 
Wang Wei15.9 billionpackage deliveryShanghai 
Robin Li13.3 billioninternet searchShanxi 
Wang Jianlin31.3 billionreal estate,Sichuan 
Xu Shihui21.1 billionsolar power equipmentSichuan 
Jack Ma28.3 billione-commerceZhejiang 
William Ding17.3 billiononline gamesZhejiang 
Zong Qinghou7.2 billionbeveragesZhejiang 
Li Shufu21.1 billionautomobilesZhejiang 
Guo Guangchang6.3 billiondiversifiedZhejiang

A few of the individuals I’m not totally sure about in terms of where they were born, but I think I guessed correctly. Comparing representation on the list to national population by province, and you get:

ProvincePop %On list
Guangdong8%25%
Zheijiang4%25%
Sichuan8%10%
Fujian3%5%
Anhui5%5%
Henan7%5%
Hubei4%5%
Jiangsu6%5%
Shanghai2%5%
Shanxi3%5%
Shandong7%5%

Zheijang-Jiangsu-Shangai is the core economic region highlighted by Pomeranz. About 12% of China’s population resides in these jurisdictions, but 35%, 7 out of 20, of its 20 wealthiest individuals were born here. Guangdong, as ground zero of the new economic revolution has clearly benefited.

Introducing DNAGeeks.com

Four years ago my friend David Mittelman and I wrote Rumors of the death of consumer genomics are greatly exaggerated. The context was the FDA crackdown on 23andMe. Was the industry moribund before it began? The title gives away our opinion. We were personally invested. David and I were both working for Family Tree DNA, which is part of the broader industry. But we were sincere too.

Both of us have moved on to other things. But we still stand by our original vision. And to a great extent, we think we had it right. The consumer genomics segment in DTC is now nearing 10 million individuals genotyped (Ancestry itself seems to have gone north of 5 million alone).

One of the things that we observed in the Genome Biology piece is that personal genomics was still looking for a “killer app”, like the iPhone. Since then the Helix startup has been attempting to create an ecosystem for genomics with a variety of apps. Though ancestry has driven nearly ten million sales, there still isn’t something as ubiquitous as the iPhone. We’re still searching, but I think we’ll get there. Data in search of utility….

David and I are still evangelizing in this space, and together with another friend we came up with an idea: DNAGeeks. We’re starting with t-shirts because it’s something everyone understands, but also can relay our (and your) passion about genomics. We started with “Haplotees.” Basically the most common Y and mtDNA lineages. This might seem silly to some, but it’s something a lot of people have an interest in, and it’s also a way to get ‘regular people’ interested in genetics. Genealogy isn’t scary, and it’s accessible.

We are also field-testing other ideas. If there is a demand we might roll out a GNXP t-shirt (logo only?). The website is obscure enough that it won’t make sense to a lot of people. But perhaps it will make sense to the people who you want it to make sense to!

Anyway, as they say, “keep watching this space!” We don’t know where DNAGeeks is going, but we’re aiming to have fun with genomics and make a little money too.