GRE utility for graduate school and conditioning on the dependent variable

One of the things that seems to be popular in biological sciences right now is the push to get rid of the GRE as part of the criteria for entrance. Two of the major rationales are that it’s expensive, so discriminates against lower socioeconomic status candidates, and, that it makes it harder to recruit underrepresented minorities since on average they score lower on the GRE (many departments have either explicit or implicit GRE cut-offs).

I’m not going to litigate these issues. To be honest I believe it is a fait accompli that many departments will stop using the GRE. This will probably increase diversity in some ways. But I also suspect it will result in a greater bias toward more “polished” candidates since very high GRE scores sometimes indicate to admissions committees that applicants who are otherwise spotty or irregular may have promise.

But, I do want to enter into the record a major problem with the argument that GRE does not correlate with academic success at the graduate level (supported by research). Yes, part of the issue may simply be range restriction. But there is another issue which many biological scientists may not be familiar with.

First, right now this paper from early this year is getting a lot of attention, The Limitations of the GRE in Predicting Success in Biomedical Graduate School.

It was, of course, a political scientist who objected immediately:

This blog post is of interest for those curious, That one weird third variable problem nobody ever mentions: Conditioning on a collider. Basically, it is well known that at many universities graduate admittees exhibit a weak negative association between GRE scores and grade point averages. This was commented on as far back as the 1970s in ScienceGraduate Admission Variables and Future Success:

The standard variables considered in selecting students for graduate school do not correlate well with later measures of the success or attainments of the selected students (1, 2). The low correlations have led at least one investigator (3) to propose abandoning one of these standard variables, the Graduate Record Examination (GRE). The purpose of the present report is to demonstrate that variables that are the basis for admitting students to graduate school must have low correlations with future measures of the success of these students.

What’s going on?

As noted in the paper there are some universities which are first-choices for graduate school in a field to such an extent that they will admit candidates who have very high GPAs and very high GREs. In this case, neither of the criteria will predict success because there is very little variation to generate a correlation. But, at many universities, there is a negative correlation between admittee GRE score and undergraduate GPA. That is because very few applicants will be admitted with both low GRE and GPA scores, but some will be admitted with high GRE scores and low(er) GPAs and others with higher GPAs and low(er) GREs (usually there is still a GPA and GRE floor).

Consider the relation:
[latexpage]
\[
R^2 = \frac{r_1^2 + r_2^2 – 2r_1r_2r}{1 – r^2}
\]

Where $\R^2$ is the proportion of the variance of the variable you want to predict, and $r_1^2$ and $r_2^2$ are the correlations between GRE and GPA and that the variable of interest, and $r$ is the correlation between GRE and GPA.

Basically, when you have negative correlations you’re going to get into a situation where $r_1^2$ and $r_2^2$ are not going to be able to explain a lot of the variance in what you want to predict.

This may seem like a nerdy issue. And it is well known to social scientists. But since the people I see talking about the GRE are academics in the biological sciences I thought I would at least highlight this nerdy issue.

As I said above, I do think GRE is going to be dropped as a requirement at many universities for graduate programs. This is going to be a natural experiment, so we’ll be able to test many hypotheses. The paper above ends like so:

…Without a study in which a sample of the applicants-rather than of the selected students is evaluated, it is impossible to tell [the validity of the criteria -RK]. Yet such a study is completely infeasible. Even if rejected applicants are monitored throughout the rest of their working careers, it is impossible to evaluate how they would have done had they been admitted, because the rejection itself constitutes an important “treatment” difference between them and the selected students. The alternative is to admit a sample of the applicant population without using the standard admission variables to select them-preferably, to select at random.

Selection may not be random, but I believe we may be able to test some hypotheses in the next generation by testing a set of students later on after admittance on the GRE and see what the future correlation is.

The GRE is useful; range restriction is a thing

The above figure is from Beyond the Threshold Hypothesis: Even Among the Gifted and Top Math/Science Graduate Students, Cognitive Abilities, Vocational Interests, and Lifestyle Preferences Matter for Career Choice, Performance, and Persistence. It shows that even at very high levels of attainment on standardized tests there are differences in life outcome based on variation. The old joke is that results on intelligence tests don’t matter beyond a certain point…that point being whatever your own position is! But these results show that mathematics SAT outcomes at age 13 can still predict a lot of things across a wide range.

From personal experience people outside of psychology are pretty unaware of the power of cognitive aptitude testing. This includes many biologists. I was reminded of the above figure as I read portions of Richard Haier’s The Neuroscience of Intelligence. If you are a biologist curious about the topic, this is a highly recommended book.

The main reason I am posting this is because a friend in academia suggested it might be useful. There has recently been a backlash against the GRE exam, with support from the highest echelons of the science media. Additionally, many researchers in public forums are expressing objections to the GRE very vocally. Naturally this has resulted in counterarguments…but respondents have to be very careful how the couch their disagreement, because they fear being accused of being racist, sex, or classist. Such accusations might trigger social media mobs, which no one wants to be the target of (and if past experience is any guide, friends and colleagues will stand aside while the witch is virtually burned, hoping to avoid notice).

Because of the request above I finally decided to look at the two papers which are eliciting the current wave of GRE-skepticism, The Limitations of the GRE in Predicting Success in Biomedical Graduate School and Predictors of Student Productivity in Biomedical Graduate School Applications. To my eye they suffer from the same problem as all earlier criticisms: range restriction.

The issue is that if a university is using the GRE and other metrics well as filters for those admitted then there shouldn’t be that much variation to be left to be explained by those measures (the outcome being publications or some other important metric which actually leads to the production of science, as opposed to test scores and grades). The two papers above look at those admitted to biomedical programs at UNC and Vanderbilt, while another study looked at UCSF. These are all universities with standards high enough that there are either explicit or implicit cut-off scores so that many students are removed from the applicant pool immediately (the mean scores are well above the 50th percentile, you can see them in the paper yourself).

When I was in graduate school I was on a fellowship committee for several years, and I had access to GRE scores and grades. But I didn’t really pay much attention to them because there wasn’t that much range. And to be honest if the student was beyond their first year I didn’t look at all as time went on. In contrast, I did look really closely at the recommendations from their advisors. From talking to others on the committee this seemed typical. Once students were admitted they were judged based on how they were doing in graduate school. And how they were doing in graduate school had to do with research, not their graduate school GPA or what they scored on the GRE to get in.

As an empirical matter I do think that it is likely many universities will follow the University of Michigan in dropping the GRE as a requirement. There will be some resistance within academia, but there is a lot of reluctance to vocally defend the GRE in public, especially from younger faculty who fear the social and professional repercussions (every time a discussion pops up about the GRE I get a lot of Twitter DMs from people who believe in the utility of the GRE but don’t want to be seen defending it in public because they fear becoming the target of accusations of an -ism). My prediction is that after the GRE is gone people will simply rely on other proxies.

If the GRE is not required, but can be taken, then students who do well on the GRE will put that on their application. Sometimes strong students encounter tragedies in their undergraduate years which strongly impact their grade point averages, and very strong GREs can help show admissions committees that they can do the coursework despite their undergraduate record (I’m not positing a hypothetical, but recounting real individuals I’ve known of and seen). It seems cruel to deny these students the chance to submit their test scores. This means that those professors who believe the GRE is valid will show preference to students who take the test and have strong scores (and to be sure, many more care about the GRE when it means someone concretely joining their lab, as opposed to the abstraction of who gets admitted to the department).

More broadly, professors who are taking students will look more at proxies for GRE score, such as undergraduate institution, or the prestige of the people writing recommendation letters. That is, pedigree will matter a lot more. In some places, such as Britain, standardized testing emerged in part as a way to identify strong students from underprivileged backgrounds. These are not the type of students who would ever be able to present a prestigious letter of recommendation. This is a sort of student which still exists (often they are from non-academic backgrounds, being the first to graduate from college in their family; what they lack in polish they compensate for in aptitude, but that takes the right environment to express).

The recourse to other variables besides the GRE score will likely have mixed results at best. Consider the successful campaign to ban asking for job applicants’ criminal records. It turns out that just increased discrimination against all young black men, because employers could not longer differentiate. In general I think removing the GRE would probably hurt graduates of less prestigious state universities the most (and of course students from East Asia, who tend to have a comparative advantage on standardized tests). I’m pretty sure we’ll see, as the experiment will be run.

Addendum: There are professors at relatively prestigious research universities who had mediocre or sub-par GRE scores. We all know them. To some extent I think many of these individuals almost take pride in the fact that they accomplished so much in science despite negative feedback due to their unimpressive test scores. But remember that we’re talking about trends and averages, not deterministic predictions. Nothing in science is guaranteed, and even if you start at Harvard with undergraduate publications (not first author, but still) in Nature you may not make it that far (I’m thinking of a friend of mine, alas, who picked the wrong lab/project and couldn’t recover).