Monday, May 11, 2009

Harlem Children's Zone   posted by dkane @ 5/11/2009 09:52:00 AM

Via Steve Sailer and Half Sigma, we have this New York Times op-ed by David Brooks on work (pdf) by Will Dobbie and Roland Fryer on the Harlem Children's Zone (HCZ). Brooks writes:

The fight against poverty produces great programs but disappointing results. You go visit an inner-city school, job-training program or community youth center and you meet incredible people doing wonderful things. Then you look at the results from the serious evaluations and you find that these inspiring places are only producing incremental gains.

That's why I was startled when I received an e-mail message from Roland Fryer, a meticulous Harvard economist. It included this sentence: "The attached study has changed my life as a scientist."

No one else seems to have linked to (read?) the study itself. Here are the key graphics:


Extremely impressive, if true.

Note, however, that there is no way (that I could find) to tell from the paper just how many observations make up the blue and red dots for 8th grade mean math scores in Fig 3A. Key paragraph:

We use two separate statistical strategies to account for the fact that students who attend HCZ schools are not likely to be a random sample. First, we exploit the fact that HCZ charter schools are required to select students by lottery when the number of applicants exceeds the number of available slots for admission. In this scenario, the treatment group is composed of students who are lottery winners and the control group consists of students who are lottery losers. This allows us to provide a set of causal estimates of the effect of being offered admission into the HCZ charter schools on a range of outcomes, including test scores, attendance, and grade completion.

Using a lottery as a method of randomly assigning students to treatment and control groups is far-and-away the best method for estimating causal effects. Their second statistical strategy, instrumental variables, is much less reliable. If the authors were merely reporting some regression-based estimates, few would take the results that seriously. Teasing out causal effects from a regression is very hard. That the authors do not use a propensity score approach (at least as a check against their estimates) makes me doubt their statistical chops.

Anyway, the lottery aspect is key. To their credit, the authors are upfront in admitting that:

[T]he HCZ middle school was not significantly oversubscribed in their first year of operation, and the HCZ elementary schools have never been significantly oversubscribed, making it more difficult to estimate the effect of being offered admission for these groups.

I think that the first year of operation refers to 2005, so the number of observations in the 8th grade loser category might be very low. Still, the authors report that "The effect of receiving a winning lottery number is generally larger for students in the 2006 cohort, though we only observe sixth and seventh grade scores for these students and so decided not to show it in our figures." So, I expect that the 8th grade numbers reported here are not a fluke.

If you really started with 1,000 5th graders, randomly assigned 500 to HCZ and 500 to their local (lousy) public schools and then saw these huge differences in math scores, you would have discovered just about the biggest causal effect in the history of education research.

Have Dobbie and Fryer made that discovery? I don't know. Their write-up and tables make it very hard to understand what is going on. What is the mean difference (without any "adjustments") in 8th grade math scores between students who won the lottery and those who did not? It would certainly be useful if someone were to replicate these results.

The notes to Table 2 report that "Each regression controls for the gender, race, lunch status, and predetermined values of the dependent variable." How do you control for "predetermined values of the dependent variable" in a regression? I have no idea.

Summary: There are many subtle issues in any study like this one. How do you handle missing data? What about students who win the lottery but decide, for whatever reason, not to attend an HCZ school? The authors mention several of these issues and their approach is reasonable. Still, a lot more focus on the lottery results and a lot less of the instrumental variables would have made for a stronger paper.