Monday, May 11, 2009
Via Steve Sailer and Half Sigma, we have this New York Times op-ed by David Brooks on work (pdf) by Will Dobbie and Roland Fryer on the Harlem Children's Zone (HCZ). Brooks writes:
No one else seems to have linked to (read?) the study itself. Here are the key graphics: ![]() Extremely impressive, if true. Note, however, that there is no way (that I could find) to tell from the paper just how many observations make up the blue and red dots for 8th grade mean math scores in Fig 3A. Key paragraph:
Using a lottery as a method of randomly assigning students to treatment and control groups is far-and-away the best method for estimating causal effects. Their second statistical strategy, instrumental variables, is much less reliable. If the authors were merely reporting some regression-based estimates, few would take the results that seriously. Teasing out causal effects from a regression is very hard. That the authors do not use a propensity score approach (at least as a check against their estimates) makes me doubt their statistical chops. Anyway, the lottery aspect is key. To their credit, the authors are upfront in admitting that:
I think that the first year of operation refers to 2005, so the number of observations in the 8th grade loser category might be very low. Still, the authors report that "The effect of receiving a winning lottery number is generally larger for students in the 2006 cohort, though we only observe sixth and seventh grade scores for these students and so decided not to show it in our figures." So, I expect that the 8th grade numbers reported here are not a fluke. If you really started with 1,000 5th graders, randomly assigned 500 to HCZ and 500 to their local (lousy) public schools and then saw these huge differences in math scores, you would have discovered just about the biggest causal effect in the history of education research. Have Dobbie and Fryer made that discovery? I don't know. Their write-up and tables make it very hard to understand what is going on. What is the mean difference (without any "adjustments") in 8th grade math scores between students who won the lottery and those who did not? It would certainly be useful if someone were to replicate these results. The notes to Table 2 report that "Each regression controls for the gender, race, lunch status, and predetermined values of the dependent variable." How do you control for "predetermined values of the dependent variable" in a regression? I have no idea. Summary: There are many subtle issues in any study like this one. How do you handle missing data? What about students who win the lottery but decide, for whatever reason, not to attend an HCZ school? The authors mention several of these issues and their approach is reasonable. Still, a lot more focus on the lottery results and a lot less of the instrumental variables would have made for a stronger paper. |