Matt Yglesias on the enthusiasm for data mining in economics:
Betsey Stevenson and Justin Wolfers hail the way increases in computing power are opening vast new horizons of empirical economics.
I have no doubt that this is, on the whole, change for the better. But I do worry sometimes that social sciences are becoming an arena in which number crunching sometimes trumps sound analysis. Given a nice big dataset and a good computer, you can come up with any number of correlations that hold up at a 95 percent confidence interval, about 1 in 20 of which will be completely spurious. But those spurious ones might be the most interesting findings in the batch, so you end up publishing them!
Those in genomics won’t be surprised at this caution. I think in some ways social psychology and areas of medicine suffered a related problem, where a massive number of studies were “mined” for confirming results. And we see this more informally all the time. In domains where I’m rather familiar with the literature and distribution of ideas it is often easy to infer exactly which Google query the individual entered to fetch back the result they wanted. More worryingly I’ve noticed the same trend whenever people find the historian or economist who is willing to buttress their own perspective. Sometimes I know enough to see exactly how the scholars are shading their responses to satisfy their audience.
With great possibilities comes great peril. I think the era of big data is an improvement on abstruse debates about theory which can’t ultimately be resolved. But you can do a great deal of harm as well as good.
Comments are closed.