


Nevertheless there are downsides to this process. In National Geographic Peter Turchin says of this work: “This is a terrific data set, but they are not testing a scientific question here….” If you read the paper, and the press coverage, you see lots of neat visualizations which are representations of the patterns extracted from the data, but to a great extent they are representations of what we already know. Very few non-quantitative scholars would be surprised that eminent individuals tend to move from rural areas to urban ones. Or that Rome and Athens were prominent magnets in 300 AD while London was in 1800 AD. There is value to be gained in formalizing this, to establish an algebra of history if you will. But this is not revolutionary; the field of cliometrics has been around for two generations. What is different is that computational methods can be brought to analyze data far more effectively. But a major temptation of this sort of cutting edge analysis is data dredging, as well as the issues that come with ascertainment bias. For example in National Geographic the first author states:
The distance that people moved over their lifetimes has also changed “very little,” the study says, over the past eight centuries. It grew from a typical distance of 133 miles (214 kilometers) in the 14th century to 237 miles (382 kilometers) today, despite the advent of automobiles and airplanes. Schich expected that the opening of the 3,000-mile (4,828-kilometer) trip to the New World after 1492 would stretch the distance much farther.
“People in the past were not so different from us,” Schich says, noting the records include accounts of Jesuit priests who traveled to China in the 17th century. “It’s very strange to think my odds of moving a long distance are similar,” he says, with a laugh.
This is certainly a result that would surprise many, but please remember that the database is a selection of notable individuals. I would bet that the change would be far greater if you had a sampling of most of the world’s population, rather than ~150,000 extremely notable ones from the last few thousand years. Immanuel Kant aside, those people in the past who became famous often did so by migrating and getting involved in the events of the world, which entailed travel. They were atypical (consider also that every single Roman Emperor was probably functionally literate in a world where this was a minority capacity [the idea that Justin was illiterate is probably a slander]).
To make the best use of the data we need to be clear about our thinking. I do think it goes beyond just asserting that we need hypotheses, though that’s part of it. Genomics is the product of the age of big(ger) data, and it has had to deal with problems of false positives being confused for real signals because old statistical thresholds became out of date. Culturomics has a lot it could learn from the experience of biologists before 2010. With all that said, there is a body of formal theory which can move in and start to operate upon the data set. Boyd and Richersen’s The Origin and Evolution of Cultures and Cavalli-Sforza and Feldman’s Cultural Transmission and Evolution are good places to start. Recently I read Alex Mesoudi’s Cultural Evolution: How Darwinian Theory Can Explain Human Culture and Synthesize the Social Sciences, which is newer, and probably aimed at an audience that is a touch less specialist. I highly recommend it for those interested in this topic (if you have an evolutionary genetics background much of it goes fast because it is review of basic theory).
Let me finish with a quote from The Genetic Basis of Evolutionary Change by Richard Lewontin:
For many years population genetics was an immensely rich and powerful theory with virtually no suitable facts on which to operate. It was like a complex and exquisite machine, designed to process a raw material that one had succeeded in mining. Occasionally some unusually clever or lucky prospecter would come upon a natural outcrop of high-grade ore, and part of the machinery would be started up to prove to its backers that it really would work. But for the most part the machine was left to the engineers, forever tinkering, forever making improvements, in anticipation of the day when it would be called upon to carry out full production….
Apply your regular expression substitution where appropriate.

Comments are closed.