Human languages show a remarkable degree of variation in the area they cover. However, the factors governing the distribution of human cultural groups such as languages are not well understood. While previous studies have examined the role of a number of environmental variables the importance of cultural factors has not been systematically addressed. Here we use a geographical information system (GIS) to integrate information about languages with environmental, ecological, and ethnographic data to test a number of hypotheses that have been proposed to explain the global distribution of languages. We show that the degree of political complexity and type of subsistence strategy exhibited by societies are important predictors of the area covered by a language. Political complexity is also strongly associated with the latitudinal gradient in language area, whereas subsistence strategy is not. We argue that a process of cultural group selection favoring more complex societies may have been important in shaping the present-day global distribution of language diversity.
Here’s the map from a figure which shows linguistic diversity, with darker areas being more diverse. If you read the authors’ paper you note that their model explains 55% of the variance in linguistic diversity. That’s the important point, qualitatively it is obvious that political complex entities (or at least those which scale) are prior to the spread of their lingua franca. The spread of Chinese, Latin and Arabic are three classic examples where we have a lot of historical data. The extant Classical sources make it clear that the Roman world was peppered with a plethora of exotic dialects, only a few of which were recorded in written form (since they were not written languages). Remember that languages like Finnish were oral “peasant tongues” until the past few centuries. The same is obviously true for Chinese, though I have read that the dialects of southeast China still exhibit traces of their pre-Chinese substratum.
Obviously the spread of languages along with political systems is no great revelation. Rather, I think it is important to note that there are likely other dynamics at work. Geneticists such as Marcus Feldman have suggested that the similarities between genetic and linguistic cladograms which Cavalli-Sforza noted decades ago probably are due to the fact that marriage markets extend only out to those who speak the same language. In other words the spread of languages like Latin and Arabic obscure over older genetic-linguistic structures, which is seen in many societies where super-languages did not supersede the local dialect. A final issue that I think needs to be brought up are the somewhat artificial lines on the map between closely related languages (e.g., Dutch-German)*, and the real chasms of unintelligibility between unrelated languages (e.g., Finnish-Swedish). I am curious as genetic maps become more fine-grained if there are particular language-related patterns to the changes in allele frequencies.
* Artificial because the codification of a standard dialect as the language, e.g., Florentine to Italian, ignores the historical continuity of dialects.