Birth Months of World Cup Players

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

Four years ago I (and others) got in a dispute with Stephen Dubner and Steven Levitt (DL) about their claim that birth month had a meaningful impact on later success in professional sports because kids born in January were “old” for their age group and were, therefore, more likely to make elite travel, national teams for U-17 tournaments and the like. These athletes benefited from the better coaching and competition that they faced relative to the kids who were born later in the same calendar year but with the same natural ability. DL predicted that

If you were to examine the birth certificates of every soccer player in next month’s [2006] World Cup tournament, you would most likely find a noteworthy quirk: elite soccer players are more likely to have been born in the earlier months of the year than in the later months.


But this turned out not to be true. Yet the worst part was that Levitt, through some peculiar post-hoc reasoning, tried to claim that it was, in fact, true, that birth month mattered.

The larger debate here goes back to nature-versus-nurture. DL (and/or their readers) and Malcolm Gladwell (and/or his readers) want to believe that stars are made not born. Spend enough on your kids coaching and he too can play in the World Cup or the NHL! The GNXP perspective would probably be that, while training obviously matters, the more incentives there are for high performance and the more open/democratic the pipeline, the more that genetics matter.

Luge stars are made but not born because so few kids have access to a luge track. Soccer stars are born but not made because millions of children around the world have extensive exposure to soccer. If they have the genetics to star, the system will find them and cultivate their talents. Without the genetics, they have no chance. The more equal the opportunities, the more that genetics matter. This is not the story that your typical American soccer parent wants to read about in the New York Times or The New Yorker.

But enough random speculation! Let’s look at the data (text file hand-collected by me from the FIFA site) for the current World Cup. I will add some code later in the comments, but here is the key chart.

If you squint, you can try to claim that players are less likely to be born later in the year than earlier. But a chi-square test of randomness gives a p-value of 16% and a binomial test of the 72 players born in January versus the expected value of about 61 (given a total player population of 736) yields a p-value of 10%. In other words, there is no/little evidence that birth month matters to your chances of playing in the World Cup.

And, to the extent that you think it does (and looking at the first three months of the year does pop up as statistically significant), the much more likely explanation (exercise left for the reader) is not age cut-offs in wealthy countries with extensive junior soccer programs but birth date fraud in poor countries seeking an advantage in international U-17 and U-20 competitions.

11 Comments

  1. Here is the code for replicating the above analysis:

    x = read.csv(“players.txt”)
    x$dob = as.Date(x$Date.of.Birth, format = “%d/%m/%Y”)
    x$birth.month = ordered(months(dob), levels = month.name)
    jpeg()
    plot(table(x$birth.month), main = “Birth Month of World Cup Players”, xlab = “Month”, ylab = “Number of Players”, axes = FALSE)
    axis(1, at = 1:12, labels=month.abb)
    axis(2, at = seq(10, 80, 10), labels = seq(10, 80, 10))
    dev.off()

    chisq.test(table(x$birth.month))

    binom.test(x = 72, n = 736, p = 1/12)

  2. Does this calculation take into account the fact that more babies are born in some months and not only because some months have several more days than others?

  3. The ‘expected value’ of 61 looks suspiciously like 736 just divided by 12!

  4. JL: No.

    DavidB: Yes. I am not bothering with the details of some months having different numbers of days, with leap years and so on. None of that arcana would make any difference to the larger point. (Figuring out precisely the correct way to handle leap years is an interesting statistical question . . .

  5. the much more likely explanation … birth date fraud in poor countries seeking an advantage in international U-17 and U-20 competitions

    Am I missing something? Wouldn’t that work in the opposite direction, with kids born in the early months being made a few months younger?

  6. A kid born in December would have his birthday moved to January (a month later) so that he was eligible for the U-17 tournament, or whatever. You make older players appear younger by moving their birthday later in time, especially from one side of the cut-off to the other.

  7. Perhaps I betray some ignorance about soccer leagues in respective countries, but in Canada the January cutoff is used. What cutoffs are used in other countries? If there are different cutoffs in different systems, you’ve made an ecological fallacy by grouping them all into the same category.

    The binomial seems an appropriate way to tackle the problem. P value in test for trend of 16%? You mean P value of 0.16, presumably. Logistic regression could also tell you the significance of birth month controlling for cutoff date and fraction of the total population born each month, but presumably such data would be hard to come by.

  8. As Ryan says there are different cut off points in different countries, in the UK for instance it would be September. There was a study of English players done a few years ago that showed they were far more likely to be born in September than in July, although I can’t it find right now. Although this makes reference to the same trend:

    “Among the 25 most capped England football players, 11 were born between September and November, while only one, Frank Lampard, was born between June and August.”

    http://tinyurl.com/32m5bh6

    Which bears out Levitt’s idea imho.

  9. The main point of this post is that Levitt is wrong about the birth month distribution among World Cup players. Once we accept that, we can explore the reasons why he is wrong.

    One possible reason is that different countries have different cut off dates. Perhaps. Even within a single country, multiple cut off dates may be used because different leagues have different rules. (This is certainly true in the US.)

    But, for me, the more parsimonious explanation is that birth month does not matter. Until anyone provides any evidence that it does, for adult professionals, not 13 year olds, then I will stick with Occam.

  10. What if you were to run a simple Pearson coefficient? In this case there is a -0.57 correlation the number of players born a certain month and the numerical value of the month. For hockey, the correlation is -0.86. That is, by this measure Levitt’s conclusion appears to hold.

    Of course, I could be calculating things wrong or there may be a reason why looking at it this way isn’t appropriate.

Leave a Reply

a