Clarification on “roots”

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

Jim Bender links to my post where I express some frustration with tree based thinking in terms of human ancestry. He says, “I take that to mean that there is a great deal of mixing between the so-called races over time” [my emphasis]. Terms like “great deal” are insufficiently precise. Fst 0.2 is often used as a boundary number between low and moderate levels of population substructure, but it is somewhat arbitrary (1 migrant per generations yields 0.2). My overall point was that beyond a few generations in the past our intuitionally grounded concepts start to deviate from how gene genealogies work. 50 generations into the past I have an x number of distinct ancestors, but a y number of these ancestors show up many times at the tips of branches. In talking about populations the easiest citations are mtDNA and NRY, the male and female lineages…which causes problems in adding empirical support to overall assertions derived from an understanding of the details of how the tree plays out as you progress back in time, because mtDNA and NRY isn’t always a good match (concordance) with the rest of the genome.


  1. “50 generations into the past I have an x number of distinct ancestors, but a y number of these ancestors show up many times at the tips of branches” 
    I suspect that many people would have suprisingly few X’s (uniques) making up most of the branches.50 generations is a lot but at ten to twenty generations we may find a villager somewhere whose entire ancestory is from that – possibly the entire population of the fecund part of the village at the time. (Or It could well be that the ancestory is a smaller part of the village, class dependent.) 
    Of course there is the ancestor paradox, which is not really a paradox at all.There are fewer ancestors than slots..

  2. eoin, the most conservative estimates i’ve seen of the LUCA (last universal common ancestor) of all humans is 5,000 years ago, or about 200 generations. the more near term estimates give about 40. of course, you make assumptions, yada, yada, yada, but yeah, 50 was a LONG time for me to spit out.

  3. I know some branches of my own family tree going back 8 generations. In my case, there are no known ancestors appearing multiple times, but at 8 generations it’s starting to become statistically likely that there should be some: 
    N generations back there are 2**N slots for ancestors. 
    Before the industrial revolution, people tended to marry people from there own or the next village. So the pool of possible ancestors should be growing like N**2 – possibly less, because population density may be lower at earlier times. 
    N**2/2**N becomes small very rapidly! 
    A slightly more accurate model might take into account that there is some (low) proportion of marriages across large distances – so it’s a “small world” metric. 
    I suspect that it’s not realistic to take the period English Civil War to Industrial Revolution (exclusive of end-points) as being statistically typical if you want to model very long time periods. (And this is the period for which “partner from same or next village” applies). So a more refined model might take into account a low frequency of events that cause major population movement.

  4. Razib mentions F_st and other measures of within-population variability to between-population variability. 
    Now that we’ve got hapmap, is it possible to calculate these for some human populations? 
    - This would only tell us about the regions sampled for hapmap, not huamns in general 
    - Hapmap’s sampling methodology might lead us to underestimate the within-population variability. People from the Tokyo metropolitan area whose grandparents are all from Japan might be less gentically diverse than all people in that area. (And for medical applications, the population you care about is everyone who seeks medical treatment).

  5. is it possible to calculate these for some human populations? 
    well, F_st has been around for a while on the level of genes. that’s where the famous “85% of variation is within races vs. 15% between races” comes from.