
First, you have trees. These are pretty popular from macroevolutionary relationships, but on the population genetic scale (intraspecific, microevolutionary) you’re mostly talking about representing distances between groups in a tree format. You saw this in History and Geography of Genes, where genetic distances in the form of Fst values (proportion of genetic variation unique to between two groups) were used as distance inputs.
A problem with trees is that they don’t model gene flow, a major dynamic on a microevolutionary scale. Also, complex relationships can get elided in tree frameworks, and as you add more and more populations you often end up with an incomprehensible fan-like topology.

The problem with PCA is that different types of dynamics can lead to the same result. For example, someone who is an F1 of two distinct groups occupies the same position as a population which happens to occupy a genetic position between two groups. Additionally, by constraining the variation into two dimensions, one can mislead in terms of relationships. There are many dimensions, but operationally you focus on on two at a time.
A paper of interest, Population Structure and Eigenanalysis.
Next you have model-based clustering introduced in Jonathan Pritchard’s Inference of Population Structure Using Multilocus Genotype Data. There are many flavors of this, but they operate under the same framework. You have a model of population dynamics, and see how the genotype data can be explained by parameters of the model. Of particular interest is assignment to one of K populations, which can be combined to explain the variation in the data.
Unlike PCA these model-based methods are rather good at identifying people who are first generation mixes, as opposed to those from stabilized groups along a cline. But, they also produce artifacts, because they are quite sensitive to the input data, and lend themselves to cherry-picking.

All visualizations are deformations of reality. TreeMix attempts to mitigate this somewhat by introducing another representation, that of migration.

Related to, but not exactly the same, as local ancestry methods are haplotype based methods. In particular, I’m thinking of the FineStructure and its related methods. These leverage variation across the genome in terms of haplotypes, rather than just looking at genotypes. They also tend to benefit from phasing, for obvious methods. FineStructure and its relatives tend to need more marker density than model-based methods, which require more marker density than PCA, which requires more marker density that tree based methods. These haplotype based methods allow for correction of and accounting for forces such as genetic drift, which tend to skew results in other methods.
Finally, there is the AdmixTools framework which is good for testing very explicit hypotheses. While many of the above methods, such as TreeMix and unsupervised model-based clustering, explore an almost open-ended space of structure possibilities, the methods in AdmixTools exists in large part to test narrow delimited models. This goes to the fact that many of these methods are complementary, and you should use them together to arrive at a robust result. For example, if you are assigning populations for TreeMix, you should use PCA and model-based clustering to make sure that the populations are clear and distinct, and outliers are removed.
There’s a lot I left out, but many of the other methods are just twists on the ones above.


Comments are closed.