10 Things About Roman History You Should Know

Since Since the earlier “10 Things” was quite popular, I thought I’d try my hand at another one on a topic I know rather well. This involves Roman history. Unfortunately, history is a less clear and distinct topic than evolutionary biology, so there may be some disagreement with the assertions below.

But here we go….

1) Constantine did not make Christianity the official religion of the Roman Empire. The Roman Empire did not have an established religion, at that point, in any way we could understand today. Rather, there were customary subsidies given to traditional cults, and favor shown to particular religions by particular emperors. The subsidies from the state coffers to pagan cults were cut off more than two generations after Constantine.

2) By the late Republic most of the “noble” families of Roman society were originally plebeian, rather than patrician, in origin. They were defined by their wealth, power, and achievements, as opposed to their blood. There were still powerful patrician lineages, such as the Julii and Claudii, by they no longer held a monopoly on the public square (Julius Caesar may have been from an old patrician line, but his mother was a Cotta, who were plebeians).

3) Most of the emperors who were “not Roman,” were thoroughly Roman. Septimius Severus, the “African emperor,” born in Libya, did come from a paternal lineage of Punic (so Phoenician) origin. But his mother descended from Italian colonists in North Africa. He was culturally a man of the Latin West.

4) At the elite level Roman culture was to some extent dual-culture, with many Latin elites cultivating aspects of Greek culture and learning. But Western (Latin) and Eastern (which usually been Greek or Hellenized non-Greek) societies remained sharply differentiated in many ways. The first emperor who may have spoken Greek as his first language, Anastasius, reigned at the end of the 5th century. Greeks dominated philosophy, while Latins dominated rhetoric.

5) Though Latin political control collapsed in Italy in 476, the cultural and economic destruction of the Italian peninsula occurred during the East Roman reconquista of the 6th century.

6) The forms of Republican Rome persisted for centuries during the imperial period. The transformation of Roman Emperors into purely naked autocrats did not occur until after the chaos of the middle 3rd century.

7) Speaking of which, the Roman system almost collapsed during the “Crisis of the Third Century”.

8) The early “bad emperors,” such as Nero or Caligula, often caused problems for the Roman elites. But the overall institutional system persisted and was minimally impacted. In contrast, Julius Caesar would almost certainly be judged to have committed genocide in Gaul were he judged by modern standards.

9) Most of the expenditure of the Roman state went to the military.

10) Romans arguably invented Western bureaucracy. Though the Roman state in was incredibly understaffed by modern standards, one consequence of the Western Empire’s fall was the collapse of tax collection in specie as opposed to kind or service.

When the gods come crashing down

Sometimes the old gods slowly fade into oblivion. Contrary to popular perception this seems likely the case for ancient paganism. The conversion of Constantine to the Christian religion began the process of a hand-off and the commanding heights of classical culture that took over a century to complete. There were punctuating moments, such as the apostasy of Julian in the 360s, or the mostly symbolic ban on public paganism by Theodosius in the 390s (the Serapeum was destroyed by a vigilante mob). But pagans in the form of the Neoplatonic school persisted into the 6th century, while elite pagans such as Marcellinus maintained power and influence deep into the second half of the 5th century.

Call this “normal” cultural evolution. Antiquity evolved from being predominantly pagan to predominantly Christian (though a small cultured pagan minority persisted even until the Islamic conquest in the Near East, such as the Sabians of Haran).

The Reformation period was different. In a single generation one thousand years of a coherent and unified Western Christian ideology collapsed, and was replaced by something very different.

Note here that I said Western Christian ideology. The reality is that Western Christianity was never as unified or coherent as Western Christians themselves envisaged themselves to be (or aspired to be). There were episodes of hostility between particular kingdoms and the Roman papacy. Heresies such as that of the Cathars, and popular revolts with a religious tinge such as that of the Hussites. And finally, there were periods of multiple popes, which undermined the credibility of the institution of the Church in the medieval period.

But all this pales next to the magnitude and scope of the revolt against the establishment of the Western Christian church that occurred in the 1520s. Martin Luther went from being a Christian cleric within the established Church to declaring the pope the anti-Christian! Previously devout peasants in Switzerland turned on the relics and churches which they had only recently venerated, and engaged in mob iconoclasm. Whereas monarchs, such as Henry IV, ultimately compromised with the clerical estate (or, submitted), Henry VIII of England managed to destroy or subordinate the institutions of the church to his own will and pleasure.

There are many theories for why the Reformation occurred when it did. Some of them are rooted in technology, in particular the printing press. Others point to the development of proto-national identities, such as the rise of German nationalism and its leveraging by Luther against his “Roman” persecutors.

These specific issues are not interesting to me. Rather, what they point out to us that there can be cultural revolutions that occur very rapidly. One can point to the pacific post-World War II Japanese, and contrast them with the militaristic Japanese of the first half of the 20th century. Or the shift of Russia from being a conservative autocracy in the 1910s to a revolutionary society in the 1920s. But these are modern events, and moderns are liable to suggest that our own epoch is sui generis in these sorts of turnovers of values. But the Reformation shows that revolutionary changes in whole societies can occur rather rapidly even in a pre-modern context.

In other words, cultural revolution is not a derived characteristic of our species, but perhaps a very old one. The rapid expansion of the Austronesians. Or the radiation of non-African humanity. These come out of a vacuum, a cultural-demographic analog to the inflationary universe. But given enough time perhaps our species is simply subject to these sorts of explosions of creative change and innovation.

List of top 10 evolutionary biologists in history

What is your list of the top 10 evolutionary biologists in history? I’m asking because this came up in a discussion with a friend. Obviously the composition of the list will have to do with disciplinary bias and geography and history (there are Russian population geneticists from the 20th century who should be more famous who aren’t).

Here are my top 10 (with two minutes thought given):

1. Charles Darwin & Alfred Russell Wallace (I’m combining these two)
2. R. A. Fisher
3. Sewall Wright
4. J. B. S. Haldane
5. W. D. Hamilton
6. G. G. Simpson
7. John Maynard Smith
8. August Weismann
9. Motoo Kimura
10. Theodosius Dobzhansky

What’s your list? (in the comments)

How Indians are a lot like Latin Americans


Pretty much any person of Indian subcontinental origin in the United States of a certain who isn’t very dark skinned has probably had the experience of being spoken to in Spanish at some point. When I was younger growing up in Oregon I had the experience multiple times of Spanish speakers, probably Mexican, pleading with me to interpret for them because there was no one else who seemed likely. It isn’t a genius insight to conclude I was most likely South Asian…but it wasn’t out of the question I was Mexican. This applies even more to lighter skinned South Asians. In the Central Valley of California, where there are many Sikhs from Punjabi and Mexicans, this confusion occurred a lot for some Indian kids.

Of course biogeographically there isn’t that much connection between South Asia and the New World. But it isn’t crazy that Christopher Columbus labelled the peoples of the New World “Indian.” After all, they were a brown-skinned people whose features were not African, East Asian, or West Eurasian. And, it turns out genetically there is a coincidence that connects the New World and South Asia: the mixed peoples of Latin America with Amerindian and European ancestry recapitulate an admixture which resembles what occurred in South Asia thousands of years ago. It looks as if about half the ancestry of South Asians is West Eurasian and half something more like eastern Eurasians.

On principles component analysis that means that South Asian and Mexican and Peruvian samples often overlap. This is somewhat curious because the non-West Eurasian ancestors of South Asians and Amerindians diverged in ancestry on the order of 25 to 45 thousand years before the present. And the Iberian ancestry of the mixed people of the New World is almost as far from the character of South Asian West Eurasian ancestry as you can get (in the parlance of this blog, lots of EEF, less CHG, not too much ANE).

A new paper, A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals, highlights another similarity: massive bias in biogeographic ancestry by sex. More precisely, the rank order of West Eurasian ancestry in South Asia is skewed like so: Y chromosome > whole-genome > mtDNA (as is evident in the above figure).

I actually began writing about this in the late 2000s, when the fact that South Asian mtDNA was very different from West Eurasian mtDNA, and South Asian Y chromosome was mostly West Eurasian, was obvious. Then work using genome-wide data sets began to point to massive intra-Eurasian admixture between very diverged lineages. The paper is not revolutionary, but worth reading for its thoroughness and how it brings together all the lines of evidence.

Finally, no ancient DNA. That’s probably for the future, but I don’t expect any surprises.

Citation: A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals.

It doesn’t get better, blogging vs. YouTube and Twitter

Many of you know I use Twitter. It’s replaced a lot of the “link posts” I might have done in the early 2000s or so. Some have argued that Twitter cannibalized a lot of blogging, and that seems true. And that hasn’t always been for the good…there are some arguments and discussions which don’t work well on Twitter. There have been many Twitter misunderstandings which simply wouldn’t have happened in the blogging format, because of the artificiality of Twitter strips context.

Until recently I didn’t much pay attention to YouTube except for movie trailers and Games of Thrones stuff. Oh, and How it Should Have Ended. Any other YouTube I probably just found via a share on Facebook and Twitter.

But of late I have been watching some YouTube channels. I was prompted partly by the fact that after the hit piece came out on me about my incredible influence on the alt-right someone emailed me to explain that in fact the most influential people on the alt-right were on YouTube, where they spread interpretations of genetics congenial to their racialist worldview. Honestly I didn’t watch these channels for very long because:

Fathering the next generation of white non-whites

1) I don’t need genetics lectures.

2) I don’t need primers on Western history.

3) I am not concerned about white genocide, I am white genocide.

Rather, I found a channel called The Rubin Report, which had come recommended to me by my friend Sarah Haider. I agreed with the host, Dave Rubin, on most issues, and often disagreed with his guests. It made for reasonably compelling listening (I rarely watch really, but treat this stuff like a podcast). He also introduced me to a lot of different vloggers. Among the people he interviewed was someone called Roaming Millennial, an early 20s Eurasian Canadian woman with broadly center-right/classical liberal views.

I don’t mean to spotlight her, but her channel illustrates three facts:

1) Relatively short, pithy, commentaries.

2) A huge number of views.

3) Many of these vloggers are “TV-friendly” in their appearance.

Comparing traffic can be hard across years and platforms, so I’ll focus on the first and last issue. When it comes to the early generation of bloggers there are plenty who became famous on pithy quick links. But there were also long-form essayists and commenters. To give one example, Cosma Shalizi’s posts on IQ were extensively linked for many years because of their thoroughness and depth (obviously few people read everything or understood much, but the posts were there, and many at least skimmed a fair amount).

These sorts of discursive commentaries are not really possible on YouTube. From what I can tell when vloggers allow themselves to go more than 20 minutes on a single topic they start to ramble, repeat themselves, and get boring. You can’t engage in extemporaneous speaking for too long and sound like you have your shit together. The data density of blogging is potentially much higher than vlogging.

The third issue…. Many bloggers had a face for radio, and a voice for silent film. The extremely popular liberal blogger Steve Gilliard was morbidly obese, and died of illnesses related to his weight issues. But his appearance was not a big deal when he began blogging. Many of the early bloggers concealed many details of their private life, let alone their image. Similarly, a lion of the warblogging cohort, Steve Den Beste, looked to be out of central casting for “middle aged software/anime nerd.” But Den Beste became hugely influential before his retirement from blogging, which was partly triggered by health issues.

Obviously things aren’t that different. There was television in the 2000s. And many webforums existed which had a Twitter-like feel. But they are different nevertheless. Someone like Roaming Millennial could have made it on TV, but there are only so many spots for non-blondes at Fox, and in any case she speaks at a higher level of analysis than what you see in talking heads. There are many more of these vloggers than there would ever have been slots on television. This is a whole new information universe, and it’s different.

A the end of the day it makes me appreciate text, and blogging. There are newer technologies, but they aren’t better.

Adaptation is ancient: the story of Duffy

Anyone with a passing familiar with human population genetics will know of the Duffy system, and the fact that there is a huge difference between Sub-Saharan Africans and other populations on this locus. Specifically, the classical Duffy allele exhibits a nearly disjoint distribution from Africa to non-Africa. It was naturally one of the illustrations in The Genetics of Human Populations, a classic textbook from the 1960s.

Today we know a lot more about human variation. On most alleles we don’t see such sharp distinctions. Almost certainly the detection of these very differentiated alleles early on in human genetics was partly a function of selection bias. The methods, techniques, and samples, were underpowered and limited, so only the largest differences would be visible. Today we often use single base pair variations, single nucleotide polymorphisms, and the frequency differences are much more modest on average. Ergo, the reality that only a minority of genetic variation is partitioned across geographic races.

Why is Duffy different? Obviously it could be random. Assuming you have a polymorphism, you’ll get a range of frequencies across populations, and in some cases those frequencies which map onto different geographic zones just by chance. Imagine constant mutation, and high structured bottlenecks. You could get a sequence of derived mutations fixing in populations one after the other, just by chance.

This is probably not the case with Duffy. I’ll quote from Wikipedia:

The Duffy antigen is located on the surface of red blood cells, and is named after the patient in which it was discovered. The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The protein is also the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system.

Malaria is one of the strongest selection pressures known to humanity. The balancing selection which results in sickle-cell disease is well known even among the general public. But the likely selection pressures due to the vivax variety are well commonly talked about, partly because they don’t as a side-effect induce a serious disease. Duffy may be canonical if you are a human population geneticist, but it is of less interest more generally.

But a recent paper in PLOS GENETICS shows just how dynamic the evolutionary genetic past of our species was, through the lens of the Duffy system, Population genetic analysis of the DARC locus (Duffy) reveals adaptation from standing variation associated with malaria resistance in humans. Here’s the author summary:

Infectious diseases have undoubtedly played an important role in ancient and modern human history. Yet, there are relatively few regions of the genome involved in resistance to pathogens that show a strong selection signal in current genome-wide searches for this kind of signal. We revisit the evolutionary history of a gene associated with resistance to the most common malaria-causing parasite, Plasmodium vivax, and show that it is one of regions of the human genome that has been under strongest selective pressure in our evolutionary history (selection coefficient: 4.3%). Our results are consistent with a complex evolutionary history of the locus involving selection on a mutation that was at a very low frequency in the ancestral African population (standing variation) and subsequent differentiation between European, Asian and African populations.

Why is it that regions of the genome subject to selection due to co-evolution with pathogens are hard to detect in relation to selection? My response would be that it’s because selection and adaptation are always happening in these regions, constantly erasing its footprints in these regions of the genome.

You may be familiar with the fact that the major histocompatibility complex (MHC) are some of the most diverse regions of the genome. That’s because negative frequency dependent selection makes it so that rare variants never go extinct, as the rarer they get the more favored they are.

Many classical and modern techniques of selection require less protean dynamics when it comes to the model which they attempt to detect. Basically, many of the standard selection detection methods are looking for a simple perturbation in the pattern of variation that’s expected. A strong powerful recent sweep on a single mutation is like the spherical cow of evolutionary genetics. It happens. And it’s easy to model and detect. But it may not be nearly as important as our ability to detect these “hard sweeps” may suggest to us.

In contrast, if selection targets a larger number of independent mutations, then you get a “soft sweep,” which is harder to detect, because it is no singular event. Complexity is the enemy of detection. As a thought experiment, if you selected for height within a population you may catch some large effect alleles that would leave strong signals, but most of the dynamic would leave a polygenic footprint, distributed across innumerable genes.

The Duffy locus is somewhat in the middle. The authors distinguish between selection on standing variation (the allele frequency is higher than a single new mutation within the population) and a soft sweep, where multiple variants against different haplotypes are subject to selection. Their models and results strongly support selection on standing variation for the FY*O variant, and perhaps selection for the FY*A variant.

These selection events were very old, and very strong. Selection coefficients on the order of 4% are hard to believe in a natural environment. Curiously the coalescence times for the haplotypes some of these alleles indicate that selection was contemporaneous with the emergence of modern humans out of Africa, about ~50,000 years ago. From their sequence data analysis the different alleles have been segregating for a long time in the collective human population, and powerful sweeps fixed FY*O in both the ancestors of the Bantu and Pygmies before they diverged from each other. In contrast the Khoisan samples suggest that FY*O introgressed into their population from newcomers, while variants of FY*A are ancestral.

The big picture here is that selection is ancient, that it is powerful, and it was a dynamic even before our species diversified into various lineages.

If you read the paper, and you should, it’s pretty clear that a lot of the adaptive story was suspected. It’s just with modern genomics and fancy ABC methods you can put point estimates and intervals on these hunches. But another issue, as they note in the piece, is that we have a better grasp of African population structure today than in the past, and this allows for better framing.

But it is here I have some caution to throw. At one point citing a 2012 paper the authors suggest “The KhoeSan peoples are a highly diverse set of southern African populations that diverged from all other populations approximately 100 kya.” I can tell you that some credible researchers who have access to whole genome sequences and have been looking at this question peg the divergence date closer to 200,000 years. Some of the issue here is that you need to decompose later gene flow, which will reduce the distance between populations. Easier said than done.

The genetic prehistory of the African continent is almost certainly much more complex than what is presented in the paper, largely due to lack of ancient DNA within Africa. Northern Eurasia turned out to be far more complex than had earlier been guessed…and it is likely that Northern Eurasia has had a simpler history because of its much shorter time of habitation.

If I had to guess I suspect that the ancestors of the Khoisan as we understand them were a separate and distinct group who diverged between ~100,000 and ~200,000 years ago from other extant African populations. But I suspect our clarity is very low in relation the sort of structure which eventually resulted in the shake-out of only a few large groups of Sub-Saharan Africans aside from the Khoisan.

Citation: Population genetic analysis of the DARC locus (Duffy) reveals adaptation from standing variation associated with malaria resistance in humans.

1967 imagines the home and life of 1999

We’re almost two decades past 1999, but some of the things imagined in this conception of the future in the 1960s for the turn of the century are only now just coming true (e.g., electronic medical records, home health monitoring). I was surprised how well they anticipated a lot of the function of information technology, though of course as it was the 1960s there are a lot of turning of knobs.

Second, they didn’t anticipate how traditional humans can be about certain things. It turns out that instant meals have always remained a niche, rather than taking over the whole sector. Additionally, just because people could engage in ‘e-learning’ by the mid-1990s with the internet, didn’t mean that schools and their social aspect didn’t remain important. And we’ve been able to do video-conferencing for a while, but most of us prefer to take calls in the old-fashioned way, when we take calls at all (with email and various messaging services cannibalizing a lot of the function of the telephone).

Finally, they were totally unrealistic about the nature of transportation for middle class families. Yes, people travel, depending on your socioeconomic status. But an impulsive jaunt to Mexico City to go play golf just doesn’t happen.

Open Thread, 03/26/2017


Lots of tweaks and changes on regards to the blog platform recently. As they say in the start-up world we’re “iterating.” The content/substance is going to remain pretty much the same, but over time I’ll be trying to figure out different ways to deliver.

This might cause some minor issues in terms of continuity (I do have the full archives from Unz and earlier, so I’ll load them up once I’m confident we aren’t going to change platforms for a while). I did some fiddling with the permlink URLs, so if you shared anything on Facebook, I’d appreciate if you reshared again.

No matter the details, the old Gene Expression website will point to where you need to go, gnxp.com, but you can also keep track of me through razib.com as well. Also, Twitter and my permanent feed (this feed always hooks into wherever my blog is, so it’s the one you want).

Finally, I also have set up a newsletter with MailChimp. The primary reason is really that I’m worried that some day Twitter will disappear and I figured it is important to have another way to contact people who follow me. I have only sent out one notification, and the next one will probably be when I’m more settled in terms of platform tweaks.

Mostly done with Reformations: The Early Modern World, 1450-1650. I’m a big fan of Diarmaid MacCulloch’s The Reformation, and this is a somewhat different book. Reformations focuses more on intellectual history and theological details, while MacCulloch’s magisterial survey hits political, social, cultural, and theological angles in equal measures. If I had to pick the order in which to read it would definitely be The Reformation first, but Reformations is a good compliment.

It’s annoying to me that journalists are pretty ignorant. I understand that that’s the deal when you are a generalist and get assigned to a diverse array of topics. But the public takes journalists seriously, so the fact that so many are so bad at what they do is frustrating. At this point I assume I’m being misled in a lot of areas where I don’t have domain knowledge.

I have a little knowledge about what happened in East Pakistan in the period of the late 1960s and early 1970s. The writer above probably doesn’t have domain knowledge. So they fit the Pakistan military’s killings in the framework of intra-Muslim conflict. Obviously there is something to this. But it is critical, in my opinion, to note that the ruling elites of West Pakistan viewed East Pakistanis as racially and culturally inferior, and that the large population of Hindus who remained in East Pakistan after partition bore a disproportionate brunt of the genocide. Foregrounding attacks on Muslims by this journalist arguably “erases” and misleads many of the readers of this piece, though I assume this is inadvertent.

On many topics my knowledge comes through “book-learning.” The conflict around 1970, and the cultural context beforehand, I know through oral history. For example, older Muslim Bengalis, such as my maternal grandfather, remained pro-Pakistan, in large part because their formative years were during the British Raj, and they retained strong memories of their religious marginalization during the time when the Hindu upper classes dominated Bengal. He was born in 1896, and recalled being the only Bengali Muslim doctor in many areas.

My parents, growing up after partition, had different memories. From what they have told me if you were a Muslim Bengali it certainly wasn’t similar to the experience of blacks in the American South, but there were events that occurred which made it clear who was on top. In Bangladesh after partition there was a community of people who migrated from India termed “Biharis” (many, but not all, were from Bihar province to the west of Bengal). As Urdu-speakers they identified more strongly with West Pakistan, and perceived themselves to be superior to the native population.

After independence they have been the subject of persecution in Bangladesh. Obviously this is bad, and my family does not have any animus toward Biharis. Many of them have assimilated and become Bengali. As most are Sunni Muslims and don’t look that different from the range of physical types among Bengalis it is not that difficult. Some of my cousins for example have a Bihari grandmother, a fact I only became aware of because despite having perfect Bengali there are some words she uses which point to an Urdu-speaking background.

But, my mother does admit during the 1960s she was witness to incidents where Biharis in Bengal behaved as if they were better and had more rights. One case which will have resonance with American readers: a Bihari man got on a bus and began shouting in Urdu for someone to get off because there were no seats left on the bus. Since the bus driver did not know Urdu someone had to be found to interpret for him, at which point a poor soul at the front of the bus was ejected and room was made for the Bihari man.

The killings of hundreds of thousands to millions of Bengalis was a bad thing. But the root causes and historical context shouldn’t be misrepresented.

RNA viruses drove adaptive introgressions between Neanderthals and modern humans. Here’s the important sentence: ” Our results imply that many introgressions between Neanderthals and modern humans were adaptive.”

I got a review copy of The Neuroscience of Intelligence. We’ll see when I get to it.

So some people are still asking me about the hit piece. I think I can tell you it was mostly written before the guy ever talked to me. Second, I’m to understand the editor of Undark is a serious person by journalist friends, but there is one link in there where the implication made does not follow at all from the content at the link (I rather argue the opposite of what was implied from the title).

I’m pretty sure that the journalist and the editor assumed most people would not read it (I can check the Google Analytics, very few people clicked through). If that isn’t true, they’re incompetent. Basically, it’s been a little sad because I am now concluding that the media is fine with just lying about people by implication without even the barest pretense. Meanwhile, someone like Michael Oman-Reagan is more mainstream in science than I am.

Honestly I’ve given up on the future of classical liberalism in the West. Most people are cowards and liars when push comes to shove. I don’t want to speak of this at length, as it’s a bit like a God-is-dead moment for me, but I thought I’d come clean and be frank. The Critical Theorists are right, power trumps truth. I’m not sure they’ll enjoy what’s to come in the future when objectivity is dethroned, but I think I will probably laugh as the liars scramble to lie different lies, because that is almost certain to happen.

So I have another son. He’s healthy. That’s all you can ask for. I still think now and then about the cat who died in January though.

Ancestry inference won’t tell you things you don’t care about (but could)

The figure above is from Noah Rosenberg’s relatively famous paper, Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. The context of the publication is that it was one of the first prominent attempts to use genome-wide data on a various of human populations (specifically, from the HGDP data set) and attempt model-based clustering. There are many details of the model, but the one that will jump out at you here is that the parameter defines the number of putative ancestral populations you are hypothesizing. Individuals then shake out as proportions of each element, K. Remember, this is a model in a computer, and you select the parameters and the data. The output is not “wrong,” it’s just the output based how you set up the program and the data you input yourself.

These sorts of computational frameworks are innocent, and may give strange results if you want to engage in mischief. For example, let’s say that you put in 200 individuals, of whom 95 are Chinese, 95 are Swedish, and 10 are Nigerian. From a variety of disciplines we know to a good approximation that non-Africans form a monophyletic clade in relation to Africans (to a first approximation). In plain English, all non-Africans descend from a group of people who diverged from Africans more than 50,000 years ago. That means if you imagine two populations, the first division should be between Africans and non-Africans, to reflect this historical demography. But if you skew the sample size, as the program looks for the maximal amount of variation in the data set it may decide that dividing between Chinese and Swedes as the two ancestral populations is the most likely model given the data.

This is not wrong as such. As the number of Africans in the data converges on zero, obviously the dividing line is between Swedes and Chinese. If you overload particular populations within the data, you may marginalize the variation you’re trying to explore, and the history you’re trying to uncover.

I’ve written all of this before. But I’m writing this in context of the earlier post, Ancestry Inference Is Precise And Accurate(Ish). In that post I showed that consumers drive genomics firms to provide results where the grain of resolution and inference varies a lot as a function of space. That is, there is a demand that Northern Europe be divided very finely, while vast swaths of non-European continents are combined into one broad cluster.

Less than 5% Ancient North Eurasian

Another aspect though is time. These model-based admixture frameworks can implicitly traverse time as one ascends up and down the number of K‘s. It is always important to explain to people that the number of K‘s may not correspond to real populations which all existed at the same time. Rather, they’re just explanatory instruments which illustrate phylogenetic distance between individuals. In a well-balanced data set for humans K = 2 usually separates Africans from non-Africans, and K = 3 then separates West Eurasians from other populations. Going across K‘s it is easy to imagine that is traversing successive bifurcations.

A racially mixed man, 15% ANE, 30% CHG, 25% WHG, 30% EEF

But today we know that’s more complicated than that. Three years ago Pickrell et al. published Toward a new history and geography of human genes informed by ancient DNA, where they report the result that more powerful methods and data imply most human populations are relatively recent admixtures between extremely diverged lineages. What this means is that the origin of groups like Europeans and South Asians is very much like the origin of the mixed populations of the New World. Since then this insight has become only more powerful, as ancient DNA has shed light as massive population turnovers over the last 5,000 to 10,000 years.

These are to some extent revolutionary ideas, not well known even among the science press (which is too busy doing real journalism, i.e. the art of insinuation rather than illumination). As I indicated earlier direct-to-consumer genomics use national identities in their cluster labels because these are comprehensible to people. Similarly, they can’t very well tell Northern Europeans that they are an outcome of a successive series of admixtures between diverged lineages from the late Pleistocene down to the Bronze Age. Though Northern Europeans, like South Asians, Middle Easterners, Amerindians, and likely Sub-Saharan Africans and East Asians, are complex mixes between disparate branches of humanity, today we view them as indivisible units of understanding, to make sense of the patters we see around us.

Personal genomics firms therefore give results which allow for historically comprehensible results. As a trivial example, the genomic data makes it rather clear that Ashkenazi Jews emerged in the last few thousand years via a process of admixture between antique Near Eastern Jews, and the peoples of Western Europe. After the initial admixture this group became an endogamous population, so that most Ashkenazi Jews share many common ancestors in the recent past with other Ashkenazi Jews. This is ideal for the clustering programs above, as Ashkenazi Jews almost always fit onto a particular K with ease. Assuming there are enough Ashkenazi Jews in your data set you will always be able to find the “Jewish cluster” as you increase the value.

But the selection of a K which satisfies this comprehensibility criterion is a matter of convenience, not necessity. Most people are vaguely aware that Jews emerged as a people at a particular point in history. In the case of Ashkenazi Jews they emerged rather late in history. At certain K‘s Ashkenazi Jews exhibit mixed ancestral profiles, placing them between Europeans and Middle Eastern peoples. What this reflects is the earlier history of the ancestors of Ashkenazi Jews. But for most personal genomics companies this earlier history is not something that they want to address, because it doesn’t fit into the narrative that their particular consumers want to hear. People want to know if they are part-Jewish, not that they are part antique Middle Eastern and Southwest European.

Perplexment of course is not just for non-scientists. When Joe Pickrell’s TreeMix paper came out five years ago there was a strange signal of gene flow between Northern Europeans and Native Americans. There was no obvious explanation at the time…but now we know what was going on.

It turns out that Northern Europeans and Native Americans share common ancestry from Pleistocene Siberians. The relationship between Europeans and Native Americans has long been hinted at in results from other methods, but it took ancient DNA for us to conceptualize a model which would explain the patterns we were seeing.

An American with recent Amerindian (and probably African) ancestry

But in the context of the United States shared ancestry between Europeans and Native Americans is not particularly illuminating. Rather, what people want to know is if they exhibit signs of recent gene flow between these groups, in particular, many white Americans are curious if they have Native American heritage. They do not want to hear an explanation which involves the fusion of an East Asian population with Siberians that occurred 15,000 to 20,000 years ago, and then the emergence of Northern Europeans thorough successive amalgamations between Pleistocene, Neolithic, and Bronze Age, Eurasians.

In some of the inference methods Northern Europeans, often those with Finnic ancestry or relationship to Finnic groups, may exhibit signs of ancestry from the “Native American” cluster. But this is almost always a function of circumpolar gene flow, as well as the aforementioned Pleistocene admixtures. One way to avoid this would be to simply not report proportions which are below 0.5%. That way, people with higher “Native American” fractions would receive the results, and the proportions would be high enough that it was almost certainly indicative of recent admixture, which is what people care about.

Why am I telling you this? Because many journalists who report on direct-to-consumer genomics don’t understand the science well enough to grasp what’s being sold to the consumer (frankly, most biologists don’t know this field well either, even if they might use a barplot here and there).

And, the reality is that consumers have very specific parameters of what they want in terms of geographic and temporal information. They don’t want to be told true but trivial facts (e.g., they are Northern European). But neither they do want to know things which are so novel and at far remove from their interpretative frameworks that they simply can’t digest them (e.g., that Northern Europeans are a recent population construction which threads together very distinct strands with divergent deep time histories). In the parlance of cognitive anthropology consumers want their infotainment the way they want their religion, minimally counterintuitive. Consume some surprise. But not too much.

Ancestry inference is precise and accurate(ish)

For about three years I consulted for Family Tree DNA. It was a great experience, and I met a lot of cool people through that connection. But perhaps the most interesting aspect was the fact that I can understand the various pressures that direct-to-consumer genomics firms face from the demand side. The science is one thing, but when you are working on a consumer facing product, other variables come into play which are you not cognizant of when you are thinking of it from a point of pure analysis. I’m pretty sure that my insights working with Family Tree DNA can generalize to the other firms as well (23andMe, Ancestry, and Genographic*).

The science behind the ancestry inference elements of the product on offer is not particularly controversial or complex, but the customer aspect of how these results are received can become an intractable nightmare. The basic theory was outlined in the year 2000 in Pritchard et al.’s Inference of Population Structure Using Multilocus Genotype Data. You have lots of data thanks to better genomic technology (e.g., 300,000 SNPs). You have computers to analyze that data. And, you have scientific models of population history and dynamics which you can test that data against. The shape of the data will determine the parameters of the model, and it this those parameters that yield “your ancestry.”

In broad sketches the results make sense for most people. It’s in the finer details that the confusions emerge. To the left you see my son’s 23andMe ancestry deconvolution. The color coding is such you can tell that his maternal and paternal chromosomes have very different ancestry profiles (mostly Northern European and South Asian, respectively).

But his “Northern European” chromosomes also are more richly colored, with alternative segments denoting ancestry from different parts of Northern Europe. So in terms of proportions I am told my son is about 15 percent French and German, and 10 percent Scandinavian and 10 percent British and Irish. This is reasonable. On the other side he’s nearly 50 percent “broadly South Asian.” The balance is accounted for by my East Asian ancestry, which is correct, as my South Asian ethnicity is from Bengal, where there is a fair amount of East Asian ancestry (my family’s origin is on the eastern edge of Bengal itself).

And it is here that the non-scientific concerns of consumer genomics comes into focus. The genetic differences and distance between various South Asian groups are far higher than those between various Northern European groups. Depending on the statistic measure you use intra-South Asian variation is about one order of magnitude greater than intra-Northern European differences. This is due to geographic partitioning, the caste system, and differential admixture in South Asians between extreme diverged ancestral elements (about half of South Asian ancestry is very similar to Europeans and Middle Easterners, and half of it is extremely different, so how far you are from the 50 percent mark determines a lot).

Broadly South Asian

In Northern Europe there is very little genetic variation from the British Isles all the way the Baltic. The reason for this is historical: massive population turnover in the region 4,500 years ago means that much of the genetic divergence between the groups dates to the Bronze Age. It is this the genetic divergence, the variation, that is the raw material for the inferences and proportions you see in ancestry calculators. There’s just not that much raw material for Northern Europeans.

Broadly South Asian

Remember, the methods require lots of variation in the data as a raw input. You’re making the inference machine work real hard to produce a reasonable robust result if you don’t have that much variation. In contrast to the situation with Northern Europeans, with South Asians the companies are leaving raw material on the table, and just combining diverse groups together.

What’s going on here? As you might have guessed this is an economically motivated decision. Most South Asians know their general heritage due to caste and regional origins (though many Bengalis exhibit some lacunae about their East Asian ancestry). In contrast, many Americans of Northern European ancestry with an interest in genealogy are extremely curious about explicit proportional breakdowns between Northern European nationalities. The direct-to-consumer genomic firms attempt to cater to this demand as best as they can.

As I have stated many times, racial background is to various extents both biological and social. When it comes to the difference between Lithuanians and Nigerians the biological differences due to evolutionary history are straightforward, and clear and distinct. You can generate a phylogenetic history and perform a functional analysis of the differences. Additionally, you also have to note that the social differences exist, but are not straightforward. Like Lithuanians Nigerians of Igbo background are generally Roman Catholic, while most other Nigerians are not. The linguistic differences between Nigerian languages are great enough that it is defensible to suggest that Hausa speakers of Afro-Asiatic dialects are closer to Lithuanians in their phylogenetic history than to the dialects of the Yoruba.

A Lithuanian American

Contrast this to the situation where you differentiate Lithuanians from French. To any European the differences here are incredibly huge. The history of France, what was Roman Gaul, goes back 2,000 years. After the collapse of the West Roman Empire by any measure the people who became French were at the center of European history. In contrast, Lithuanians were a marginal tribe, who did not enter Christian civilization until the late 14th century. In social-cultural terms, due to history, the differences between French and Lithuanians are extremely salient to people of French and Lithuanian ancestry. But genetically the differences are modest at best.

If a direct-to-consumer genetic testing company tells you that you are 90 percent Northern European and 10 percent West African, that is a robust result that has a clear historical genetic interpretation. The two element’s of one’s ancestry have been relatively distinct for on the order of 100,000 years, with the Northern European element really just a proxy for non-Africans (though it is easy to drill-down within Eurasia). In contrast, notice how 23andMe, with some of the best scientists in the business, tells people they are “French-German,” and not French or German. What the hell is a “French-German”? Someone from Alsace-Lorraine? A German descendent of Huguenots? Obviously not.

“French-German” is a cluster almost certainly because there are no clear and distinct genetic differences between French and Germans. Yes, there is a continuum of allele frequencies between these two groups, but having looked at a fair number of people of French and German background in Family Tree DNA’s database I can tell you that France and Germany have a lot of local structure even among people of indigenous ancestry. Germans from the Rhineland are quite often genetically closer to French from Normandy than they are to Germans from eastern Saxony. Some of this is due to gene flow between neighboring regions, but some of this is due to cultural fluidity as to who exactly is German. It is clear that some Germans from the eastern regions are Germanized Slavs. Some Germans from the north exhibit strong affinities to Scandinavians, while Germans from Bavaria and Austria are classically Central European (whatever that means). The average German is distinct from the average French person, but the genetic clustering of the two groups is not clear and distinct.

Remember earlier I explained that the science is predicated on aligning data and models. The cultural model of Northern Europeans is conditioned on diversity and difference which has been very salient for the past few thousand years since the rise and fall of Rome. But the evolutionary genetic history is one where there are far fewer differences. The data do not fit a model that makes much sense to the average consumer (e.g., “you descend from a mix of Bronze Age migrants from the west-central steppe of Eurasia and Mesolithic indigenous hunter-gatherers and Neolithic farmers”). What makes sense to the average American consumer are histories of nationalities, so direct-to-consumer genetic companies try to satisfy this need. Because the needs of the consumer and their cultural expectations are poorly served by the data (genetic variation) and models of population history, you have a lot of awkward kludges and strange results.

A Saxon

Imagine, for example, you want to estimate how “German” someone is.  What do you use for your reference population of Germans?  Looking at the data there are clearly three major clusters within Germany when you weight the numbers appropriate, with affinities to the northern French, Slavs, and Scandinavians, and various proportions in between. Your selection of your sample is going to mean that some Germans are going to be more Germans than other Germans. If you select an eastern German sample then western Germans whose ancestors have been speaking a Germanic language far longer than eastern Germans are going to come out as less German. Or, you could just pick all of these disparate groups…in which case, lots of Northern Europeans become “German.”

Consumers want genetic tests to reflect strong cultural memories which were forged in the fires of rapidly protean and distinction-making process of cultural evolution. But biological and cultural evolution exhibit different modes (the latter generates huge between group differences) and tempos (those differences emerge fast). The ancestry results many people get are the outcomes of compromises to thread the needle and square the circle.

All the above is half the story. Next I’ll explain why “deep history” has to be massaged to make recent history informative and comprehensible….

* Also, I have a little historical perspective because of my friendship with the person who arguably created this sector, Spencer Wells.