The new African “multi-regionalism” & pan-Neanderthalism

Posted on July 17, 2018July 17, 2018 by Razib Khan

We live in times when our understanding of the origin and diversification of modern humans is undergoing great change. More concretely, our understanding of what it means to be human is transforming. The terms are overused, but perhaps it could be called a “revolution” or “paradigm shift” between the year 2000 and today.

At the end of 2010 ancient DNA made it highly likely that people outside of Sub-Saharan Africa had non-trivial Neanderthal ancestry. That is, enough ancestry that it is detectable genomically. I should also add that I think it is highly probable that the good majority of people within Sub-Saharan Africa have Neanderthal ancestry. Some of this is due to recent attenuated Eurasian back-migration (e.g., many West Africans, Nilotic people, and KhoeSan have Holocene gene-flow signals which derive from the agricultural expansions of the past 10,000 years). But, I think once deep Pleistocene genomes of African humans are sequenced we will see evidence of some Eurasian back-migration at a very ancient date (there is already some suggestive inferential evidence of this).*

Talking with a few friends this week, I realized that the famous “We are all Africans” t-shirts, which have turned into recognizable memes, should be supplemented with “We are all Neanderthals” t-shirts. So yeah, now selling them on DNA Geeks. If the Richard Dawkins Foundation can make quid on it, why not the Razib Khan et al. Foundation?

This has all been on my mind due to a review paper in Trends in Ecology and Evolution, Did Our Species Evolve in Subdivided Populations across Africa, and Why Does It Matter? (OA). If you read this blog closely you’ll see there’s not much new in it. But, it is a signpost, a marker, of the times we live in. Here’s the important bit:

Together with recent archaeological and genetic lines of evidence, these data are consistent with the view that our species originated and diversified within strongly subdivided (i.e., structured) populations, probably living across Africa, that were connected by sporadic gene flow…This concept of ‘African multiregionalism’…may also include hybridization between H. sapiens and more divergent hominins (see Glossary) living in different regions…Crucially, such population subdivisions may have been shaped and sustained by shifts in ecological boundaries…challenging the view that our species was endemic to a single region or habitat, and implying an often underacknowledged complexity to our African origins.

The first person who explicitly used the term “African multi-regionalism” that I recall was Alwyn Scally, though the general framework was shaping up years before. Frankly, I was waiting for someone to use that word. If Richard Klein’s The Dawn of Human Culture, published in 2002, was the apogee of the old model, often inchoate and more crisp in popularization than within the scientific community that we are all descended from a single East African tribe, this review paper heralds the emergence of a more complex and pluralistic framework. The emergence of modern humans within Africa then may have been a polycentric gradual and interactive process; not a singular explosion against the firmament of the antique savanna landscape.

By the late 2000s, even before the 2010 Neanderthal draft genome paper, it was starting to be evident due to genome-wide analyses of contemporary populations, that the extreme bottleneck clear in non-African populations was much more modest within Africa. That opened the possibility for the existence of deep structure within the continent that pre-dated the “Out of Africa” event. A deeper look at African hunter-gatherers indicated to many researchers that these groups diverged from other modern humans in the range of ~200,000 years before the well. Recent paleontological work has confirmed this genetic insight.

Where we are today is that some people are now arguing for the overthrow of the “Out-of-Africa” idea, whether by replacing it with an “Into-Africa” model of some sort, or resurrecting a more polycentric classical multi-regionalism (“some people” as evident in the increased frequency of emails and Twitter messages I get in this vein). I don’t think we’re there yet, not by any measure. But, it is now in the realm of very unlikely, not extremely unlikely (at least the “Into-Africa” model; it is clear that strong overwhelming demographic pulses from somewhere singular dominate the genome of most modern humans).

* I don’t think it is all that implausible that some Neanderthal back-migration into Africa occurred at some point in the last ~500,000.

Open Thread, 07/17/2018

Posted on July 16, 2018July 17, 2018 by Razib Khan

History of Japan: Revised Edition. As I said, a pretty good and short history. Recommended.

CRISPR/Cas9 gene editing scissors are less accurate than we thought, but there are fixes. I know the focus is on human genetics. And rightly so. But this isn’t going to be as much of an issue in animal and plant breeding.

Patterns of speciation and parallel genetic evolution under adaptation from standing variation.

Genome-wide analysis in UK Biobank identifies over 100 QTLs associated with muscle mass variability in middle age individuals.

Amazon told me R for Everyone: Advanced Analytics and Graphics was on sale. Great. But I already own it. That being said, I can tell you it’s a pretty good book.

Genome doubling shapes the evolution and prognosis of advanced cancers.

Against Moral Equivalence. “The talking heads trafficking in examples of U.S. interference neglect to mention that the goal of American policy has always been to prop up anti-totalitarian, pro-market leaders.” I dislike the tendency of American conservatism to conflate anti-authoritarian and pro-market. The two are distinct (I’m pro-market for what it’s worth, but capitalism is amoral, even though it leads to greater human well-being).

Large randomized controlled trial finds state pre-k program has adverse effects on academic achievement.

Archaeobotanical evidence reveals the origins of bread 14,400 years ago in northeastern Jordan.

Confronting Implicit Bias in the New York Police Department. Implicit bias stuff is sketchy science. But people want solutions for social problems.

How Social Science Might Be Misunderstanding Conservatives. I got introduced to the “authoritarian personality” in college. I didn’t think much about it, but over the years it seemed pretty clearly a bit rigged. But whatever. Then I read The Dialectical Imagination: A History of the Frankfurt School and the Institute of Social Research, 1923-1950. That’s where it comes from. Enough said, right?

Tides of History is a great podcast. Now Patrick Wyman is talking about the “Hundred Years War.”

Should I post “open threads” anymore? It seems that the number of comments keeps dropping. Really “everything” is moving to Twitter nowadays though Twitter is a wasteland.

A “carvaka” perspective historicity of myth and religion. A long post on Brown Pundits by me. Was asked once why I post there and not here, and why here and not there. 45% of the readers of that weblog are from Indian IPs. 5% here. About twice as much traffic here, but much more engagement there (bounce rate 70% vs. 40%).

India vs. China, genetically diverse vs. homogeneous

Posted on July 15, 2018July 16, 2018 by Razib Khan

About 36% of the world’s population are citizens of the Peoples’ Republic of China and the Republic of India. Including the other nations of South Asia (Pakistan, Bangladesh, etc.), 43% of the population lives in China and/or South Asia.

But, as David Reich mentions in Who We Are and How We Got Here China is dominated by one ethnicity, the Han, while India is a constellation of ethnicities. And this is reflected in the genetics. The relatively diversity of India stands in contrast to the homogeneity of China.

At the current time, the best research on population genetic variation within China is probably the preprint A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese. The author used low-coverage sequencing of over 10,000 women to get a huge sample size of variation all across China. The PCA analysis recapitulated earlier work. Genetic relatedness among the Han of China is geographically structured. The largest component of variance is north-south, but a smaller component is also east-west. The north-south element explains more than 4.5 times the variance as the east-west.

Tutorial to run supervised admixture analyses

Posted on July 13, 2018July 17, 2020 by Razib Khan

ID	Dai	Gujrati	Lithuanians	Sardinian	Tamil
razib_23andMe	0.14	0.26	0.02	0.00	0.58
razib_ancestry	0.14	0.26	0.02	0.00	0.58
razib_ftdna	0.14	0.26	0.02	0.00	0.57
razib_daughter	0.05	0.14	0.29	0.18	0.34
razib_son	0.07	0.17	0.28	0.19	0.30
razib_son_2	0.06	0.19	0.29	0.19	0.27
razib_wife	0.00	0.07	0.55	0.38	0.00

This is a follow-up to my earlier post, Tutorial To Run PCA, Admixture, Treemix And Pairwise Fst In One Command. Hopefully, you’ll be able to run supervised admixture analysis with less hassle after reading this. Here I’m pretty much aiming for laypeople. If you are a trainee you need to write your own scripts. The main goal here is to allow people to run a lot of tests to develop an intuition for this stuff.

The above results are from a supervised admixture analysis of my family and myself. The fact that there are three replicates of me is because I converted my 23andMe, Ancestry, and Family Tree DNA raw data into plink files three times. Notice that the results are broadly consistent. This emphasizes that discrepancies between DTC companies in their results are due to their analytic pipeline, not because of data quality.

The results are not surprising. I’m about ~14% “Dai”, reflecting East Asian admixture into Bengalis. My wife is ~0% “Dai”. My children are somewhere in between. At a low fraction, you expect some variance in the F1.

Now below are results for three Swedes with the same reference panel:

Group	ID	Dai	Gujrati	Lithuanians	Sardinian	Tamil
Sweden	Sweden17	0.00	0.09	0.63	0.28	0.00
Sweden	Sweden18	0.00	0.08	0.62	0.31	0.00
Sweden	Sweden20	0.00	0.05	0.72	0.23	0.00

All these were run on supervised admixture frameworks where I used Dai, Gujrati, Lithuanians, Sardinians, and Tamils, as the reference “ancestral” populations. Another way to think about it is: taking the genetic variation of these input groups, what fractions does a given test focal individual shake out at?

The commands are rather simple. For my family:
bash rawFile_To_Supervised_Results.sh TestScript

For the Swedes:
bash supervisedTest.sh Sweden TestScript

The commands need to be run in a folder: ancestry_supervised/.

You can download the zip file. Decompress and put it somewhere you can find it.

Here is what the scripts do. First, imagine you have raw genotype files downloaded from 23andMe, Ancestry, and Family Tree DNA.

Download the files as usual. Rename them in an intelligible way, because the file names are going to be used for generating IDs. So above, I renamed them “razib_23andMe.txt” and such because I wanted to recognize the downstream files produced from each raw genotype. Leave the extensions as they are. You need to make sure they are not compressed obviously. Then place them all in RAWINPUT/.

The script looks for the files in there. You don’t need to specify names, it will find them. In plink the family ID and individual ID will be taken from the text before the extension in the file name. Output files will also have the file name.

Aside from the raw genotype files, you need to determine a reference file. In REFERENCEFILES/ you see the binary pedigree/plink file Est1000HGDP. The same file from the earlier post. It would be crazy to run supervised admixture on the dozens of populations in this file. You need to create a subset.

For the above I did this:
grep "Dai\|Guj\|Lithua\|Sardi\|Tamil" Est1000HGDP.fam > ../keep.keep

Then:
./plink --bfile REFERENCEFILES/Est1000HGDP --keep keep.keep --make-bed --out REFERENCEFILES/TestScript

When the script runs, it converts the raw genotype files into plink files, and puts them in INDIVPLINKFILES/. Then it takes each plink file and uses it as a test against the reference population file. That file has a preprend on group/family IDs of the form AA_Ref_. This is essential for the script to understand that this is a reference population. The .pop files are automatically generated, and the script inputs in the correct K by looking for unique population numbers.

The admixture is going to be slow. I recommend you modify runadmixture.pl by adding the number of cores parameters so it can go multi-threaded.

When the script is done it will put the results in RESULTFILES/. They will be .csv files with strange names (they will have the original file name you provided, but there are timestamps in there so that if you run the files with a different reference and such it won’t overwrite everything). Each individual is run separately and has a separate output file (a .csv).

But this is not always convenient. Sometimes you want to test a larger batch of individuals. Perhaps you want to use the reference file I provided as a source for a population to test? For the Swedes I did this:
grep "Swede" REFERENCEFILES/Est1000HGDP.fam > ../keep.keep

Then:
./plink --bfile REFERENCEFILES/Est1000HGDP --keep keep.keep --make-bed --out INDIVPLINKFILES/Sweden

Please note the folder. There are modifications you can make, but the script assumes that the test files are inINDIVPLINKFILES/. The next part is important. The Swedish individuals will have AA_Ref_ prepended on each row since you got them out of Est1000HGDP. You need to remove this. If you don’t remove it, it won’t work. In my case, I modified using the vim editor:
vim Sweden.fam

You can do it with a text editor too. It doesn’t matter. Though it has to be the .fam file.

After the script is done, it will put the .csv file in RESULTFILES/. It will be a single .csv with multiple rows. Each individual is tested separately though, so what the script does is append each result to the file (the individuals are output to different plink files and merged in; you don’ t need to know the details). If you have 100 individuals, it will take a long time. You may want to look in the .csv file as the individuals are being added to make sure it looks right.

The convenience of these scripts is that it does some merging/flipping/cleaning for you. And, it formats the output so you don’t have to.

I originally developed these scripts on a Mac, but to get it to work on Ubuntu I made a few small modifications. I don’t know if it still works on Mac, but you should be able to make the modifications if not. Remember for a Mac you will need the make versions of plink and admixture.

For supervised analysis, the reference populations need to make sense and be coherent. Please check the earlier tutorial and use the PCA functions to remove outliers.

Again, here is the download to the zip files. And, remember, this only works on Ubuntu for sure (though now I hear it’s easy to run Ubuntu in Windows).

Tutorial to run PCA, Admixture, Treemix and pairwise Fst in one command

Posted on July 11, 2018July 17, 2020 by Razib Khan

Today on Twitter I stated that “if the average person knew how to run PCA with plink and visualize with R they wouldn’t need to ask me anything.” What I meant by this is that the average person often asks me “Razib, is population X closer to population Y than Z?” To answer this sort of question I dig through my datasets and run a few exploratory analyses, and get back to them.

I’ve been meaning to write up and distribute a “quickstart” for a while to help people do their own analyses. So here I go.

The audience of this post is probably two-fold:

“Trainees” who are starting graduate school and want to dig in quickly into empirical data sets while they’re really getting a handle on things. This tutorial will probably suffice for a week. You should quickly move on to three population and four population tests, and Eigensoft and AdmixTools. As well fineStructure
The larger audience is technically oriented readers who are not, and never will be, geneticists professionally.

What do you need? First, you need to be able to work in a Linux or Linux-environment. I work both in Ubuntu and on a Mac, but this tutorial and these scripts were tested on Ubuntu. They should work OK on a Mac, but there may need to be some modifications on the bash scripts and such.

Assuming you have a Linux environment, you need to download this zip or tar.xz file. Once you open this file it should decompress a folderancestry/.

There are a bunch of files in there. Some of them are scripts I wrote. Some of them are output files that aren’t cleaned up. Some of them are packages that you’ve heard of. Of the latter:

admixture
plink
treemix

You can find these online too, though these versions should work out of the box on Ubuntu. If you have a Mac, you need the Mac versions. Just replace the Mac versions into the folderancestry/. You may need some libraries installed into Ubuntu too if you recompile yourselves. Check the errors and make search engines your friends.

You will need to install R (or R Studio). If you are running Mac or Ubuntu on the command line you know how to get R. If not, Google it.

I also put some data in the file. In particular, a plink set of files Est1000HGDP. These are merged from the Estonian Biocentre, HGDP, and 1000 Genomes. There are 4,899 individuals in the data, with 135,000 high-quality SNPs (very low missingness).

If you look in the “family” file you will see an important part of the structure. So do:

less Est1000HGDP.fam

You’ll see something like this:Abhkasians abh154 0 0 1 -9 Abhkasians abh165 0 0 1 -9 Abkhazian abkhazian1_1m 0 0 2 -9 Abkhazian abkhazian5_1m 0 0 1 -9 Abkhazian abkhazian6_1m 0 0 1 -9 AfricanBarbados HG01879 0 0 0 -9 AfricanBarbados HG01880 0 0 0 -9

There are 4,899 rows corresponding to each individual. I have used the first column to label the ethnic/group identity. The second column is the individual ID. You can ignore the last 4 columns.

There is no way you want to analyze all the different ethnic groups. Usually, you want to look at a few. For that, you can use lots of commands, but what you need is a subset of the rows above. The grep command matches and returns rows with particular patterns. It’s handy. Let’s say I want just Yoruba, British (who are in the group GreatBritain), Gujurati, Han Chinese, and Druze. The command below will work (note that Han matches HanBeijing, Han_S, Han_N, etc.).

grep "Yoruba\|Great\|Guj\|Han\|Druze" Est1000HGDP.fam > keep.txt

The file keep.txt has the individuals you want. Now you put it through plink to generate a new file:

./plink --bfile Est1000HGDP --keep keep.txt --make-bed --out EstSubset

This new file has only 634 individuals. That’s more manageable. But more important is that there are far fewer groups for visualization and analysis.

As for that analysis, I have a Perl script with a bash script within it (and some system commands). Here is what they do:

1) they perform PCA to 10 dimensions
2) then they run admixture on the number of K clusters you want (unsupervised), and generate a .csv file you can look at
3) then I wrote a script to do pairwise F_st between populations, and output the data into a text file
4) finally, I create the input file necessary for the treemix package and then run treemix with the number of migrations you want

There are lots of parameters and specifications for these packages. You don’t get those unless you to edit the scripts or make them more extensible (I have versions that are more flexible but I think newbies will just get confused so I’m keeping it simple).

Assuming I create the plink file above, running the following commands mean that admixture does K = 2 and treemix does 1 migration edge (that is, -m 1). The PCA and pairwise Fst automatically runs.

perl pairwise.perl EstSubset 2 1

Just walk away from your box for a while. The admixture will take the longest. If you want to speed it up, figure out how many cores you have, and edit the file makecluster.sh, go to line 16 where you see admixture. If you have 4 cores, then type -j4 as a parameter. It will speed admixture up and hog all your cores.

There is as .csv that has the admixture output. EstSubset.admix.csv. If you open it you see something like this:
Druze HGDP00603 0.550210 0.449790 Druze HGDP00604 0.569070 0.430930 Druze HGDP00605 0.562854 0.437146 Druze HGDP00606 0.555205 0.444795 GreatBritain HG00096 0.598871 0.401129 GreatBritain HG00097 0.590040 0.409960 GreatBritain HG00099 0.592654 0.407346 GreatBritain HG00100 0.590847 0.409153

Column 1 will always be the group, column 2 the individual, and all subsequent columns will be the K’s. Since K = 2, there are two columns. Space separated. You should be able to open the .csv or process it however you want to process it.

You’ll also see two other files: plink.eigenval plink.eigenvec. These are generic output files for the PCA. The .eigenvec file has the individuals along with the values for each PC. The .eigenval file shows the magnitude of the dimension. It looks like this:68.7974 38.4125 7.16859 3.3837 2.05858 1.85725 1.73196 1.63946 1.56449 1.53666

Basically, this means that PC 1 explains twice as much of the variance as PC 2. Beyond PC 4 it looks like they’re really bunched together. You can open up this file as a .csv and visualize it however you like. But I gave you an R script. It’s RPCA.R.

You need to install some packages. First, open R or R studio. If you want to go command line at the terminal, type R. Then type:install.packages("ggplot2") install.packages("reshape2") install.packages("plyr") install.packages("ape") install.packages("igraph") install.packages("ggplot2")

Once those packages are loaded you can use the script:
source("RPCA.R")

Then, to generate the plot at the top of this post:
plinkPCA()

There are some useful parameters in this function. The plot to the left adds some shape labels to highlight two populations. A third population I label by individual ID. This second is important if you want to do outlier pruning, since there are mislabels, or just plain outlier individuals, in a lot of data (including in this). I also zoomed in.

Here’s how I did that:
plinkPCA(subVec = c("Druze","GreatBritain"),labelPlot = c("Lithuanians"),xLim=c(-0.01,0.0125),yLim=c(0.05,0.062))

To look at stuff besides PC 1 and PC 2 you can do plinkPCA(PC=c("PC3","PC6")).

I put the PCA function in the script, but to remove individuals you will want to run the PCA manually:

./plink --bfile EstSubset --pca 10

You can remove individuals manually by creating a remove file. What I like to do though is something like this:
grep "randomID27 " EstSubset.fam >> remove.txt

The double-carat appends to the remove.txt file, so you can add individuals in the terminal in one window while running PCA and visualizing with R in the other (Eigensoft has an automatic outlier removal feature). Once you have the individuals you want to remove, then:
./plink --bfile EstSubset --remove remove.txt --make-bed --out EstSubset ./plink --bfile EstSubset --pca 10

Then visualize!

To make use of the pairwise F_st you need the fst.R script. If everything is set up right, all you need to do is type:
source("fst.R")

It will load the file and generate the tree. You can modify the script so you have an unrooted tree too.

The R script is what generates the FstMatrix.csv file, which has the matrix you know and love.

So now you have the PCA, F_st and admixture. What else? Well, there’s treemix.

I set the number of SNPs for the blocks to be 1000. So -k 1000. As well as global rearrangement. You can change the details in the perl script itself. Look to the bottom. I think the main utility of my script is that it generates the input files. The treemix package isn’t hard to run once you have those input files.

Also, as you know treemix comes with R plotting functions. So run treemix with however many migration edges (you can have 0), and then when the script is done, load R.

Then:>source("src/plotting_funcs.R") >plot_tree("TreeMix")
But actually, you don’t need to do the above. I added a script to generate a .png file with the treemix plot in pairwise.perl. It’s called TreeMix.TreeMix.Tree.png.

OK, so that’s it.

To review:

Download zip or tar.xz file. Decompress. All the packages and scripts should be in there, along with a pretty big dataset of modern populations. If you are on a non-Mac Linux you are good to go. If you are on a Mac, you need the Mac versions of admixture, plink, and treemix. I’m going to warn you compiling treemix can be kind of a pain. I’ve done it on Linux and Mac machines, and gotten it to work, but sometimes it took time.

You need R and/or R Studio (or something like R Studio). Make sure to install the packages or the scripts for visualizing results from PCA and pairwiseF_st won’t work.*

There is already a .csv output from admixture. The PCA also generates expected output files. You may want to sort, so open it in a spreadsheet.

This is potentially just the start. But if you are a layperson with a nagging question and can’t wait for me, this could be you where you need to go!

* I wrote a lot of these things piecemeal and often a long time ago. It may be that not all the packages are even used. Don’t bother to tell me.

Drawing on the slate of human nature

Posted on July 11, 2018July 11, 2018 by Razib Khan

Some of you have been reading me since 2002. Therefore, you’ve seen a lot of changes in my interests (and to a lesser extent, my life…no more cat pictures because my cats died). Whereas today I incessantly flog Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, in 2002 I would talk about Steven Pinker’s The Blank Slate: The Modern Denial of Human Nature quite a bit. The reason I don’t talk much about The Blank Slate is that some point in the 2000s I realized my future deep interests were going to be in population genetics, rather than behavior genetics and cognitive psychology. If you are not a specialist who doesn’t follow the literature. Who doesn’t “read the supplements”. You’re going to stop gaining anything more from books at a certain point.

Similarly, after I read In Gods We Trust: The Evolutionary Landscape of Religion, I read a lot of books on the cognitive anthropology of religion. Until I didn’t. Now that Harvey Whitehouse has teamed up with Peter Turchin, I suspect I’ll check in on this literature again.

But life comes at you fast. Today I think the broad thesis of The Blank Slate seems so correct, that we are not a “blank slate”, that no one would argue with that. Rather, the implications of that thesis are highly “problematic,” and social and cultural constructionism has really gone much further on the Left operationally than they were in the early 2000s. To give a concrete example, you can admit that sex differences are real and significant, but you have to be very careful in mentioning or highlighting specific instances or cases where they matter.

Moving to a more controversial topic, for a long while I’ve pretty much ignored the genomic study of the normal variation of cognition. The reason is that until recently all the studies were very underpowered to detect much of anything. The sample sizes were too small in relation to the genetic architecture of the trait because of the “Fourth Law of Behavior Genetics.”

As 2018 proceeds I think we can say that we are now in new territory. On Twitter, Steve Hsu seems positively ecstatic over a paper that just came out in PNAS. His blog post, Game Over: Genomic Prediction of Social Mobility summarizes it pretty well, but you should read the open access paper.

Genetic analysis of social-class mobility in five longitudinal studies:

Genome-wide association study (GWAS) discoveries about educational attainment have raised questions about the meaning of the genetics of success. These discoveries could offer clues about biological mechanisms or, because children inherit genetics and social class from parents, education-linked genetics could be spurious correlates of socially transmitted advantages. To distinguish between these hypotheses, we studied social mobility in five cohorts from three countries. We found that people with more education-linked genetics were more successful compared with parents and siblings. We also found mothers’ education-linked genetics predicted their children’s attainment over and above the children’s own genetics, indicating an environmentally mediated genetic effect. Findings reject pure social-transmission explanations of education GWAS discoveries. Instead, genetics influences attainment directly through social mobility and indirectly through family environments.

Why does this matter? I’m assuming most of you have seen charts like the ones below, which “prove” how the game is rigged against the poor:

The problem that most behavior geneticists immediately have with these popular analyses, which now suffuse our public culture (e.g., the “representation” argument in academic science often takes as a cartoonish model that all groups will have equal representation in all fields given no discrimination; substantively almost everyone believes this isn’t true in some way, but for the sake of argumentation this is a bullet-proof line of attack which every white male academic is going to retreat away from), is that they ignore genetic confounds. This paper is an attempt to address that. Measure it. Quantify it. Characterize it.

The two most interesting results for me have to do with siblings and mothers. Unsurprisingly siblings who have a higher predicted educational attainment score genetically tend to have higher educational attainments. As you know, siblings vary in relatedness. They vary in the segregation of alleles from their parents. Some siblings are tall. Some are short. This is due to variation in genetics across the pedigree. People within a family are related to each other, but unless you are talking Targaryens they aren’t exactly alike. Similarly, some siblings are smart and some are not so smart, because they’re “born that way.”

We knew that. Soon we’ll understand that genomically I suspect.

Second, we see again the importance of maternal effects and non-transmitted alleles. Mothers who have a higher predicted level of education have children with more education even if those children don’t inherit those alleles.* One natural conclusion here is mothers with a particular disposition shaped by genes are creating particular environments for their children, and those environments let them flourish even if they do not have their mother’s genetic endowments. This actually has “news you can use” implications in life choices people make in relation to their partners.

The study ends on a cautionary note. Residual population substructure can cause issues, correcting which can attenuate or eliminate such subtle and small signals. The sample sizes could always get bigger. And ethnically diverse panels have to come into the picture at some point.

But Razib abides. This study had a combined sample size of >20,000 individuals. Then you have the other recent paper with 270,000 individuals, Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. All well and good, but I wait for greater things. There is no shame in waiting for better things. And I prophesy that a greater sample size shall come to pass before this year turns into the new.

And you know what’s better than 1 million samples? How about 1 billion samples!

* Note that the models are controlling for a lot of background socioeconomic variables.

The coming end of 150 years of the USA as the largest economy

Posted on July 10, 2018July 10, 2018 by Razib Khan

Most projections usually predict that China will be the largest economy by the year 2030. This got me thinking: when exactly did the USA surpass other nations? I knew it was in the 19th century, but I wasn’t sure exactly when.

GDP estimates are always somewhat dicey, and they were even more so in the past. But the above plot* is representative of what you can find online: the USA became #1 in the decade or so after the Civil War. What surprised me is that the nation it surpassed was China! Around 1880 the USA overtook China, and around 2030 China will overtake the USA. That’s 150 years of American singular economic dominance. Curiously, for a period India was #3, just as it will be in 2030 (though its GDP will be far lower than #2 USA by most estimates).

I am aware that on a per capita basis America will be the most affluent large society in the world for decades beyond the point when its economy is not the largest. My only observation is that we are living to see the end of a particular phase in world history.

One aspect of this that I wonder about is that it is a fact that to some extent in the late 19th and early 20th century America refused to take over the role of the world’s preeminent power from the United Kingdom long after it had become the most consequential economic force. To be frank, it was clear in the early 20th century that the UK was simply longer up to the task, and arguably a great deal of suffering might have been alleviated if the United States had stepped into its natural role earlier. Now I wouldn’t be surprised if the inverse occurred in the second quarter of the 21st century: the USA, like Britain, continues to play the role of hyperpower hegemon longer after it’s able to carry out that role credibly. I hope I’m wrong.

* Data from Barry Ritholtz’s blog.

The hegemon and world-citizen

Posted on July 10, 2018July 10, 2018 by Razib Khan

On occasion, I read a book…and forget its title. I usually manage to recall the title at some point. For the past five years or so I’ve been trying to recall a book I read on Asian diplomatic history written by a Korean American scholar. Today I finally recalled that book: East Asia Before the West: Five Centuries of Trade and Tribute.

The reason I’ve been trying to remember this book is that I’ve felt it told a story which is more relevant today than in the late 2000s, when the book was written and published. From the summary:

Focusing on the role of the “tribute system” in maintaining stability in East Asia and in fostering diplomatic and commercial exchange, Kang contrasts this history against the example of Europe and the East Asian states’ skirmishes with nomadic peoples to the north and west. Although China has been the unquestioned hegemon in the region, with other political units always considered secondary, the tributary order entailed military, cultural, and economic dimensions that afforded its participants immense latitude. Europe’s “Westphalian” system, on the other hand, was based on formal equality among states and balance-of-power politics, resulting in incessant interstate conflict.

Here’s my not-so-counterintuitive prediction: as China flexes its geopolitical muscles, it will revert back to form in substance, forging a foreign policy predicated on hierarchical relationships between states, while maintaining an external adherence to the system of European diplomacy which crystallized between the Peace of Westphalia and the Congress of Vienna, that emphasized the importance of equality between states. “Diplomacy with Chinese characteristics” if you will.

The main interesting thing about Bangladeshi genetics is how East Asian Bangladeshis are

Posted on July 9, 2018July 9, 2018 by Razib Khan

I got a question about endogamy and Bangladeshis on of my other weblogs, as well as their relatedness to western (e.g., Iranian) and eastern (e.g., Southeast Asian) populations. Instead of talking, what do the data say? Most of you have probably seen me write about this before, but I think it might be useful to post again for Google (or Quora, as Quora seems to like my blog posts as references).

The 1000 Genomes project collected samples a whole lot of Bangladeshis in Dhaka. The figure at the top shows that the Bangladeshis overwhelmingly form a relatively tight cluster that is strongly shifted toward East Asians. There is one exception: about five individuals, several of which were collected right after each other (their sample IDs are sequential) who show almost no East Asian shift.

Open Thread, 07/09/2018

Posted on July 9, 2018July 9, 2018 by Razib Khan

My review of The University We Need: Reforming American Higher Education is up at National Review Online (it’s already posted to my total content feed). The book’s publication date is tomorrow.

A review can only pack in so many things. So if there is something missing that seems obvious, it’s probably something that I cut in the interest of space (e.g., the author is not a fan of the emphasis on football and such at many universities, but I didn’t touch on that in the review). The University We Need is a short book, but it’s very dense in ideas and suggestions. Unfortunately, comments on NRO and Twitter indicate many people haven’t really read the review, so they won’t read the book.

Surely one reason I enjoyed the book is that the author is someone with whom I’m coincidently on the same wavelength. I first encountered his work nearly twenty years ago, when I read A History of the Byzantine State and Society, a ~1,000-page survey of the topic. In many ways a scholarly “core dump”, it has stood me in good stead all these years. But at the time I was totally unaware that the author, Warren Treadgold, and I shared broadly similar politics in the grand scheme of things. That is, we were intellectually oriented people who were also not on the Left.

I don’t consider myself a conservative intellectual. I’m just an intellectual who happens to be conservative because the Left terrifies me (I have real personal reasons!). Treadgold’s work similarly is not informed by him being a conservative intellectual. Rather, he’s a scholar whose views default to the Right as opposed to the center or Left because of where the dominant tendency in academia today is.

I’m currently reading A History of Japan. I think I’m getting stale and predictable. I read John Keay’s Midnight’s Descendants: A History of South Asia since Partition really quickly a few weeks ago. Need to move out beyond my tendency of reading long histories and lots of genetics papers.

I have a stack of books on cognitive psychology and cultural evolution I need to get through, though I think papers are probably more useful in the latter area, since I’ve read a fair number of books already on this topic (e.g., Cultural Evolution: How Darwinian Theory Can Explain Human Culture and Synthesize the Social Sciences).

Speaking of psychology, there are some really good podcasts in that field out. Part of it is there is so much to talk about with the replication crisis. I really enjoyed Two Psychologists Four Beers, for example. Though not surprisingly they sort of still mischaracterize the views and issues of conservatives or non-liberals in academia…there are so few who are “out” and vocal with their politically normal colleagues that people just don’t know what’s going on in their heads and it’s easy to mischaracterize.

This is the week when you follow the #SMBE2018 hashtag on Twitter. I assume a lot of papers are going to come out in the next few weeks after people present at SMBE.

Estimating recent migration and population size surfaces. This seems important. Definitely going to read.

How eliminating the ‘kill box’ turned Mosul into a meat-grinder.

Genetic analysis of social-class mobility in five longitudinal studies.

Male homosexuality and maternal immune responsivity to the Y-linked protein NLGN4Y.

Hung out with Stuart Ritchie this week. Still recommend his book, Intelligence: All That Matters.

There was some discussion on ancient DNA and archaeology on Twitter. Has ancient DNA changed everything? Or not?

First, I think it’s important to acknowledge that many of the models which have emerged out of ancient DNA are resurrections of older anthropological, archaeological, and historical frameworks, which emphasize migration. But these were long dismissed within many of these fields. Like David Reich in Who We Are and How We Got Here I believe that there was a political rationale for this. As someone who has read deeply in paleoanthropology and history for twenty years, I reject the idea that ancient DNA is actually not that revolutionary because I remember what passed as conventional wisdom 10-20 years ago.