Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the eigenvectors generated by Eigenstrat are sample specific and thus cannot be generalized to other individuals.
Dartmouth Digital Commons Citation
Li, Yafang; Byun, Jinyoung; Cai, Guoshuai; Xiao, Xiangjun; Han, Younghun; Cornelis, Olivier; Dinulos, James E.; Dennis, Joe; Easton, Douglas; Gorlov, Ivan; Seldin, Michael F.; and Amos, Christopher I., "FastPop: A Rapid Principal Component Derived Method to Infer Intercontinental Ancestry Using Genetic Data" (2016). Dartmouth Scholarship. 564.