Probabilistic models of genetic variation in structured populations applied to global human studies.

TitleProbabilistic models of genetic variation in structured populations applied to global human studies.
Publication TypeJournal Article
Year of Publication2016
AuthorsHao, W, Song, M, Storey, JD
JournalBioinformatics
Volume32
Issue5
Pagination713-21
Date Published2016 Mar 01
ISSN1367-4811
KeywordsAlgorithms, Genetic Variation, Genotype, Humans, Models, Statistical, Probability, Software
Abstract

MOTIVATION: Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation.

RESULTS: We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard-Stephens-Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new 'logistic factor analysis' framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions.

AVAILABILITY AND IMPLEMENTATION: A Bioconductor R package called lfa is available at http://www.bioconductor.org/packages/release/bioc/html/lfa.html

CONTACT: jstorey@princeton.edu

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

DOI10.1093/bioinformatics/btv641
Alternate JournalBioinformatics
PubMed ID26545820
PubMed Central IDPMC4795615
Grant ListR01 HG006448 / HG / NHGRI NIH HHS / United States