John Storey

Professor of Lewis-Sigler Institute of Integrative Genomics
Director of Center for Statistics and Machine Learning
Contact
jstorey@princeton.eduResearch Area
Genetics & GenomicsResearch Focus
Quantitative and functional genomicsMy lab develops and applies quantitative methods in genomics. We are particularly focused on functional genomics problems involving high-dimensional data sets, such as that obtained from large-scale genotyping, gene expression monitoring, and mass spectroscopy based proteomics. Because our research deals with large amounts of noisy data, we also develop theory and methods for statistics and machine learning.
This is an especially exciting time for quantitative genomics, as many studies are underway that involve multiple types of large-scale data. For example, we are working on studies involving high-throughput measurements on mRNA expression, protein expression, metabolite levels, protein-DNA binding, chromatin structure, and DNA sequences.
The over-arching goal of our research is to utilize multiple sources of high-throughput genomic data to understand biological regulatory networks and the molecular basis of complex traits. This involves characterizing the "wiring diagram" of the molecular biology of the cell. The ultimate goal is to build a quantitative system for understanding how the hard-wired components of a cell, such as DNA sequence and epigenetic factors, interact with the environment to determine the dynamic molecular behavior of the cell, as manifested in variables such as RNA expression, protein expression, enzymatic activity, and eventually as complex traits.
Specific problems we are working on include:
- Inferring causal regulatory networks from studies involving high-throughput molecular profiling (e.g., RNA and protein expression) and large-scale genotyping.
- Decomposing sources of gene expression variation in complex clinical and experimental settings.
- Understanding the genetic and epigenetic determinants of the gene expression program.
- Developing quantitative approaches to providing a causal "molecular dissection" of complex traits.
- Understanding the relationship between evolutionary forces driving natural genetic variation and its effect on variation in expression levels of gene products.
- Developing new theory and methods for high-dimensional statistical inference, large-scale significance testing, and machine learning
-
Author Correction: Genome-wide real-time in vivo transcriptional dynamics during Plasmodium falciparum blood-stage development. Nat Commun. 2022 ;13(1):1497. .
-
The functional false discovery rate with applications to genomics. Biostatistics. 2021 ;22(1):68-81. .
-
Estimating FST and kinship for arbitrary population structures. PLoS Genet. 2021 ;17(1):e1009241. .
-
The optimal discovery procedure for significance analysis of general gene expression studies. Bioinformatics. 2021 ;37(3):367-374. .
-
Extending Tests of Hardy-Weinberg Equilibrium to Structured Populations. Genetics. 2019 ;213(3):759-770. .
-
A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis. Genetics. 2019 ;212(4):1009-1029. .
-
Genome-wide real-time in vivo transcriptional dynamics during Plasmodium falciparum blood-stage development. Nat Commun. 2018 ;9(1):2656. .
-
Probabilistic models of genetic variation in structured populations applied to global human studies. Bioinformatics. 2016 ;32(5):713-21. .
-
Scaling probabilistic models of genetic variation to millions of humans. Nat Genet. 2016 ;48(12):1587-1590. .
-
A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays. Nucleic Acids Res. 2015 ;43(20):e131. .
-
Beyond the E-Value: Stratified Statistics for Protein Domain Prediction. PLoS Comput Biol. 2015 ;11(11):e1004509. .
-
Testing for genetic associations in arbitrarily structured populations. Nat Genet. 2015 ;47(5):550-4. .
-
Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics. 2015 ;31(4):545-54. .
-
subSeq: determining appropriate sequencing depth through efficient read subsampling. Bioinformatics. 2014 ;30(23):3424-6. .
-
Gene expression profiles associated with acute myocardial infarction and risk of cardiovascular death. Genome Med. 2014 ;6(5):40. .