and the Lewis-Sigler Institute for Integrative Genomics
Computer Science Bldg, 420
Computational molecular biology
My group focuses on developing and applying computational techniques to problems in molecular biology. We are particularly interested in developing algorithms for genome-level analysis of protein structure and protein-protein interactions.
Since a genome contains a complete 'parts list' of an organism, whole-genome data allows one to begin to address exhaustively the problem of determining and predicting which proteins can interact with each other. Traditionally, knowledge of protein-protein interactions has been accumulated from biochemical and genetic experiments; however, as whole-genome data accumulates, it becomes increasingly necessary to develop computational methods for predicting these interactions. Computational methods have already proven to be a useful first step for rapid genome-wide identification of putative protein function and structure, but research in the problem of computationally determining biologically relevant partners for given protein sequences is just beginning.
The difficulty of the general protein structure prediction problem precludes prediction at a detailed structural level (e.g., at the atomic level). Additionally, the constraint of genomic-level analysis favors a focus on fast, informatics-based methods. Thus, we simplify the problem of predicting protein-protein interactions in two complementary ways, one structural and the other genomic. Our structural approach has been to focus on particular structural motifs that mediate protein-protein interactions, and to develop fast, computational methods both for recognizing these motifs within protein sequences as well as for predicting which of these sequences interact with each other. Our genomic approach has been to exploit and integrate information gleaned from whole- and cross- genome analysis. Instead of explicitly using information about protein structure, these methods exploit the following ideas: (1) if two proteins interact in one genome, their homologues in other genomes are likely to interact as well and (2) regulatory information present in whole-genome sequence data or genome-wide expression data can be used to make predictions about protein function and protein-protein interactions.
Thus far, much of our work on predicting protein structure and protein-protein interactions has focused on the coiled coil motif. The coiled coil is a common and important structural motif that mediates protein-protein interactions, and is found in proteins involved in transcription, in cell-cell and viral-cell fusion events, and in maintaining the structural identity of cells. We have developed highly effective sequence-based methods for identifying whether a given protein sequence can take part in a coiled coil structure, and are currently developing novel computational techniques to predict whether two coiled coil proteins interact with each other, and if so, what the nature of this interaction is.
Song J, Singh M. (2013) From hub proteins to hub modules: the relationship between essentiality and centrality in the yeast interactome at different scales of organization. PLoS Comput Biol. 9:e1002910. Pubmed
Khan Z, Bloom JS, Amini S, Singh M, Perlman DH, Caudy AA, Kruglyak L. (2012) Quantitative measurement of allele-specific protein expression in a diploid yeast hybrid by LC-MS. Mol Syst Biol. 8: 602. Pubmed
Khan Z, Amini S, Bloom JS, Ruse C, Caudy AA, Kruglyak L, Singh M, Perlman DH, Tavazoie S. (2011) Accurate proteome-wide protein quantification from high-resolution 15N mass spectra. Genome Biol. 12: R122. Pubmed
Persikov AV, Singh M. (2011) An expanded binding model for Cys2His2 zinc finger protein-DNA interfaces. Phys Biol. 8: 035010. PubMed
Capra JA, Paeschke K, Singh M, Zakian VA. (2010) G-quadruplex DNA sequences are evolutionarily conserved and associated with distinct genomic features in Saccharomyces cerevisiae. PLoS Comput Biol. 6: e1000861. PubMed
Jiang P, Singh M. (2010) SPICi: a fast clustering algorithm for large biological networks. Bioinformatics. 26: 1105-1111. PubMed
Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. (2009) Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 5: e1000585. PubMed
Song J, Singh M. (2009) How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics. 25: 3143-3150. PubMed
Khan Z, Bloom JS, Garcia BA, Singh M, Kruglyak L. (2009) Protein quantification across hundreds of experimental conditions. Proc Natl Acad Sci. 106: 15544-15548. PubMed
Bloom JS, Khan Z, Kruglyak L, Singh M, Caudy AA. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics. 10: 221. PubMed
Khan Z, Bloom JS, Kruglyak L, Singh M. (2009) A practical algorithm for finding maximal exact matches in large sequence data sets using sparse suffix arrays. Bioinformatics 25: 1609-1616. PubMed
Yanover C, Singh M, Zaslavsky E. (2009) M are better than one: an ensemble-based motif finder and its application to regulatory element prediction. Bioinformatics. 25: 868-874. PubMed
Persikov AV, Osada R, Singh M. (2008) Predicting DNA recognition by Cys2His2 zinc finger proteins. Bioinformatics 25: 22-29. PubMed
Banks E, Nabieva E, Chazelle B, Singh M. (2008) Organization of physical interactomes as uncovered by network schemas. PLoS Comput Biol. 4: e1000203. PubMed
Banks E, Nabieva E, Peterson R, Singh M. (2008) NetGrep: fast network schema searches in interactomes. Genome Biol. 9: R138. PubMed
Capra JA, Singh M (2007). Predicting functionally important residues from sequence conservation. Bioinformatics. 23: 1875-1882. PubMed
Capra JA, Singh M. (2008) Characterization and prediction of residues determining protein functional specificity. Bioinformatics 24: 1473-1480. PubMed
Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M (2005). Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, Suppl 1: i302-310. PubMed
Kingsford CL, Chazelle B, Singh M (2005). Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 21: 1028-1036. PubMed
Osada R, Zaslavsky E, Singh M (2005). Comparative analysis of methods for representing and searching for transcription factor binding sites. Bioinformatics 20: 3516-3525. PubMed
Brooks DJ, Fresco JR, Singh M (2004). A novel method for estimating ancestral amino acid composition and its application to proteins of the Last Universal Ancestor. Bioinformatics 20: 2251-2257. PubMed
Jim K, Parmar K, Singh M, Tavazoie S (2004). A cross-genomic approach for systematic mapping of phenotypic traits to genes. Genome Res 14: 109-115. PubMed
Fong JH, Keating AE, Singh M (2004). Predicting specificity in bZIP coiled-coil protein interactions. Genome Biol 5: R11. PubMed
Brooks DJ, Fresco JR, Lesk AM, Singh M (2002). Evolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code. Mol Biol Evol 19: 1645-1655. PubMed
Malashkevich VN, Singh M, Kim PS (2001). The trimer-of-hairpins motif in membrane fusion: Visna virus. Proc Natl Acad Sci USA 98: 8502-8506. PubMed
Singh M and Kim PE (2001). Towards predicting coiled-coil protein interactions. Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB), Montreal, CA: ACM. p 279–286.
Zhao X, Singh M, Malashkevich VN and Kim PS (2000). Structural characterization of the human respiratory syncytial virus fusion protein core. Proc Natl Acad Sci USA 97: 14172-14177. PubMed
Singh M, Berger B and Kim PS (1999). LearnCoil-VMF: computational evidence for coiled-coil-like motifs in many viral membrane-fusion proteins. J Mol Biol 290: 1031-1041. PubMed
Berger B and Singh M (1997). An iterative method for improved protein structural motif recognition. J Comput Biol 4: 261-273. PubMed