Improved inference of tandem domain duplications.

TitleImproved inference of tandem domain duplications.
Publication TypeJournal Article
Year of Publication2021
AuthorsAluru, C, Singh, M
JournalBioinformatics
Volume37
IssueSuppl_1
Paginationi133-i141
Date Published2021 07 12
ISSN1367-4811
KeywordsAlgorithms, Evolution, Molecular, Gene Duplication, Humans, Phylogeny, Programming, Linear, Protein Domains
Abstract

<p><b>MOTIVATION: </b>Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a single evolutionary event. Existing methods for inferring domain-level evolutionary events are based on reconciling domain trees with gene trees. While some formulations consider multiple domain duplications, they do not explicitly model tandem duplications; this leads to inaccurate inference of which domains duplicated together over the course of evolution.</p><p><b>RESULTS: </b>Here, we introduce a reconciliation-based framework that considers the relative positions of domains within extant sequences. We use this information to uncover tandem domain duplications within the evolutionary history of these genes. We devise an integer linear programming approach that solves our problem exactly, and a heuristic approach that works well in practice. We perform extensive simulation studies to demonstrate that our approaches can accurately uncover single and tandem domain duplications, and additionally test our approach on a well-studied orthogroup where lineage-specific domain expansions exhibit varying and complex domain duplication patterns.</p><p><b>AVAILABILITY AND IMPLEMENTATION: </b>Code is available on github at https://github.com/Singh-Lab/TandemDuplications.</p><p><b>SUPPLEMENTARY INFORMATION: </b>Supplementary data are available at Bioinformatics online.</p>

DOI10.1093/bioinformatics/btab329
Alternate JournalBioinformatics
PubMed ID34252920
PubMed Central IDPMC8275333
Grant ListR01 GM076275 / GM / NIGMS NIH HHS / United States
ABI-1458457 / / National Science Foundation /