Literature curation of protein interactions: measuring agreement across major public databases.

TitleLiterature curation of protein interactions: measuring agreement across major public databases.
Publication TypeJournal Article
Year of Publication2010
AuthorsTurinsky, A. L., Razick S., Turner B., Donaldson I. M., and Wodak S. J.
JournalDatabase (Oxford)
Volume2010
Paginationbaq026
Date Published2010
ISSN1758-0463
KeywordsAnalysis of Variance, Animals, Database Management Systems, Databases, Protein, Humans, Internet, Protein Interaction Domains and Motifs, Protein Interaction Mapping, Protein Isoforms, Proteins, Proteomics
Abstract

Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15‚ÄČ471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community. Database URL: http://wodaklab.org/iRefWeb.

DOI10.1093/database/baq026
Alternate JournalDatabase (Oxford)
PubMed ID21183497
PubMed Central IDPMC3011985
Grant ListMOP#82940 / / Canadian Institutes of Health Research / Canada