Extracting high confidence protein interactions from affinity purification data: at the crossroads.

TitleExtracting high confidence protein interactions from affinity purification data: at the crossroads.
Publication TypeJournal Article
Year of Publication2015
AuthorsPu, S., J. Vlasblom, A. Turinsky, E. Marcon, S. Phanse, S. Smiley Trimble, J. Olsen, J. Greenblatt, A. Emili, and S. J. Wodak
JournalJ Proteomics
Volume118
Pagination63-80
Date Published2015 Apr 06
ISSN1876-7737
KeywordsDatabases, Protein, Humans, Mass Spectrometry, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins
Abstract

UNLABELLED: Deriving protein-protein interactions from data generated by affinity-purification and mass spectrometry (AP-MS) techniques requires application of scoring methods to measure the reliability of detected putative interactions. Choosing the appropriate scoring method has become a major challenge. Here we apply six popular scoring methods to the same AP-MS dataset and compare their performance. The comparison was carried out for six distinct datasets from human, fly and yeast, which focus on different biological processes and differ in their coverage of the proteome. Results show that the performance of a given scoring method may vary substantially depending on the dataset. Disturbingly, we find that the high confidence (HC) PPI networks built by applying the six scoring methods to the same raw AP-MS dataset display very poor overlap, with only 1.7-4.1% of the HC interactions present in all the networks built, respectively, from the proteome-wide human, fly or yeast datasets. Various properties of the shared versus unique interactions in each network, including biases in protein abundance, suggest that current scoring methods are able to eliminate only the most obvious contaminants, but still fail to reliably single out specific interactions from the large body of spurious associations detected in the AP-MS experiments.BIOLOGICAL SIGNIFICANCE: The fast progress in AP-MS techniques has prompted the development of a multitude of scoring methods, which are relied upon to remove contaminants and non-specific binders. Choosing the appropriate scoring scheme for a given AP-MS dataset has become a major challenge. The comparative analysis of 6 of the most popular scoring methods, presented here, reveals that overall these methods do not perform as expected. Evidence is provided that this is due to 3 closely related issues: the high 'noise' levels of the raw AP-MS data, the limited capacity of current scoring methods to deal with such high noise levels, and the biases introduced using Gold Standard datasets to benchmark the scoring functions and threshold the networks. For the field to move forward, all three issues will have to be addressed. This article is part of a Special Issue entitled: Protein dynamics in health and disease. Guest Editors: Pierre Thibault and Anne-Claude Gingras.

DOI10.1016/j.jprot.2015.03.009
Alternate JournalJ Proteomics
PubMed ID25782749
Grant ListMOP #82940 / / Canadian Institutes of Health Research / Canada
Research group: