Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters.

TitleCharacterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters.
Publication TypeJournal Article
Year of Publication2012
AuthorsDey, S., A. Pal, M. Guharoy, S. Sonavane, and P. Chakrabarti
JournalNucleic Acids Res
Volume40
Issue15
Pagination7150-61
Date Published2012 Aug
ISSN1362-4962
KeywordsAmino Acids, Binding Sites, DNA, DNA-Binding Proteins, Evolution, Molecular, Hydrogen Bonding, Protein Conformation, Reproducibility of Results, RNA-Binding Proteins, Support Vector Machines
Abstract

We present a set of four parameters that in combination can predict DNA-binding residues on protein structures to a high degree of accuracy. These are the number of evolutionary conserved residues (N(cons)) and their spatial clustering (ρ(e)), hydrogen bond donor capability (D(p)) and residue propensity (R(p)). We first used these parameters to characterize 130 interfaces in a set of 126 DNA-binding proteins (DBPs). The applicability of these parameters both individually and in combination, to distinguish the true binding region from the rest of the protein surface was then analyzed. R(p) shows the best performance identifying the true interface with the top rank in 83% cases. Importantly, we also used the unbound-bound test cases of the protein-DNA docking benchmark to test the efficacy of our method. When applied to the unbound form of the DBPs, R(p) can distinguish 86% cases. Finally, we have applied the SVM approach for recognizing the interface region using the above parameters along with the individual amino acid composition as attributes. The accuracy of prediction is 90.5% for the bound structures and 93.6% for the unbound form of the proteins.

DOI10.1093/nar/gks405
Alternate JournalNucleic Acids Res.
PubMed ID22641851
PubMed Central IDPMC3424558
Research group: