Relations between protein sequence and structure and their significance.

TitleRelations between protein sequence and structure and their significance.
Publication TypeJournal Article
Year of Publication1990
AuthorsRooman, M. J., Rodriguez J., and Wodak S. J.
JournalJ Mol Biol
Volume213
Issue2
Pagination337-50
Date Published1990 May 20
ISSN0022-2836
KeywordsAmino Acid Sequence, Molecular Structure, Protein Conformation, Proteins
Abstract

The relation between amino acid sequence and local structure in proteins is investigated. The local structures considered are either the four classes of secondary structure (H, E, T and C) or four classes of local conformations defined using measures of conformational similarity based on distances between C alpha atoms. The classes are obtained by applying an automatic clustering procedure to short polypeptide fragments of uniform length from a database of 75 known protein structures. The thrust of our investigation consists of systematically searching the database for simple amino acid patterns of the type Gly-X-Ala-X-X-Val, where X denotes an arbitrary residue. Patterns that are nearly always associated with the same structure are retained. Finding many such associations, we then evaluate by a statistical approach how many among them are non-random and compare the results for different definitions of local structure. A similar comparison is made for the predictive value of retained associations, which is assessed using an internal test based on dividing the database into "learning" and "test" subsets. While we find that local structures defined by conformational similarity are not superior to secondary structure for prediction purposes, they help us gain insight into the factors that influence the predictive value of derived associations. A major conclusion is that the number of retained associations is in large excess over the number expected from a random correlation between sequence and structure, irrespective of how local conformation is defined. However, only a very small number of these associations can be earmarked as reliable using statistical criteria, due to the limited size of the database. We find, for instance, that the pattern Ala-Ala-X-X-Lys reliably characterizes helix, and the pattern Val-X-Val-X-X-X-Ala reliably characterizes extended structure and beta-strand. The possibility is discussed that these and other reliable associations correspond to regions of the polypeptide chain whose conformations are locally determined and that these regions may play a role in folding.

DOI10.1016/S0022-2836(05)80195-0
Alternate JournalJ. Mol. Biol.
PubMed ID2342111