Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins.

TitleOptimal protein structure alignments by multiple linkage clustering: application to distantly related proteins.
Publication TypeJournal Article
Year of Publication1995
AuthorsBoutonnet, N. S., Rooman M. J., Ochagavia M. E., Richelle J., and Wodak S. J.
JournalProtein Eng
Volume8
Issue7
Pagination647-62
Date Published1995 Jul
ISSN0269-2139
KeywordsAlgorithms, Amino Acid Sequence, Consensus Sequence, Models, Chemical, Molecular Sequence Data, Protein Conformation, Sequence Alignment
Abstract

A fully automatic procedure for aligning two protein structures is presented. It uses as sole structural similarity measure the root mean square (r.m.s.) deviation of superimposed backbone atoms (N, C alpha, C and O) and is designed to yield optimal solutions with respect to this measure. In a first step, the procedure identifies protein segments with similar conformations in both proteins. In a second step, a novel multiple linkage clustering algorithm is used to identify segment combinations which yield optimal global structure alignments. Several structure alignments can usually be obtained for a given pair of proteins, which are exploited here to define automatically the common structural core of a protein family. Furthermore, an automatic analysis of the clustering trees is described which enables detection of rigid-body movements between structure elements. To illustrate the performance of our procedure, we apply it to families of distantly related proteins. One groups the three alpha + beta proteins ubiquitin, ferredoxin and the B1-domain of protein G. Their common structure motif consists of four beta-strands and the only alpha-helix, with one strand and the helix being displaced as a rigid body relative to the remaining three beta-strands. The other family consists of beta-proteins from the Greek key group, in particular actinoxanthin, the immunoglobulin variable domain and plastocyanin. Their consensus motif, composed of five beta-strands and a turn, is identified, mostly intact, in all Greek key proteins except the trypsins, and interestingly also in three other beta-protein families, the lipocalins, the neuraminidases and the lectins. This result provides new insights into the evolutionary relationships in the very diverse group of all beta-proteins.

Alternate JournalProtein Eng.
PubMed ID8577694