Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments.

TitleSynonymous constraint elements show a tendency to encode intrinsically disordered protein segments.
Publication TypeJournal Article
Year of Publication2014
AuthorsMacossay-Castillo, M., S. Kosol, P. Tompa, and R. Pancsa
JournalPLoS Comput Biol
Date Published2014 May
KeywordsAmino Acid Sequence, Base Sequence, Computer Simulation, Humans, Intrinsically Disordered Proteins, Models, Chemical, Models, Genetic, Models, Molecular, Molecular Sequence Data, Open Reading Frames, Structure-Activity Relationship

Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.

Alternate JournalPLoS Comput. Biol.
PubMed ID24809503
PubMed Central IDPMC4014394
Research group: