Biopolymer



2       Protein Model Construction


Introduction

Proteins

Proteins are critical components of every living organism and are involved in the majority of biological processes. Within a living system proteins perform a diverse range of necessary functions, including:

The Biopolymer module contains a comprehensive set of tools for building and manipulating polypeptides and proteins. These capabilities are contained mainly in the Protein and Residue pulldowns.

Example applications include:


Tutorials

The Biopolymer module contains four Pilot tutorials related to the modeling of polypeptide and protein structures.

To start Pilot, click the mortarboard icon on the Insight II toolbar. When the Pilot interface appears, click Select and choose the Biopolymer tutorials from the list.

Select one of the following:

.

. (Lesson 4 is not relevant).

.


and click the Select button. Run the tutorial by clicking either of the two leftmost buttons at the top of the Pilot window.

These two buttons will execute a single command from the tutorial. The leftmost button executes commands without displaying the graphical user interface, while the adjacent button displays the interface containing predefined command parameters and waits for the user to click the execute button in the user interface parameter block.


Background

Basics of protein structures

Proteins are either linear polymers or the aggregates of linear polymers. The building blocks of proteins are amino acids (or residues). Of the 80 or so amino acids found in nature, 20 are used in proteins. Each of these are represented by either a three letter or a one letter code. The names, chemical formulae, three letter and one letter codes of the 20 common amino acids are shown in Figure 1 (page 6).

Figure 1 . The 20 common amino acids.

Note that each amino acid has the basic formula NH2CHRCOOH. The R group is what distinguishes one amino acid from another.  

The amino acid monomers are joined by a peptide bond --CO--NH. The peptide bond is almost planar and, in most cases, assumes a trans conformation, with the sole exception of proline which has a high probability of assuming a cis conformation.

The atoms --NH--C--CO-- that traverse the length of the polypeptide chain are called backbone atoms, while atoms attached to the backbone via the C are called side chain atoms.

The backbone torsion angles around the N--C and C--C bonds are denoted and respectively. These bonds are subject to relatively free rotation (with three major rotameric states) and are important descriptors of the local protein backbone conformation. The mean force potential energy plot in and for a single residue is called a Ramachandran plot.

Hierarchy of protein structure

The primary structure of the protein is the sequence of amino acids involved in protein chain formation. The secondary structure of the protein refers to the conformation of the peptide chain.

There exist stable conformations of the polypeptide chain in which the backbone dihedral angles are locally repeating. Common examples are -helices and -strands. An -helix is an element of secondary structure adopting a right handed helical conformation with approximately 3.6 residues per turn. There is a repeating pattern of hydrogen bonds from the carbonyl oxygen of residue i to the amide hydrogen of residue i+4. Less common are 310 helices, slender versions of -helices. In a -strand in turn the amino acid residues adopt an extended rather than helical conformation.

Several -strands can align in either parallel or antiparallel arrangements in which hydrogen bonds are formed between the CO oxygen of one strand and the NH hydrogen of another strand. These sets of strands are then said to form a -sheet.

Secondary structure elements classes have a lot of characteristic properties: e.g., helices have certain preferences for amino acid location, especially pronounced for helix capping residues. -strands also have a set of preferential residues that favor -sheet formation. Perfect, text book helices and strands rarely occur in proteins; e.g. -helices are often curved or bent and are often substituted at helical termini with 310 helices. -sheets exhibit right handed twists and can evolve into -bulges. Altogether, secondary structures in proteins are closely packed and there are a number of secondary structure packing motifs called supersecondary structures.

The tertiary structure of a protein is synonymous with the three dimensional structure of a single peptide chain, where mutual orientation of all secondary structure elements connected with loops is specified. A number of distinctive protein folds has been identified so far. They are generally classified as , , / and +.

The quaternary structure describes the manner in which two or more chains of the same or distinctive folds are combined together to form a larger system. Even single proteins can have a few alternative quaternary structures.

Below, methods for creating and modifying proteins chain models are described. Secondary structure editing and loop searching methods are also described.


Methodology and implementation

Building and editing initial protein structure

Append

The Residue/Append command is used to build polypeptides by sequentially adding residues to an existing chain or to create a new peptide containing a single residue. The geometry of the addition is controlled by choosing from a set of standard geometries or by specifying a new geometry using the Phi (), Psi (), and Omega () Angle parameters. The default set of residues is comprised of the 20 standard amino acids, their charged sidechain variants, D Proline, and the standard capping group.

Repeat

The Residue/Repeat command is used to create a new peptide consisting of a series of identical residues. The geometry of the final peptide can be controlled by specifying the dihedral angles between the residues.

Delete

The Residue/Delete command is used to remove the specified residue from its parent peptide or protein, or delete multiple residues from the same peptide or protein. No change is made to the geometry of the peptide or protein after the residue is deleted, which results in an unusually long bond between the two residues remaining on either side of where the original residue was. This anomaly is usually repaired by constraining portions of the peptide or protein, and allowing the distorted region to relax during an energy minimization.

Replace

The Residue/Replace command is used to replace a residue of an existing peptide or protein with any residue from a set of defined residues known to Insight. The chirality of the residue being used for replacement can also be controlled.

When a residue is replaced, the replacement residue is first aligned to the backbone of the original residue. After the backbone is aligned, the dihedral angles in common with the residue being replaced are also aligned. Replacement can be done on heavy atoms only or full atom models. If the replacement is done on a full atom model, new charges and potential functions types are taken from the residue library.

Cap

The Protein/Cap command is used to change the atomic configuration of either the N or C terminus of a specified peptide or protein. This command creates either a neutral sp3 or charged sp3 N terminus, and a carboxylate (COO-) or carboxylic (COOH) C terminus. When a peptide or protein is capped, it is compared against the residue library. If a match is found, the partial charges and potential function atom types are then assigned to the new capping residue.

For the C-terminus to be cappable, the last C atom of the molecule must be attached to an C atom and cannot be attached to any other heavy atoms except a maximum of two oxygens. To cap the N-terminus the first heavy atom must be a nitrogen connected to an C atom and no other heavy atoms.

Rename

The Protein/Rename command is used to assign a series of new residue numbers to a specified region in a peptide or protein. Residue numbers may contain numbers or letters. This command is frequently used after splicing in a new region of a protein from another protein to realign the residue numbering.

IUPAC_Name

The Protein/IUPAC_Name command renames sidechain atoms in amino acid residues to conform with IUPAC nomenclature rules.

The command assumes that the heavy atoms' names correctly reflect the remoteness of the heavy atoms in the side chains. In other words, the heavy atoms in the side chains contain the correct modifiers (BGDEZHT). The program only renames the numerical part of the atom names. Note that many of the names are dependent on geometry of the side chains.

List

The Protein/List command is used to list detailed information about a peptide or protein, such as secondary structural features, chirality, coordinate and topology information, dihedral angles, and total energy. You may also list molecular mechanics information such as partial charges and potential function atom types.

Setting backbone conformation and backbone construction

Secondary

The Protein/Secondary command is used to change the secondary structure of a specified region of a peptide or protein. The new geometry may be selected from known geometries (such as an -helix) or may be input by specifying the backbone dihedral angles phi, psi, and omega.

Turn

The Protein/Turn command is used to introduce standard and turns in proteins. Arbitrary turns can also be introduced by specifying and dihedral angle values.

Backbone

The Protein/Backbone command uses geometrical and statistical constraints to predict the coordinates of all backbone and C atoms in a protein given only the coordinates of the C atoms. In addition it predicts the coordinates of all atoms in any proline side-chains. It also creates side-chains for all other residues, but it makes no attempt to optimize their conformations. Instead it creates them in arbitrary extended conformations. These arbitrary side-chain conformations can subsequently be optimized using the Residue/Auto_Rotamer and Residue/Manual_Rotamer commands.

Setting protein side chain rotameric states

Manual_Rotamer

The Residue/Manual_Rotamer command places one of the rotamers appropriate for a particular amino acid type at the position of the specified residue. You can cycle through all the possible Rotamer Choices by repeating the command. If the Evaluate_Energy parameter is on, then a single-point nonbond energy calculation is performed, calculating the electrostatic and van der Waals interactions between the specified residue and the residues that are within the specified distance. If the Bump_Check parameter is on, the newly placed side chain will be tested for steric overlap against the rest of the protein. You can reject all of the choices and return to the original side chain conformation by selecting Original as the Rotamer Choice.

Auto_Rotamer

The Residue/Auto_Rotamer command finds the optimum combination of rotamer choices for a given list of moving residues. A search is performed starting at the beginning of the list and proceeding down. For each residue, the rotamer choice that produces the lowest energy is retained, and the search continues with the next moving residue. The search stops when the energy has not changed, or when the maximum number of iterations has been exceeded.

Searching protein loop conformations

SearchLoop

The Protein/SearchLoop command is used to search the Brookhaven protein database for regions of proteins that meet a defined geometric criterion. This command uses an existing C atom distance matrix to search for regions of proteins whose C atom distances best fit those of the selected region of the protein being studied, while meeting the additional constraint of having the specified number of residues present between the regions of interest.

Best fit is defined as the lowest root mean squared distance value, as calculated from:

Eq. 1

The number of distances compared in the search can be calculated from (N2 - N)/2, where N is the number of preflex residues plus postflex residues.

The ten best models from the search are retained for further examination. A loop region is defined as that portion of the molecule which is not included in the search; its geometry is allowed to vary, and is not a criterion in the search. The residues leading up to the loop region are defined to be the preflex residues, and the residues going away from the loop region are defined as the postflex residues.

The Protein/SearchLoop command can be used to find suitable geometries for residue insertion and deletion. The results of the search are a direct result of the files used to build the distance matrix. The inclusion of all of the Brookhaven files when building the distance matrix may lead to certain searches where the result is a selection between very similar proteins, since the Brookhaven database has varying numbers of solved structures for different types of proteins.

DispLoop

The results of the Protein/SearchLoop command are displayed with the Protein/DispLoop command. The results of the search are ranked according to goodness of fit to the desired structure. Loop 1 has the best fit while Loop 10 has the poorest. In addition to the actual loop structure additional pieces of information are presented. The RMS of the C atom distances is presented along with the name of the file in which the match was found, the starting sequence number for the match, and the actual loop sequence. When the matches are displayed they are superimposed on the original protein. You can control whether the superimposition is done by aligning only the two residues at the base of the flex region, or the C atoms along the entire pre- and post-flex regions. The RMS value of the superimposition is also presented, and varies depending on the type of superimposition selected. A match found from the search may be built into the original model using the Protein/SpliceLoop command.

More than one loop choice may be displayed at one time. In this case, the loop choice information appears in summary on the display, and is provided in full on the textport.

SpliceLoop

The Protein/SpliceLoop command is used to automatically splice in a loop choice, replacing the old loop region of a molecule with the new loop acquired through the SearchLoop command.




Last updated September 30, 1997 at 11:30AM PDT.
Copyright © 1997, Molecular Simulations, Inc. All rights reserved.