| Biopolymer |

atoms. This chapter describes how to create a protein backbone, given an
-carbon trace. Read this chapter if you are planning on doing protein model building. To fully understand the concepts covered here, it is recommended that you work through the available Pilot tutorial. Descriptions of individual commands can be found using the Help/Insight_Help command in Insight II.

coordinates arises in several situations. Many files in the Brookhaven Protein Data Bank contain only coordinates for the C
atoms, either because of low resolution in the electron density data, or because the author chose not to publish the complete structure. For some protein families, an
-carbon trace is the only structural information available.
-carbon traces may also be constructed as a preliminary step in an X-ray crystallographic project. Finally, some simplified models used in simulations of protein folding represent proteins by their
-carbon traces. The results of these simulations could be better evaluated if the simplified C
representation could be converted into a more complete and realistic peptide backbone structure.

-carbon trace.To start Pilot, click the mortarboard icon on the Insight II toolbar. When the Pilot interface appears, click Select and choose the Biopolymer tutorials from the list.
Select Lesson 3 -- Building a Backbone from an Alpha-Carbon Trace and click the Select button. Run the tutorial by clicking either of the two leftmost buttons at the top of the Pilot window.
These two buttons will execute a single command from the tutorial. The leftmost button executes commands without displaying the graphical user interface, while the adjacent button displays the interface containing predefined command parameters and waits for the user to click the execute button in the user interface parameter block.

atoms in a protein given only the coordinates of the C
atoms. This command uses a new algorithm that applies geometrical and statistical constraints to find an approximate solution to this difficult problem. This section describes the theory underlying this algorithm. Using a new algorithm developed at MSI, the Protein/Backbone command can provide an approximate reconstruction of peptide backbones from their
-carbon traces. The method is most easily understood by comparison with other approaches to the problem, some of which use similar constraints. One of the most powerful of these constraints is a geometrical constraint defined and identified below as the bond angle constraint. One way to categorize approaches to this problem is to distinguish between those methods that do not use the bond angle constraint and those that do. The algorithm used by the Protein/Backbone command is in the latter category.
-carbon trace (Holm and Sander 1991; Levitt 1992; Reid and Thornton 1989). This method is limited by its dependence upon the availability of suitable backbone templates in the database of structures.
Another conceptually simple approach is incremental construction (first of the backbone, then of the side-chains) with repeated cycles of molecular dynamics and energy minimization (Correa 1990). This method gives the most accurate results yet published (RMS deviation in the backbone atoms of 0.19 angstroms for
-lytic protease), but requires dozens of hours of calculation time using a 50 MHz array processor.
A third approach in this category is the directed conformational search method of Bassolino-Klimas and Bruccoleri (1992). This method searches through tens of millions of possible backbone conformations using a directed search algorithm that evaluates candidate structures by potential energy and/or RMS deviation from the C
atom coordinates. This method is also extremely time-consuming (fifty to one hundred CPU hours for a typical protein) and gives relatively inaccurate results.
The method of Rey and Skolnick (1993) uses some of the constraints of standard peptide bond geometry, but not the bond angle constraint. It uses statistical averages (from a database of known structures) of C
atom orientations relative to neighboring C
atoms to predict the C
atom position for each residue. It then uses certain standard bond lengths and angles (and additional statistics from the database) to fix the positions of the amide nitrogen, carbonyl carbon, and carbonyl oxygen relative to the C
atom and C
atoms. The Rey and Skolnick algorithm does not enforce planarity of the peptide bonds. It is also a strictly local algorithm: the locations of the backbone atoms of each residue are determined entirely by the coordinates of the two nearest neighbor C
atoms. For these two reasons the method is relatively inaccurate.
Methods that use the bond angle constraint
Algorithms in this category, including the one used by the Protein/Backbone command, maintain the planarity of peptide bonds (either approximately or exactly) and rely on geometrical constraints that allow a global optimization of backbone conformation over the full length of the polypeptide chain. This approach is exemplified in the work of Purisima and Scheraga (1984), Luo et al. (1992) and Payne (1993). The fundamental constraint that underlies all these methods is illustrated in Figure 8.
Figure 8 shows two conformations of an alanine residue with its two neighboring peptide bonds and the C
atoms of the two neighboring residues. The positions of the three C
atoms and of the atoms in the left peptide plane are identical in both conformations. The only difference in the backbones of the two conformations is in the orientation of the right peptide plane with respect to the plane containing the three C
atoms. This illustrates the bond angle constraint: given the positions in space of three consecutive C
atoms, and the orientation of one of the two peptide planes between them, the orientation of the other peptide plane can have at most two distinct and calculable values.
The orientation of each peptide plane is measured as the dihedral angle between the peptide plane and the plane containing the three C
atoms. For convenience, these dihedrals are referred to as peptide plane angles.
and µ, on the N- and C- terminal sides, respectively, of the residue. More specifically,
i is defined as the dihedral formed by the four backbone atoms

Eq. 2
The angle
i is 0 when Oi-1 and
i = µi = 0 when
i =
i = 0.
The bond angle constraint is so named because it is a consequence of the fixed bond angle (of about 111º) formed by the central nitrogen, C
and carbonyl carbon atoms of the tripeptide. The constraint was stated above in terms of the peptide plane angles
and µ, but it can also be expressed in terms of the more commonly used conformational angles
and
: given the positions of three consecutive C
atoms and the value of either
or
of the central residue, the other angle (
or
) can have, at most, two distinct and calculable values. Although some methods have used the
-
version of the bond angle constraint (Purisima and Scheraga 1984; Luo et al. 1992), it is more natural and convenient to formulate the problem of backbone prediction using peptide plane angles.
The mathematical formulation of the bond angle constraint was first derived in terms of
and
by Nishikawa et al. (1974; their equation 13). Wako and Scheraga (1982) published another formulation (their equations A34 and A35) that relate
and
to the peptide plane angles. For the latter they used the symbols
1 and
2; these are identical to
and µ, respectively, as defined above. Payne (1993) published the bond angle constraint exclusively in terms of the peptide plane angles (his equation 5). This equation, expressed in the angles
and µ as defined above, is as follows:
Where
is the bond angle formed by Ni, C
i , and Ci;
is the virtual bond angle formed by C
i-1, C
i and C
i+1;
is the virtual bond angle formed by C
i+1, C
i and Ci; and
is the virtual bond angle formed by C
i-1, C
i and Ni. With the assumption of standard geometry, the angles
,
, and
are constant and have the following values:


:
:
and µ and the angle
formed by the virtual bonds between the three consecutive C
atoms. If both peptide bonds are trans, the assumption of standard geometry limits the range of
:
Eq. 1
min
max
where
Eq. 2
min =
-
-
and
Eq. 3
max =
+
+
For the all-trans non-proline case,
min and
max are about 75° and 146°, respectively. Given an angle
within this range, can be solved analytically to find all possible pairs of angles
and µ that satisfy the bond angle constraint. Figure 9 depicts graphs of these solutions for eleven values of
. Notice that there is only one conformation compatible with
=
min, namely
= µ = 0. Similarly, only the fully extended conformation (
= µ = 180 degrees) is compatible with
=
max. For values of
near these extremes,
and µ are restricted to small oval-shaped curves centered on the
min or
max solution. For only a limited range of
(from about 104° to 115°) is it possible to find a valid µ for all values of
between -180° and +180°.
![]()
|
A second important constraint used by methods in this category is based on the following observation. The angles µi and
i+1 describe the orientation of the same peptide plane with respect to two different (but overlapping) triplets of C
atoms. The two angles are related by the torsion angle
i formed by the virtual bonds between the four consecutive C
atoms C
i-1 through C
i+2:
Eq. 4
This relationship, which in this discussion will be referred to as the virtual torsion constraint, was first described by Nishikawa et al. (1974; their equation 10). It provides a means by which the local bond angle constraints at each residue of the protein can be combined into a single global constraint on the entire backbone.
i for this residue.
i.
i+1 for the next residue.
at step (2) is out of the range within which valid solutions for µ exist. Furthermore, the best solution among those that do not lead to dead ends can be found by selecting the one containing minimal steric clashes (Purisima and Scheraga, 1984) or the maximal number of
and
angles observed at high frequency in the database of known structures (Luo et al. 1992).
Algorithm used by the Protein/Backbone command
The Protein/Backbone command finds a globally optimal solution to and Eq. 4 using an optimization algorithm known as dynamic programming. This method is much faster and more robust than the algorithm outlined above. In addition, the Protein/Backbone command imposes a statistical constraint that favors conformations that occur frequently in the database of known structures. The algorithm is similar in many ways to the one independently derived by Payne (1993).
atoms are calculated. For standard trans peptide bonds, this distance should be about 3.8Å. If it is greater than a user-specified parameter (known as the break threshold), then there is assumed to be a covalent break in the polypeptide chain at that location. The break threshold is typically 4.5Å. If the distance between two C
atoms is less than another user-specified threshold (known as the cis threshold), then the peptide bond between those residues is assumed to be in the cis configuration. A typical value for the cis threshold is 3.2Å.
, µ) conformation in a database of representative protein structures (Boberg et al., 1992). Those (
, µ) pairs that occur with high frequency in the database have low costs; those that occur rarely or not at all have high costs.
i - µi - 180° -
i+1)2
of the next residue. A nonzero torsion cost implies that there will be some local strain in the backbone. This strain manifests itself as a deviation in the angle
from its ideal value. The dynamic programming algorithm finds the optimal solution that simultaneously minimizes the torsion and Ramachandran costs over all nonterminal residues.
1 and µN-2 specify the optimal orientations of the first and last peptide planes, respectively. For all other peptide planes, the optimal orientation is calculated as the average of the orientations specified by µi and
i+1.
and C
are determined by the peptide plane orientations. The coordinates of C
are calculated so as to maintain the proper C
-C
and C
-C
bond lengths with minimal pucker of the proline ring. This method gives the proline side chain maximal tolerance for torsions about the N-C
bond (dihedral angle
). It has the slight disadvantage that the proline ring may have less pucker than the energetically optimal conformation.
atoms are preserved exactly.
-helices and less elsewhere. Most of the RMS deviation is attributable to occasional peptide "flips", in which the wrong one of two possible solutions to the bond angle constraint is selected. This is presumably a result of the fact that the algorithm ignores steric clashes of the backbone with other parts of the protein that are distant in the sequence but nearby in space.
The implementation of the Backbone command (in the Protein pulldown) requires a protein structure that contains only C
Methodology and implementation
atoms. Therefore, the Molecule Name parameter must specify the name of a protein that contains only C
atoms.
atom, and C
atom for all residues. It creates and positions the entire side-chain of each proline residue. It also creates side-chains for all other residues, but it makes no attempt to optimize their conformations. Instead, it places the side-chains in arbitrary extended conformations.
atoms. Where this distance exceeds the parameter Break Threshold, the command assumes there is a break in the chain at that location and constructs N- and C-termini there. The command also recognizes cis peptide bonds by the distance between consecutive C
atoms. Where this distance is less than the Cis Threshold, the command creates a peptide bond in the cis configuration.