Biopolymer



5       Protein Backbone Construction

Backbone is a command in the Protein pulldown of the Biopolymer module which predicts the backbone conformation of a protein that contains only C atoms.

This chapter describes how to create a protein backbone, given an -carbon trace. Read this chapter if you are planning on doing protein model building.

To fully understand the concepts covered here, it is recommended that you work through the available Pilot tutorial. Descriptions of individual commands can be found using the Help/Insight_Help command in Insight II.


Introduction

The need to predict the backbone structure of a protein from its C coordinates arises in several situations. Many files in the Brookhaven Protein Data Bank contain only coordinates for the C atoms, either because of low resolution in the electron density data, or because the author chose not to publish the complete structure. For some protein families, an -carbon trace is the only structural information available. -carbon traces may also be constructed as a preliminary step in an X-ray crystallographic project. Finally, some simplified models used in simulations of protein folding represent proteins by their -carbon traces. The results of these simulations could be better evaluated if the simplified C representation could be converted into a more complete and realistic peptide backbone structure.


Tutorial

Biopolymer lesson 3 is a Pilot tutorial script that demonstrates the procedure of constructing a protein backbone from an -carbon trace.

To start Pilot, click the mortarboard icon on the Insight II toolbar. When the Pilot interface appears, click Select and choose the Biopolymer tutorials from the list.

Select Lesson 3 -- Building a Backbone from an Alpha-Carbon Trace and click the Select button. Run the tutorial by clicking either of the two leftmost buttons at the top of the Pilot window.

These two buttons will execute a single command from the tutorial. The leftmost button executes commands without displaying the graphical user interface, while the adjacent button displays the interface containing predefined command parameters and waits for the user to click the execute button in the user interface parameter block.


Theory

Most commands in the Biopolymer module are sufficiently simple in concept that they do not require a description of underlying theory. An exception is the Protein/Backbone command, which predicts the coordinates of all backbone and C atoms in a protein given only the coordinates of the C atoms. This command uses a new algorithm that applies geometrical and statistical constraints to find an approximate solution to this difficult problem. This section describes the theory underlying this algorithm.

Using a new algorithm developed at MSI, the Protein/Backbone command can provide an approximate reconstruction of peptide backbones from their -carbon traces. The method is most easily understood by comparison with other approaches to the problem, some of which use similar constraints. One of the most powerful of these constraints is a geometrical constraint defined and identified below as the bond angle constraint. One way to categorize approaches to this problem is to distinguish between those methods that do not use the bond angle constraint and those that do. The algorithm used by the Protein/Backbone command is in the latter category.

Methods that do not use the bond angle constraint

The simplest in this category is the construction of a backbone from fragments of known protein structures that have local conformations similar to portions of the -carbon trace (Holm and Sander 1991; Levitt 1992; Reid and Thornton 1989). This method is limited by its dependence upon the availability of suitable backbone templates in the database of structures.

Another conceptually simple approach is incremental construction (first of the backbone, then of the side-chains) with repeated cycles of molecular dynamics and energy minimization (Correa 1990). This method gives the most accurate results yet published (RMS deviation in the backbone atoms of 0.19 angstroms for -lytic protease), but requires dozens of hours of calculation time using a 50 MHz array processor.

A third approach in this category is the directed conformational search method of Bassolino-Klimas and Bruccoleri (1992). This method searches through tens of millions of possible backbone conformations using a directed search algorithm that evaluates candidate structures by potential energy and/or RMS deviation from the C atom coordinates. This method is also extremely time-consuming (fifty to one hundred CPU hours for a typical protein) and gives relatively inaccurate results.

The method of Rey and Skolnick (1993) uses some of the constraints of standard peptide bond geometry, but not the bond angle constraint. It uses statistical averages (from a database of known structures) of C atom orientations relative to neighboring C atoms to predict the C atom position for each residue. It then uses certain standard bond lengths and angles (and additional statistics from the database) to fix the positions of the amide nitrogen, carbonyl carbon, and carbonyl oxygen relative to the C atom and C atoms. The Rey and Skolnick algorithm does not enforce planarity of the peptide bonds. It is also a strictly local algorithm: the locations of the backbone atoms of each residue are determined entirely by the coordinates of the two nearest neighbor C atoms. For these two reasons the method is relatively inaccurate.

Methods that use the bond angle constraint

Algorithms in this category, including the one used by the Protein/Backbone command, maintain the planarity of peptide bonds (either approximately or exactly) and rely on geometrical constraints that allow a global optimization of backbone conformation over the full length of the polypeptide chain. This approach is exemplified in the work of Purisima and Scheraga (1984), Luo et al. (1992) and Payne (1993). The fundamental constraint that underlies all these methods is illustrated in Figure 8.

Figure 8 . Conformations illustrating bond angle restraint

Two conformations of a tripeptide that illustrate the bond angle constraint. Each conformation is shown as a stereo pair in which the middle residue (an alanine) is shown in its entirety, along with the two adjacent peptide planes and the carbon atoms of the two neighboring residues. In both conformations the positions of the carbon atoms and the orientations of the left peptide planes (angles ) are identical. The only differences between the two are in the orientations of the right peptide planes (angles µ) and in the orientation of the side-chain. A. = 178°, µ = 99.7°. B. = 178°, µ = -98.9°. In both A and B the angle formed by the three carbon atoms is 130.7°.  

Figure 8 shows two conformations of an alanine residue with its two neighboring peptide bonds and the C atoms of the two neighboring residues. The positions of the three C atoms and of the atoms in the left peptide plane are identical in both conformations. The only difference in the backbones of the two conformations is in the orientation of the right peptide plane with respect to the plane containing the three C atoms. This illustrates the bond angle constraint: given the positions in space of three consecutive C atoms, and the orientation of one of the two peptide planes between them, the orientation of the other peptide plane can have at most two distinct and calculable values.

The orientation of each peptide plane is measured as the dihedral angle between the peptide plane and the plane containing the three C atoms. For convenience, these dihedrals are referred to as peptide plane angles.

Each nonterminal residue in a protein has associated with it two such angles, denoted and µ, on the N- and C- terminal sides, respectively, of the residue. More specifically, i is defined as the dihedral formed by the four backbone atoms

where i is the residue number. The angle µi is the defined as the reflection of the dihedral formed by

where the reflection of an angle x (that varies between ± 180º) is defined thus:

Eq. 2

The angle i is 0 when Oi-1 and

are eclipsed, and µi is 180º when Oi and

are eclipsed. Both angles range from -180º to +180º. The direction of positive rotation follows the dihedral convention specified by the IUPAC-IUB commission (IUPAC-IUB 1970). With these conventions, i = µi = 0 when i = i = 0.

The bond angle constraint is so named because it is a consequence of the fixed bond angle (of about 111º) formed by the central nitrogen, C and carbonyl carbon atoms of the tripeptide. The constraint was stated above in terms of the peptide plane angles and µ, but it can also be expressed in terms of the more commonly used conformational angles and : given the positions of three consecutive C atoms and the value of either or of the central residue, the other angle ( or ) can have, at most, two distinct and calculable values. Although some methods have used the - version of the bond angle constraint (Purisima and Scheraga 1984; Luo et al. 1992), it is more natural and convenient to formulate the problem of backbone prediction using peptide plane angles.

The mathematical formulation of the bond angle constraint was first derived in terms of and by Nishikawa et al. (1974; their equation 13). Wako and Scheraga (1982) published another formulation (their equations A34 and A35) that relate and to the peptide plane angles. For the latter they used the symbols 1 and 2; these are identical to and µ, respectively, as defined above. Payne (1993) published the bond angle constraint exclusively in terms of the peptide plane angles (his equation 5). This equation, expressed in the angles and µ as defined above, is as follows:

Where is the bond angle formed by Ni, Ci , and Ci; is the virtual bond angle formed by Ci-1, Ci and Ci+1; is the virtual bond angle formed by Ci+1, Ci and Ci; and is the virtual bond angle formed by Ci-1, Ci and Ni. With the assumption of standard geometry, the angles , , and are constant and have the following values:

The above constants are for the case in which both neighboring peptide bonds are in the trans configuration. If the peptide bond between residues i-1 and i is cis, then different values must be used for :

If the peptide bond between residues i and i+1 is cis, then different values must be used for :

Given these constants, contains only three variables: the peptide plane angles and µ and the angle formed by the virtual bonds between the three consecutive C atoms. If both peptide bonds are trans, the assumption of standard geometry limits the range of :

Eq. 1 min max

where

Eq. 2 min = - -

and

Eq. 3 max = + +

For the all-trans non-proline case, min and max are about 75° and 146°, respectively. Given an angle within this range, can be solved analytically to find all possible pairs of angles and µ that satisfy the bond angle constraint. Figure 9 depicts graphs of these solutions for eleven values of . Notice that there is only one conformation compatible with = min, namely = µ = 0. Similarly, only the fully extended conformation ( = µ = 180 degrees) is compatible with = max. For values of near these extremes, and µ are restricted to small oval-shaped curves centered on the min or max solution. For only a limited range of (from about 104° to 115°) is it possible to find a valid µ for all values of between -180° and +180°.

Figure 9 . Analytic solutions to the bond angle restraint

Analytic solutions to the bond angle constraint () for eleven values of ranging from min (75°) to max (146°). Because and µ are angles, the graph "wraps around" on itself along the left and right edges and along the top and bottom edges. The four corners of the graph represent the same point. The letters A and B indicate the two points that correspond to the two conformations shown in Figure 8.  

A second important constraint used by methods in this category is based on the following observation. The angles µi and i+1 describe the orientation of the same peptide plane with respect to two different (but overlapping) triplets of C atoms. The two angles are related by the torsion angle i formed by the virtual bonds between the four consecutive C atoms Ci-1 through Ci+2:

Eq. 4

This relationship, which in this discussion will be referred to as the virtual torsion constraint, was first described by Nishikawa et al. (1974; their equation 10). It provides a means by which the local bond angle constraints at each residue of the protein can be combined into a single global constraint on the entire backbone.

Perhaps the simplest method for combining the bond angle and virtual torsion constraints to predict the backbone conformation from an -carbon trace is the following:

  1. Beginning at an arbitrarily chosen nonterminal residue (the one nearest the N-terminus, for example), make an arbitrary guess at the value of i for this residue.

  2. Use the bond angle constraint () to calculate the two possible values of µi compatible with the guessed i.

  3. For each of these two values of µi, use the virtual torsion constraint (Eq. 4) to calculate a corresponding i+1 for the next residue.

  4. Repeat steps (2) and (3) for each value of i+1 to obtain four possible values of i+2. For each of these, repeat steps (2) and (3) to obtain sixteen possible values of i+3, etc., until the end of the protein is reached.

This is the essence of the methods used by Purisima and Scheraga (1984) and Luo et al. (1992). The algorithm appears to suffer from the problem that it produces multiple solutions, the number of which increases exponentially with the number of residues. In practice this exponential explosion does not occur, because most of the partial solutions lead to dead ends: i.e., the given at step (2) is out of the range within which valid solutions for µ exist. Furthermore, the best solution among those that do not lead to dead ends can be found by selecting the one containing minimal steric clashes (Purisima and Scheraga, 1984) or the maximal number of and angles observed at high frequency in the database of known structures (Luo et al. 1992).

Algorithm used by the Protein/Backbone command

The Protein/Backbone command finds a globally optimal solution to and Eq. 4 using an optimization algorithm known as dynamic programming. This method is much faster and more robust than the algorithm outlined above. In addition, the Protein/Backbone command imposes a statistical constraint that favors conformations that occur frequently in the database of known structures. The algorithm is similar in many ways to the one independently derived by Payne (1993).

The following outline describes in detail the algorithm used by the Protein/Backbone command to predict a backbone conformation from an -carbon trace:

  1. The distances between consecutive C atoms are calculated. For standard trans peptide bonds, this distance should be about 3.8Å. If it is greater than a user-specified parameter (known as the break threshold), then there is assumed to be a covalent break in the polypeptide chain at that location. The break threshold is typically 4.5Å. If the distance between two C atoms is less than another user-specified threshold (known as the cis threshold), then the peptide bond between those residues is assumed to be in the cis configuration. A typical value for the cis threshold is 3.2Å.

  2. For each nonterminal residue i, the virtual bond angle i formed by Ci-1, Ci and Ci+1 and the virtual torsion angle i formed by Ci-1, Ci and Ci+1 and Ci+2 are measured and stored.

  3. For each nonterminal residue i, a list of possible peptide angle pairs (, µ) that satisfy the bond angle constraint is calculated and stored. The points in the list are taken at discrete sampling intervals of 1.4°. The purpose of steps (4) and (5), below, is to select the single best (, µ) pair from each of these lists of possible angle pairs.

  4. Each angle pair in the lists constructed in step (3) has associated with it a cost value. The single best angle pair from each list must be selected such that the sum of their costs, over all nonterminal residues, is minimal. Initially the cost of each angle pair is set to its Ramachandran cost, i.e., a cost that reflects the frequency of occurrence of the given (, µ) conformation in a database of representative protein structures (Boberg et al., 1992). Those (, µ) pairs that occur with high frequency in the database have low costs; those that occur rarely or not at all have high costs.

  5. A dynamic programming algorithm is used to find the globally optimal set of angle pairs, one from each nonterminal residue. This algorithm adjusts the costs of the angle pairs by adding to each of them a torsion cost, T, defined thus:

T = (i - µi - 180° -i+1)2

The torsion cost is zero if the virtual torsion constraint (Eq. 4) is exactly satisfied. It reflects the compatibility of the µ of one residue with the of the next residue. A nonzero torsion cost implies that there will be some local strain in the backbone. This strain manifests itself as a deviation in the angle from its ideal value. The dynamic programming algorithm finds the optimal solution that simultaneously minimizes the torsion and Ramachandran costs over all nonterminal residues.

  1. For a protein of N residues numbered 0 to N-1, the optimal angles 1 and µN-2 specify the optimal orientations of the first and last peptide planes, respectively. For all other peptide planes, the optimal orientation is calculated as the average of the orientations specified by µi and i+1.

  2. Using the optimal peptide plane orientations and assuming standard bond lengths and angles, the algorithm calculates for each nonterminal residue the coordinates of the following atoms:

    amide nitrogen

    amide hydrogen (or delta carbon of proline)

    carbonyl carbon

    carbonyl oxygen

    C atom (or H atom HA2 of glycine)

    H atom HA1

    C atom of proline (i.e., entire side-chain of proline)

Coordinates for all atoms of the proline side chain can be calculated because the locations of both C and C are determined by the peptide plane orientations. The coordinates of C are calculated so as to maintain the proper C-C and C-C bond lengths with minimal pucker of the proline ring. This method gives the proline side chain maximal tolerance for torsions about the N-C bond (dihedral angle ). It has the slight disadvantage that the proline ring may have less pucker than the energetically optimal conformation.

The algorithm also calculates coordinates for these atoms in the N-terminal residue:

carbonyl carbon

carbonyl oxygen

and for these atoms in the C-terminal residue:

amide nitrogen

amide hydrogen

carbonyl carbon (if C-terminal residue is PRO)

all side-chain atoms (if C-terminal residue is PRO)

The peptide plane angles cannot be used to determine the coordinates of any of the following:

amide nitrogen of N-terminal residue

amide hydrogen of N-terminal residue

side-chain atoms of the N-terminal residue

carbonyl oxygen of C-terminal residue

carbonyl carbon of C-terminal residue (if not PRO)

side-chain atoms of residues other than ALA, GLY and PRO

These atoms are assigned arbitrary coordinates that ensure chemically reasonable bond lengths and angles, but that do not necessarily minimize potential energy or avoid steric overlap. For the atoms of each side-chain other than GLY, ALA and PRO, the arbitrary coordinates are those of the extended conformation.

The original coordinates of the C atoms are preserved exactly.

Accuracy

The accuracy of this method is similar to that of Payne (1993). For most well-refined structures, the RMS deviation between the experimentally determined coordinates of the backbone atoms and those predicted by the Protein/Backbone command are on the order of 0.5Å. The accuracy tends to be greater in -helices and less elsewhere. Most of the RMS deviation is attributable to occasional peptide "flips", in which the wrong one of two possible solutions to the bond angle constraint is selected. This is presumably a result of the fact that the algorithm ignores steric clashes of the backbone with other parts of the protein that are distant in the sequence but nearby in space.


Methodology and implementation

The implementation of the Backbone command (in the Protein pulldown) requires a protein structure that contains only C atoms. Therefore, the Molecule Name parameter must specify the name of a protein that contains only C atoms.

The algorithm creates and positions the carbonyl carbon, carbonyl oxygen, amide nitrogen, amide hydrogen, H atom, and C atom for all residues. It creates and positions the entire side-chain of each proline residue. It also creates side-chains for all other residues, but it makes no attempt to optimize their conformations. Instead, it places the side-chains in arbitrary extended conformations.

By default the side-chains and all hydrogen atoms are created with their display status off, so they are not visible. If the boolean parameters Display_Side_Chains or Display_Hydrogens are toggled on, then the side-chains or hydrogens, respectively, are displayed after the command is executed. The N- and C-termini are created in the charged form.

The Backbone command recognizes breaks in the polypeptide chain by the distance between consecutive C atoms. Where this distance exceeds the parameter Break Threshold, the command assumes there is a break in the chain at that location and constructs N- and C-termini there. The command also recognizes cis peptide bonds by the distance between consecutive C atoms. Where this distance is less than the Cis Threshold, the command creates a peptide bond in the cis configuration.




Last updated September 30, 1997 at 11:31AM PDT.
Copyright © 1997, Molecular Simulations, Inc. All rights reserved.