Consensus



6       Tutorial

In this tutorial, you will learn how to set up and run a distance geometry calculation using a sequence alignment as input. The program automatically analyzes the relationships between the reference and model proteins and determines appropriate distance restraints that must be met by the model protein. Upon launching of the DGII background job, a specified number of protein structures that all satisfy the restraints is generated.

1. Invoke Insight II and Homology

Type biosym_tutorial at the UNIX prompt.

Wait a few moments while Insight II loads.

Select Homology from the Module pulldown by picking the MSI icon with the mouse.

Even if you have not purchased the Homology product, a subset of its commands is available to Consensus.

2. Read in a series of reference proteins

The serine protease family of enzymes are well suited to illustrate the principles of homology model building. On one hand, their core structures and active sites are remarkably similar to one another, and on the other hand, there is quite a bit of variation at the surfaces of the molecules. Also, many members of the family have been studied extensively, and high resolution X-ray structures are available.

Select the Restore_Folder command from the File pulldown. Pick consensus_lesson1.psv from the value-aid. Leave all the other parameters at their default values. Select Execute.

Three proteins, chymotrypsin (CHA), trypsinogen (TGN), and elastase (EST) are read in, along with their corresponding sequences. Each protein is colored magenta, except that the regions defined to be structurally conserved regions (SCRs) are colored yellow.

3. Read a sequence (model protein) file

Select Get from the Sequences pulldown. Select pka.seq from the value-aid. The Get Sequence Name parameter is automatically set to PKA. Select Execute.

The sequence of porcine kallikrein (PKA) appears in the sequence display at the bottom of the screen. The one-letter sequence codes are in lowercase to indicate that no coordinates have been assigned to the residues of the protein.

4. Align the sequence of the unknown protein to those of the reference protein

(This step assumes that you have the Homology module as well as Consensus.)

While the sequences of the reference proteins were aligned to each other already, it still remains to relate the residues of the model protein to them. This can be done in an automatic fashion, with some adjustments. As with any homology modeling project, either using traditional model building methods or using this advanced distance geometry approach, the sequence alignment is the most critical step and must be done perfectly.

Select the Alignment/Pairwise_Sequence command. Choose Automatic as the Seq Align Mode, then Identity as the Scoring Matrix. From the value-aid, select PKA and TGN. Select Execute.

(Ignore the error message stating that the summary boxes are not correct. Simply select Done in the message window.)

Although the alignment procedure yields a reasonable result, there must be some adjustment here. That is because a sub-optimal residue/residue match early in the sequence can preclude a perfect alignment downstream. Therefore, perform the following steps.

In the sequence window, locate residue EST:ASP_97. Using the right mouse button, click and drag the residue to the right to insert a gap of two residues for all three reference proteins.

Similarly, insert a gap of two residues by dragging residue PKA:THR_112 two positions to the right.

In a similar manner, insert sequence gaps at the following positions: EST LEU 151 Drag to the right 2 residues
PKA HIS 159 Drag to the right 2 residues
PKA GLY 192 Drag to the right 4 residues
PKA PRO 204 Drag to the right 1 residue

This is now the final sequence alignment used to build a structure for PKA.

5. Working with family-wide SCRs

Most often, all proteins of a family will possess the same SCRs in exactly the same locations along the peptide chain. After all, that is the very essence of what is meant by a conserved region. In areas of the reference proteins where this is found to be the case, it is best to relate the residues of the model protein to those of all the reference proteins simultaneously. In that way, the distance restraints that are calculated will reflect the overall trend for the family as a whole. They won't reflect a single or a small number of the reference proteins, making the prediction of the conformation of the model protein that much less biased and more reliable.

In the Consensus interface to DGII, there are two ways to relate the residues of the model protein to those of the reference proteins. One is through the use of summary boxes, where the conformations of all the reference proteins are examined simultaneously. That will be shown in this section. The other way is through ordinary sequence boxes. That will be illustrated in the next section.

Select the DGII_Setup command from the Consensus pulldown. Leave the Setup Operation parameter set to Residue_List, as you create and edit a list of reference protein residues associated with each model protein residue. Leave the Activation parameter at Add. Pick PKA from the value-aid, and it becomes the new Model Protein. Leave the Include_Hydrogens parameter Off so that hydrogen atom positions will not be generated by DGII. With Box Type set to Summary and All_Summary_Boxes turned Off, pick a residue in the second summary box in the sequence window. This should be the one that, for the protein EST, begins with the sequence WAHTC. The box number 36 appears in the Summary Box Num parameter, and the command automatically executes.

Now, check to see what happened.

Select List as the Activation mode, and select Execute again.

The textport pops forward, and a list of associated residues appears. It is sorted by the residues of the model protein. For each, note that there are three reference protein residues. That is because in a summary box, that is in an SCR that spans the entire family of reference proteins, there is always an equal number of protein members.

6. Working with single SCRs

Usually, the model protein may share residues with one or more of the reference proteins, but not all of them. There will be no summary box in such regions of the sequence alignment. As it is nearly always better to impose some restraints on the proposed structures than not, it is possible to associate model residues with those of a single reference protein.

Select the Initialize Boxes command. With the mouse, pick the first residue in both PKA and EST. PKA:1 and EST:16 appear in the parameters. Select Execute.

A green (active) sequence box appears enclosing the two residues.

Be sure the Mode control in the sequence window is set to Box mode. Use the right mouse button to click inside the green box, hold down the right mouse button and drag right until the box encloses residue HIS_20 in PKA. Select Freeze Boxes and pick a residue of PKA that is inside the box. The number 47 appears in the Box Num parameter. Select Execute, and the box turns red (frozen).

Now select DGII_Setup again. Change the Box Type parameter to Normal. Pick PKA as the Model Protein again. (Note that once you choose a model protein, you may not change your mind unless the list of associated residues is cleared.) Now pick a residue in PKA that lies within the box, and 47 appears in the Box Num parameter. Select Execute.

The reference residues for EST only are now associated with PKA in the single peptide segment indicated by the sequence box.

Select List as the Activation parameter and select Execute.

Again, the textport pops forward, and the list of associated residues is displayed. Note that for model residues PKA:1-20, only a single reference residue (from EST) is given.

7. Processing the whole protein at once

It is a tedious ordeal to add sequence boxes one at a time or even summary boxes. Instead it is possible to add all the summary boxes at once. That way, as long as you are satisfied with the finished sequence alignment, it is a simple process to associate all the related residues at once.

Set the Activate parameter back to Add and the Box Type parameter back to Summary. Turn On the All_Summary_Boxes parameter and select Execute.

The SCRs will be added one at a time in sequence from N- to C-terminus.

Set the Activation parameter to List and select Execute again.

Now all the model residues have three associated reference residues except for PKA:1 and PKA:2, since these were the only two residues that extended beyond the first summary box.

Note that there exists an analogous All_Normal_Boxes parameter. This is most useful when the SCRs among the reference proteins are thought to be of unequal length and greater flexibility is needed.

8. Set the numerical parameters

In addition to the association of corresponding residues, several numerical parameters must be initialized before a distance geometry calculation can be submitted. These have to do with the way in which the upper and lower bounds for the interatomic distances are calculated. The precision is a multiplier for the observed range of a particular distance, while the tolerance modifies the bounds depending on the number of different reference proteins that are included. (See the help text for complete descriptions.)

Again, select the DGII_Setup command. Select Parameters as the Setup Operation. Leave all the parameters at their default values and select Execute. Now Cancel.

9. Make the DGII a batch background job

DGII calculations take quite a long time to run. Consequently, it is best to make this a background job that runs in batch mode.

Select the Setup_Bkgd_Job command from the Background_Job pulldown. Pick DGII_Run from the Background_Job parameter's value-aid. Set the Execution_Mode to Cmd_File_Only. Leave the Host set to Local. Select Execute.

10. Create the DGII input files

Select the DGII_Params command from the Consensus pulldown. Turn the three major boolean parameters, Smooth, Embed, and Optimize On. Accept the default values for all the parameters. Select Execute.

The DGII_Run command is automatically activated.

Here again, all of the default parameter values are appropriate for this application. Enter an arbitrary name for the Projct_Description parameter and the number of structures you want to generate in this example, say 3, for the DGII_Num_Structures parameter. Select Execute.

This step of the calculation will take considerable time, on the order of one hour on an Indigo R3000 workstation. At the end of that time, several DGII database files are output, including PKA_DGII.car and PKA_DGII.mdf, containing the coordinates of a hypothetical protein with the same sequence as the model PKA protein. This was automatically built to establish covalent restraints for the model protein. (This hypothetical protein, and the final results will contain hydrogen atoms or not, depending on whether the Include_Hydrogens parameter was turned on or off.) Also generated is the PKA_DGII.geom file, a binary file containing the restraints for all the interatomic distances angles obtained from the analysis of the aligned reference proteins. The file bkgd_job_pka_dgii0.csh is the shell script that controls the execution of the various images that comprise a complete DGII calculation.

11. Exit Insight II

Since the interactive part of the modeling session is complete, it is possible to leave the Insight II environment.

Select Quit from the Session pulldown and select Execute.

12. Submit the DGII Background Job

After the DGII files are set up, at the UNIX prompt type:

>	bkgd_job_pka_dgii0.csh &

The programs begin to execute. The time required is approximately 4.5 hours per requested structure on an Indigo R4000 workstation, or roughly overnight. The results are in the file PKA_DGII.arc, containing three proposed conformations for kallikrein, each consistent with the structural information obtained from the three reference proteins. Because this job had been run with the Include_Hydrogens parameter set Off, the molecules contained in PKA_DGII.arc have no hydrogen atoms. They must be added back using the Hydrogens command found in the Modify pulldown of the Biopolymer module before any further energy refinement is done using Discover.




Last updated October 06, 1997 at 09:20PM PDT.
Copyright © 1997, Molecular Simulations, Inc. All rights reserved.