| Biopolymer |


The database information is derived for a set of known protein structures from the Brookhaven Protein Databank and stored in the file $BIOSYM/data/biopolymer/database.dat. The same list of proteins used in loop searches are used to create this database. The database $BIOSYM/gifts/biopolymer/structure.db consists of all the pdb files release by PDB before 8/6/97.
The stored structural information is mostly residue-based and includes some information on atoms and secondary structure. Searches of this database can address many common protein structure questions.
To start Pilot, click the mortarboard icon on the Insight II toolbar. When the Pilot interface appears, click Select and then choose the Biopolymer tutorial from the list.

For each protein the stored information consists of:
Proteins to search
The search can be limited to those proteins with a resolution better than some specified value or using a keyword such as "`hemoglobin" in their description. However, by default all proteins are searched.
A template is a fragment of consecutive residues. Each residue may be defined by residue type (or some wildcard group of residue types such as "hydrophobic"), by secondary structure type, by main chain or side chain torsions or by intra-template C
-C
distance. Each residue in the template may be defined as much or as little as required.
atoms and/or side chain "center"
A database search usually finds a number of hit fragments that satisfy the query.
Search parameters
The user can specify how tightly the search criteria must be adhered to by changing the search tolerance. For example, if you specify a required main chain or side chain torsion, then, by default, all structures within 30° of the required value will be retrieved. After you enter the required information using the Define_Template, IntraTmplt_Cnstrt, InterTmplt_Cnstrt and Run_Search commands, the control file for the search program is generated, and the search runs as a background job under Insight II.
![]()
|
An example of the input file that represents this query is shown below:
NHIT 50The default database file used for searching is found in $BIOSYM/data/biopolymer/database.dat.
DTOL 1.0
ATOL 30.0 30.0
CTOL 50.0
STOL 30.0
DBAS ./asp_proteinase.db
WILD ./wildcard.dat
RSLN 10.0
TMPL 3 asp1
RESD ASP * * * *
RESD ANY * * * *
RESD GLY * * * *
TMPL 3 asp2
RESD ASP * * * *
RESD ANY * * * *
RESD GLY * * * *
CONS CACA asp1:1 asp2:1 5.5 7.5 0
CONS IRNG asp1:1 asp2:1 0 500 0
Browsing database search results
The output of the Run_Search command is a list of hits stored in an Insight II table file called run_name.tab. In addition, a summary of the calculation, including query details and the hits found is included in a file run_name.log in the local directory. The Read_Search_Result command allows the list of protein structural hits to be loaded into an Insight II table by setting the Load_Type parameter to Load_Hit_List. When the Load_Type parameter is set to Load_Hit_Protein, then one or more protein structure hits may be loaded into Insight II and superimposed using the atom subsets that are then automatically defined.
Table 1 Hit Protein asp1:ASP asp2:ANY asp1:GLY asp2:ASP asp2:ANY asp2:GLYthen loading hits 5 and 7 will produce subsets named HIT7$PEP and HIT9$HVP respectively, and the subsets can simply be used in the Transform/Superimpose command in the viewer module to superimpose the two hits.
........
hit7 5pep 32:ASP 33:THR 34:GLY 215:ASP 216:THR 217:GLY
.....
hit9 4hvp A25:ASP A26:THR A27:GLY B25:ASP B26:THR B27:GLY\
.....
The two hits in the above table represent porcine pepsin and HIV protease, respectively.
ANY -- any amino acid.
ALLXGP -- all except gly, pro.
HYPHOB -- hydrophobic.
HYPHIL -- hydrophilic.
ACIDIC -- glu, asp.
BASIC -- lys, arg, his.
NEUTRAL -- of neutral pH.
AROMTC -- phe, trp, tyr.
SMALL -- gly, ala, val, ser, thr.
GLNASN -- gln, asn.
These definitions are contained in the data file $BIOSYM/data/biopolymer/wildcard.dat.
New residue type definitions can be entered into this file, or into a local copy specified by the Res_File_Type parameter in the Run_Search command. These new definitions may then be entered into the Res_Type parameter in the Define_Template command, but they do not show up in the value aid. The set of valid secondary structure types includes
H -- Folded i.e. any of next 4 types.
A --
-helix.3 -- 3-Turn.
4 -- 4-Turn.
5 -- 5-Turn.
T -- Turn, includes 3,4,5.
E -- Extended chain.
N-- N terminal.
C-- C terminal.
As well as the ANY category for any type. Each of the above types can be prefixed by NOT, and these categories are also presented in the SecStruct_Type value aid. The secondary structure types in the database are assigned using a Kabsch and Sander algorithm. n-Turn is a backbone conformation in which the hydrogen bond between CO of residue i and NH of reside i+n is formed.
The CA torsion for the i-th residue is the virtual torsion defined by the position of the C
atoms for residues i-1 to i+2. By default this is unrestricted (value -500.0) The default tolerance for the CA virtual torsion is 50.0 degrees and can be altered by turning on the More_Parameters boolean in the Run_Search command and then modifying the CATor_Tolerance parameters. Side chain torsion restrictions can be set by turning on the Cnstrn_SideChain boolean in the Define_Template command; the default tolerance is again set by turning on the More_Parameters boolean in the Run_Search command and setting the Sch_Tolerance parameter value.
atom to C
atom distance, a C
atom to side chain center distance or a side chain center to side chain center distance. These options are available using the Constraint_Method parameter. A target distance for the constraint can be set and the distance tolerance (default 1.0A) is one of the additional parameters in the Run_Search command. The atoms that define the centers of the side chain for specific residues are shown below.Definition of side chain center:
PRO -- CG
GLY -- CA
ALA -- CB
VAL -- CG1/CG2
LEU -- CD1/CD2
ILE -- CD
MET -- CE
PHE -- CH
TRP -- NE1
SER -- OG
THR -- OG1
ASN -- OD1/ND2
GLN -- OE1/NE2
CYS -- SG
TYR -- OZ
ASP -- OD1/OD2
GLU -- OE1/OE2
LYS -- NZ
ARG -- NH1/NH2
HIS -- ND1/CD2
Where / implies the geometric mean coordinate of the two atoms.
to C
, C
to side chain center or side chain center to side chain center. In the case of inter-template constraints the Constrain parameter also contains one additional option. When Res_Separate_Range is selected, additional parameters are enabled which allow the user to only select hits where the separation, in terms of number of residues apart in the protein sequence, are either inside or outside a specific range. One use of this feature is to ensure that a given query find only one example of each hit by requiring template 2 to be after template 1 in the sequence, by default two examples of each hit would be otherwise returned.
Another application of the residue separation constraint is to specify a variable number of residues between two fixed patterns of residues. For example, to find a sequence pattern G-G-(2,5)X-G-- two consecutive gly residues followed by between two and five residues before another gly residue-- the first template is two gly residues and the second template is one gly residue. The constraint is that the first residue of the first template and the residue of the second template are between four and seven residues apart.
It also is often necessary to set an exclusion range of residues. For example, in studying side chain-side chain interactions you may require that the interactions be between residues remote in sequence. Two templates can be defined each containing one residue of the amino acid type of interest, and set an exclusion range between the two templates. If this range is set between -5 and +5 residues, then the two residues must be at least five residues apart in the sequence.
An instance where it is necessary, but not obvious, to set a constraint is if two identical templates are defined. For example, in searching for two interacting histidine residues, you would define two templates; one for each histidine residue. In this case you also must specify an exclusion range between the two templates of at least -1 to 1 to ensure that any single histidine residue does not satisfy both templates.
In other instances, it may be desirable that one residue in a protein is simultaneously in two templates. Take, for example, search for a structure with two
-helices which are connected by a loop of between 6 and 12 non-helical residues. To do this you would define two templates and set a residue separation constraint between them. The first template is 12 residues long with the first six residues specified to be helix, and the second six specified as not helix. The second template is 12 residues long, with the first six residues not helix and the second six helix. The constraint is that the fist residue of the first template, and the first residue of the second template are between 6 and 12 residues apart.This could be illustrated by the two extreme solutions: where H = helix, and N = not-helix.
First residues in the two templates separated by 6 residues:
Template 1: H-H-H-H-H-H-N-N-N-N-N-NFirst residues in the two templates separated by 12 residues:
Template 2: N-N-N-N-N-N-H-H-H-H-H-H Fragment found: H-H-H-H-H-H-N-N-N-N-N-N-H-H-H-H-H-H
Template 1: H-H-H-H-H-H-N-N-N-N-N-NIn the case which the first residues area separated by 6 residues, some residues in the hit structure are in both templates.
Template 2: N-N-N-N-N-N-H-H-H-H-H-H Fragment found: H-H-H-H-H-H-N-N-N-N-N-N-N-N-N-N-N-N-H-H-H-H-H-H
$BIOSYM/bin/biopolymer/Run_SearchThe structure database search is then performed by the executable
$BIOSYM/$BIOSYM_PLATFORM/biosym_exe/template_db_searchThe job can be made to run locally or remotely by setting the Background_Job/Setup_Bkgd_Job/Background_Job parameter to Run_Search and setting the Host parameter to either Local or to the hostname of a machine on the local area network. The progress of the search calculation can be monitored using the Background_Job/Completion_Status command and the job aborted using the Background_Job/Kill_Bkgd_Job command. For the supplied database.dat file a simply query of the database can be performed in a few minutes.
The Run_Search command will default to a maximum of 50 hits. In addition to setting the torsion angle constraint tolerances the search can be restricted to crystal structures of a given resolution by the setting the Xtal_Resolution parameter to High, Medium or Low. The definitions of these terms are as follows:
Run_Search Job_Nameand the same set of files produced by running from within Insight II will be produced.
Custom database creation
$BIOSYM/bin/biopolymer/Run_Crebaseis called from Insight II and this in turn calls the executable
$BIOSYM/$BIOSYM_PLATFORM/biosym_exe/crebasewhich creates the run_name.db file used in the search. By default, the Create_DB command will create a new database file, but if the Append_DB boolean is turned on, then the new data will be added to an existing run_name.db file.
Define_Template
A template is a fragment of consecutive residues. The Define_Template command allows you to either create a template by sequentially adding residues to the template or to modify an existing template. Each residue in the template may be defined by residue type (or a wildcard group of residue types such as "hydrophobic"), by secondary structure type, or by main chain or side chain torsions. Each residue in the template may be defined as much or as little as required.
The Intratmplt_Cnstrnt command defines the relationship of two residues in a template in terms of the distance between two C
atoms, a C
atom and side chain center, or two side chain centers.
The Intertmplt_Cnstrnt command defines a constraint between two residues in different templates in terms of the residue distance or residue separation range in a protein sequence.
atoms, a C
atom and side chain center, or two side chain centers.
The Delete_Query command allows the user to delete individual templates or constraints, or to delete all elements of a query.
The List_Query command lists defined queries to the text port or to a file. The format of the list is the same as that in the command file jobname.ddb written out by the Run_Search command. Please refer to the manual for details of the format.
The Run_Search command writes out the control file for searching the database and starts the search as a background job.
The Read_Search_Results command allows you to load either a table of hit lists or a list of hit pdb files with a subset of hit residues.
The Create_DB command allows you to create a database file from a list of pdb file names.
1
Kabsch and Sanders method is used to determine secondary structure type.