Homology


Contents

Release 98.0, December 1998


1. Introduction

What is Homology?
Hardware and Installation
How to Invoke Homology
Program Environment
Homology and Insight II
Saving Homology Information with Insight II
Command Logging and Restarting
Homology and Discover
Operations

2. Theory

Background
Homology Model Building
Searching Sequence Databases With the FASTA Program
ktup Value
Scoring Local Regions
Joining Regions
Optimizing the Sequence Matching
Explicit Statistical Estimation
The Determination of Structurally Conserved Regions (SCRs)
Manual Determination of Structurally Conserved Regions
Automatic Determination of Structurally Conserved Regions
Automatic Sequence Alignment Methods
Needleman and Wunsch Algorithm for Pairwise Alignment
MSI's Pairwise Alignment Procedure
MSI's Automatic Multiple Sequence Alignment
Clustal W
Scoring Matrices
Multiple Structure Alignment
Simultaneous Superposition of Structures
Assignment of Coordinates Within a Conserved Region
Assignment of Coordinates for Loop or Variable Regions
Search Loops Command
Generate Loops Command
Side Chain Conformational Searches Using Rotamer Libraries
Refinement of the Model Using Molecular Mechanics
The Potential Energy Equation
Energy Minimization
Energy Constraints
Molecular Dynamics
Secondary Structure Prediction
The Chou-Fasman Method
The GOR II Method
Hydrophobicity Profiles
Solvent Accessible Surfaces
Definition of Solvent Accessible Surface Area
Significance of Solvent Accessible Surface Area
Solvation Module versus ProStat
Solvent Accessible Surface Area for Protein Structure Validation
Rules for Protein Validation

3. Implementation

Sequence Window
Sequence Display
Controls
Sequence Boxes
Sequence Gaps
Manipulating the Sequence Display
Scrolling Modes
Seq Mode
Box Mode

4. Command summary

Modules
Pulldowns
Commands
Sequences pulldown
Boxes pulldown
Loops pulldown
Residue pulldown
Databases pulldown
Background_Job pulldown
Alignment pulldown
By_Residue pulldown
Refine pulldown
Consensus pulldown
Profiles_3D pulldown
ProStat pulldown
Modeler pulldown
Seqfold pulldown

5. Methodology

Step 1: Determine Which Proteins Are Related to the Model Protein
Sequence Database Searching
Motif searching
Step 2: Determining Structurally Conserved Regions (SCRs)
Automatic Determination of SCRs
Specifying the Initial Search Zone
Optimizing the Automatic Search for SCRs
Subsets
Automatic Superimposing of Structures
Characteristics of m-boxes
Handling of Existing Boxes
Interrupting the Search
Manual Determination of SCRs
Finding Pairwise SCRs
Criteria for Evaluating Manually-Determined SCRs
Summarizing the Manually-Determined SCRs
Superimposing Reference Proteins Using Manually-Determined SCRs
Multiple Sequence Alignment as an Alternative to the Manual Method
Step 3: Sequence Alignment
Choosing a Scoring Matrix
Automatic Sequence Alignment without SCRs
Automatic Sequence Alignment with Automatically-Determined SCRs
Automatic Sequence Alignment with Manually-Determined SCRs
Pairwise Manual Sequence Alignment
Multiple Sequence Alignment
Specifying the Initial Search Zone
Specifying a Mandatory Sequence
Automatic Calculation of Pairwise Threshold
Statistical Significance and Alternate Sequence Coloring
Characteristics of m-boxes
Subsets
Automatic Superimposing of Structures
Adjusting the Sensitivity and Selectivity of the Search
Handling of Existing Boxes
Interrupting the Search
Excessive Calculation Time
Single_Search Mode
Manual Mode
Step 4: Assigning Coordinates Within the SCRs
Step 5: Building Loop or Variable Regions (VRs)
Searching for and Displaying Loops
Generating and Displaying Loops
Building Coordinates for the VRs
Step 6: Conformational Search for Side Chains Using Rotamers
Step 7: Refining the Structure with Discover
Running Discover with Homology-Built Model Structures
End Repair
Splice Repair
Energy Minimization
Molecular Dynamics
Step 8: Validating Results
Structure Checking
Residue Dihedral Angles
Secondary Structure Classification
Algorithmic Implementation
United Atom Models
Setting Atomic Radii
Definition of Computed Surface Areas and their Significance
Total Surface Area
Relative Surface Area and the Tripeptide Model
Polar and Apolar Surface Area
Limitations in Implementation
Conclusion

6. Tutorial

Introduction
Hardcopy and Pilot online tutorias
Hardcopy lessons
Lesson 4a: Finding structurally conserved regions
Lesson 4b: Building SCRs and loops
Lesson 10: Finding alternative multiple sequence alignments

A. References

B. File Formats

Introduction
Amino Acid Scoring Matrices
User Scoring Matrix Files
Sequence Alignment Command
Input Databases Command
Get Sequence, Alignment and Databases commands

C. Glossary

D. seq_extract Utility

Output
Results Displayed on the Screen
Output Files
Execution Options

E. Matrices

Sequence Alignment Matrices
Identity Matrix
Codon Substitution Matrix
Dayhoff Evolutionary Mutation Matrix
Hydrophobicity Matrix
Input Databases Command Matrices
Identity Matrix
Codon Substitution Matrix
Dayhoff Evolutionary Mutation Matrix
Hydrophobicity Matrix

F. Hydrophobicity Scale Values

Amino Acid Values
Threshold Values

G. Sequence Databases

Protein Sequence Databases
NBRF
Swiss-Prot
DNA Sequence Databases
GenBank
EMBL Data Library

H. Clustal W Standalone

HELP 1: General help for CLUSTAL W
HELP 2: Help for multiple alignments
HELP 3: Help for pairwise alignment parameters
HELP 4: Help for multiple alignment parameters
HELP A: Help for protein gap parameters.
HELP 5: Help for output format options.
HELP 6: Help for profile and structure alignments
HELP B: Help for secondary structure / gap penalty masks
HELP C: Help for secondary structure / gap penalty mask output options
HELP 7: Help for phylogenetic trees
HELP 8: Help for choosing a weight matrix
HELP 9: Help for command line parameters DATA (sequences)
HELP 0: Help for tree output format options


Last updated January 06, 1999 at 05:39PM PST.
Copyright © 1998, 1999 Molecular Simulations, Inc. All rights reserved.