| Biopolymer |


Initially, the query protein can be in any format readable by Insight II but is automatically translated to pdb format prior to the analysis.
On output, a file containing subset definitions for all possible divisions of the query structure into clusters of domains is produced. This information may be further used to visualize the domain content and in any other Insight II command that understands subset definitions.
To start Pilot, click the mortarboard icon on the Insight II toolbar. When the Pilot interface appears, click Select and then choose the Biopolymer tutorials from the list.
In addition to basic operations, use of Domain_Analysis requires an understanding of the Insight II subset definition file format. Refer to the relevant chapter in the Insight II manual.

distances, derivation of optimal cutting planes, minimization of domain surface area or grouping of structural elements. The Domain_Analysis command simply groups secondary structure elements to represent a protein structure in the form of secondary structure clusters.First, the query structure is annotated by the secondary structure on a residue-by-residue basis, i.e. each residue is assigned a helix, strand or coil state in the Kabsh and Sander fashion. Then distances between all pairs of secondary structure elements are computed. The proximity of two secondary structure elements is expressed as an average of distances between all residues involved in the formation of the element. Residue positions are approximated by C
atom positions only.Next, a clustering algorithm is applied to transform the all-to-all distance table into a dendrogram representing structural clustering of secondary structure elements. The algorithm repeatedly finds the closest pair of secondary structure element clusters and joins them to form a node in the binary tree. The distance between two clusters is defined as the average of all distances between all secondary structure elements of one cluster and all secondary structure elements of the other cluster. In the tree, the hierarchical structure of the domain organization is expressed as the relations between ancestors and offsprings; i.e., closely related secondary structure elements will have more common ancestors than loosely related ones.
A single domain is defined as a cluster of closest secondary structure elements. Any number of domains (up to the number of all secondary structure elements) can be separated by performing horizontal cuts through the tree. However, it is reasonable to assume that the number of domains is much lower. The maximal number of domains is estimated by specifying a cutoff for the distance between two offsprings (Dom_Diff). An additional cutoff for the increase of the average interdomain distance (Dom_frac) specified as a fraction with respect to the immediate offsprings is also used.
The loops that connect elements within the single domain are also considered to be in the same domain. For loop regions between secondary structure elements that belong to different domains a domain boundary is defined by minimizing the sum of distances from two consecutive C
atoms in the loop region to each others nearest secondary structure element. All residues that precede the minimizing pair are considered to belong to the preceding domain. Residues that follow, belong to the following domain, respectively.The Domain_Analysis command provides a graphical user interface to the domain command line program. You have to specify the name of the background job for the domain command since it will be submitted in the background. The domain executable is also available from the command line interface. The input is a pdb file and the output takes the form of an Insight II style subset definition file. The subset naming convention is as follows: for a protein named PROT, the subset PROT$DOM$n_m (n>=m) indicates the m-th domain from the n-th domain cluster. Subset names do not differ from the background job name. Therefore, loading different results overwrites those previously loaded, though, of course, results for different proteins can co-exist. If you need to compare different runs, copy or rename your protein molecule object and rerun Domain_Analysis separately.
The domain definition created by Domain_Analysis may not be perfect, however all the necessary changes can be introduced by the Insight II subset mechanism available in the Subset pulldown (in the top Insight II menu bar). Using commands from this menu, you can calculate union, intersection and difference operations on sets of residues as defined in domain definitions. Domains can be copied, deleted and renamed, and contents of the subset can be listed to the file or to the textport. The issue of interactions between Domain_Analysis and the Subset pulldown is covered in the second Domain_Analysis Pilot tutorial. Since the domain definition files are human readable, they may also be modified manually.
Protein_ID
The Protein_ID parameter specifies the Insight II object name of the protein under study. This object must exist prior to the execution of the command, either loaded from the Insight II supported file format or created using the Biopolymer module.
The Max_Dom_Num parameter signifies the maximum number of domains in the biggest domain cluster. The default value for Max_Dom_Num is 11. High values of Max_Dom_Num heavily affect the performance of the Domain_Load state of the Domain_Analysis command. If you set Max_Dom_Num to 0, the maximum number of domains will be estimated and only that number of domains will be computed.
Dom_Frac is the threshold for the fraction of distance-related scores gained upon merging two subdomains into a domain. It is used to estimate the number of domains. See the theory section above for more information.
Dom_Diff specifies the distance cutoff for the secondary structure elements distance comparison. See the theory section above for more information.
Run_Name is any Insight II valid name for the background job.
The Domain_Subsets parameter interprets domain clustering information in the form of Insight II subsets. In principle, any subset definition file can be read by this command, but only those which follow the Domain_Analysis style can be used in the Domain_View state for visualization purposes.
The Domain_Clusters parameter has the form Domain_n (where n is the number of domains in the domain cluster you would like to depict). If your subset definition file contains more than the currently provided 15 choices you can type in the value.