Effect of Sequence Order Independence and Dependence:
ProtClass method takes sequence order of AA residues and SSEs into account. In order words, when we compare the protein abstracts (PAs) of two different proteins or the discretized contact pattern feature vectors (CPs) of them, we take the positions (topological order in the AA sequence) and the orientations (from N-terminal to C-terminal in the AA sequence) of the SSEs in the protein.
This consideration of sequence order helps improve the accuracy of the system as shown in the figure given below.
Although sequence order independence is useful in some applications such as detection of protein surface motifs (i.e. only substructures of the proteins), it is generally not appropriate in the application of classifying the protein structures as their whole. (This is because rearrangement of AAs and SSEs occurred less frequently throughout evolution than did insertion and deletion of AAs and SSEs.)
In order to have the effect of sequence order independence, we modify our algorithm by:
· Dropping SSE sequence (S) attribute from PA.
· Dropping AA position of the 1st SSE (AS) and SSE position of the 1st SSE (SS) attributes form CP feature vector.
· Dropping AA position difference (AD) and SSE position difference (SD) attributes form CP feature vector.
· Treating the attribute value 1 (Helix-Sheet) and the value 2 (Sheet-Helix) of Contact pattern type (CT) attribute in CP feature vector to be the same.
· Assuming the maximum possible original value for torsion angle (W) attribute in CP feature vector to be from 0 to 180 degree (instead of –180 to +180 degree) assuming –x degree and +x degree to be the same.