Comparison among different programming systems for organizing protein families.

Scope: Psi-Blast, ClustalW, HMMER, SAM

Caveat: Each system is actively upgraded, often with the motive to incorporate features derived from the others.  Hence,  a capability cited as lacking in some program in the comments below may appear in a later version.  Links to pages describing the individual programs should be followed to find the latest innovations in each system.
 

Psi-Blast

Psi-Blast is generally used to search the entire protein database for distant homologues.  It is a multi-iteration search tool, for which the first round is identical to BlastP.  In each additional iteration, it composes a Position Specific Scoring Matrix (PSSM) from the sequences already found, and uses that as the key in the subsequent iteration.  The PSSM has a different matching matrix for each position in the alignment of sequences already found.  The matrix for each position is based on the general matching matrix (generally Blossum 62) upweighted for residues already found at that position in the alignment.  The scheme is very similar to that used in ClustalW.  It follows from the development of profile methods, and is a forerunner to the Hidden Markov Models (HMM) used by HMMER and SAM.  It differs from the HMM methods in that gaps are placed according to the BlastP algorithm (a fixed penalty plus a penalty for extension), rather than learning position specific gap penalties from the sequences already aligned.

CLUSTAL W

Clustal W is a profile-based progressive alignment system.  That means it first aligns the closest sequences to form a profile, representing the different residues at each position.  It subsequently aligns sequence to profile and profile to profile using the information in the profile to help align distant families.  Clustal W is the most commonly used algorithm after Psi-Blast, and the most commonly used algorithm to prepare seed alignments for HMMER.

Implementation:

Characteristics: Advantages: Disadvantages: Related programs:

HMMER

HMMER creates an HMM model of a protein family.  The HMMER HMM contains both position-specific residue information and position specific gap penalties.  To calculate these it uses a build cycle, wherein it iteratively computes the gap penalties from the existing alignment, and then realigns the sequences according to the position specific gap penalties.  In principle, it should be able to achieve its final alignment and model starting from unaligned sequences.  In practice the process is more likely to settle on a good final model if the sequences are prealigned.  Most typically this is done by ClustalW.  Most typically HMMER is used as follows:

Implementation.

Characteristics.

Advantages and Disadvantages.

SAM (Sequence Alignment and Modeling Tool).

SAM is very much like HMMER algorithmically.  Its implementation has a mode for searching against premade libraries like HMMER, but also a search to compile a family starting from one sequence like Psi-Blast. The developers of SAM have focused on fold recognition, and hence SAM has many facilities related to searching at higher degrees of divergence than typically done with HMMER.

Implementation

Characteristics

Advantages/ disadvantages



Last updated 3/22/2003 - Steve Hardies