CLUSTAL W
Program status
-
The bcf program suite containing clustalw seems to have disappeared.
-
Version 2.05 is installed in the suite of programs maintained by Dr. Hardies.
-
Place /home/hardies/apps in your path, and then clustalw2 will run when
you type clustalw.
Versions of clustal can be found in many applications and web pages.
Often many of the options and capabilities are missing. All versions conduct
a progressive alignment, whereby the closest sequences are aligned first.
Remote clustal W alignments may be computed with many but not all options
at http://www.ebi.ac.uk/clustalw/
Capabilities of ClustalW 2.05 include:
-
Ability to align sections separately, manually edit them, and then put
the sections together.
-
Ability to do an NJ tree after alignment (a better tree than the cladogram
generated during the alignment).
-
Can output the distance matrix or do bootstrap analysis.
-
Secondary structure information can be added to a sequence and used to
steer gap placement.
-
User defined position specific gap penalties can be manually added (but
not automatically computed).
-
Some ability to fine tune gap placement parameters.
-
Input and output in a variety of formats.
Because ClustalW incorporates a Neighbor Joining phylogenetic analysis
and and a Bootstrap analysis, it can directly support the first steps of
a phylogenetic analysis. Note that for divergent sequences, it is
often necessary to reduce noise by confining the analysis to the more conserved
blocks of sequence. We expect an implementation of PAUP soon, which
has the best facilities for exploring that strategy.
See also clustalW embedded in the SeaView
editor, and the in the PC packages Vector
NTI, and LaserGene,
which are under a site license at UTHSCSA.
The program is run on bioinf by the command clustalw. Upon
running the program, a tree structured menu appears to guide the use of
the program.
Documentation
-
Each menu page in the clustalw menu system has context-specific
help.
-
The collection
of context-specific help messages for version 2.05.
-
Thompson, J D; Higgins, D G; Gibson, T J 1994. CLUSTAL W: improving
the sensitivity of
progressive multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice. Nucl. Acids Res.
22: 4673-4680. [version
on web without figures]
-
For information about some other flavors of Clustal, see Jeanmougin F.
Thompson JD. Gouy M. Higgins DG. Gibson TJ. 1998. Multiple sequence alignment
with Clustal X. Trends in Biochemical Sciences. 23(10):403-5, and
citations therein.
CLUSTAL format.
-
Often indicated by .aln extension.
-
Example:
CLUSTAL W(1.60) multiple sequence alignment
Humlbpa M---MGALARALPS-ILLALLLTSTPEALGA-NPGLVARITDKGLQYAAQEGLLALQSEL
Rablpb M---MGTWARALLGSTLLSLLLAAAPGALGT-NPGLITRITDKGLEYAAREGLLALQRKL
Ratlbp M---MKSATGPLLP-TLLGLLLLSIPRTQGV-NPAMVVRITDKGLEYAAKEGLLSLQREL
Humcetp M---MLAATVLT---LALLGNAHACSKGTSH-EAGIVCRITKPALLVLNHETAKVIQTAF
Maccetp M---MLAATVLT---LALLGNVHACSKGTSH-KAGIVCRITKPALLVLNQETAKVIQSAF
Rabcetp -----------------------ACPKGASY-EAGIVCRITKPALLVLNQETAKVVQTAF
Humbpi MRENMARGPCNAPRWVSLMVLVAIGTAVTAAVNPGVVVRISQKGLDYASQQGTAALQKEL
Bovbpi M---MARGPDTARRWATLVVLAALGTAVTTT-NPGIVARITQKGLDYACQQGVLTLQKEL
Humlbpa LRITLPDFTGDLRIPHVGRGRYEFHSLNIHSCELL
Rablpb LEVTLPDSDGDFRIKHFGRAQYKFYSLKIPRFELL
Ratlbp YKITLPDFSGDFKIKAVGRGQYEFHSLEIQSCQLR
Humcetp QRASYPDITGEKAMMLLGQVKYGLHNIQISHLSIA
Maccetp QRANYPNITGEKAMMLLGQVKYGLHNIQISHLSIA
Rabcetp QRAGYPDVSGERAVMLLGRVKYGLHNLQISHLSIA
Humbpi KRIKIPDYSDSFKIKHLGKGHYSFYSMDIREFQLP
Bovbpi EKITIPNFSGNFKIKYLGKGQYSFFSMVIQGFNLP
Programs accepting clustal format input may or may not allow/enforce
the following:
-
Header line may not be required, or just a blank line may be required.
Usually the CLUSTAL keyword is required, and sometimes a specfic version
number as above may be required.
-
Number of blank lines between blocks usually does not matter.
-
Many programs tolerate(and discard) number lines or other kinds of
annotation between the blocks starting with blanks in the label field.
-
Most display programs accept characters other than - within the alignment.
Other programs may be more fussy.
-
Invariably must be the same number of characters per sequence within a
particular block. Programs may allow variable white space such that
first characters for each sequence in block do not need to be vertically
aligned.
-
Number of characters per block generally flexible (from one block to the
next).
-
All labels must appear in all blocks in same order.
-
No spaces allowed in labels.
-
Labels traditionally limited to 10 characters. Other programs on input
of aln files may cut longer labels to 10 characters.
-
Many programs will error-check for redundant labels. This may include
after truncation to 10 characters.
Some versions of clustalw allow some options in the output.
-
May allow longer labels.
-
Put a line under bottom of block indicating conserved positions.
-
Put a coordinate after the end of each line.
Last updated 4/4/2008 - Steve Hardies