Programs maintained on bcf by Steve Hardies
This page should be mirrored to the molgen/tutor directory.
Clustal W
Clustal W is the most commonly used
method for multiple alignment of divergent sequences. It uses progressive
alignment, which means that closer sequences are aligned first, and prealigned
sequences are represented by profiles during alignment. The implementation
has a variety of alternatives for improving gap placement guided by secondary
structure or arbitrary user intervention. Clustal is found in some form
in many other packages (eg. GCG pileup), or at various web servers, although
often restricted in some way. The interface is menu driven and each screen
has context-specific help.
-
Status:
-
The currently installed version is 2.05
-
The number of sequences handled by Clustal W depends on the amount of memory
available in the executing computer. The bcf implementation exceeds tolerances
somewhere above 250x700 characters.
-
The SeaView alignment editor. SeaView is currently out of service.
-
The clustal implementation within the site-licensed LaserGene package is
advertised as being rewritten to accept arbitrarily large alignments. We
haven't benchmarked it yet, but that would normally be expected to slow
it down.
-
The current clustal W implementation includes the option to conduct NJ
tree or UPGMA tree analysis with bootstrap support. The documentation
suggests that NJ is slow, but it should not be slow on bcf. NJ is
a much more accurate algorithm than UPGMA.
-
A local help file is available with links to
further documentation.
fastaq2phd
-
This program processes a fasta file and its associated .qual file to a
phd file
-
It was was written by James D. White, University of Oklahoma, and retrieved
from http://www.genome.ou.edu/informatics.html.
-
We use it to represent a 454 contig as a single long phd file for reassembly
with locally generated sequence reads.
-
To access, export /home/hardies/genome/bin to your path.
-
For options, type fastaq2phd
-
Typical commands for the 454 contig conversion are:
-
rename 454AllContigs.fna 454AllContigs; mkdir phd_dir; fastaq2phd -d
phd_dir -p 454 454AllContigs
-
To avoid 454 contigs becoming singlets, take a small section from ends
and run fasta2phd.
-
There is a script to harvest contig ends from a collection of contigs at
the ou site.
-
Technical notes concerning 454
data incorpration.
Phred/Phrap/Consed
Phred/Phrap and Consed are the Washington University Sequencing
Center's shotgun sequence assembly and contig
display programs. They feature probabilistic
interpretation of chromatograms for quality of data, automatic contig
assembly with heuristic upweighting
of high quality data, and automated recommendataion of primers for finishing.
The
philosophy of these programs is to remove
human inspection from the sequencing process as much as possible and to
allow residual human inspection to zero
in on problem areas as efficiently as possible. The system has a substantial
learning curve. But in testing versus
more user-friendly options, we find that this system ultimately saves significant
time
and expense.
Status:
-
7/2/2006 - Implemented version that can handle >64K phd files.
-
Put /home/hardies/apps/genome in $PATH
-
run phredPhrap_longreads instead of phredPhrap
-
run consed_longreads instead of consed
-
The purpose was to allow representing the 454 contract contigs as a single
phd file containing consensus sequence and consensus quality values.
-
This strategy may be used as a general way to pass forward the product
of a previous assembly.
-
See also fastaq2phd.
-
6/29/2006 - Implemented a separate version maintained by Steve Hardies
-
Put /home/hardies/apps/genome in $PATH
-
Default phredpar.dat file is in genome/lib
-
Default vector.seq file is in genome/lib/screenlibs/
-
determineReadTypes.perl is in genome/bin and implements Hardies' lab naming
conventions.
-
phredPhrap does not automatcally run determineReadTypes.perl. The user
should access the phd_dir and run determineReadTypes.perl prior to running
phredPhrap. It is implemented this way to allow each user to run
their own customized version of determineReadTypes.perl
-
Installed version is Consed15 and phred version 0.020425.c
-
Phred now handles CEQ data.
-
consed is a pseudosym for consed_amd64_dyn.
-
phrepar.dat has an entry for ABI 310.
-
This implementation is used extensively by the Hardies lab, but mainly
only phredPhrap and Consed. Phred, addReads2Contig, and view_assembly
have been briefly tested. Other functions may require additional configuration.
-
7/4/2006
-
General system copy in /share/apps/genome, has been withdrawn from service
pending reconciliation of its organization with the new system backup strategy.
It has become nonoperable.
-
For the view assembly button, on small projects it will usually complain
that all contigs have been excluded by user-selectable criteria. Dismiss
that window, and look for another behind the main consed window, and decrease
the number of reads and length of sequence required for view assembly to
include the contig.
-
Polyphred and autofinish have never been used at this site, and are probably
not adequately configured.
-
A local
help file has been prepared.
-
Phred/Phrap/Consed
Home Page.
-
Purpose of the vector.seq file (documentation).
-
Technical
notes concerning the maintenance history.