NOTE: as of 7/12/2006, the local NetBlast implementation was out of order. If you wish to use this program, inquire to Dr. Dememler.
Use these for the following purposes:
The same nucleotide and protein databases at NCBI are available at UTHSCSA. They are updated periodically directly from NCBI. Databases for RPS-Blast are NCBI's remake of Pfam, Smart, COGS, KOGS, and an option to search All (which includes a few more protein families than the union of the other 3).
Comparison to other methods:
Note: These programs are constantly updated from NCBI. Sometimes there are small changes to the command syntax. If the suggested syntax does not work, try typing the command followed by a <space> <dash> to see a synopsis of the syntax for the currently installed version.
| Program | Documentation file
NOTE: the documentation described below is now somewhat dated, but the files can still be read in /home/hardies/oldblast Newer documentation is at this site |
Summary of functions |
| blastall | README.bls | Run blastp, blastn, blastx, tblastn or tblastx
searches.
Permits
|
| blastall | README.bls | Run psitblastn search. Key is a position specific matrix generated by blastpgp. The key is searched against a nucleotide database translated in all six frames. |
| fastacmd | README.bls |
|
| blastpgp | README.bls | Conduct Psi-Blast search in non-interactive mode
Permits:
A requirement to match a regular expression specifying a sequence motif is added to the first round of a Psi-Blast search. Note: command line Psi-Blast requires you to specify the regular expression in a file of a particular format (see documentation). NetBlast lets you just put the regular expression in a box. Syntax of of regular expressions for blast is described at http://www.ncbi.nlm.nih.gov/blast/html/PHIsyntax.html |
| seedtop | README.bls | Searches a database for match to a regular expression specifying
a sequence motif.
Can also search a library of patterns against a sequence. You would have to obtain the library (from the Procite database, for example). NetBlast only lets you do this in conjunction with a Blast search. Command line blast let's you do it separately. |
| bl2seq | README.bls | Blast two sequences against each other. Can be done in blastp, blastn, blastx, tblastx, or tblastn modes. |
| formatdb | README.formatdb | Creates a user-defined blast searchable database from a multifasta
file.
To get rid of redundancies that block formatdb over a list of sequences retrieved by Entrez:
|
| megablast | README.mbl | Speeds up a blastn search between two very long nucleotide sequences at the cost of assuming near identity. Mainly used for overlapping clones. |
| blastclust | README.bcl | Organizes a database into sets of homologous sequences |
| rpsblast | README.rps | Searches a sequence against a library of protein family models
(mainly derived from the Pfam and smart databases).
Databases are in /ncbi/rpsblast and allowable names are Pfam, Smart,
Cog, Kog, and All. The case must be matched. e.g.
Permits:
|
| impala | README.imp | Searches a protein family model (derived from blastpgp) against a database of sequences. |
| copymat
makemat |
README.rps | Programs used to convert PSSMs generated by blastpgp to protein family models used by rpsblast and impala. |
| fmerge | Merges two databases (as multifasta files) with removal of redundant
gi's.
It's a two step process resulting in addition of fasta entries from update_file to the multifasta file oldlib:
|
To retrieve specific sequences from the daily updated databases, instead of the GCG fetch function, use fastacmd -s <accession number> -d $BLAST_DB/<database name> -o <output file> to retrieve the sequence. If the database is protein nr, also include -pT in the parameter list to avoid confusion with the nucleotide nr database. The file retrieved will be in fasta format. Use GCG command fromfasta to convert it to GCG format, if desired.
We do not at the moment have a way to make the daily updated databases
available to non-blast GCG programs like FastA. We recommend the
Blast suite in place of those programs in any case.