Databases available for BLAST searching at UTHSCSA

The Blast databases are updated nightly from NCBI.  Divisions are defined below.
By NetBlast, one can only search named divisions, although some of those are subsets of others, or are the union of other divisions.  By command line blast programs, one can search the union of multiple divisions with the syntax -d "div1 div2 ..."
Last updated 11/30/2004 - Steve Hardies
Changes may occur in the arrangement of the divisions based on pending changes at NCBI.

Peptide Sequence Databases


nr
Described by NCBI as "All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF"  Protein nr contains essentially all the protein entries that there are.  The same sequence may be present with different gi numbers as a GenBank entry, an EMBL entry, a SwissProt entry, etc.  The "nonredundant" aspect of the organization is that the actual sequence for redundant entries is only represented once, hence only searched once.  If matched in a blast search, links to all the entries corresponding to that sequence are then given.
nr.redundant
Same as nr, but one entry per sequence even if the sequence is identical to another entry.  It is not obvious that there would be any reason to burn the extra search time to search this library.  Its main utility may be for use with fastacmd to retrieve sequences with clean definition lines.
month.aa [no longer available; NCBI has dropped support of month databases; One can implement a search of entries limited to any time interval by use of the "limit by Entrez query" option available in NetBlast and at NCBI, but not with command line Blast.]
A rolling 30 day look back at new sequences released into nr.
swissprot
the last major release of the SWISS-PROT protein sequence database (no daily updates)
yeast.aa
A single curated set (NCBI's RefSeq set) of Yeast (Saccharomyces cerevisiae) protein sequences.  Compared to searching nr with "Saccharomyces cerevisiae"[orgn] in the Entrez query field, searching yeast.aa is faster, but will only show links to RefSeq entries.  The latter may find additional yeast protein sequences that RefSeq curators chose not to include.
ecoli.aa
A single curated set (NCBI's RefSeq set) of E.coli K12 genes.  See yeast.aa for implications relative to limiting by Entrez query.
drosoph.aa mito.aa
pdbaa
Sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank
igSeqProt
Kabat's database of sequences of immunological interest
alu.a
Translations of select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. It is available by anonymous FTP from ncbi.nlm.nih.gov (under the /pub/jmc/alu directory). See "Alu alert" by Claverie and Makalowski, Nature vol. 371, page 752 (1994) .
pataa
Sequences submitted in support of patent applications.
env_nr
Translations from env.nt

env_nr.redundant
Same as env_nr, but one sequence per definition line.
 
env.nt
DNA sequence directly from the environment (ie. from all organisms mixed together)

month.nt, month.htgs, month.est_human, month.est_mouse, month.est_other, month.gss
[no longer available; NCBI has dropped support of month databases;  one can implement a search limited to entries from any time period using the "limit to Entrez query" option available in NetBlast and at the NCBI site, but not in command line Blast].
A rolling one month lookback on sequences entered into the respective divisions over the last 30 days.


Last updated 3/3/2005 - Steve Hardies