The GCG package of sequence analysis programs.
GCG refers to the Wisconsin
Package Version 10.3, Genetics Computer Group (GCG), Madison, Wisc. GCG
is implemented under Linux on bioinf. This is a mostly outdated package.
One should not search the internal GCG sequence databases because they
are years out of date. The seqlab graphics interface is no longer
implemented at UTHSCSA. However, many of the command line analytical
utilities are still useful, particularly for those who are acustomed to
using this package. Both cpu time and disk space are to be free of
charge to university researchers and students. Apply
for a bioinf account here.
Current
program status.
User interface.
Our license for the web browser
interface SeqWeb has been discontinued. Hence, access it at the Linux
command level. See linux
and vi help links. To access your bioinf account, you will need
a client program on your computer that does makes secure shell (shh) and
secure copy (scp) connections. See help
on obtaining and using ssh.
Many of the GCG routines have an option to produce .png graphics files
that you can download and display on your computer. Before running
the program producing the graphics file, run the command png.
This is an alternative to running the same program under seqlab,
which would automatically export the file as an X-windows graphic.
Starting GCG.
On request, the bioinformatics center will set up your account so that
GCG starts when you log in. Alternatively:
-
Place a file in your home directory named gcg.com with the following contents
source /share/apps/gcg/gcgstartup.ksh
gcg
set +o noglob
export LD_ASSUME_KERNEL=2.4.19
-
Thereafter the command: source ~/gcg.com will start GCG.
Documentation.
Help for GCG can be obtained by typing genmanual or genhelp
at the command line. Help on individual functions can be obtained
by typing genhelp <function name> from the command line.
These help files are accessed by the lynx non graphic browser by default,
so you have to work through them with arrow keys and the tab and space
keys. There is a way to redirect the system to use a better browser.
Typical method of use.
-
First get GCG running. It will announce itself with a sign-on screen
that lists numbers of databases. GCG commands are given by typing
at the ordinary linux command prompt. Ordinary linux commands are
still available
-
Get the sequence of interest. You will usually put it into your bioinf
directory either by scp from your local machine, or by accessing a graphics
desktop on bioinf by a graphics interface (such as VNC viewer) and using
the Netscape browser to download the sequence directly from GenBank.
Get the sequence either in GenBank (aka. GenPept) or fasta format.
Do not download it as an html file. Fasta format has a definition
line starting with an >, followed by lines with nothing but sequence characters.
-
Convert the sequence to GCG's internal format. For fasta files, use
the GCG command fromfasta. For GenBank files use the GCG command
fromgenbank.
Beware that the output file will be named after the sequence identifier
found inside your file; not after the input file name. Output files
will be named with a .seq extension. All other GCG commands will
use the .seq files for input. The above sequence of commands handles
either protein or nucleic acid files.
-
You can edit or complement a file using the GCG sequence editor named seqed.
Some commands have to be ended with a <control d>. For example
you type a <control d> to stop editing the header in seqed. In
seqed, you type a : to get a command line. Issue the command help
to see what other commands are available.
-
To see what programs are available type genmanual from the linux
command prompt, and explore.
-
Most functions will lead you through interactively if you just type the
function name. There is usually more functionality available than
meets the eye, so use genehelp if you want to find a way to modify
the way the program works.
-
If you want graphics output, you should first give the command png.
-
Sequences can be put back to fasta format with the command tofasta.
Functions available in GCG.
The GCG software package is somewhat dated now. More powerful alternatives
to some of its functions are indicated below where available in the UTHSCSA
bioinformatics system.
-
General utilities to edit, complement, translate sequence, and to map restriction
sites or peptidase cleavage sites.
-
Similar functions exist in most all packages and at many remote sites on
the web.
-
Facilities to retrieve sequences from GenBank, and to reformat them from
a variety of formats.
-
Comparable to using NCBI Entrez with at web browser, except sequences are
automatically imported in GCG format
-
Automated and manual alignment programs.
-
The automated alignment program, pileup, is less powerful that ClustalW,
which can be run directly, or out of the SeaView editor.
-
The SeqLab alignment editor is similar to SeaView, but SeaView has integrated
phylogenetic analysis as well as integrated ClustalW.
-
The manual alignment program lineup, is much like MSE, but MSE can handle
more and longer sequences.
-
Blast, Fasta, and other search programs for databases.
-
The NCBI programs at NCBI and installed locally in the Blast suite are
more advanced. The internal GCG databases fall out of date and should
not be used. The GCG local blast and psiblast programs can be redirected
to search the daily updated local databases by the command line parameter
-INfile2=$BLAST_DB/<database
name>
-
Profile and motif construction and search facilities.
-
HMMER itself is directly installed on bioinf.
-
A more advanced package named SAM is also installed locally.
-
Implementations Neighbor Joining for constructing phylogenetic trees.
-
Neighbor joining runs directly out of ClustalW. PAUP, a more powerful
package for phylogenetic analysis, is available on bioinf. Phylip,
which is extremely powerful but not very convenient is also installed.
-
A variety of algorithms for randomizing, scrambling, or randomly mutating
sequence, for use in statistical testing.
-
Sequence assembly for sequencing projects.
-
However, the capability to link to the original data is absent.
-
The LaserGene Package under site-license is recommended for sequence assembly.
Vector NTI also has a package. For very large projects, phred/phrap/consed
is available.
-
A primer picking program.
-
All of the commercial packages have comparable primer picking software.
We recommend the site-licensed Oligo6.0, because it has the most thorough
documentation.
-
General utilities for counting codons and determining codon preferences.
-
RNA and DNA secondary structure prediction.
-
General facilities for creation of displays.
-
Prediction of protein features (helical wheel, transmembrane, secretion
signals, coiled-coil regions, etc.).
-
Screening of internal repetitive sequence.
Troubleshooting GCG.
You may find some of the following information helpful in dealing with
some peculiarities of the local GCG implementation.
-
Reinitialization.
-
Command line GCG sometimes loses track of its initialization information
and starts to give "not found" messages for its commands or for genhelp.
-
Place a file in your home directory named gcg.com with the following contents
source /share/apps/gcg/gcgstartup.ksh
gcg
set +o noglob
export LD_ASSUME_KERNEL=2.4.19
-
Thereafter the command: source ~/gcg.com will reinitialize GCG.
-
ls fails to work with * as wild card.
-
The gcg startup file executed at startup for most bioinf users inactivates
wildcard usage.
-
To reverse this, type set +o noglob.
-
As indicated in the item above, you can put this fix in a gcg initialization
file.
-
Note: To prevent directory names from being expanded to directory contents
use the -d option.
-
For example, to find directory names starting with "p" , type ls -d
p*, not ls p*
-
The lookup command doesn't work right.
-
Instead use NetFetch to get the sequence from NCBI Entrez, or use your
browser to get it from NCBI Entrez.
-
The commands refering to PAUP are not implemented.
-
Can't find a particular database.
-
For blast and psiblast searching out of GCG, use -INfile2=$BLAST_DB/<database
name> to refer to the daily updated NCBI libraries. See notes
about GCG use in local blast suite documentation.
-
Other databases such as ProCite, Rebase, and Pfam are no longer being maintained
by distribution from Accelrysis. If there are up-to-date copies being
maintained that GCG can access, then there will be a comment listed about
them on the Bioinformatics GCG status page. E-mail hardies@uthscsa.edu
if you need some particular database to be updated.
Last update 5/22/6 - Steve Hardies