Issues that could be explored in analysis of the vpv sequence:
-
Maturase
gene family analysis.
-
The prolyl tRNA.
-
It's for CCA codons. CCA is not particularly rare in Escherichia
or Vibrio, but in highly expressed E. coli genes, CCG is
preferred 10:1 to CCA. Can the case be made for the same in the Vibrio
cholerae genome?
-
Is the CCA codon usage elevated in vpv genes? How about in late vs.
early vpv genes? Since it appears to be expressed as part of a late
operon, it wouldn't be available for early genes. However, early
genes presumably aren't as heavily expressed, so that will interfere with
the interpretation, assuming there is a difference.
-
Could there be some particular gene with a very high content of CCA?
The value of having the prolyl tRNA could in principle come from aiding
the expression of a single gene.
-
Does Roseophage SIO1 have a tRNA? NO.
-
There are tRNA genes in some other phages. Are they also for prolyl
tRNA? Is there literature containing ideas about why the tRNA genes
are there (besides the obvious rare codon compensation)?
-
I think I've read ideas about rare codons having special impact around
the N terminal. I should try to locate that literature. Is there
an unusual concentration of CCA codons around vpv gene initiators.
-
Codon preference.
-
Applying GCG's codonfrequency and correspond functions reveal that there
is some consistent difference between early and late genes in codon useage.
Upon trying to pursue it further, the analysis falls apart due to some
problem with the metric used by GCG correspond. Can this be sorted
out? See codond.html.
-
The ends.
-
We have one end located in a purine tract. I seem to remember that
T7 had its ends in a pyrimidine tract and it was conserved in phiYe03-12.
Could our end be the complement of the T7 end?
-
Can we identify this sequence in Roseophage?
-
Is there generally a relationship between the right and left end cut sites
in T7 & phiYe03-12? If so can we use that to pin down where our
other end is?
-
Promoters.
-
We have reasonably placed promoter candidates to express the early and
late transription units as if they were single operons. But T7 has
been mapped to death for transcription starts, and there are auxilliary
starts all through the units. Do the T7 mapped starts fall within
regions that GeneMark would call a frame? We have thus far only considered
promoter candidates where there was no frame predicted, because of the
high false positive rate for the prediction algorithm. Are there
certain small frame configurations that we should also consider?
-
Is there any sequence similarity in the Roseophage sequence in the area
analogous to our postulated major early and late promoters?
-
Since there is not a phage encoded RNA polymerase, the most precedented
method to hijack the host polymerase would be to encode an alternate
sigma factor.
-
The alternate sigma factor would be an early gene. Can candidates for one
be identified based on reverse searching from known sigma factors?
-
The use of an alternate sigma factor could obscure the ability of the promoter
search program to identify true phage promoters. If we accept the
two promoter candidates prior to the prolyl tRNA as true late promoters,
is there similarity between them that might be used to narrow down on a
definition of a VpV262 - specific promoter sequence?
-
Transcription terminators.
-
How do we look for them? There are several types. There appears
to be a classic hairpin followed by T tract terminator in the late transcription
unit. But there is not an obvious promoter to restart transcription
in that direction. Could this be an anti-terminator system?
By comparison to anti-terminators in other systems, could we find candidates
for its components?
-
Packaging RNA.
-
Portal protein.
-
The one structural protein with a crystal structure is the portal protein.
A sweep of all of our ORFs with GenThreader didn't find a match to a portal
protein. Is the portal protein structure in the structural database
searched by GenThreader? If not, is it anywhere, and can we get a
threader to search our ORFs for it?
-
If the portal structure was searched by GenThreader, then does GenThreader
efficiently identify known portal proteins in other phage?
-
Finally got a hit by doing targetted Psi-Blast of ORF H, plus its SIO1
homologue, and all T7 genes. This identified T7 gene 8, which did
match in size, predicted secondary structure, and net charge.
-
Could pursue this further by making a hmm model and seeing if it could
go further.
-
Capsid protein.
-
By position and by biased psi-blast, the capsid protein is
orfK, which is a better match to 10A than 10B.
-
Two additional T7 capsid homologues now found in genomes
-
gi|22966005|gb|ZP_00013603.1| hypothetical protein [Rhodospirillum
rubrum]
-
gi|26989006|ref|NP_744431.1| minor capsid protein 10 [Pseudomonas
putida KT2440]
-
The P60 capsid protein appears to be missing its N terminal.
Two frameshifts extend the homology to a start codon making the gene the
right length, but still no convincing rbs.
-
Information now incorporated in orfK section of comparative
genomics page.
-
DNA polymerase.
-
GenThreader picked up a 2nd DNA polymerase with high confidence.
Is that really true? NO - the wrong sequence was submitted as p55.1.
Repeat GenThreader for p55.1
-
The DNA polymerase is a pol A (cellular type), not the typical phage type.
On a preliminary scan, I see no other examples in the Psi Blast output.
- Check this through pfam. On review: there are both type
A and B polymerases in phages. The T7 polymerase is also A type,
although the exo domain looks more B type.
-
If the VpV262 DNA polymerase is put into a tree with the closest bacterial
polymerases, then we may be able to get a sense of the degree of
divergence acceleration in phage vs. cellular genes. No; it's
very distant, and roughly equidistant from bunches of different genomic
polymerase genes. It's only a little closer to genomic than T7 polymerase,
so it could actually be in a T7 clade if the phage clade had a higher divergence
rate.
-
Is there really no RNA polymerase gene?
-
It would have
been hard to have missed one.
-
What does this do to the idea that it is a T7-like phage?
-
Can we find a sigma factor or something like that to direct transcription
to the phage promoters. A database of all things called sigma factors
plus all Vpv genes keyed in psiblast with each sigma factor fails to find
a vpv gene.
-
Origin of replication
-
How do we look
for that? By exploring T7, it seems that phage promoters leading
into A rich regions would be good candidates. Is that typical of
other phages systems? That would provide potential purpose for some
promoter candidates we have noticed in potential noncoding regions but
directed in the wrong orientation for the ORFs in the region. We
need a map displaying positions of all promoter candidates and A rich regions.
Map
(orimap.html).
The T7 primase likes to prime its Okasaki fragments on 3' CTGGG 5'
. Mapping those sequences tends to indicate that the ori is somewhere
between 10.5 and 21.9 Kb. This is explored some in primesite.html,
but needs more work.
-
ORF A.
-
ORF A appears to be a late insertion backwards in the late transcription
unit at its very beginning and having its own promoter. So far, there
doesn't seem to be another promoter to restart the late transcription unit,
so ORF A may be included as antisense RNA in the major late transcript.
There are possibilities for antisense regulation here. Perhaps ORFA
is expressed early, and then shut down by antisense regulation when the
late promoters come on. Are there precedents for this kind of arrangement
in other phages systems?
-
It kind of looks like a Hendrix moron.
-
Partition genes.
-
Orf41 appears to be a parB gene. It aligns with parA interaction
domain as well as the DNA binding domain. Could this be an indication
of a plasmid prophage cassette? See partition.html.
-
Endolysin.
-
Most phages have a holin and an endolysin. ORF B is the candidate
for the holin. Nothing really stands out in the blast searches, although
segment analysis suggests it should be among ORF C,D & E. Can
we get a better kind of search to find it?
-
T7 gene 18.5 (actually an endopeptidase) is labelled as a homologue of
lambda Rz lysis protein. PsiBlast keyed from T7 gene 18.5 doesn't
go very far. But Psi-Blast keyed from lambda Rz makes a big family
and includes T7 18.5. But that specialized database plus VpV and
SIO1 genes doesn't hit a VpV or SIO1 gene in a targetted Psi-Blast search
keyed with lambda Rz. See endop.html.
-
I think the lambda Rz genes are pretty consistently 142..143 residues long.
Can we prefer ORF C, D, or E by length and secondary structure?