Fragments of G phage recovered by random primed PCR.

Summary

This file is under construction.

Index

Sequences

G1

263 bp nearly all ds sequence.;  unidentified

>g1
 TCTTTAGCAGCTAATGGTGAAATGATACAAACTGCTTCTT
 CCAATTTTAAAACTCCTACACAGTTAAATCACATACTGGAATTTGTTAGTTGGATAATCG
 ATTTATTGAAATACATTTTATCGGGAATTGCAGGTTTAATTTGTACATTCGCTGGTTATA
 AGTGGGCTACATCTTTAGAAGGTAATGGCCAAGAAGCTGCTAAGAAAATTCTAAAGAATG
 CATTTGTAGGTGGCGTAATAGTTTGGACAGGTTCTTCAATAGC

>G1 frame +1
SLAANGEMIQTASSNFKTPTQLNHILEFVSWIIDLLKYILSGIAGLICTFAGYKWATSLEGNGQEAAKKI
LKNAFVGGVIVWTGSSI
 

ORFfinder:
    frame=+1

       1 tctttagcagctaatggtgaaatgatacaaactgcttcttccaattttaaaactcctaca
         S  L  A  A  N  G  E  M  I  Q  T  A  S  S  N  F  K  T  P  T
      61 cagttaaatcacatactggaatttgttagttggataatcgatttattgaaatacatttta
         Q  L  N  H  I  L  E  F  V  S  W  I  I  D  L  L  K  Y  I  L
     121 tcgggaattgcaggtttaatttgtacattcgctggttataagtgggctacatctttagaa
         S  G  I  A  G  L  I  C  T  F  A  G  Y  K  W  A  T  S  L  E
     181 ggtaatggccaagaagctgctaagaaaattctaaagaatgcatttgtaggtggcgtaata
         G  N  G  Q  E  A  A  K  K  I  L  K  N  A  F  V  G  G  V  I
     241 gtttggacaggttcttcaatagc 263
         V  W  T  G  S  S  I

Psi-Blast of frame 1 vs nr at w3 or 2: nothing
Also nothing by CD search, prodomain search, or TBlastn.
Best match against "tailed phages" [orgn]at W3 or 2

>gi|7461934|pir||T42293 hypothetical protein - phage SPP1
 gi|2764870|emb|CAA66554.1| (X97918) gene 17.5 [Bacteriophage SPP1]
          Length = 179

 Score = 24.6 bits (52), Expect = 0.53
 Identities = 11/28 (39%), Positives = 15/28 (53%)

Query: 48  CTFAGYKWATSLEGNGQEAAKKILKNAF 75
           C FAGY+   SL+  GQ  +   +  AF
Sbjct: 78  CAFAGYEQVPSLDDIGQALSDMDIDEAF 105


G4

Mostly ds; homologue of bacterial fructose 1,6 bisphosphatase
Was this assayed for bacterial contamination?

>g4 - COMPLETE
CAAGATCCTGGTAAATTACTACCTGAGGAAGAAGAAGTTA
TGAATAAACTCCTACTTTCTTTCCAACAATCTGAGAAATTGAAACGTCACATGTCGTTCT
TGATGAAGAAAGGGAAACTTTACCTTCCATATAATGGCAACCTTCTTATTCATGGTTGTA
TCCCTGTTGATGAGAATGGAGAAATGGAATCATTCGAAATTGAAGGTGAACAACATAAAG
GACGTGACCTATTAGACGTATTTGAAAGACATGTACGTTATGCATATGACTACAAAGAAA
TCACTGATGATCTATCAACAGATTTAGTGTGGTATTTATGGACTGGTAAATATAGTTCAC
TTTTTGGAAAAAAGAGCGATGACAACACTTTGAACGTTACTTTATTGAAGATTAAGGCTT
CTCATAAAGAACCTAAGAATCCATATTATTATCTACGAGAAGATGTTGATATGATTCGCA
AAATGTTAAAAGACTTCGGTCTCAATCCAGATGAAGGAAGAATTATTAATGGACATACAC
CTGTTAAAGAAATAGATGGCGAAGACCCTATTAAAGCAGATGGTAAAATGTTAGTTATCG
ATGG
 

>G4 open frame +1
QDPGKLLPEEEEVMNKLLLSFQQSEKLKRHMSFLMKKGKLYLPYNGNLLIHGCIPVDENGEMESFEIEGE
QHKGRDLLDVFERHVRYAYDYKEITDDLSTDLVWYLWTGKYSSLFGKKSDDNTLNVTLLKIKASHKEPKN
PYYYLREDVDMIRKMLKDFGLNPDEGRIINGHTPVKEIDGEDPIKADGKMLVID
 
 

1 complete open reading frame  (+1):

       1 caagatcctggtaaattactacctgaggaagaagaagttatgaataaactcctactttct
         Q  D  P  G  K  L  L  P  E  E  E  E  V  M  N  K  L  L  L  S
      61 ttccaacaatctgagaaattgaaacgtcacatgtcgttcttgatgaagaaagggaaactt
         F  Q  Q  S  E  K  L  K  R  H  M  S  F  L  M  K  K  G  K  L
     121 taccttccatataatggcaaccttcttattcatggttgtatccctgttgatgagaatgga
         Y  L  P  Y  N  G  N  L  L  I  H  G  C  I  P  V  D  E  N  G
     181 gaaatggaatcattcgaaattgaaggtgaacaacataaaggacgtgacctattagacgta
         E  M  E  S  F  E  I  E  G  E  Q  H  K  G  R  D  L  L  D  V
     241 tttgaaagacatgtacgttatgcatatgactacaaagaaatcactgatgatctatcaaca
         F  E  R  H  V  R  Y  A  Y  D  Y  K  E  I  T  D  D  L  S  T
     301 gatttagtgtggtatttatggactggtaaatatagttcactttttggaaaaaagagcgat
         D  L  V  W  Y  L  W  T  G  K  Y  S  S  L  F  G  K  K  S  D
     361 gacaacactttgaacgttactttattgaagattaaggcttctcataaagaacctaagaat
         D  N  T  L  N  V  T  L  L  K  I  K  A  S  H  K  E  P  K  N
     421 ccatattattatctacgagaagatgttgatatgattcgcaaaatgttaaaagacttcggt
         P  Y  Y  Y  L  R  E  D  V  D  M  I  R  K  M  L  K  D  F  G
     481 ctcaatccagatgaaggaagaattattaatggacatacacctgttaaagaaatagatggc
         L  N  P  D  E  G  R  I  I  N  G  H  T  P  V  K  E  I  D  G
     541 gaagaccctattaaagcagatggtaaaatgttagttatcgatgg 584
         E  D  P  I  K  A  D  G  K  M  L  V  I  D

Strong matches to Bacterial fructose-1,6 bisphosphatase genes:

Staphylococcus aureus       80% identity
Bacillus subtilus           65% identity
Lactococcus lactis          53% identity
Clostridium acetobutylicum  45% identity

Seems to be nonorthologous to the fructose-1,6 bisphosphatase gene in COGS.

>gi|13702466|dbj|BAB43607.1| (AP003137) fructose-bisphosphatase [Staphylococcus aureus subsp.
           aureus N315]
 gi|14248290|dbj|BAB58678.1| (AP003365) fructose-bisphosphatase [Staphylococcus aureus subsp.
           aureus Mu50]
          Length = 654

 Score =  292 bits (748), Expect = 8e-79
 Identities = 155/193 (80%), Positives = 172/193 (88%), Gaps = 1/193 (0%)

Query: 2   DPGKLLPEEEEVMNKLLLSFQQSEKLKRHMSFLMKKGKLYLPYNGNLLIHGCIPVDENGE 61
           +P +LLPEEEEVMNKLLLSFQQSEKL+RHMSFLM+KG LYLPYNGNLLIHGCIPVDENGE
Sbjct: 375 NPAELLPEEEEVMNKLLLSFQQSEKLRRHMSFLMRKGSLYLPYNGNLLIHGCIPVDENGE 434

Query: 62  MESFEIEGEQHKGRDLLDVFERHVRYAYDYKEITDDLSTDLVWYLWTGKYSSLFGKKSDD 121
           MESFEI+G  + G++LLDVFE HVR ++D KE TDDLSTDLVWYLWTGKYSSLFGK++
Sbjct: 435 MESFEIDGHTYSGQELLDVFEYHVRKSFDEKENTDDLSTDLVWYLWTGKYSSLFGKRA-M 493

Query: 122 NTLNVTLLKIKASHKEPKNPYYYLREDVDMIRKMLKDFGLNPDEGRIINGHTPVKEIDGE 181
            T     +  KASHKE KNPYY+LREDV+M+RKML DFGLNPDEGRIINGHTPVKEI+GE
Sbjct: 494 TTFERYFIADKASHKEEKNPYYHLREDVNMVRKMLSDFGLNPDEGRIINGHTPVKEINGE 553

Query: 182 DPIKADGKMLVID 194
           DPIKADGKMLVID
Sbjct: 554 DPIKADGKMLVID 566


G7

unidentified.

>G7
CCAAATCTTTAATTAAACGTTTAATCGCTCCTGGCATTTT
ACCACGATTTTTTTCATTATTTTCAGCTACTTTAGCAGCTTGAATTGTACGGCCTTCCCA
ATCTTGAGGAGTGTTACCTTTAGTGTCTCCATTTTCATCACCAGCTTGAGCTTTACCCCA
CTCATCATGGCTATCAATTGTCCCACCATTACCATTAGCTGCATTAGCATGAGCTTTTTT
AACGTAATCTTCATCTTTAATTAAGATGTCATAGATTTCTTCTGAGAACATACCTTCAAA
TTTCTTGTCATAAAGTCCACCAACAATCATTTTGAAATGTTTCTCATC

>G7 Frame -1
DEKHFKMIVGGLYDKKFEGMFSEEIYDILIKDEDYVKKAHANAANGNGGTIDSHDEWGKAQAGDENGDTK
GNTPQDWEGRTIQAAKVAENNEKNRGKMPGAIKRLIKDL

In Gphage full library is 0.17 to a relatively poorly conserved segment of T4 DNA polymerase  DJBT4
 
 

102 aa frame -1 is completely open:

     310 atgattgttggtggactttatgacaagaaatttgaaggtatgttc
         M  I  V  G  G  L  Y  D  K  K  F  E  G  M  F
     265 tcagaagaaatctatgacatcttaattaaagatgaagattacgtt
         S  E  E  I  Y  D  I  L  I  K  D  E  D  Y  V
     220 aaaaaagctcatgctaatgcagctaatggtaatggtgggacaatt
         K  K  A  H  A  N  A  A  N  G  N  G  G  T  I
     175 gatagccatgatgagtggggtaaagctcaagctggtgatgaaaat
         D  S  H  D  E  W  G  K  A  Q  A  G  D  E  N
     130 ggagacactaaaggtaacactcctcaagattgggaaggccgtaca
         G  D  T  K  G  N  T  P  Q  D  W  E  G  R  T
      85 attcaagctgctaaagtagctgaaaataatgaaaaaaatcgtggt
         I  Q  A  A  K  V  A  E  N  N  E  K  N  R  G
      40 aaaatgccaggagcgattaaacgtttaattaaagatttg 2
         K  M  P  G  A  I  K  R  L  I  K  D  L
 
 

This hit found to T4 DNA polymerase and other phage polymerases in phage full library.  This is a relatively unconserved segment, so the poor similarity in this segment is plausible.  Monte Carlo 10,100 assigns 92% confidence. Monte Carlo 1,100 gave 86% confidence.
Psi-Blast found it in "tailed phages" [orgn] as e=0.05.  The match could not be extended.  The fragment is classified as unidentified.

>pir||DJBPT4 DNA-directed DNA polymerase (EC 2.7.7.7) - phage T4
           Length = 898

 Score = 29.0 bits (63), Expect = 0.17
 Identities = 13/36 (36%), Positives = 20/36 (55%)

Query: 10  GGLYDKKFEGMFSEEIYDILIKDEDYVKKAHANAAN 45
           G +YDK  EG+  +EI  +  + +D+ KK  A   N
Sbjct: 457 GWMYDKHQEGIIPKEIAKVFFQRKDWKKKMFAEEMN 492


G8


partially ds.
DNA primase

 > g8
 ATTATTGGCGAACCAGGGAGAAGATACTTACAATCAAGGG
 GAATAAATGAAGAAACAATTGAATATTGGTCTATAGGTTATGCACCTATAGGACATAAAA
 AATACACTAAACTTAGAGGTAGGATTACTTTTCCAGTATTTGATAACAATGGAAAAATAG
 TTACAATTAGTGGAAGATCAGTTTTCGATACATTAAAACCAAAATACGATATGTATCCTT
 TCTCTGCAAGAAAAATCTTATTTGGCTTATGGCAAAATAGACACGATATAAGAGATCACA
 ATAGAGCTATTATTACTGAAGGACAAATTGATGTTATTACATCATGGCAAAATGGATTAA
 AAATTGTAACTTCTACATT

>G8 Frame +1
IIGEPGRRYLQSRGINEETIEYWSIGYAPIGHKKYTKLRGRITFPVFDNNGKIVTISGRSVFDTLKPKYD
MYPFSARKILFGLWQNRHDIRDHNRAIITEGQIDVITSWQNGLKIVTST

+1 frame is completely open:

       1 attattggcgaaccagggagaagatacttacaatcaaggggaataaatgaagaaacaatt
         I  I  G  E  P  G  R  R  Y  L  Q  S  R  G  I  N  E  E  T  I
      61 gaatattggtctataggttatgcacctataggacataaaaaatacactaaacttagaggt
         E  Y  W  S  I  G  Y  A  P  I  G  H  K  K  Y  T  K  L  R  G
     121 aggattacttttccagtatttgataacaatggaaaaatagttacaattagtggaagatca
         R  I  T  F  P  V  F  D  N  N  G  K  I  V  T  I  S  G  R  S
     181 gttttcgatacattaaaaccaaaatacgatatgtatcctttctctgcaagaaaaatctta
         V  F  D  T  L  K  P  K  Y  D  M  Y  P  F  S  A  R  K  I  L
     241 tttggcttatggcaaaatagacacgatataagagatcacaatagagctattattactgaa
         F  G  L  W  Q  N  R  H  D  I  R  D  H  N  R  A  I  I  T  E
     301 ggacaaattgatgttattacatcatggcaaaatggattaaaaattgtaacttctacatt 359
         G  Q  I  D  V  I  T  S  W  Q  N  G  L  K  I  V  T  S  T
 

Matches lots of DNA primases. Closest is:

>gi|10173991|dbj|BAB05094.1| (AP001511) DNA primase [Bacillus halodurans]
          Length = 599

 Score = 85.1 bits (209), Expect = 1e-16
 Identities = 49/145 (33%), Positives = 71/145 (48%), Gaps = 35/145 (24%)

Query: 3   GEPGRRYLQSRGINEETIEYWSIGYAP------------------------------IGH 32
           G+ GR YL+ RG  +E IE++ IG+AP
Sbjct: 135 GKAGRTYLERRGFTKEQIEHFQIGFAPPHWDALTNVLAKRDVDLKKAGESGLLVERESDG 194

Query: 33  KKYTKLRGRITFPVFDNNGKIVTISGRSVFDTLKPKYDMYP----FSARKILFGLWQNRH 88
           K+Y + R R+ FP+ +  GKIV   GR++ D  KPKY   P    F   K+L+G +Q R
Sbjct: 195 KRYDRFRNRVIFPIRNGKGKIVAFGGRTLGDD-KPKYLNSPESPIFQKGKLLYGFYQARP 253

Query: 89  DIRDHNRAIITEGQIDVITSWQNGL 113
            IR  N A++ EG +DVI +W+ G+
Sbjct: 254 AIRKENEAVLFEGYVDVIAAWKAGV 278
 

There is a vague similarity to primase/helicases of T7,T3, T4 & other phages.
There are apparently more than one family of helicases in phage and bacteria and there is some confusion about paralogues.  Big families built around dnaB and recA and dnaG all seem to be paralogues.

Frick et al. 1998. PNAS 95:7957-62 &
Guo et al 1999. JBC 274:30303.
   T7 gene has a primase 242..271, a linker region 242..271, and a helicase of the dnaB family (and distantly recA family?) 271..end.
T7 gene 4 expressed in 2 forms, 63kd and 56 kd due to internal start.
Only the 63 is active as primase.
1..63 (the part missing in 56K) is zinc finger required for template recognition.  Necessary but not sufficient for primase.

Arvind L Leipe DD Koonin EV. 1998. Toprim -- a conserved catalytic domain in type IAand II topoisomerases, DnaG-type primases, OLD family nucleases. NAR 26: 4205-13.
  100 aa conserved domain;  conserved glu, and DxD motifs
  I think these are the E at 159 and the DMD at 207 in T7 gene 4.
DnaG-type primases, small pimase-like proteins from bacteria and archaea,
bacterial and archaeal nucleases of the OLD family,
bacterial DNA repair proteins of the RecR/M family.

Upon further searching, there are many phage primase families.  G8 is closest to one from mycobacterial phages l5 and D29.  These are cut into two genes, g57 and g58, with g57 overlapping G8 and the to prim box and g58 being more proximal.

T7, T3, phi-Ye03-12 for a tight cluster with SP6 weakly associated.

phi-C31 (Siphoviridae), phi-105, A2, phi adh for a cluster.

It's unclear if the clustering represents relationships among the phage or the underlying relationships among hosts that they repeatedly acquire dnaG from.


G9r

One strand only; matches acid soluble spore protein family.

Matches acid soluble spore proteins and a TATA box.

>g9r
 TAGCGGATTGAGCAAAAGCAGTCATTTTCTTAGTCATGTT
 ACCACCTACATAACCATGTTGAATAGCTGGCATCCAACGTTTGTCCATGCTATCATAATT
 CTTTAAACCAATTTCTGCTGCTACCGCTTCTTTAAAGGAATTCATTTGTTGACTAATGTT
 TCCAGTTGGTAAATTCCCTTCTTGAACTGCATAAGTGTTACGGTTATTGTTTGACATATT
 GTTTTCCTCCTAAAGGTTTTTTTTATTTTCCGATGTCCACCATCGTAATGTTATTATTAC
 CAGGTTTTATCTTCTTTATTCATTATTTACCTTGATTTTTTATTATGGTTTAAA
 ATTACCA
 TTCATCATTTCCATTTTTTGCTTTATCTATCATTAGTTTAGTAACATGACCATTTT

last 60 bp of poor quality.

Matches are in frame -1

>G9r frame -1
MSNNNRNTYAVQEGNLPTGNISQQMNSFKEAVAAE
IGLKNYDSMDKRWMPAIQHGYVGGNMTKKMTAFAQSA

     217 atgtcaaacaataaccgtaacacttatgcagttcaagaagggaat
         M  S  N  N  N  R  N  T  Y  A  V  Q  E  G  N
     172 ttaccaactggaaacattagtcaacaaatgaattcctttaaagaa
         L  P  T  G  N  I  S  Q  Q  M  N  S  F  K  E
     127 gcggtagcagcagaaattggtttaaagaattatgatagcatggac
         A  V  A  A  E  I  G  L  K  N  Y  D  S  M  D
      82 aaacgttggatgccagctattcaacatggttatgtaggtggtaac
         K  R  W  M  P  A  I  Q  H  G  Y  V  G  G  N
      37 atgactaagaaaatgactgcttttgctcaatccgct 2
         M  T  K  K  M  T  A  F  A  Q  S  A
BlastX to bacterial subdivision:

>gi|134234|sp|P22066|SAS2_CLOBI SMALL, ACID-SOLUBLE SPORE PROTEIN BETA (SASP) (ASSP)
 gi|483083|pir||B61028 small acid-soluble spore protein beta - Clostridium bifermentans
 gi|226892|prf||1610173B small acid soluble protein beta [Clostridium bifermentans]
          Length = 64

 Score = 50.1 bits (118), Expect = 9e-07
 Identities = 23/46 (50%), Positives = 32/46 (69%)
 Frame = -1

Query: 145 MNSFKEAVAAEIGLKNYDSMDKRWMPAIQHGYVGGNMTKKMTAFAQ 8
           +N  K  +A E+GL NY+S+DK  + A Q+GYVGG MTKK+   A+
Sbjct: 13  LNQMKLEIANELGLSNYESVDKGNLTARQNGYVGGYMTKKLVEMAE 58

>gi|134226|sp|P22065|SAS1_CLOBI SMALL, ACID-SOLUBLE SPORE PROTEIN ALPHA (SASP) (ASSP)
 gi|482685|pir||A61028 small acid-soluble spore protein alpha - Clostridium bifermentans
 gi|226891|prf||1610173A small acid soluble protein alpha [Clostridium bifermentans]
          Length = 70

 Score = 47.8 bits (112), Expect = 4e-06
 Identities = 22/46 (47%), Positives = 30/46 (64%)
 Frame = -1

Query: 145 MNSFKEAVAAEIGLKNYDSMDKRWMPAIQHGYVGGNMTKKMTAFAQ 8
           +   K  +A E+G+ NYD+ DK  M A Q+GYVGG MTKK+   A+
Sbjct: 17  LKQMKLEIANELGISNYDTADKGNMTARQNGYVGGYMTKKLVEMAE 62

There is a family of these in each sporulating bacterial species, and they are involved in binding the DNA.  Some of the B. megaterium genes are sequenced, and do not match as well as the Clostridial ones above.


G10r

single stranded reading.
309 bp unidentified open frame.
 

>g10R
AATTCCAACCAATCAAGTTTATATACGTTTTCGTAAATCATTTC
TTTACTCTCCTTTGTTAGTTAACATTCTTAATAAATATCTATCATTTTCTTTCCAATAAT
TAATTTTTTCATAAATTTTATTAATAGTTTCTAACTTCTCGTCCAAAGATGAGTTTAGAG
ATAATGAAAAAAAGGGTGATTCATCAAAAAACTTTATTAAAAAATCAATAGATTCCTCTG
AGCTCGATGAGTCTAAATGAAAACATCCATTTTGGCAATTTACAATTACACCAAAATAAT
CTTCCTCAATGATTTTTATTTCTTTTGATTTGCAAATAGGACACCCATCCTTAAACCAGA
AGAGTTCTATTTT
 

 ORFfinder:

 frame=-1 open for 309 bp.

>G10r Frame -1
KIELFWFKDGCPICKSKEIKIIEEDYFGVIVNCQNGCFHLDSSSSEESIDFLIKFFDESPFFSLSLNSSL
DEKLETINKIYEKINYWKENDRYLLRMLTNKGE*RNDLRKRI*T*LVGI
 
 

     357 aaaatagaactcttctggtttaaggatgggtgtcctatttgcaaatcaaaagaaataaaa
         K  I  E  L  F  W  F  K  D  G  C  P  I  C  K  S  K  E  I  K
     297 atcattgaggaagattattttggtgtaattgtaaattgccaaaatggatgttttcattta
         I  I  E  E  D  Y  F  G  V  I  V  N  C  Q  N  G  C  F  H  L
     237 gactcatcgagctcagaggaatctattgattttttaataaagttttttgatgaatcaccc
         D  S  S  S  S  E  E  S  I  D  F  L  I  K  F  F  D  E  S  P
     177 tttttttcattatctctaaactcatctttggacgagaagttagaaactattaataaaatt
         F  F  S  L  S  L  N  S  S  L  D  E  K  L  E  T  I  N  K  I
     117 tatgaaaaaattaattattggaaagaaaatgatagatatttattaagaatgttaactaac
         Y  E  K  I  N  Y  W  K  E  N  D  R  Y  L  L  R  M  L  T  N
      57 aaaggagagtaaagaaatgatttacgaaaacgtatataaacttgattggttggaatt 1
         K  G  E  *  R  N  D  L  R  K  R  I  *  T  *  L  V  G  I

No hits of Frame -1 in nr or prodomain or prodomain CG
Only candidate hit in microbial genomes is:

gnl|OUACGT_714|A.actin_Contig462 Actinobacillus actinomycetemcomitans unfinished fragment of complete
            genome
            Length = 6860
 
 Score = 35.6 bits (80), Expect = 0.067
 Identities = 22/63 (34%), Positives = 37/63 (57%), Gaps = 9/63 (14%)
 Frame = -2

Query: 39   HLDSSSSEESIDFLIKFFDESPFFSLSLNSSLDEKLETINKIY---------EKINYWKE 89
            H DS + E      IKFF E P F++S++ +   KL T  K +         ++IN+W E
Sbjct: 1774 HSDSINPE-----FIKFFQEFPIFNISISINKKFKLFTDEKFFSNQQISAYIKQINHWIE 1610

Query: 90   NDRYLLRMLTNK 101
            N++ + + +  K
Sbjct: 1609 NEKSIEKYIKQK 1574


G10f

264 bp open frame; unidentified.

> g10F
AATAATAGTTATTTTCCAAGGTATAAAGCGATGTGCCGTT
TTGGGAAAATTAAAGAAATATACACTCTCTTTTCTATTGATGCAATATTAGAAAACGGAC
AAGATTGGGTAGATCATCAAATAAAACATGAGTATAAAGATTGGAAGTATTATTCTAACT
TAAGAAAAGAATTACAAAATTTTATTGCACAACCAAAGTTTTGGTTTAAAAGAACAAACT
ACAAAGAAGGATTAACTAATAAATTTGTATGGATTAAAATTTATTAAAAAGGGTGCGTTT
TAAATGTATAATAAAAAAAGAAATAATTTTTTTGTGGCGAGTTACTGTTATGTAGTAATG
AGATTGAAATAAAATTAAAGAATATCAAAAGTTTGTGATTTAATGGAGATTAAAAG

Frame +1 open 264 bp:

>G10f Frame +1
NNSYFPRYKAMCRFGKIKEIYTLFSIDAILENGQDWVDHQIKHEYKDWKYYSNLRKELQNFIAQPKFWFK
RTNYKEGLTNKFVWIKIY*KGCVLNV**KKK*FFCGELLLCSNEIEIKLKNIKSL*FNGD*K

       1 aataatagttattttccaaggtataaagcgatgtgccgttttgggaaaattaaagaaata
         N  N  S  Y  F  P  R  Y  K  A  M  C  R  F  G  K  I  K  E  I
      61 tacactctcttttctattgatgcaatattagaaaacggacaagattgggtagatcatcaa
         Y  T  L  F  S  I  D  A  I  L  E  N  G  Q  D  W  V  D  H  Q
     121 ataaaacatgagtataaagattggaagtattattctaacttaagaaaagaattacaaaat
         I  K  H  E  Y  K  D  W  K  Y  Y  S  N  L  R  K  E  L  Q  N
     181 tttattgcacaaccaaagttttggtttaaaagaacaaactacaaagaaggattaactaat
         F  I  A  Q  P  K  F  W  F  K  R  T  N  Y  K  E  G  L  T  N
     241 aaatttgtatggattaaaatttattaaaaagggtgcgttttaaatgtataataaaaaaag
         K  F  V  W  I  K  I  Y  *  K  G  C  V  L  N  V  *  *  K  K
     301 aaataatttttttgtggcgagttactgttatgtagtaatgagattgaaataaaattaaag
         K  *  F  F  C  G  E  L  L  L  C  S  N  E  I  E  I  K  L  K
     361 aatatcaaaagtttgtgatttaatggagattaaaag 396
         N  I  K  S  L  *  F  N  G  D  *  K
 

Frame 1 has no match in nr, microbial genomes, tailed phages, or prodomain or CG prodomain.
No sig. hit in phage full library.


G11f


single stranded readings; MATCHES T4 RNA LIGASE

>G11f
AATAGAATTGTTGTAGCTTATAATGATGCAGATTTAAGAC
TAATTGGCGTGAAAGATTTAAAAACGCATCAAGATTTGTCATATGCTGAAGTTATTAAAA
TGGCAAAAGAATTAGGGTTTGCGCATACTGAATTAGAAGATATTACGTTGGAAGAAATAT
TAGAGGAAAGAGAAAAACGTGAAAACTTTGAAGGTTGGGTAGTGCGATTTTCAAATGGTT
TATACATGAAAATTAAATGTAAAGCCTATTTAGATTTGCATGGTGCTCGTTTTGGCTCTT
CAATTAAATCGGTATTTGTACTCTTAAAAGAAGAAAAATGGGACGATTTTATTTCTCTAT
TCCAGAAGAGT
ORFfinder:

frame=+1

>G11f Frame +1
NRIVVAYNDADLRLIGVKDLKTHQDLSYAEVIKMAKELGFAHTELEDITLEEILEEREKRENFEGWVVRF
SNGLYMKIKCKAYLDLHGARFGSSIKSVFVLLKEEKWDDFISLFQKS

       1 aatagaattgttgtagcttataatgatgcagatttaagactaattggcgtgaaagattta
         N  R  I  V  V  A  Y  N  D  A  D  L  R  L  I  G  V  K  D  L
      61 aaaacgcatcaagatttgtcatatgctgaagttattaaaatggcaaaagaattagggttt
         K  T  H  Q  D  L  S  Y  A  E  V  I  K  M  A  K  E  L  G  F
     121 gcgcatactgaattagaagatattacgttggaagaaatattagaggaaagagaaaaacgt
         A  H  T  E  L  E  D  I  T  L  E  E  I  L  E  E  R  E  K  R
     181 gaaaactttgaaggttgggtagtgcgattttcaaatggtttatacatgaaaattaaatgt
         E  N  F  E  G  W  V  V  R  F  S  N  G  L  Y  M  K  I  K  C
     241 aaagcctatttagatttgcatggtgctcgttttggctcttcaattaaatcggtatttgta
         K  A  Y  L  D  L  H  G  A  R  F  G  S  S  I  K  S  V  F  V
     301 ctcttaaaagaagaaaaatgggacgattttatttctctattccagaagagt 351
         L  L  K  E  E  K  W  D  D  F  I  S  L  F  Q  K  S

After 300, frame becomes ambiguous.

>G11f Frame +1
NRIVVAYNDADLRLIGVKDLKTHQDLSYAEVIKMAKELGFAHTELEDITLEEILEEREKRENFEGWVVRF
SNGLYMKIKCKAYLDLHGARFGSSIKSVFVLLKEEKWDDFISLFQKS

Psi-Blast vs. "tailed phages" [orgn}

>gi|9632683|ref|NP_049839.1| RNA ligase [Enterobacteria phage T4]
 gi|133093|sp|P00971|RLIG_BPT4 RNA LIGASE
 gi|68670|pir||LQBPR4 RNA ligase (ATP) (EC 6.5.1.3) - phage T4
 gi|15367|emb|CAA25107.1| (X00365) RNA ligase [Enterobacteria phage T4]
 gi|5354307|gb|AAD42514.1|AF158101_101 (AF158101) RNA ligase [Enterobacteria phage T4]
          Length = 374

 Score = 35.0 bits (79), Expect = 9e-04
 Identities = 24/90 (26%), Positives = 41/90 (44%), Gaps = 1/90 (1%)

Query: 1   NRIVVAYNDADLRLIGVKDLKTHQDLSYAEVIKMAKELGFAHXXXXXXXXXXXXXXXXXX 60
           NRIV+AY +  + L+ V++ +T + +SY ++ K A    +
Sbjct: 165 NRIVLAYQEMKIILLNVRENETGEYISYDDIYKDATLRPYL-VERYEIDSPKWIEEAKNA 223

Query: 61  XNFEGWVVRFSNGLYMKIKCKAYLDLHGAR 90
            N EG+V    +G + KIK   Y+ LH  +
Sbjct: 224 ENIEGYVAVMKDGSHFKIKSDWYVSLHSTK 253

The remaining 30 aa didn’t significantly match anything else on blastp of nr or on tbastx of microbial database.  They gave a questionable further match in Genedoc to T4 RNA ligase with a gap of 2 added to the ligase.

2nd iteration brings out weak match to a DNA ligase.

>gi|9634001|ref|NP_052075.1| DNA ligase [Bacteriophage phiYeO3-12]
 gi|6598992|emb|CAB63596.1| (AJ251805) DNA ligase [Bacteriophage phiYeO3-12]
          Length = 346

 Score = 27.2 bits (59), Expect = 0.18
 Identities = 11/41 (26%), Positives = 20/41 (47%), Gaps = 1/41 (2%)

Query: 44  ELEDITLEEILEEREKRENFEGWVVRFSNGLYMK-IKCKAY 83
           E+ D+     L E ++ E  EG +V+   G+Y +  K   +
Sbjct: 203 EVYDMDSLSELYEAKRAEGHEGLIVKDPQGIYKRGKKSGWW 243


G12


628 bp fragment with two open frames:

G12a:
The latter part of the insert appears to be a 293 bp long unidentified open reading frame.
Nothing showed in microbial genomes, nr, or prodomain.  Longer segments of
other frames were also examined.  There was a poor quality match to T4 NDP reductase in "tailed phages" [orgn].

G12b:
1-157 is the N terminal of some gene labelled as similar to pyrazinamidase/nicotinamidase in the B. subtilis genome (frame -1).

>G12
TAGCGATTACAACTGTCCCACCATTACCTAAAAATTCATT
AGCTTGATCAATAATGTATGGAACAATTTCTTGTGCTGGTTTACCAGCTGTTAAACTTCC
GTTATCTGCCACAAAATCATTACTCATATCAACAATAATTAATGCTTCATTCTTCATTTT
AAAATCCCCTTTTTATTTTTTTATTTTTACCACCCAAAATTAGGTGTTGAATAAACTTGT
TGTTGCATCCTAATTACTTTACGTAAATGCTGTTTCTGGAGGACAAAGATCAGTTTCAGT
TACTCTAATTAAATATCTCCATTCACTCTTTGTAGCAGTAATACCAGTGTGGATGTCCCA
CCAATCGAATTCTACAACTTGGCAAGAAGGAGCTGTTAGAATATTTAACTCGCTTTCTCT
GTTACTTACTTCGTTAGAAATAATTTTGAAAATATCTGGTGTATAGTTTGGAACTTCTAA
ATCTAAAGGCTGATTTACAAATTTTGTTAAGCAACCTTCACCAGTTACTAATAAATGAAA
TCTTGATGGTCTTAATTCTTCTTGGATGTATAAATTTGGCGGAGCAGATTGTAATGGATT
TAAGTCTTCCCCACGTGGAGTACCATCGACGTGCCAACCAGGGATTGC

G12a:

>G12a Frame -1
AIPGWHVDGTPRGEDLNPLQSAPPNLYIQEELRPSRFHLLVTGEGCLTKFVNQPLDLEVPNYTPDIFKII
SNEVSNRESELNILTAPSCQVVEFDWWDIHTGITATKSEWRYLIRVTETDLCPPETAFT

>G12b Frame -1 continued.
MKNEALIIVDMSNDFVADNGSLTAGKPAQEIVPYIIDQANEFLGNGGTVVIA
 

Frame -1 now seems like the end of one gene; a promoter; and the beginning of another.

     628 gcaatccctggttggcacgtcgatggtactccacgtggggaagacttaaatccattacaa
         A  I  P  G  W  H  V  D  G  T  P  R  G  E  D  L  N  P  L  Q
     568 tctgctccgccaaatttatacatccaagaagaattaagaccatcaagatttcatttatta
         S  A  P  P  N  L  Y  I  Q  E  E  L  R  P  S  R  F  H  L  L
     508 gtaactggtgaaggttgcttaacaaaatttgtaaatcagcctttagatttagaagttcca
         V  T  G  E  G  C  L  T  K  F  V  N  Q  P  L  D  L  E  V  P
     448 aactatacaccagatattttcaaaattatttctaacgaagtaagtaacagagaaagcgag
         N  Y  T  P  D  I  F  K  I  I  S  N  E  V  S  N  R  E  S  E
     388 ttaaatattctaacagctccttcttgccaagttgtagaattcgattggtgggacatccac
         L  N  I  L  T  A  P  S  C  Q  V  V  E  F  D  W  W  D  I  H
     328 actggtattactgctacaaagagtgaatggagatatttaattagagtaactgaaactgat
         T  G  I  T  A  T  K  S  E  W  R  Y  L  I  R  V  T  E  T  D
     268 ctttgtcctccagaaacagcatttacgtaaagtaattaggatgcaacaacaagtttattc
         L  C  P  P  E  T  A  F  T  *  S  N  *  D  A  T  T  S  L  F
     208 aacacctaattttgggtggtaaaaataaaaaaataaaaaggggattttaaaatgaagaat
         N  T  *  F  W  V  V  K  I  K  K  *  K  G  D  F  K  M  K  N
     148 gaagcattaattattgttgatatgagtaatgattttgtggcagataacggaagtttaaca
         E  A  L  I  I  V  D  M  S  N  D  F  V  A  D  N  G  S  L  T
      88 gctggtaaaccagcacaagaaattgttccatacattattgatcaagctaatgaattttta
         A  G  K  P  A  Q  E  I  V  P  Y  I  I  D  Q  A  N  E  F  L
      28 ggtaatggtgggacagttgtaatcgcta 1
         G  N  G  G  T  V  V  I  A
 

G12a: best match in "tailed phages" [orgn] is

>gi|9632790|ref|NP_049845.1| NDP reductase [Enterobacteria phage T4]
 gi|417656|sp|P32282|RIR1_BPT4 RIBONUCLEOSIDE-DIPHOSPHATE REDUCTASE ALPHA CHAIN (RIBONUCLEOTIDE
           REDUCTASE) (B1 PROTEIN)
 gi|509024|gb|AAA32527.1| (J03968) ribonucleoside diphosphate reductase [Enterobacteria phage
           T4]
 gi|5354414|gb|AAD42621.1|AF158101_208 (AF158101) NDP reductase [Enterobacteria phage T4]
          Length = 754

 Score = 25.4 bits (54), Expect = 0.84
 Identities = 24/101 (23%), Positives = 40/101 (38%), Gaps = 15/101 (14%)

Query: 12  RGEDLNPLQSAPPNLYIQEELRPSRFHLLVTGEGCLTKFVNQPLDLEVPNYTPDI----- 66
           R  +L       PN+  +       F LL+T      +   Q +D    NYTP I
Sbjct: 372 RFRELYEAAEKDPNIRKKRIKARELFELLMTERSGTARIYVQFID-NTNNYTPFIREKAP 430

Query: 67  ---------FKIISNEVSNRESELNILTAPSCQVVEFDWWD 98
                      I +N+V++ ++E+ + T  +  +  FDW D
Sbjct: 431 IRQSNLCCEIAIPTNDVNSPDAEIGLCTLSAFVLDNFDWQD 471
 

G12b matches:

emb|CAB15164| (Z99120) similar to pyrazinamidase/nicotinamidase [Bacillus
           subtilis]
           Length = 183
 
 Score = 49.2 bits (115), Expect = 3e-05
 Identities = 23/49 (46%), Positives = 33/49 (66%)

Query: 161 EALIIVDMSNDFVADNGSLTAGKPAQEIVPYIIDQANEFLGNGGTVVIA 209
           +ALI +D +NDFVA +G LT G+P + I   I++   EF+ NG  VV+A
Sbjct: 3   KALICIDYTNDFVASDGKLTCGEPGRMIEEAIVNLTKEFITNGDYVVLA 51

TBLASTX found homologues in several bacteria in the unfinished microbial
genomes section.


G14

267 bp fragment, completely open frame; RNAse H
 

 Summary: matches some unidentified frames in S. aureus and mycoplasma
 and a Ent faecalis protein indicated to be a cell wall enzyme,
 possibly hydrolase.  It more broadly belongs to the RNAse H domain family.

 >G14
 AACTACTTCCCATGATTTTAATGATTGTAATAAATTCTTT
 GCCTCATCATAATAAGGAATTATTGAATCTTTATGCGCAGAATACTTATTATTCATCTGC
 TTTTCTATTAAATTACTATCAGTATAAAAAACAACATGTTCTACATTTAGATCTATAGCG
 CACTTTAATGCTTTAATAAATGAACAATACTCTGCTTGATTATTATCTATAACCCCAAGA
 TAAAAAGATTCTTGAAGTAATACGCTGTCATTACTTTTAATAACCAT

One completely open frame:

>G14 Frame -1
MVIKSNDSVLLQESFYLGVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQMNNKYSAHKDSII
PYYDEAKNLLQSLKSWEVV
 
 ORFfinder:
 
 seq4\frame-1

     267 atggttattaaaagtaatgacagcgtattacttcaagaatctttttatcttggggttata
         M  V  I  K  S  N  D  S  V  L  L  Q  E  S  F  Y  L  G  V  I
     207 gataataatcaagcagagtattgttcatttattaaagcattaaagtgcgctatagatcta
         D  N  N  Q  A  E  Y  C  S  F  I  K  A  L  K  C  A  I  D  L
     147 aatgtagaacatgttgttttttatactgatagtaatttaatagaaaagcagatgaataat
         N  V  E  H  V  V  F  Y  T  D  S  N  L  I  E  K  Q  M  N  N
      87 aagtattctgcgcataaagattcaataattccttattatgatgaggcaaagaatttatta
         K  Y  S  A  H  K  D  S  I  I  P  Y  Y  D  E  A  K  N  L  L
      27 caatcattaaaatcatgggaagtagtt 1
         Q  S  L  K  S  W  E  V  V

CD search identifies RNAseH homology

gnl|Pfam|pfam00075  rnaseH, RNase H  37.0  5e-04

Query:  12   QESFYLGVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQMNN  60
Sbjct:  30   TFSKPLGATTNQRAELIALIEALEALAP---QPVNIYTDSQYVIKGITN  75

                       10        20        30        40        50        60
                 ....*....|....*....|....*....|....*....|....*....|....*....|
 consensus     1 PNA-VTVYTDGSCSGN---PGTG--GAGY-VL-WGG--------R--TFS-KPLG--ATT 39
 query         1 ------------------------------------mviksndsvllQES-FYLG--VID 21
 1RIL          6 RKR-VALFTDGACLGN---PGPG--GWAA-LLrFHA--------H--EKL-LSGGeaCTT 47
 gi 400825   603 EFA-MVFYTDGSAIKHpdvNKSH--SAGMgIA-QVQfipeykivH--QWS-IPLG--DHT 653
 gi 2352039  591 YEA-I-FYTDGSAIRSpkpNKTH--SAGMgII-QAKfepdfrivH--LWS-FPLG--DHT 640
 gi 3123539  587 NFQ-HIFYTDGSAITSp--TKEGhlNAGMgIV-Yfinkd-gnlqKqqEWS-ISLG--NHT 638
 gi 1350805  185 NKS-MNVYCDGSSFGNg--TSSS--RAGY-GA-YFEg------aPeeNIS-EPLLsgAQT 230
 gi 687799   131 PKGtLVMYTDGSYLKR---PPTS--GIGI-FVgP-G--------HelNRS-QRIRgpIQD 174
 gi 1730902    2 PTE-I--YVDGASAGN---PGPS--GIGI-FIkHEG--------KaeSFS-IPIG--VHT 41
 gi 544225     1 ---mlrIYVDAATKGN---PGES--GGGIvYLtDQS--------R--QQLhVPLG--IVS 40

                         70        80        90       100       110       120
                 ....*....|....*....|....*....|....*....|....*....|....*....|
 consensus    40 NQRAELIALIEALE-AL-----AP-------QPVN--IYTDSQYVIKGITN--L------ 76
 query        22 NNQAEYCSFIKALK-CA-----IDln----vEHVV--FYTDSNLIEKQMNNkysahkdsi 69
 1RIL         48 NNRMELKAAIEGLK-ALk----EP-------CEVD--LYTDSHYLKKAFTEgwLegw--- 90
 gi 400825   654 AQLAEIAAVEFACKkALk----IS-------GPVL--IVTDSFYVAESANKe-Lpywksn 699
 gi 2352039  641 AQYAEIAAFEFAIRrATg----IR-------GPVL--IVTDSNYVAKSYNEe-Lpywesn 686
 gi 3123539  639 AQFAEIAAFEFALKkCLp----LG-------GNIL--VVTDSNYVAKAYNEe-Ldvwasn 684
 gi 1350805  231 NNRAEIEAVSEALK-K------IWekltnekEKVNyqIKTDSEYVTK-LLNd-Rymtyd- 280
 gi 687799   175 NNYAEFIAVRTALQnALknenyRD-------QKVV--IRTDCLNVIE-ALQg-------- 216
 gi 1730902   42 NQEAEFLALIEGMKlCAt----RGy------QSVS--FRTDSDIV-ERATEl-E------ 81
 gi 544225    41 NHEAEFKVLIEALKqA------IAned--nqQTVL--LHSDSKIVVQ-TIEkn------- 82

                        130       140       150       160       170       180
                 ....*....|....*....|....*....|....*....|....*....|....*....|
 consensus    77 ------GWptKSSSKPIKN------DIWQLL---LK-KHK-VYIGWVPGH-SGIp----- 113
 query        70 ipyydeaknllqslkswevv---------------------------------------- 89
 1RIL         91 ---rkrGWr-TAEGKPVKNr-----DLWEALllaMA-PHR-VRFHFVKGH-TGH------ 132
 gi 400825   700 gflnnkKKplRHVSK-WKS-------IAECL---QL-KPD-IIIMHEKGH-QQPmttlht 745
 gi 2352039  687 gfvnnkKKtlKHISK-WKA-------IAECK---NL-KAD-IHVIHEPGHqPAEasp-ha 732
 gi 3123539  685 gfvnnrKKplKHISK-WKS-------VADLK---RL-RPD-VVVTHEPGHqKLDssp-ha 730
 gi 1350805  281 -nkkleGLpnSDLIVPLVQrfvkvkKYYELNkecFKnNGK-FQIEWVKGH-DGD------ 331
 gi 687799   217 ------TRptAFVD--VKS--------QVEFl--SKqFPKgVHFQHVYAH-AGD------ 251
 gi 1730902   82 ------MVknITFQ-PFVE------EIIRLKaafPL-----FFIKWIPGK---------- 113
 gi 544225    83 -----yAKn-EKYQ-PYLA-------EYQQL---EK-NFPlLLIKWLP-----Es----- 114

                        190
                 ....*....|....*
 consensus   114 PGNELADELAKQGAS 128
 query           ---------------
 1RIL        133 PENERVDREARRQAQ 147
 gi 400825   746 eGNNLADKLATQGSY 760
 gi 2352039  733 qGNALADKQAVSGSY 747
 gi 3123539  731 yGNNLADQLATQASF 745
 gi 1350805  332 PGNEMADFLAKKGAS 346
 gi 687799   252 PGNEMADLFAGQASS 266
 gi 1730902  114 -QNQKADLLAKEAIR 127
 gi 544225   115 -QNKAADMLARQALQ 128

Psi-blast finds the strongest match to an Arabidopisis retroelement, although a Strep aureus protein is not far behind.

>gi|9927273|dbj|BAA96774.2| (AP002521) Similar to Arabidopsis thaliana chromosome II BAC F26H6;
            putative retroelement pol polyprotein (AC006920) [Oryza
            sativa]
 gi|9927274|dbj|BAB08213.2| (AP002539) Similar to Arabidopsis thaliana chromosome II BAC F26H6;
            putative retroelement pol polyprotein (AC006920) [Oryza
            sativa]
          Length = 2876

 Score = 53.1 bits (126), Expect = 4e-07
 Identities = 30/91 (32%), Positives = 48/91 (51%), Gaps = 2/91 (2%)

Query: 1    MVIKSNDSVLLQESFYL--GVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQM 58
            +V K+    ++  SF L      NN+AEY + I  L  A+ + V  +  + DS LI +Q+
Sbjct: 2308 LVFKTPQGGVIYHSFSLLKEECSNNEAEYEALIFGLLLALSMEVRSLRAHGDSRLIIRQI 2367

Query: 59   NNKYSAHKDSIIPYYDEAKNLLQSLKSWEVV 89
            NN Y   K  ++PYY  A+ L+   +  EV+
Sbjct: 2368 NNIYEVRKPELVPYYTVARRLMDKFEHIEVI 2398

>gi|13701231|dbj|BAB42526.1| (AP003133) ORFID:SA1266~hypothetical protein, similar to cell
          wall enzyme EbsB [Staphylococcus aureus subsp. aureus
          N315]
 gi|14247204|dbj|BAB57595.1| (AP003362) hypothetical protein [Staphylococcus aureus subsp.
          aureus Mu50]
          Length = 133

 Score = 40.4 bits (93), Expect = 0.003
 Identities = 23/59 (38%), Positives = 37/59 (61%), Gaps = 1/59 (1%)

Query: 17 LGVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQMNNKYSAHKDSIIPYYDE 75
          LG +DN+ AE+ + I AL+ A +LNV++ + YTDS LI   +   Y  + +   PY+D+
Sbjct: 36 LGEMDNHTAEWAACIYALEHARELNVQNALLYTDSKLIADSIEAGYVKNAN-FKPYFDQ 93

This domain is dominated by retroviral RNAse H's and cellular (including E. coli) RNAse H's.  I had to slog through a lot of retroviral hits to get to an established bacterial RNAse H, but the similarity wasn't that much less. (31%).  I followed the "cell wall" enzyme angle on the Staph aureus gene, but it didn't lead anywhere.

Didn't get a hit in "tailed phages" [orgn].


G17


G17: contains forward and reverse readings overlap.
1/6/2000

Unidentified; frame is open

>G17 Sch proofs
AATTTGGTGCAACAAATAACTGATATTATGGCTTCTGGAG
GACAGATGCAAATAGAATACAACGGAGAGTGGAAAACTATAGAACCATATGGTTGGAACT
CTTCCAATGCTGGAAACGTATTATTAATGTGTTATAAAGATACTGGTGAAGTTAGAAGTT
ATAGATTAGATAGAATGTCAAATGTACAATTTGATTCAAGTACTATTGATTTATCTCAAT
ATGGTTTAGAATCTGAAGATGTAGAGAATTTAGATAGTGTAGACGATGATAGTAATATTG
AAATCCCTACAATAGAAGATGATGGAAGTCAATCATTTATTGATGAACAACAAGAAATTG
AAACTCCATTTGATGATGCAATTGATGTTTTAGAGCAAATTGACGATAACTATTTAATTG
AAGATTTAAGACAAGTAGATAATACAGAATCTTTCGAACCAGTAAATGAAGAAGATTATG
ACCCTACTAATGA
 

ORFfinder:

>lcl|Sequence 1 Frame +1
NLVQQITDIMASGGQMQIEYNGEWKTIEPYGWNSSNAGNVLLMCYKDTGEVRSYRLDRMSNVQFDSSTID
LSQYGLESEDVENLDSVDDDSNIEIPTIEDDGSQSFIDEQQEIETPFDDAIDVLEQIDDNYLIEDLRQVD
NTESFEPVNEEDYDPTN

Psi-blast finds this weak match in nr:

>gi|10835405|pdb|1FEZ|A Chain A, The Crystal Structure Of Bacillus Cereus
           Phosphonoacetaldehyde Hydrolase Complexed With
           Tungstate, A Product Analog

          Length = 256

 Score = 36.2 bits (82), Expect = 0.085
 Identities = 29/104 (27%), Positives = 45/104 (42%), Gaps = 27/104 (25%)

Query: 29  PYGWNSSNAGNVLLMCYKDTGEVRSYRLDRMSNVQFDSS---------------TIDLSQ 73
           PY W          MCYK+  E+  Y ++ M  V    S                +  S+
Sbjct: 157 PYPW----------MCYKNAMELGVYPMNHMIKVGDTVSDMKEGRNAGMWTVGVILGSSE 206

Query: 74  YGLESEDVENLDSVDDDSNIEI--PTIEDDGSQSFIDEQQEIET 115
            GL  E+VEN+DSV+    IE+      ++G+   I+  QE+E+
Sbjct: 207 LGLTEEEVENMDSVELREKIEVVRNRFVENGAHFTIETMQELES 250
 

Prodomain gave a match to HIV tat, but since it's almost all acidic
amino acid matches, I don't believe it.
 

The following came up, but also has lots of acidic matches:

emb|CAA76602| (Y17045) NAD(P)H-dependent glutamate synthase [Plasmodium falciparum]
            Length = 3097
 

This gene in turn is not grouped with others in Prodomain.


G18

G18: forward and reverse readings overlap
1/18/2000
Unidentified open frame all the way through; 493 bp

>G18
TTAATGACACGAGATAATGGTGAAGGTGGTATGGTTGATA
TTTATGTTAGGGCAGAAGATGCTGAAGAATATAAAGAAACATTCTACGTTAGTGATGAAT
ATACAACAGGAACAATACATGTAAAACCTTATGATAATATTATCCCAAAGAAACAGCCTA
TTATAATGATAGATCGGATAATAGGCAGAATTCCAAATAGTTCTTCTGAGACTGGATATG
ATGAAAGAGTATATATTAATGGTTCTAACTACAAAAAAGAAAAAGGTTCAAATAAGTATC
ATCGAGATATTCTGTGGAATTTTGCAGATGTTCCAGTTGAAGGTTTAACTGATGATGACT
TATTAGAAGCAACTGCTATTAATGTTTTAAATGTCTTACTTAAAAATATGACTTATTTAA
AAGATATTAAGTATGATATTGATTGGCAAATGA

>G18 Frame +1
LMTRDNGEGGMVDIYVRAEDAEEYKETFYVSDEYTTGTIHVKPYDNIIPKKQPIIMIDRIIGRIPNSSSE
TGYDERVYINGSNYKKEKGSNKYHRDILWNFADVPVEGLTDDDLLEATAINVLNVLLKNMTYLKDIKYDI
DWQM

Open frame 1 gave no hits in microbial genomes, nr, or prodomain.



G25

weak seq. in middle of contig.
 

>G25
AATGAACTTGGATCAGCTTTACCCGGCGCATCATTTCCTT
TTGGCCCACCTGGCGTTGGTGGCCCGCCCGGTTTATTTCCATTATTAGGGCCAGGGCTGT
CAGGCGTTGTTGGAGGTTTTGGTGCTCCACCACTAGGTCCTCCAGGTGCGCCTCCACCGC
TTGGTGCTTTAGGAGCATTAGTATCAAATACAGAACCACGTTCTTGCTCAAGATTTTTAC
GCTCTGTTTCTGGGTCAAGTTTAAGCATTGGTAATATCGTGCTCATAGAAACAAGTCCCT
TAGATTGTAATTGTTGAATAAAGTTAAGAACAGATTGATTTGAAGTTAAATCTTGTTGA
CTCCAAGAAATCTCAGGACAATTAATTGCATTTCTTTTCTATCAGCAAGTTTTCTTTGTT
CTTTTGAACTTAAGTATCTTCTTGCGACAGTGCCATTTGTAGGTTTATAAAAACCTTGT
ATTTCAGAAATTGGTCTATAAACTTTTTGTCTAATCCAAGACTCTAATCTTAAACGATAT
GACATGTATCTTCTAGCTAATGCATCCATACCAACTTGAGCATTTGAATATGTTGGACCT
TCACCGTTTAACATAGCTTGGTTAATGCCTAAACCGTTCATAAGCTCTTTTTGAATGAAA
TCAAATTCTTGGTTTAATGGTAAAATTCTACCAGCAGAACCTACATAGTCAGTTTGTAAT
GCATAGTGATATACAAGCATGAAGTT
 
 
 

Prodomain: nothing found.

>lcl|Sequence 1 Frame +1
NELGSALPGASFPFGPPGVGGPPGLFPLLGPGLSGVVGGFGAPPLGPPGAPPPLGALGALVSNTEPRSCS
RFLRSVSGSSLSIGNIVLIETSPLDCNC*IKLRTD*FEVKSC*LQEISGQLIAFLFYQQVFFVLLNLSIF
LRQCHL*VYKNLVFQKLVYKLFV*SKTLILNDMTCIF*LMHPYQLEHLNMLDLHRLT*LG*CLNRS*ALF
E*NQILGLMVKFYQQNLHSQFVMHSDIQA*S
>lcl|Sequence 2 Frame +2
MNLDQLYPAHHFLLAHLALVARPVYFHY*GQGCQALLEVLVLHH*VLQVRLHRLVL*EH*YQIQNHVLAQ
DFYALFLGQV*ALVISCS*KQVP*IVIVE*S*EQIDLKLNLVDSKKSQDN*LHFFSISKFSLFF*T*VSS
CDSAICRFIKTLYFRNWSINFLSNPRL*S*TI*HVSSS*CIHTNLSI*ICWTFTV*HSLVNA*TVHKLFL
NEIKFLV*W*NSTSRTYIVSL*CIVIYKHEV
>lcl|Sequence 3 Frame +3
*TWISFTRRIISFWPTWRWWPARFISIIRARAVRRCWRFWCSTTRSSRCASTAWCFRSISIKYRTTFLLK
IFTLCFWVKFKHW*YRAHRNKSLRL*LLNKVKNRLI*S*ILLTPRNLRTINCISFLSASFLCSFELKYLL
ATVPFVGL*KPCISEIGL*TFCLIQDSNLKRYDMYLLANASIPT*AFEYVGPSPFNIAWLMPKPFISSF*
MKSNSWFNGKILPAEPT*SVCNA**YTSMKX
>lcl|Sequence 4 Frame -1
NFMLVYHYALQTDYVGSAGRILPLNQEFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSY
RLRLESWIRQKVYRPISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQLIVLRFLGVNKI*LQINLFL
TLFNNYNLRDLFL*ARYYQCLNLTQKQSVKILSKNVVLYLILMLLKHQAVEAHLEDLVVEHQNLQQRLTA
LALIMEINRAGHQRQVGQKEMMRRVKLIQVH
>lcl|Sequence 5 Frame -2
TSCLYITMHYKLTM*VLLVEFYH*TKNLISFKKSL*TV*ALTKLC*TVKVQHIQMLKLVWMH*LEDTCHI
V*D*SLGLDKKFIDQFLKYKVFINLQMALSQEDT*VQKNKENLLIEKKCN*LS*DFLESTRFNFKSICS*
LYSTITI*GTCFYEHDITNA*T*PRNRA*KS*ARTWFCI*Y*CS*STKRWRRTWRT*WWSTKTSNNA*QP
WP**WK*TGRATNARWAKRK*CAG*S*SKFI
>lcl|Sequence 6 Frame -3
LHACISLCITN*LCRFCW*NFTIKPRI*FHSKRAYERFRH*PSYVKR*RSNIFKCSSWYGCIS*KIHVIS
FKIRVLD*TKSL*TNF*NTRFL*TYKWHCRKKILKFKRTKKTC**KRNAINCPEISWSQQDLTSNQSVLN
FIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFDTNAPKAPSGGGAPGGPSGGAPKPPTTPDSP
GPNNGNKPGGPPTPGGPKGNDAPGKADPSSX

Several possible long frames.  -1 and -3 could be fused by frameshifting.
None of them hit anything in prodomain or prodomain CG

Frame -1 gave the following weak hit by blastp against nr.
 

>gi|9630180|ref|NP_046607.1| unknown [Bacteriophage SPBc2]
 gi|7519724|pir||T12819 hypothetical protein yonE - Bacillus subtilis phage SPBc2
 gi|2634532|emb|CAB14030.1| (Z99115) yonE [Bacillus subtilis]
 gi|3025533|gb|AAC13028.1| (AF020713) unknown [Bacteriophage SPBc2]
          Length = 506

 Score = 36.6 bits (83), Expect = 0.20
 Identities = 16/70 (22%), Positives = 37/70 (52%), Gaps = 3/70 (4%)
 Frame = -1

Query: 646 EFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSYRLRLESWIRQKVYR-- 473
           +FD I  ++ +  G++ ++LNG+G  Y+ + + +D   +R       +E  + QK++
Sbjct: 337 KFDHINSDIQSAYGLSGSLLNGDGGNYATSSLNLDTFYKRIGVLMEDIEQEVYQKLFNLV 396

Query: 472 -PISEIQGFY 446
            P ++   +Y
Sbjct: 397 LPAAQKDNYY 406

Frame +1 has lots of prolines and glycines and raises all kinds of proline
 and glycine rich stuff.

The first part matches AAC13028.1 unk in phage/full at 0.002, the match is visibly the same motif as yonE above.  psiblast in phage full started to bring in portal proteins at < 0.05  eg. AAD41031.1

The second part matched CAB16752.1 gp40 at 0.006.  It was compositionally compromised but a shuffle 1,100 gave 98% confidence.

GeneMark (heuristic) favored these two frames:

Predicted genes
   Gene    Strand    LeftEnd    RightEnd       Gene     Class
    #                                         Length
    1        -          <3         248          246        1
    2        -         326        >724          399        1

>G25a Frame -1
NFMLVYHYALQTDYVGSAGRILPLNQEFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSY
RLRLESWIRQKVYRPISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQLIVLRFLGVNKI

>G25b Frame -3
KRNAINCPEISWSQQDLTSNQSVLN
FIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFDTNAPKAPSGGGAPGGPSGGAPKPPTTPDSP
GPNNGNKPGGPPTPGGPKGNDAPGKADPSS

Note: these two overlap, and might be the same frame disrupted by sequencing error in the middle region of the contig.

G25a is the one that finds YonE, here in bacteria only:

>gi|9630180|ref|NP_046607.1| unknown [Bacteriophage SPBc2]
 gi|7519724|pir||T12819 hypothetical protein yonE - Bacillus subtilis phage SPBc2
 gi|2634532|emb|CAB14030.1| (Z99115) yonE [Bacillus subtilis]
 gi|3025533|gb|AAC13028.1| (AF020713) unknown [Bacteriophage SPBc2]
          Length = 506

 Score = 34.7 bits (78), Expect = 0.030
 Identities = 16/70 (22%), Positives = 37/70 (52%), Gaps = 3/70 (4%)

Query: 27  EFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSYRLRLESWIRQKVYR-- 84
           +FD I  ++ +  G++ ++LNG+G  Y+ + + +D   +R       +E  + QK++
Sbjct: 337 KFDHINSDIQSAYGLSGSLLNGDGGNYATSSLNLDTFYKRIGVLMEDIEQEVYQKLFNLV 396

Query: 85  -PISEIQGFY 93
            P ++   +Y
Sbjct: 397 LPAAQKDNYY 406

but G25b finds a YonE homologue at e=2.7; however, I think it really is the end of G25a.

>gi|15024052|gb|AAK79106.1|AE007629_12 (AE007629) Phage related protein, YonE B.subtilis homolog
           [Clostridium acetobutylicum]
          Length = 505

 Score = 30.4 bits (67), Expect = 0.66
 Identities = 18/58 (31%), Positives = 27/58 (46%)

Query: 8   PEISWSQQDLTSNQSVLNFIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFD 65
           P   +   +L S +   + + QL + GL+S  T L  LK D + E+K  E E     D
Sbjct: 405 PSFKFESLNLQSEKDFRSEVMQLYTFGLLSRETTLSELKFDFKQEKKRRESENSENLD 462

So I arbitrarily fused them to make:

>G25 fused
NFMLVYHYALQTDYVGSAGRILPLNQEFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSY
RLRLESWIRQKVYRPISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQL
PEISWSQQDLTSNQSVLN
FIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFDTNAPKAPSGGGAPGGPSGGAPKPPTTPDSP
GPNNGNKPGGPPTPGGPKGNDAPGKADPSS

That allowed this match:

>gi|15024052|gb|AAK79106.1|AE007629_12 (AE007629) Phage related protein, YonE B.subtilis homolog
           [Clostridium acetobutylicum]
          Length = 505

 Score = 34.7 bits (78), Expect = 0.18
 Identities = 31/155 (20%), Positives = 65/155 (41%), Gaps = 27/155 (17%)

Query: 27  EFDFIQKELMNGLGINQAMLNGEG--PTYSNAQVGMDALARRYMSYRLRLESWIRQKVYR 84
           ++  + + +++ +GI+  ++ G G   +++ A + +  LA+R    + ++  +I   + R
Sbjct: 333 KYQTVNESILSSIGISAIVVTGNGGSGSFAQASINLSTLAKRIKDGQNKIAKFINLLLKR 392

Query: 85  PISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQLPEISWSQQDLTSNQSVLNFIQQL 144
             S   G    ++G                      +P   +   +L S +   + + QL
Sbjct: 393 KFS---GRSSTSDG----------------------IPSFKFESLNLQSEKDFRSEVMQL 427

Query: 145 QSKGLVSMSTILPMLKLDPETERKNLEQERGSVFD 179
            + GL+S  T L  LK D + E+K  E E     D
Sbjct: 428 YTFGLLSRETTLSELKFDFKQEKKRRESENSENLD 462

However, I wasn't able to coax it any further.
 



G26

seq. is weak in middle.
One matchs a small seg. in an unidentified Staph gene but doesn't
seem extendable.  The other is unidentified.

>G26
CTATACTCCCTCCTAAATATCAAAAAAGTAGTACCTTCTT
CGTCGTTTTTTCTTTCTTCTTCAATAGCAACCTTACTTCTTAACTTAGCGAATGCTACCA
ATTGATCAAGTGACATATTGTATTTTTTAGCTACGCCTTCAATCGCACCAATCATGTCAG
CTAATTCAATTAAAAGCATTAAGTCTTGGCCTTGTTCTTCAGCATCATATGCTTCCTGTA
ATTCTTCTTCGATTTTTGATAATTCGCCATAAATACCTTTTTGAATTTTACGATTATGGA
ATCCAGACATGGAAACTCTCCTTTAAAAAATATTTAAAGAGAGCTGAAAATCAGCTCTTT
TAAAATACCTCTAGATTTTTTCAAATGCTACCTTTAAGATATGTTTACAAATAGCGCCAC
GGAATTGGTGATGAGGACAGTTACAATCAACAACCTTCTCAGCATCAACTTTTACTAAAT
AACCCTTGTCTTCTTCATGGTTAATAACAATGTATTCTAAACCGTCTTTACCAGAATCTA
AAATTCGGAAGCTTTCGGATTCAAATGTTTCAATTGATTTACGTAGAGCAGCTAATTTCG
TAGTACCTTTTTTTGGAGCAACTTCGCTACGGCCATCGTAAGGTTTTTTAATGTTTTCAG
TAACTTCTGCTTCATTTGAAGGAACAT
 

Picked up following by TBLASTX in microbial genomes:

gnl|OUACGT_1280|s.aureus_Contig859 Staphylococcus aureus NCTC 8325 unfinished fragment of complete
            genome
            Length = 2663
 
 Score = 37.3 bits (75), Expect = 0.11
 Identities = 12/21 (57%), Positives = 14/21 (66%)
 Frame = -2 / +2

 
Query: 425  CNCPHHQFRGAICKHMLKVAF 363
            CNCPH   R  ICKHM+ + F
Sbjct: 2252 CNCPHADGRRVICKHMIALLF 2314

 gnl|TIGR_1280|S.aureus_4348 Staphylococcus aureus COL unfinished fragment of complete genome
            Length = 49859
 
 Score = 37.3 bits (75), Expect = 0.11
 Identities = 12/21 (57%), Positives = 14/21 (66%)
 Frame = -2 / -3

 
Query: 425  CNCPHHQFRGAICKHMLKVAF 363
            CNCPH   R  ICKHM+ + F
Sbjct: 9594 CNCPHADGRRVICKHMIALLF 9532

Frame -3 open except for two terminators in region that is suspect.

     665 gttccttcaaatgaagcagaagttactgaaaacattaaaaaaccttacgatggccgtagc
         V  P  S  N  E  A  E  V  T  E  N  I  K  K  P  Y  D  G  R  S
     605 gaagttgctccaaaaaaaggtactacgaaattagctgctctacgtaaatcaattgaaaca
         E  V  A  P  K  K  G  T  T  K  L  A  A  L  R  K  S  I  E  T
     545 tttgaatccgaaagcttccgaattttagattctggtaaagacggtttagaatacattgtt
         F  E  S  E  S  F  R  I  L  D  S  G  K  D  G  L  E  Y  I  V
     485 attaaccatgaagaagacaagggttatttagtaaaagttgatgctgagaaggttgttgat
         I  N  H  E  E  D  K  G  Y  L  V  K  V  D  A  E  K  V  V  D
     425 tgtaactgtcctcatcaccaattccgtggcgctatttgtaaacatatcttaaaggtagca
         C  N  C  P  H  H  Q  F  R  G  A  I  C  K  H  I  L  K  V  A
     365 tttgaaaaaatctagaggtattttaaaagagctgattttcagctctctttaaatattttt
         F  E  K  I  *  R  Y  F  K  R  A  D  F  Q  L  S  L  N  I  F
     305 taaaggagagtttccatgtctggattccataatcgtaaaattcaaaaaggtatttatggc
         *  R  R  V  S  M  S  G  F  H  N  R  K  I  Q  K  G  I  Y  G
     245 gaattatcaaaaatcgaagaagaattacaggaagcatatgatgctgaagaacaaggccaa
         E  L  S  K  I  E  E  E  L  Q  E  A  Y  D  A  E  E  Q  G  Q
     185 gacttaatgcttttaattgaattagctgacatgattggtgcgattgaaggcgtagctaaa
         D  L  M  L  L  I  E  L  A  D  M  I  G  A  I  E  G  V  A  K
     125 aaatacaatatgtcacttgatcaattggtagcattcgctaagttaagaagtaaggttgct
         K  Y  N  M  S  L  D  Q  L  V  A  F  A  K  L  R  S  K  V  A
      65 attgaagaagaaagaaaaaacgacgaagaaggtactacttttttgatatttaggagggag
         I  E  E  E  R  K  N  D  E  E  G  T  T  F  L  I  F  R  R  E
       5 tatag 1
         Y
 
 
 

>lcl|Sequence 1 Frame +1
LYSLLNIKKVVPSSSFFLSSSIATLLLNLANATN*SSDILYFLATPSIAPIMSANSIKSIKSWPCSSASY
ASCNSSSIFDNSP*IPF*ILRLWNPDMETLL*KIFKES*KSALLKYL*IFSNATFKICLQIAPRNW**GQ
LQSTTFSASTFTK*PLSSSWLITMYSKPSLPESKIRKLSDSNVSIDLRRAANFVVPFFGATSLRPS*GFL
MFSVTSASFEGT
>lcl|Sequence 2 Frame +2
YTPS*ISKK*YLLRRFFFLLQ*QPYFLT*RMLPIDQVTYCIF*LRLQSHQSCQLIQLKALSLGLVLQHHM
LPVILLRFLIIRHKYLFEFYDYGIQTWKLSFKKYLKRAENQLF*NTSRFFQMLPLRYVYK*RHGIGDEDS
YNQQPSQHQLLLNNPCLLHG**QCILNRLYQNLKFGSFRIQMFQLIYVEQLIS*YLFLEQLRYGHRKVF*
CFQ*LLLHLKEH
>lcl|Sequence 3 Frame +3
ILPPKYQKSSTFFVVFSFFFNSNLTS*LSECYQLIK*HIVFFSYAFNRTNHVS*FN*KH*VLALFFSIIC
FL*FFFDF**FAINTFLNFTIMESRHGNSPLKNI*RELKISSFKIPLDFFKCYL*DMFTNSATELVMRTV
TINNLLSINFY*ITLVFFMVNNNVF*TVFTRI*NSEAFGFKCFN*FT*SS*FRSTFFWSNFATAIVRFFN
VFSNFCFI*RNX
>lcl|Sequence 4 Frame -1
MFLQMKQKLLKTLKNLTMAVAKLLQKKVLRN*LLYVNQLKHLNPKASEF*ILVKTV*NTLLLTMKKTRVI
**KLMLRRLLIVTVLITNSVALFVNIS*R*HLKKSRGILKELIFSSL*IFFKGEFPCLDSIIVKFKKVFM
ANYQKSKKNYRKHMMLKNKAKT*CF*LN*LT*LVRLKA*LKNTICHLINW*HSLS*EVRLLLKKKEKTTK
KVLLF*YLGGSI
>lcl|Sequence 5 Frame -2
CSFK*SRSY*KH*KTLRWP*RSCSKKRYYEISCST*IN*NI*IRKLPNFRFW*RRFRIHCY*P*RRQGLF
SKS*C*EGC*L*LSSSPIPWRYL*TYLKGSI*KNLEVF*KS*FSALFKYFLKESFHVWIP*S*NSKRYLW
RIIKNRRRITGSI*C*RTRPRLNAFN*IS*HDWCD*RRS*KIQYVT*SIGSIR*VKK*GCY*RRKKKRRR
RYYFFDI*EGV*
>lcl|Sequence 6 Frame -3
VPSNEAEVTENIKKPYDGRSEVAPKKGTTKLAALRKSIETFESESFRILDSGKDGLEYIVINHEEDKGYL
VKVDAEKVVDCNCPHHQFRGAICKHILKVAFEKI*RYFKRADFQLSLNIF*RRVSMSGFHNRKIQKGIYG
ELSKIEEELQEAYDAEEQGQDLMLLIELADMIGAIEGVAKKYNMSLDQLVAFAKLRSKVAIEEERKNDEE
GTTFLIFRREYX

The staph seq. Is:

TGAGTTAAAACATGTGACTTTTGGTTATAATAAAAAGCAGATGGTGCTACAAGATATCAA
TATTACTATACCTGATGGAGAAAATGTTGGTATTTTAGGCGAAAGTGGCTGTGGTAAAAG
TACGCTCGCTTCATTGGTTCTTGGCTTGTTTAAACCTGTTAAAGGAGAGATTTACTTAAG
TGACAATGCTGTGTTACCGATTTTCCAACACCCTTTAACTAGCTTTAACCCTGATTGGAC
GATTGAGACCTCATTAAAAGAAGCGTTATATTATTACAGAGGTCTAACTGATAATACTGC
TCAGGATCAATTATTATTACAACATTTATCTACTTTTGAGTTAAACGCGCAATTATTGAC
TAAATTACCAAGCGAAGTGAGTGGCGGACAATTACAAAGATTTAATGTCATGCGTTCGTT
ATTAGCACAGCCTCGCGTTTTAATATGTGATGAGATAACTTCAAATTTAGATGTTATAGC
TGAACAAAATGTAATCAATATATTAAAAGCGCAAACGATTACGAACTTAAATCATTTTAT
CGTTATTTCTCATGATTTATCCGTGTTACAACGCTTAGTTAATAGAATTATCGTTCTTAA
GGATGGCATGATAGTCGATGATTTTGCAATAGAGGAATTATTTAATGTTGATAGACACCC
TTATACAAAAGAATTAGTGCAAGCATTTTCATATTAGTTATTTAAGAATGCGATAATTCT
AGACTTGTTATAAAATATAGATAAATCAAGTATTTTAATCTAGACACTTATCTATTTTAT
TTTCTTTATTTAAAAATAATAATAAAAAGGAGTATCATTAATGGGATTACTTGATATTGC
AAGTATTCGTTCTATAGAAAGAGGCTTTAATTATTATCAAAGTGAATGCGTCATTAACTT
AAAATCATTTTCAGAAACGCAGCATGAGGCTGAAGTAAAGGGCAGTGGCAACAAAGTATA
TCGTTGTTATATTGATATGGAACATCCTAGAAAATCCATATGTAATTGTCCTCATGCTGA
TGGAAGACGAGTGATATGTAAACATATGATTGCATTACTCTTTACAGCTAGTCCAGAAGC
AGCAAATAAACATATAATGATGTTAAACGAAGTTGAAGAAGACTATCAATTACGCAGAAA
TATGTGGATTGATAGTCTTAAAGAAATGATTAATGATATGAGTGAAGAAGAACTCCGCGA
TGCATATTTAAACATGTTAATTGAACATGGAGAAATGGCAGAATTATTTGGATTAGATGA
AGAGGAAGAAATGTTCGAGGACGAATTTTATTAAAATAGCCCCTATCGATTGATAATGAT
TATCATTTGATAGGGGTGTTTTTATTTATATGATTTTAAGACTTTGCAAATAGCTTGTGC
ATAATAATTGATGCGTTAGACTTTATCAACTG

>Unidentified staph gene
MGLLDIA
SIRSIERGFNYYQSECVINLKSFSETQHEAEVKGSGNKVYRCYIDMEHPRKSICNCPHADGRRVICKHMI
ALLFTASPEAANKHIMMLNEVEEDYQLRRNMWIDSLKEMINDMSEEELRDAYLNMLIEHGEMAELFGLDE
EEEMFEDEFY

     821 atgggattacttgatattgcaagtattcgttctatagaaagaggc
         M  G  L  L  D  I  A  S  I  R  S  I  E  R  G
     866 tttaattattatcaaagtgaatgcgtcattaacttaaaatcattt
         F  N  Y  Y  Q  S  E  C  V  I  N  L  K  S  F
     911 tcagaaacgcagcatgaggctgaagtaaagggcagtggcaacaaa
         S  E  T  Q  H  E  A  E  V  K  G  S  G  N  K
     956 gtatatcgttgttatattgatatggaacatcctagaaaatccata
         V  Y  R  C  Y  I  D  M  E  H  P  R  K  S  I
    1001 tgtaattgtcctcatgctgatggaagacgagtgatatgtaaacat
         C  N  C  P  H  A  D  G  R  R  V  I  C  K  H
    1046 atgattgcattactctttacagctagtccagaagcagcaaataaa
         M  I  A  L  L  F  T  A  S  P  E  A  A  N  K
    1091 catataatgatgttaaacgaagttgaagaagactatcaattacgc
         H  I  M  M  L  N  E  V  E  E  D  Y  Q  L  R
    1136 agaaatatgtggattgatagtcttaaagaaatgattaatgatatg
         R  N  M  W  I  D  S  L  K  E  M  I  N  D  M
    1181 agtgaagaagaactccgcgatgcatatttaaacatgttaattgaa
         S  E  E  E  L  R  D  A  Y  L  N  M  L  I  E
    1226 catggagaaatggcagaattatttggattagatgaagaggaagaa
         H  G  E  M  A  E  L  F  G  L  D  E  E  E  E
    1271 atgttcgaggacgaattttattaa 1294
         M  F  E  D  E  F  Y  *

Which in turn matched:

emb|CAA67095.1| (X98455) SNF [Bacillus cereus]
          Length = 1064
 
 Score = 36.0 bits (81), Expect = 0.19
 Identities = 21/78 (26%), Positives = 38/78 (47%), Gaps = 3/78 (3%)

Query: 10 DIASIRSIERGFNYYQSECVINLKSFSETQHEAEVKGSGNKVYRCYIDMEHPRKSI--CN 67
          ++    S +RG  YY+S  VI +  + ET+   E    GN+ +R  ++       +  C+
Sbjct: 12 EVCGETSYKRGEAYYKSNKVI-VNYYDETKEICEATVKGNEDFRVTVEKAKKGDVVARCS 70

Query: 68 CPHADGRRVICKHMIALL 85
          CP     +  C+H+ A+L
Sbjct: 71 CPSLASFQTYCQHVAAVL 88

fused all the frames together and found nothing in prodomain.

GeneMark says:

Predicted genes
   Gene    Strand    LeftEnd    RightEnd       Gene     Class
    #                                         Length
    1        -          <3         290          288        1
    2        -         351        >665          315        1

>G26a
VPSNEAEVTENIKKPYDGRSEVAPKKGTTKLAALRKSIETFESESFRILDSGKDGLEYIVINHEEDKGYL
VKVDAEKVVDCNCPHHQFRGAICKHILKVAFEKI

>G26b
RRVSMSGFHNRKIQKGIYG
ELSKIEEELQEAYDAEEQGQDLMLLIELADMIGAIEGVAKKYNMSLDQLVAFAKLRSKVAIEEERKNDEE
GTTFLIFRREY