>g1
TCTTTAGCAGCTAATGGTGAAATGATACAAACTGCTTCTT
CCAATTTTAAAACTCCTACACAGTTAAATCACATACTGGAATTTGTTAGTTGGATAATCG
ATTTATTGAAATACATTTTATCGGGAATTGCAGGTTTAATTTGTACATTCGCTGGTTATA
AGTGGGCTACATCTTTAGAAGGTAATGGCCAAGAAGCTGCTAAGAAAATTCTAAAGAATG
CATTTGTAGGTGGCGTAATAGTTTGGACAGGTTCTTCAATAGC
>G1 frame +1
SLAANGEMIQTASSNFKTPTQLNHILEFVSWIIDLLKYILSGIAGLICTFAGYKWATSLEGNGQEAAKKI
LKNAFVGGVIVWTGSSI
ORFfinder:
frame=+1
1 tctttagcagctaatggtgaaatgatacaaactgcttcttccaattttaaaactcctaca
S L
A A N G E M I Q T
A S S N F K T P T
61 cagttaaatcacatactggaatttgttagttggataatcgatttattgaaatacatttta
Q L
N H I L E F V S W
I I D L L K Y I L
121 tcgggaattgcaggtttaatttgtacattcgctggttataagtgggctacatctttagaa
S G
I A G L I C T F A
G Y K W A T S L E
181 ggtaatggccaagaagctgctaagaaaattctaaagaatgcatttgtaggtggcgtaata
G N
G Q E A A K K I L
K N A F V G G V I
241 gtttggacaggttcttcaatagc 263
V W
T G S S I
Psi-Blast of frame 1 vs nr at w3 or 2: nothing
Also nothing by CD search, prodomain search, or TBlastn.
Best match against "tailed phages" [orgn]at W3 or 2
>gi|7461934|pir||T42293 hypothetical protein - phage SPP1
gi|2764870|emb|CAA66554.1| (X97918) gene 17.5 [Bacteriophage
SPP1]
Length =
179
Score = 24.6 bits (52), Expect = 0.53
Identities = 11/28 (39%), Positives = 15/28 (53%)
Query: 48 CTFAGYKWATSLEGNGQEAAKKILKNAF 75
C
FAGY+ SL+ GQ + + AF
Sbjct: 78 CAFAGYEQVPSLDDIGQALSDMDIDEAF 105
>g4 - COMPLETE
CAAGATCCTGGTAAATTACTACCTGAGGAAGAAGAAGTTA
TGAATAAACTCCTACTTTCTTTCCAACAATCTGAGAAATTGAAACGTCACATGTCGTTCT
TGATGAAGAAAGGGAAACTTTACCTTCCATATAATGGCAACCTTCTTATTCATGGTTGTA
TCCCTGTTGATGAGAATGGAGAAATGGAATCATTCGAAATTGAAGGTGAACAACATAAAG
GACGTGACCTATTAGACGTATTTGAAAGACATGTACGTTATGCATATGACTACAAAGAAA
TCACTGATGATCTATCAACAGATTTAGTGTGGTATTTATGGACTGGTAAATATAGTTCAC
TTTTTGGAAAAAAGAGCGATGACAACACTTTGAACGTTACTTTATTGAAGATTAAGGCTT
CTCATAAAGAACCTAAGAATCCATATTATTATCTACGAGAAGATGTTGATATGATTCGCA
AAATGTTAAAAGACTTCGGTCTCAATCCAGATGAAGGAAGAATTATTAATGGACATACAC
CTGTTAAAGAAATAGATGGCGAAGACCCTATTAAAGCAGATGGTAAAATGTTAGTTATCG
ATGG
>G4 open frame +1
QDPGKLLPEEEEVMNKLLLSFQQSEKLKRHMSFLMKKGKLYLPYNGNLLIHGCIPVDENGEMESFEIEGE
QHKGRDLLDVFERHVRYAYDYKEITDDLSTDLVWYLWTGKYSSLFGKKSDDNTLNVTLLKIKASHKEPKN
PYYYLREDVDMIRKMLKDFGLNPDEGRIINGHTPVKEIDGEDPIKADGKMLVID
1 complete open reading frame (+1):
1 caagatcctggtaaattactacctgaggaagaagaagttatgaataaactcctactttct
Q D
P G K L L P E E E
E V M N K L L L S
61 ttccaacaatctgagaaattgaaacgtcacatgtcgttcttgatgaagaaagggaaactt
F Q
Q S E K L K R H M
S F L M K K G K L
121 taccttccatataatggcaaccttcttattcatggttgtatccctgttgatgagaatgga
Y L
P Y N G N L L I H
G C I P V D E N G
181 gaaatggaatcattcgaaattgaaggtgaacaacataaaggacgtgacctattagacgta
E M
E S F E I E G E Q
H K G R D L L D V
241 tttgaaagacatgtacgttatgcatatgactacaaagaaatcactgatgatctatcaaca
F E
R H V R Y A Y D Y
K E I T D D L S T
301 gatttagtgtggtatttatggactggtaaatatagttcactttttggaaaaaagagcgat
D L
V W Y L W T G K Y
S S L F G K K S D
361 gacaacactttgaacgttactttattgaagattaaggcttctcataaagaacctaagaat
D N
T L N V T L L K I
K A S H K E P K N
421 ccatattattatctacgagaagatgttgatatgattcgcaaaatgttaaaagacttcggt
P Y
Y Y L R E D V D M
I R K M L K D F G
481 ctcaatccagatgaaggaagaattattaatggacatacacctgttaaagaaatagatggc
L N
P D E G R I I N G
H T P V K E I D G
541 gaagaccctattaaagcagatggtaaaatgttagttatcgatgg
584
E D
P I K A D G K M L
V I D
Strong matches to Bacterial fructose-1,6 bisphosphatase genes:
Staphylococcus aureus 80% identity
Bacillus subtilus
65% identity
Lactococcus lactis
53% identity
Clostridium acetobutylicum 45% identity
Seems to be nonorthologous to the fructose-1,6 bisphosphatase gene in COGS.
>gi|13702466|dbj|BAB43607.1| (AP003137) fructose-bisphosphatase
[Staphylococcus aureus subsp.
aureus
N315]
gi|14248290|dbj|BAB58678.1| (AP003365) fructose-bisphosphatase
[Staphylococcus aureus subsp.
aureus
Mu50]
Length =
654
Score = 292 bits (748), Expect = 8e-79
Identities = 155/193 (80%), Positives = 172/193 (88%), Gaps
= 1/193 (0%)
Query: 2 DPGKLLPEEEEVMNKLLLSFQQSEKLKRHMSFLMKKGKLYLPYNGNLLIHGCIPVDENGE
61
+P
+LLPEEEEVMNKLLLSFQQSEKL+RHMSFLM+KG LYLPYNGNLLIHGCIPVDENGE
Sbjct: 375 NPAELLPEEEEVMNKLLLSFQQSEKLRRHMSFLMRKGSLYLPYNGNLLIHGCIPVDENGE
434
Query: 62 MESFEIEGEQHKGRDLLDVFERHVRYAYDYKEITDDLSTDLVWYLWTGKYSSLFGKKSDD
121
MESFEI+G
+ G++LLDVFE HVR ++D KE TDDLSTDLVWYLWTGKYSSLFGK++
Sbjct: 435 MESFEIDGHTYSGQELLDVFEYHVRKSFDEKENTDDLSTDLVWYLWTGKYSSLFGKRA-M
493
Query: 122 NTLNVTLLKIKASHKEPKNPYYYLREDVDMIRKMLKDFGLNPDEGRIINGHTPVKEIDGE
181
T + KASHKE KNPYY+LREDV+M+RKML DFGLNPDEGRIINGHTPVKEI+GE
Sbjct: 494 TTFERYFIADKASHKEEKNPYYHLREDVNMVRKMLSDFGLNPDEGRIINGHTPVKEINGE
553
Query: 182 DPIKADGKMLVID 194
DPIKADGKMLVID
Sbjct: 554 DPIKADGKMLVID 566
>G7
CCAAATCTTTAATTAAACGTTTAATCGCTCCTGGCATTTT
ACCACGATTTTTTTCATTATTTTCAGCTACTTTAGCAGCTTGAATTGTACGGCCTTCCCA
ATCTTGAGGAGTGTTACCTTTAGTGTCTCCATTTTCATCACCAGCTTGAGCTTTACCCCA
CTCATCATGGCTATCAATTGTCCCACCATTACCATTAGCTGCATTAGCATGAGCTTTTTT
AACGTAATCTTCATCTTTAATTAAGATGTCATAGATTTCTTCTGAGAACATACCTTCAAA
TTTCTTGTCATAAAGTCCACCAACAATCATTTTGAAATGTTTCTCATC
>G7 Frame -1
DEKHFKMIVGGLYDKKFEGMFSEEIYDILIKDEDYVKKAHANAANGNGGTIDSHDEWGKAQAGDENGDTK
GNTPQDWEGRTIQAAKVAENNEKNRGKMPGAIKRLIKDL
In Gphage full library is 0.17 to a relatively poorly conserved
segment of T4 DNA polymerase DJBT4
102 aa frame -1 is completely open:
310 atgattgttggtggactttatgacaagaaatttgaaggtatgttc
M I
V G G L Y D K K F
E G M F
265 tcagaagaaatctatgacatcttaattaaagatgaagattacgtt
S E
E I Y D I L I K D
E D Y V
220 aaaaaagctcatgctaatgcagctaatggtaatggtgggacaatt
K K
A H A N A A N G N
G G T I
175 gatagccatgatgagtggggtaaagctcaagctggtgatgaaaat
D S
H D E W G K A Q A
G D E N
130 ggagacactaaaggtaacactcctcaagattgggaaggccgtaca
G D
T K G N T P Q D W
E G R T
85 attcaagctgctaaagtagctgaaaataatgaaaaaaatcgtggt
I Q
A A K V A E N N E
K N R G
40 aaaatgccaggagcgattaaacgtttaattaaagatttg
2
K M
P G A I K R L I K
D L
This hit found to T4 DNA polymerase and other phage polymerases
in phage full library. This is a relatively unconserved segment,
so the poor similarity in this segment is plausible. Monte Carlo
10,100 assigns 92% confidence. Monte Carlo 1,100 gave 86% confidence.
Psi-Blast found it in "tailed phages" [orgn] as e=0.05. The
match could not be extended. The fragment is classified as unidentified.
>pir||DJBPT4 DNA-directed DNA polymerase (EC 2.7.7.7) - phage T4
Length
= 898
Score = 29.0 bits (63), Expect = 0.17
Identities = 13/36 (36%), Positives = 20/36 (55%)
Query: 10 GGLYDKKFEGMFSEEIYDILIKDEDYVKKAHANAAN 45
G
+YDK EG+ +EI + + +D+ KK A N
Sbjct: 457 GWMYDKHQEGIIPKEIAKVFFQRKDWKKKMFAEEMN 492
partially ds.
DNA primase
> g8
ATTATTGGCGAACCAGGGAGAAGATACTTACAATCAAGGG
GAATAAATGAAGAAACAATTGAATATTGGTCTATAGGTTATGCACCTATAGGACATAAAA
AATACACTAAACTTAGAGGTAGGATTACTTTTCCAGTATTTGATAACAATGGAAAAATAG
TTACAATTAGTGGAAGATCAGTTTTCGATACATTAAAACCAAAATACGATATGTATCCTT
TCTCTGCAAGAAAAATCTTATTTGGCTTATGGCAAAATAGACACGATATAAGAGATCACA
ATAGAGCTATTATTACTGAAGGACAAATTGATGTTATTACATCATGGCAAAATGGATTAA
AAATTGTAACTTCTACATT
>G8 Frame +1
IIGEPGRRYLQSRGINEETIEYWSIGYAPIGHKKYTKLRGRITFPVFDNNGKIVTISGRSVFDTLKPKYD
MYPFSARKILFGLWQNRHDIRDHNRAIITEGQIDVITSWQNGLKIVTST
+1 frame is completely open:
1 attattggcgaaccagggagaagatacttacaatcaaggggaataaatgaagaaacaatt
I I
G E P G R R Y L Q
S R G I N E E T I
61 gaatattggtctataggttatgcacctataggacataaaaaatacactaaacttagaggt
E Y
W S I G Y A P I G
H K K Y T K L R G
121 aggattacttttccagtatttgataacaatggaaaaatagttacaattagtggaagatca
R I
T F P V F D N N G
K I V T I S G R S
181 gttttcgatacattaaaaccaaaatacgatatgtatcctttctctgcaagaaaaatctta
V F
D T L K P K Y D M
Y P F S A R K I L
241 tttggcttatggcaaaatagacacgatataagagatcacaatagagctattattactgaa
F G
L W Q N R H D I R
D H N R A I I T E
301 ggacaaattgatgttattacatcatggcaaaatggattaaaaattgtaacttctacatt
359
G Q
I D V I T S W Q N
G L K I V T S T
Matches lots of DNA primases. Closest is:
>gi|10173991|dbj|BAB05094.1| (AP001511) DNA primase [Bacillus halodurans]
Length =
599
Score = 85.1 bits (209), Expect = 1e-16
Identities = 49/145 (33%), Positives = 71/145 (48%), Gaps
= 35/145 (24%)
Query: 3 GEPGRRYLQSRGINEETIEYWSIGYAP------------------------------IGH
32
G+
GR YL+ RG +E IE++ IG+AP
Sbjct: 135 GKAGRTYLERRGFTKEQIEHFQIGFAPPHWDALTNVLAKRDVDLKKAGESGLLVERESDG
194
Query: 33 KKYTKLRGRITFPVFDNNGKIVTISGRSVFDTLKPKYDMYP----FSARKILFGLWQNRH
88
K+Y
+ R R+ FP+ + GKIV GR++ D KPKY P
F K+L+G +Q R
Sbjct: 195 KRYDRFRNRVIFPIRNGKGKIVAFGGRTLGDD-KPKYLNSPESPIFQKGKLLYGFYQARP
253
Query: 89 DIRDHNRAIITEGQIDVITSWQNGL 113
IR N A++ EG +DVI +W+ G+
Sbjct: 254 AIRKENEAVLFEGYVDVIAAWKAGV 278
There is a vague similarity to primase/helicases of T7,T3, T4 &
other phages.
There are apparently more than one family of helicases in phage
and bacteria and there is some confusion about paralogues. Big families
built around dnaB and recA and dnaG all seem to be paralogues.
Frick et al. 1998. PNAS 95:7957-62 &
Guo et al 1999. JBC 274:30303.
T7 gene has a primase 242..271, a linker region 242..271,
and a helicase of the dnaB family (and distantly recA family?) 271..end.
T7 gene 4 expressed in 2 forms, 63kd and 56 kd due to internal
start.
Only the 63 is active as primase.
1..63 (the part missing in 56K) is zinc finger required for template
recognition. Necessary but not sufficient for primase.
Arvind L Leipe DD Koonin EV. 1998. Toprim -- a conserved catalytic
domain in type IAand II topoisomerases, DnaG-type primases, OLD family
nucleases. NAR 26: 4205-13.
100 aa conserved domain; conserved glu, and DxD motifs
I think these are the E at 159 and the DMD at 207 in T7
gene 4.
DnaG-type primases, small pimase-like proteins from bacteria and
archaea,
bacterial and archaeal nucleases of the OLD family,
bacterial DNA repair proteins of the RecR/M family.
Upon further searching, there are many phage primase families. G8 is closest to one from mycobacterial phages l5 and D29. These are cut into two genes, g57 and g58, with g57 overlapping G8 and the to prim box and g58 being more proximal.
T7, T3, phi-Ye03-12 for a tight cluster with SP6 weakly associated.
phi-C31 (Siphoviridae), phi-105, A2, phi adh for a cluster.
It's unclear if the clustering represents relationships among the
phage or the underlying relationships among hosts that they repeatedly
acquire dnaG from.
Matches acid soluble spore proteins and a TATA box.
>g9r
TAGCGGATTGAGCAAAAGCAGTCATTTTCTTAGTCATGTT
ACCACCTACATAACCATGTTGAATAGCTGGCATCCAACGTTTGTCCATGCTATCATAATT
CTTTAAACCAATTTCTGCTGCTACCGCTTCTTTAAAGGAATTCATTTGTTGACTAATGTT
TCCAGTTGGTAAATTCCCTTCTTGAACTGCATAAGTGTTACGGTTATTGTTTGACATATT
GTTTTCCTCCTAAAGGTTTTTTTTATTTTCCGATGTCCACCATCGTAATGTTATTATTAC
CAGGTTTTATCTTCTTTATTCATTATTTACCTTGATTTTTTATTATGGTTTAAA
ATTACCA
TTCATCATTTCCATTTTTTGCTTTATCTATCATTAGTTTAGTAACATGACCATTTT
last 60 bp of poor quality.
Matches are in frame -1
>G9r frame -1
MSNNNRNTYAVQEGNLPTGNISQQMNSFKEAVAAE
IGLKNYDSMDKRWMPAIQHGYVGGNMTKKMTAFAQSA
217 atgtcaaacaataaccgtaacacttatgcagttcaagaagggaat
M S
N N N R N T Y A V
Q E G N
172 ttaccaactggaaacattagtcaacaaatgaattcctttaaagaa
L P
T G N I S Q Q M N
S F K E
127 gcggtagcagcagaaattggtttaaagaattatgatagcatggac
A V
A A E I G L K N Y
D S M D
82 aaacgttggatgccagctattcaacatggttatgtaggtggtaac
K R
W M P A I Q H G Y
V G G N
37 atgactaagaaaatgactgcttttgctcaatccgct
2
M T
K K M T A F A Q S
A
BlastX to bacterial subdivision:
>gi|134234|sp|P22066|SAS2_CLOBI SMALL, ACID-SOLUBLE SPORE PROTEIN
BETA (SASP) (ASSP)
gi|483083|pir||B61028 small acid-soluble spore protein beta
- Clostridium bifermentans
gi|226892|prf||1610173B small acid soluble protein beta [Clostridium
bifermentans]
Length =
64
Score = 50.1 bits (118), Expect = 9e-07
Identities = 23/46 (50%), Positives = 32/46 (69%)
Frame = -1
Query: 145 MNSFKEAVAAEIGLKNYDSMDKRWMPAIQHGYVGGNMTKKMTAFAQ 8
+N
K +A E+GL NY+S+DK + A Q+GYVGG MTKK+ A+
Sbjct: 13 LNQMKLEIANELGLSNYESVDKGNLTARQNGYVGGYMTKKLVEMAE
58
>gi|134226|sp|P22065|SAS1_CLOBI SMALL, ACID-SOLUBLE SPORE PROTEIN
ALPHA (SASP) (ASSP)
gi|482685|pir||A61028 small acid-soluble spore protein alpha
- Clostridium bifermentans
gi|226891|prf||1610173A small acid soluble protein alpha
[Clostridium bifermentans]
Length =
70
Score = 47.8 bits (112), Expect = 4e-06
Identities = 22/46 (47%), Positives = 30/46 (64%)
Frame = -1
Query: 145 MNSFKEAVAAEIGLKNYDSMDKRWMPAIQHGYVGGNMTKKMTAFAQ 8
+
K +A E+G+ NYD+ DK M A Q+GYVGG MTKK+ A+
Sbjct: 17 LKQMKLEIANELGISNYDTADKGNMTARQNGYVGGYMTKKLVEMAE
62
There is a family of these in each sporulating bacterial species, and they are involved in binding the DNA. Some of the B. megaterium genes are sequenced, and do not match as well as the Clostridial ones above.
>g10R
AATTCCAACCAATCAAGTTTATATACGTTTTCGTAAATCATTTC
TTTACTCTCCTTTGTTAGTTAACATTCTTAATAAATATCTATCATTTTCTTTCCAATAAT
TAATTTTTTCATAAATTTTATTAATAGTTTCTAACTTCTCGTCCAAAGATGAGTTTAGAG
ATAATGAAAAAAAGGGTGATTCATCAAAAAACTTTATTAAAAAATCAATAGATTCCTCTG
AGCTCGATGAGTCTAAATGAAAACATCCATTTTGGCAATTTACAATTACACCAAAATAAT
CTTCCTCAATGATTTTTATTTCTTTTGATTTGCAAATAGGACACCCATCCTTAAACCAGA
AGAGTTCTATTTT
ORFfinder:
frame=-1 open for 309 bp.
>G10r Frame -1
KIELFWFKDGCPICKSKEIKIIEEDYFGVIVNCQNGCFHLDSSSSEESIDFLIKFFDESPFFSLSLNSSL
DEKLETINKIYEKINYWKENDRYLLRMLTNKGE*RNDLRKRI*T*LVGI
357 aaaatagaactcttctggtttaaggatgggtgtcctatttgcaaatcaaaagaaataaaa
K I
E L F W F K D G C
P I C K S K E I K
297 atcattgaggaagattattttggtgtaattgtaaattgccaaaatggatgttttcattta
I I
E E D Y F G V I V
N C Q N G C F H L
237 gactcatcgagctcagaggaatctattgattttttaataaagttttttgatgaatcaccc
D S
S S S E E S I D F
L I K F F D E S P
177 tttttttcattatctctaaactcatctttggacgagaagttagaaactattaataaaatt
F F
S L S L N S S L D
E K L E T I N K I
117 tatgaaaaaattaattattggaaagaaaatgatagatatttattaagaatgttaactaac
Y E
K I N Y W K E N D
R Y L L R M L T N
57 aaaggagagtaaagaaatgatttacgaaaacgtatataaacttgattggttggaatt
1
K G
E * R N D L R K R
I * T * L V G I
No hits of Frame -1 in nr or prodomain or prodomain CG
Only candidate hit in microbial genomes is:
gnl|OUACGT_714|A.actin_Contig462 Actinobacillus actinomycetemcomitans
unfinished fragment of complete
genome
Length = 6860
Score = 35.6 bits (80), Expect = 0.067
Identities = 22/63 (34%), Positives = 37/63 (57%), Gaps =
9/63 (14%)
Frame = -2
Query: 39 HLDSSSSEESIDFLIKFFDESPFFSLSLNSSLDEKLETINKIY---------EKINYWKE
89
H DS + E IKFF E P F++S++ + KL
T K + ++IN+W E
Sbjct: 1774 HSDSINPE-----FIKFFQEFPIFNISISINKKFKLFTDEKFFSNQQISAYIKQINHWIE
1610
Query: 90 NDRYLLRMLTNK 101
N++ + + + K
Sbjct: 1609 NEKSIEKYIKQK 1574
> g10F
AATAATAGTTATTTTCCAAGGTATAAAGCGATGTGCCGTT
TTGGGAAAATTAAAGAAATATACACTCTCTTTTCTATTGATGCAATATTAGAAAACGGAC
AAGATTGGGTAGATCATCAAATAAAACATGAGTATAAAGATTGGAAGTATTATTCTAACT
TAAGAAAAGAATTACAAAATTTTATTGCACAACCAAAGTTTTGGTTTAAAAGAACAAACT
ACAAAGAAGGATTAACTAATAAATTTGTATGGATTAAAATTTATTAAAAAGGGTGCGTTT
TAAATGTATAATAAAAAAAGAAATAATTTTTTTGTGGCGAGTTACTGTTATGTAGTAATG
AGATTGAAATAAAATTAAAGAATATCAAAAGTTTGTGATTTAATGGAGATTAAAAG
Frame +1 open 264 bp:
>G10f Frame +1
NNSYFPRYKAMCRFGKIKEIYTLFSIDAILENGQDWVDHQIKHEYKDWKYYSNLRKELQNFIAQPKFWFK
RTNYKEGLTNKFVWIKIY*KGCVLNV**KKK*FFCGELLLCSNEIEIKLKNIKSL*FNGD*K
1 aataatagttattttccaaggtataaagcgatgtgccgttttgggaaaattaaagaaata
N N
S Y F P R Y K A M
C R F G K I K E I
61 tacactctcttttctattgatgcaatattagaaaacggacaagattgggtagatcatcaa
Y T
L F S I D A I L E
N G Q D W V D H Q
121 ataaaacatgagtataaagattggaagtattattctaacttaagaaaagaattacaaaat
I K
H E Y K D W K Y Y
S N L R K E L Q N
181 tttattgcacaaccaaagttttggtttaaaagaacaaactacaaagaaggattaactaat
F I
A Q P K F W F K R
T N Y K E G L T N
241 aaatttgtatggattaaaatttattaaaaagggtgcgttttaaatgtataataaaaaaag
K F
V W I K I Y * K G
C V L N V * * K K
301 aaataatttttttgtggcgagttactgttatgtagtaatgagattgaaataaaattaaag
K *
F F C G E L L L C
S N E I E I K L K
361 aatatcaaaagtttgtgatttaatggagattaaaag
396
N I
K S L * F N G D *
K
Frame 1 has no match in nr, microbial genomes, tailed phages, or
prodomain or CG prodomain.
No sig. hit in phage full library.
single stranded readings; MATCHES T4 RNA LIGASE
>G11f
AATAGAATTGTTGTAGCTTATAATGATGCAGATTTAAGAC
TAATTGGCGTGAAAGATTTAAAAACGCATCAAGATTTGTCATATGCTGAAGTTATTAAAA
TGGCAAAAGAATTAGGGTTTGCGCATACTGAATTAGAAGATATTACGTTGGAAGAAATAT
TAGAGGAAAGAGAAAAACGTGAAAACTTTGAAGGTTGGGTAGTGCGATTTTCAAATGGTT
TATACATGAAAATTAAATGTAAAGCCTATTTAGATTTGCATGGTGCTCGTTTTGGCTCTT
CAATTAAATCGGTATTTGTACTCTTAAAAGAAGAAAAATGGGACGATTTTATTTCTCTAT
TCCAGAAGAGT
ORFfinder:
frame=+1
>G11f Frame +1
NRIVVAYNDADLRLIGVKDLKTHQDLSYAEVIKMAKELGFAHTELEDITLEEILEEREKRENFEGWVVRF
SNGLYMKIKCKAYLDLHGARFGSSIKSVFVLLKEEKWDDFISLFQKS
1 aatagaattgttgtagcttataatgatgcagatttaagactaattggcgtgaaagattta
N R
I V V A Y N D A D
L R L I G V K D L
61 aaaacgcatcaagatttgtcatatgctgaagttattaaaatggcaaaagaattagggttt
K T
H Q D L S Y A E V
I K M A K E L G F
121 gcgcatactgaattagaagatattacgttggaagaaatattagaggaaagagaaaaacgt
A H
T E L E D I T L E
E I L E E R E K R
181 gaaaactttgaaggttgggtagtgcgattttcaaatggtttatacatgaaaattaaatgt
E N
F E G W V V R F S
N G L Y M K I K C
241 aaagcctatttagatttgcatggtgctcgttttggctcttcaattaaatcggtatttgta
K A
Y L D L H G A R F
G S S I K S V F V
301 ctcttaaaagaagaaaaatgggacgattttatttctctattccagaagagt
351
L L
K E E K W D D F I
S L F Q K S
After 300, frame becomes ambiguous.
>G11f Frame +1
NRIVVAYNDADLRLIGVKDLKTHQDLSYAEVIKMAKELGFAHTELEDITLEEILEEREKRENFEGWVVRF
SNGLYMKIKCKAYLDLHGARFGSSIKSVFVLLKEEKWDDFISLFQKS
Psi-Blast vs. "tailed phages" [orgn}
>gi|9632683|ref|NP_049839.1| RNA ligase [Enterobacteria phage T4]
gi|133093|sp|P00971|RLIG_BPT4 RNA LIGASE
gi|68670|pir||LQBPR4 RNA ligase (ATP) (EC 6.5.1.3) - phage
T4
gi|15367|emb|CAA25107.1| (X00365) RNA ligase [Enterobacteria
phage T4]
gi|5354307|gb|AAD42514.1|AF158101_101 (AF158101) RNA ligase
[Enterobacteria phage T4]
Length =
374
Score = 35.0 bits (79), Expect = 9e-04
Identities = 24/90 (26%), Positives = 41/90 (44%), Gaps =
1/90 (1%)
Query: 1 NRIVVAYNDADLRLIGVKDLKTHQDLSYAEVIKMAKELGFAHXXXXXXXXXXXXXXXXXX
60
NRIV+AY
+ + L+ V++ +T + +SY ++ K A +
Sbjct: 165 NRIVLAYQEMKIILLNVRENETGEYISYDDIYKDATLRPYL-VERYEIDSPKWIEEAKNA
223
Query: 61 XNFEGWVVRFSNGLYMKIKCKAYLDLHGAR 90
N EG+V +G + KIK Y+ LH +
Sbjct: 224 ENIEGYVAVMKDGSHFKIKSDWYVSLHSTK 253
The remaining 30 aa didn’t significantly match anything else on blastp of nr or on tbastx of microbial database. They gave a questionable further match in Genedoc to T4 RNA ligase with a gap of 2 added to the ligase.
2nd iteration brings out weak match to a DNA ligase.
>gi|9634001|ref|NP_052075.1| DNA ligase [Bacteriophage phiYeO3-12]
gi|6598992|emb|CAB63596.1| (AJ251805) DNA ligase [Bacteriophage
phiYeO3-12]
Length =
346
Score = 27.2 bits (59), Expect = 0.18
Identities = 11/41 (26%), Positives = 20/41 (47%), Gaps =
1/41 (2%)
Query: 44 ELEDITLEEILEEREKRENFEGWVVRFSNGLYMK-IKCKAY 83
E+
D+ L E ++ E EG +V+ G+Y +
K +
Sbjct: 203 EVYDMDSLSELYEAKRAEGHEGLIVKDPQGIYKRGKKSGWW 243
628 bp fragment with two open frames:
G12a:
The latter part of the insert appears to be a 293 bp long unidentified
open reading frame.
Nothing showed in microbial genomes, nr, or prodomain. Longer
segments of
other frames were also examined. There was a poor quality
match to T4 NDP reductase in "tailed phages" [orgn].
G12b:
1-157 is the N terminal of some gene labelled as similar to pyrazinamidase/nicotinamidase
in the B. subtilis genome (frame -1).
>G12
TAGCGATTACAACTGTCCCACCATTACCTAAAAATTCATT
AGCTTGATCAATAATGTATGGAACAATTTCTTGTGCTGGTTTACCAGCTGTTAAACTTCC
GTTATCTGCCACAAAATCATTACTCATATCAACAATAATTAATGCTTCATTCTTCATTTT
AAAATCCCCTTTTTATTTTTTTATTTTTACCACCCAAAATTAGGTGTTGAATAAACTTGT
TGTTGCATCCTAATTACTTTACGTAAATGCTGTTTCTGGAGGACAAAGATCAGTTTCAGT
TACTCTAATTAAATATCTCCATTCACTCTTTGTAGCAGTAATACCAGTGTGGATGTCCCA
CCAATCGAATTCTACAACTTGGCAAGAAGGAGCTGTTAGAATATTTAACTCGCTTTCTCT
GTTACTTACTTCGTTAGAAATAATTTTGAAAATATCTGGTGTATAGTTTGGAACTTCTAA
ATCTAAAGGCTGATTTACAAATTTTGTTAAGCAACCTTCACCAGTTACTAATAAATGAAA
TCTTGATGGTCTTAATTCTTCTTGGATGTATAAATTTGGCGGAGCAGATTGTAATGGATT
TAAGTCTTCCCCACGTGGAGTACCATCGACGTGCCAACCAGGGATTGC
G12a:
>G12a Frame -1
AIPGWHVDGTPRGEDLNPLQSAPPNLYIQEELRPSRFHLLVTGEGCLTKFVNQPLDLEVPNYTPDIFKII
SNEVSNRESELNILTAPSCQVVEFDWWDIHTGITATKSEWRYLIRVTETDLCPPETAFT
>G12b Frame -1 continued.
MKNEALIIVDMSNDFVADNGSLTAGKPAQEIVPYIIDQANEFLGNGGTVVIA
Frame -1 now seems like the end of one gene; a promoter; and the beginning of another.
628 gcaatccctggttggcacgtcgatggtactccacgtggggaagacttaaatccattacaa
A I
P G W H V D G T P
R G E D L N P L Q
568 tctgctccgccaaatttatacatccaagaagaattaagaccatcaagatttcatttatta
S A
P P N L Y I Q E E
L R P S R F H L L
508 gtaactggtgaaggttgcttaacaaaatttgtaaatcagcctttagatttagaagttcca
V T
G E G C L T K F V
N Q P L D L E V P
448 aactatacaccagatattttcaaaattatttctaacgaagtaagtaacagagaaagcgag
N Y
T P D I F K I I S
N E V S N R E S E
388 ttaaatattctaacagctccttcttgccaagttgtagaattcgattggtgggacatccac
L N
I L T A P S C Q V
V E F D W W D I H
328 actggtattactgctacaaagagtgaatggagatatttaattagagtaactgaaactgat
T G
I T A T K S E W R
Y L I R V T E T D
268 ctttgtcctccagaaacagcatttacgtaaagtaattaggatgcaacaacaagtttattc
L C
P P E T A F T * S
N * D A T T S L F
208 aacacctaattttgggtggtaaaaataaaaaaataaaaaggggattttaaaatgaagaat
N T
* F W V V K I K K
* K G D F K M K N
148 gaagcattaattattgttgatatgagtaatgattttgtggcagataacggaagtttaaca
E A
L I I V D M S N D
F V A D N G S L T
88 gctggtaaaccagcacaagaaattgttccatacattattgatcaagctaatgaattttta
A G
K P A Q E I V P Y
I I D Q A N E F L
28 ggtaatggtgggacagttgtaatcgcta
1
G N
G G T V V I A
G12a: best match in "tailed phages" [orgn] is
>gi|9632790|ref|NP_049845.1| NDP reductase [Enterobacteria phage
T4]
gi|417656|sp|P32282|RIR1_BPT4 RIBONUCLEOSIDE-DIPHOSPHATE
REDUCTASE ALPHA CHAIN (RIBONUCLEOTIDE
REDUCTASE)
(B1 PROTEIN)
gi|509024|gb|AAA32527.1| (J03968) ribonucleoside diphosphate
reductase [Enterobacteria phage
T4]
gi|5354414|gb|AAD42621.1|AF158101_208 (AF158101) NDP reductase
[Enterobacteria phage T4]
Length =
754
Score = 25.4 bits (54), Expect = 0.84
Identities = 24/101 (23%), Positives = 40/101 (38%), Gaps
= 15/101 (14%)
Query: 12 RGEDLNPLQSAPPNLYIQEELRPSRFHLLVTGEGCLTKFVNQPLDLEVPNYTPDI-----
66
R
+L PN+ +
F LL+T + Q +D
NYTP I
Sbjct: 372 RFRELYEAAEKDPNIRKKRIKARELFELLMTERSGTARIYVQFID-NTNNYTPFIREKAP
430
Query: 67 ---------FKIISNEVSNRESELNILTAPSCQVVEFDWWD 98
I +N+V++ ++E+ + T + + FDW D
Sbjct: 431 IRQSNLCCEIAIPTNDVNSPDAEIGLCTLSAFVLDNFDWQD 471
G12b matches:
emb|CAB15164| (Z99120) similar to pyrazinamidase/nicotinamidase
[Bacillus
subtilis]
Length
= 183
Score = 49.2 bits (115), Expect = 3e-05
Identities = 23/49 (46%), Positives = 33/49 (66%)
Query: 161 EALIIVDMSNDFVADNGSLTAGKPAQEIVPYIIDQANEFLGNGGTVVIA 209
+ALI
+D +NDFVA +G LT G+P + I I++ EF+ NG VV+A
Sbjct: 3 KALICIDYTNDFVASDGKLTCGEPGRMIEEAIVNLTKEFITNGDYVVLA
51
TBLASTX found homologues in several bacteria in the unfinished microbial
genomes section.
Summary: matches some unidentified frames in S. aureus and
mycoplasma
and a Ent faecalis protein indicated to be a cell wall enzyme,
possibly hydrolase. It more broadly belongs to the
RNAse H domain family.
>G14
AACTACTTCCCATGATTTTAATGATTGTAATAAATTCTTT
GCCTCATCATAATAAGGAATTATTGAATCTTTATGCGCAGAATACTTATTATTCATCTGC
TTTTCTATTAAATTACTATCAGTATAAAAAACAACATGTTCTACATTTAGATCTATAGCG
CACTTTAATGCTTTAATAAATGAACAATACTCTGCTTGATTATTATCTATAACCCCAAGA
TAAAAAGATTCTTGAAGTAATACGCTGTCATTACTTTTAATAACCAT
One completely open frame:
>G14 Frame -1
MVIKSNDSVLLQESFYLGVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQMNNKYSAHKDSII
PYYDEAKNLLQSLKSWEVV
ORFfinder:
seq4\frame-1
267 atggttattaaaagtaatgacagcgtattacttcaagaatctttttatcttggggttata
M V
I K S N D S V L L
Q E S F Y L G V I
207 gataataatcaagcagagtattgttcatttattaaagcattaaagtgcgctatagatcta
D N
N Q A E Y C S F I
K A L K C A I D L
147 aatgtagaacatgttgttttttatactgatagtaatttaatagaaaagcagatgaataat
N V
E H V V F Y T D S
N L I E K Q M N N
87 aagtattctgcgcataaagattcaataattccttattatgatgaggcaaagaatttatta
K Y
S A H K D S I I P
Y Y D E A K N L L
27 caatcattaaaatcatgggaagtagtt 1
Q S
L K S W E V V
CD search identifies RNAseH homology
gnl|Pfam|pfam00075 rnaseH, RNase H 37.0 5e-04
Query: 12 QESFYLGVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQMNN
60
Sbjct: 30 TFSKPLGATTNQRAELIALIEALEALAP---QPVNIYTDSQYVIKGITN
75
10 20
30 40
50 60
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 1 PNA-VTVYTDGSCSGN---PGTG--GAGY-VL-WGG--------R--TFS-KPLG--ATT
39
query 1 ------------------------------------mviksndsvllQES-FYLG--VID
21
1RIL
6 RKR-VALFTDGACLGN---PGPG--GWAA-LLrFHA--------H--EKL-LSGGeaCTT 47
gi 400825 603 EFA-MVFYTDGSAIKHpdvNKSH--SAGMgIA-QVQfipeykivH--QWS-IPLG--DHT
653
gi 2352039 591 YEA-I-FYTDGSAIRSpkpNKTH--SAGMgII-QAKfepdfrivH--LWS-FPLG--DHT
640
gi 3123539 587 NFQ-HIFYTDGSAITSp--TKEGhlNAGMgIV-Yfinkd-gnlqKqqEWS-ISLG--NHT
638
gi 1350805 185 NKS-MNVYCDGSSFGNg--TSSS--RAGY-GA-YFEg------aPeeNIS-EPLLsgAQT
230
gi 687799 131 PKGtLVMYTDGSYLKR---PPTS--GIGI-FVgP-G--------HelNRS-QRIRgpIQD
174
gi 1730902 2 PTE-I--YVDGASAGN---PGPS--GIGI-FIkHEG--------KaeSFS-IPIG--VHT
41
gi 544225 1 ---mlrIYVDAATKGN---PGES--GGGIvYLtDQS--------R--QQLhVPLG--IVS
40
70 80
90 100
110 120
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 40 NQRAELIALIEALE-AL-----AP-------QPVN--IYTDSQYVIKGITN--L------
76
query 22 NNQAEYCSFIKALK-CA-----IDln----vEHVV--FYTDSNLIEKQMNNkysahkdsi
69
1RIL 48 NNRMELKAAIEGLK-ALk----EP-------CEVD--LYTDSHYLKKAFTEgwLegw---
90
gi 400825 654 AQLAEIAAVEFACKkALk----IS-------GPVL--IVTDSFYVAESANKe-Lpywksn
699
gi 2352039 641 AQYAEIAAFEFAIRrATg----IR-------GPVL--IVTDSNYVAKSYNEe-Lpywesn
686
gi 3123539 639 AQFAEIAAFEFALKkCLp----LG-------GNIL--VVTDSNYVAKAYNEe-Ldvwasn
684
gi 1350805 231 NNRAEIEAVSEALK-K------IWekltnekEKVNyqIKTDSEYVTK-LLNd-Rymtyd-
280
gi 687799 175 NNYAEFIAVRTALQnALknenyRD-------QKVV--IRTDCLNVIE-ALQg--------
216
gi 1730902 42 NQEAEFLALIEGMKlCAt----RGy------QSVS--FRTDSDIV-ERATEl-E------
81
gi 544225 41 NHEAEFKVLIEALKqA------IAned--nqQTVL--LHSDSKIVVQ-TIEkn-------
82
130 140
150 160
170 180
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 77 ------GWptKSSSKPIKN------DIWQLL---LK-KHK-VYIGWVPGH-SGIp-----
113
query 70 ipyydeaknllqslkswevv----------------------------------------
89
1RIL 91 ---rkrGWr-TAEGKPVKNr-----DLWEALllaMA-PHR-VRFHFVKGH-TGH------
132
gi 400825 700 gflnnkKKplRHVSK-WKS-------IAECL---QL-KPD-IIIMHEKGH-QQPmttlht
745
gi 2352039 687 gfvnnkKKtlKHISK-WKA-------IAECK---NL-KAD-IHVIHEPGHqPAEasp-ha
732
gi 3123539 685 gfvnnrKKplKHISK-WKS-------VADLK---RL-RPD-VVVTHEPGHqKLDssp-ha
730
gi 1350805 281 -nkkleGLpnSDLIVPLVQrfvkvkKYYELNkecFKnNGK-FQIEWVKGH-DGD------
331
gi 687799 217 ------TRptAFVD--VKS--------QVEFl--SKqFPKgVHFQHVYAH-AGD------
251
gi 1730902 82 ------MVknITFQ-PFVE------EIIRLKaafPL-----FFIKWIPGK----------
113
gi 544225 83 -----yAKn-EKYQ-PYLA-------EYQQL---EK-NFPlLLIKWLP-----Es-----
114
190
....*....|....*
consensus 114 PGNELADELAKQGAS 128
query
---------------
1RIL 133 PENERVDREARRQAQ
147
gi 400825 746 eGNNLADKLATQGSY 760
gi 2352039 733 qGNALADKQAVSGSY 747
gi 3123539 731 yGNNLADQLATQASF 745
gi 1350805 332 PGNEMADFLAKKGAS 346
gi 687799 252 PGNEMADLFAGQASS 266
gi 1730902 114 -QNQKADLLAKEAIR 127
gi 544225 115 -QNKAADMLARQALQ 128
Psi-blast finds the strongest match to an Arabidopisis retroelement, although a Strep aureus protein is not far behind.
>gi|9927273|dbj|BAA96774.2| (AP002521) Similar to Arabidopsis thaliana
chromosome II BAC F26H6;
putative retroelement pol polyprotein (AC006920) [Oryza
sativa]
gi|9927274|dbj|BAB08213.2| (AP002539) Similar to Arabidopsis
thaliana chromosome II BAC F26H6;
putative retroelement pol polyprotein (AC006920) [Oryza
sativa]
Length =
2876
Score = 53.1 bits (126), Expect = 4e-07
Identities = 30/91 (32%), Positives = 48/91 (51%), Gaps =
2/91 (2%)
Query: 1 MVIKSNDSVLLQESFYL--GVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQM
58
+V K+ ++ SF L NN+AEY
+ I L A+ + V + + DS LI +Q+
Sbjct: 2308 LVFKTPQGGVIYHSFSLLKEECSNNEAEYEALIFGLLLALSMEVRSLRAHGDSRLIIRQI
2367
Query: 59 NNKYSAHKDSIIPYYDEAKNLLQSLKSWEVV 89
NN Y K ++PYY A+ L+ + EV+
Sbjct: 2368 NNIYEVRKPELVPYYTVARRLMDKFEHIEVI 2398
>gi|13701231|dbj|BAB42526.1| (AP003133) ORFID:SA1266~hypothetical
protein, similar to cell
wall enzyme
EbsB [Staphylococcus aureus subsp. aureus
N315]
gi|14247204|dbj|BAB57595.1| (AP003362) hypothetical protein
[Staphylococcus aureus subsp.
aureus Mu50]
Length =
133
Score = 40.4 bits (93), Expect = 0.003
Identities = 23/59 (38%), Positives = 37/59 (61%), Gaps =
1/59 (1%)
Query: 17 LGVIDNNQAEYCSFIKALKCAIDLNVEHVVFYTDSNLIEKQMNNKYSAHKDSIIPYYDE
75
LG +DN+
AE+ + I AL+ A +LNV++ + YTDS LI + Y + +
PY+D+
Sbjct: 36 LGEMDNHTAEWAACIYALEHARELNVQNALLYTDSKLIADSIEAGYVKNAN-FKPYFDQ
93
This domain is dominated by retroviral RNAse H's and cellular (including E. coli) RNAse H's. I had to slog through a lot of retroviral hits to get to an established bacterial RNAse H, but the similarity wasn't that much less. (31%). I followed the "cell wall" enzyme angle on the Staph aureus gene, but it didn't lead anywhere.
Didn't get a hit in "tailed phages" [orgn].
G17: contains forward and reverse readings overlap.
1/6/2000
Unidentified; frame is open
>G17 Sch proofs
AATTTGGTGCAACAAATAACTGATATTATGGCTTCTGGAG
GACAGATGCAAATAGAATACAACGGAGAGTGGAAAACTATAGAACCATATGGTTGGAACT
CTTCCAATGCTGGAAACGTATTATTAATGTGTTATAAAGATACTGGTGAAGTTAGAAGTT
ATAGATTAGATAGAATGTCAAATGTACAATTTGATTCAAGTACTATTGATTTATCTCAAT
ATGGTTTAGAATCTGAAGATGTAGAGAATTTAGATAGTGTAGACGATGATAGTAATATTG
AAATCCCTACAATAGAAGATGATGGAAGTCAATCATTTATTGATGAACAACAAGAAATTG
AAACTCCATTTGATGATGCAATTGATGTTTTAGAGCAAATTGACGATAACTATTTAATTG
AAGATTTAAGACAAGTAGATAATACAGAATCTTTCGAACCAGTAAATGAAGAAGATTATG
ACCCTACTAATGA
ORFfinder:
>lcl|Sequence 1 Frame +1
NLVQQITDIMASGGQMQIEYNGEWKTIEPYGWNSSNAGNVLLMCYKDTGEVRSYRLDRMSNVQFDSSTID
LSQYGLESEDVENLDSVDDDSNIEIPTIEDDGSQSFIDEQQEIETPFDDAIDVLEQIDDNYLIEDLRQVD
NTESFEPVNEEDYDPTN
Psi-blast finds this weak match in nr:
>gi|10835405|pdb|1FEZ|A Chain A, The Crystal Structure Of Bacillus
Cereus
Phosphonoacetaldehyde
Hydrolase Complexed With
Tungstate,
A Product Analog
Length = 256
Score = 36.2 bits (82), Expect = 0.085
Identities = 29/104 (27%), Positives = 45/104 (42%), Gaps
= 27/104 (25%)
Query: 29 PYGWNSSNAGNVLLMCYKDTGEVRSYRLDRMSNVQFDSS---------------TIDLSQ
73
PY
W MCYK+ E+
Y ++ M V S
+ S+
Sbjct: 157 PYPW----------MCYKNAMELGVYPMNHMIKVGDTVSDMKEGRNAGMWTVGVILGSSE
206
Query: 74 YGLESEDVENLDSVDDDSNIEI--PTIEDDGSQSFIDEQQEIET 115
GL E+VEN+DSV+ IE+
++G+ I+ QE+E+
Sbjct: 207 LGLTEEEVENMDSVELREKIEVVRNRFVENGAHFTIETMQELES 250
Prodomain gave a match to HIV tat, but since it's almost all acidic
amino acid matches, I don't believe it.
The following came up, but also has lots of acidic matches:
emb|CAA76602| (Y17045) NAD(P)H-dependent glutamate synthase [Plasmodium
falciparum]
Length = 3097
This gene in turn is not grouped with others in Prodomain.
>G18
TTAATGACACGAGATAATGGTGAAGGTGGTATGGTTGATA
TTTATGTTAGGGCAGAAGATGCTGAAGAATATAAAGAAACATTCTACGTTAGTGATGAAT
ATACAACAGGAACAATACATGTAAAACCTTATGATAATATTATCCCAAAGAAACAGCCTA
TTATAATGATAGATCGGATAATAGGCAGAATTCCAAATAGTTCTTCTGAGACTGGATATG
ATGAAAGAGTATATATTAATGGTTCTAACTACAAAAAAGAAAAAGGTTCAAATAAGTATC
ATCGAGATATTCTGTGGAATTTTGCAGATGTTCCAGTTGAAGGTTTAACTGATGATGACT
TATTAGAAGCAACTGCTATTAATGTTTTAAATGTCTTACTTAAAAATATGACTTATTTAA
AAGATATTAAGTATGATATTGATTGGCAAATGA
>G18 Frame +1
LMTRDNGEGGMVDIYVRAEDAEEYKETFYVSDEYTTGTIHVKPYDNIIPKKQPIIMIDRIIGRIPNSSSE
TGYDERVYINGSNYKKEKGSNKYHRDILWNFADVPVEGLTDDDLLEATAINVLNVLLKNMTYLKDIKYDI
DWQM
Open frame 1 gave no hits in microbial genomes, nr, or prodomain.
weak seq. in middle of contig.
>G25
AATGAACTTGGATCAGCTTTACCCGGCGCATCATTTCCTT
TTGGCCCACCTGGCGTTGGTGGCCCGCCCGGTTTATTTCCATTATTAGGGCCAGGGCTGT
CAGGCGTTGTTGGAGGTTTTGGTGCTCCACCACTAGGTCCTCCAGGTGCGCCTCCACCGC
TTGGTGCTTTAGGAGCATTAGTATCAAATACAGAACCACGTTCTTGCTCAAGATTTTTAC
GCTCTGTTTCTGGGTCAAGTTTAAGCATTGGTAATATCGTGCTCATAGAAACAAGTCCCT
TAGATTGTAATTGTTGAATAAAGTTAAGAACAGATTGATTTGAAGTTAAATCTTGTTGA
CTCCAAGAAATCTCAGGACAATTAATTGCATTTCTTTTCTATCAGCAAGTTTTCTTTGTT
CTTTTGAACTTAAGTATCTTCTTGCGACAGTGCCATTTGTAGGTTTATAAAAACCTTGT
ATTTCAGAAATTGGTCTATAAACTTTTTGTCTAATCCAAGACTCTAATCTTAAACGATAT
GACATGTATCTTCTAGCTAATGCATCCATACCAACTTGAGCATTTGAATATGTTGGACCT
TCACCGTTTAACATAGCTTGGTTAATGCCTAAACCGTTCATAAGCTCTTTTTGAATGAAA
TCAAATTCTTGGTTTAATGGTAAAATTCTACCAGCAGAACCTACATAGTCAGTTTGTAAT
GCATAGTGATATACAAGCATGAAGTT
Prodomain: nothing found.
>lcl|Sequence 1 Frame +1
NELGSALPGASFPFGPPGVGGPPGLFPLLGPGLSGVVGGFGAPPLGPPGAPPPLGALGALVSNTEPRSCS
RFLRSVSGSSLSIGNIVLIETSPLDCNC*IKLRTD*FEVKSC*LQEISGQLIAFLFYQQVFFVLLNLSIF
LRQCHL*VYKNLVFQKLVYKLFV*SKTLILNDMTCIF*LMHPYQLEHLNMLDLHRLT*LG*CLNRS*ALF
E*NQILGLMVKFYQQNLHSQFVMHSDIQA*S
>lcl|Sequence 2 Frame +2
MNLDQLYPAHHFLLAHLALVARPVYFHY*GQGCQALLEVLVLHH*VLQVRLHRLVL*EH*YQIQNHVLAQ
DFYALFLGQV*ALVISCS*KQVP*IVIVE*S*EQIDLKLNLVDSKKSQDN*LHFFSISKFSLFF*T*VSS
CDSAICRFIKTLYFRNWSINFLSNPRL*S*TI*HVSSS*CIHTNLSI*ICWTFTV*HSLVNA*TVHKLFL
NEIKFLV*W*NSTSRTYIVSL*CIVIYKHEV
>lcl|Sequence 3 Frame +3
*TWISFTRRIISFWPTWRWWPARFISIIRARAVRRCWRFWCSTTRSSRCASTAWCFRSISIKYRTTFLLK
IFTLCFWVKFKHW*YRAHRNKSLRL*LLNKVKNRLI*S*ILLTPRNLRTINCISFLSASFLCSFELKYLL
ATVPFVGL*KPCISEIGL*TFCLIQDSNLKRYDMYLLANASIPT*AFEYVGPSPFNIAWLMPKPFISSF*
MKSNSWFNGKILPAEPT*SVCNA**YTSMKX
>lcl|Sequence 4 Frame -1
NFMLVYHYALQTDYVGSAGRILPLNQEFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSY
RLRLESWIRQKVYRPISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQLIVLRFLGVNKI*LQINLFL
TLFNNYNLRDLFL*ARYYQCLNLTQKQSVKILSKNVVLYLILMLLKHQAVEAHLEDLVVEHQNLQQRLTA
LALIMEINRAGHQRQVGQKEMMRRVKLIQVH
>lcl|Sequence 5 Frame -2
TSCLYITMHYKLTM*VLLVEFYH*TKNLISFKKSL*TV*ALTKLC*TVKVQHIQMLKLVWMH*LEDTCHI
V*D*SLGLDKKFIDQFLKYKVFINLQMALSQEDT*VQKNKENLLIEKKCN*LS*DFLESTRFNFKSICS*
LYSTITI*GTCFYEHDITNA*T*PRNRA*KS*ARTWFCI*Y*CS*STKRWRRTWRT*WWSTKTSNNA*QP
WP**WK*TGRATNARWAKRK*CAG*S*SKFI
>lcl|Sequence 6 Frame -3
LHACISLCITN*LCRFCW*NFTIKPRI*FHSKRAYERFRH*PSYVKR*RSNIFKCSSWYGCIS*KIHVIS
FKIRVLD*TKSL*TNF*NTRFL*TYKWHCRKKILKFKRTKKTC**KRNAINCPEISWSQQDLTSNQSVLN
FIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFDTNAPKAPSGGGAPGGPSGGAPKPPTTPDSP
GPNNGNKPGGPPTPGGPKGNDAPGKADPSSX
Several possible long frames. -1 and -3 could be fused by
frameshifting.
None of them hit anything in prodomain or prodomain CG
Frame -1 gave the following weak hit by blastp against nr.
>gi|9630180|ref|NP_046607.1| unknown [Bacteriophage SPBc2]
gi|7519724|pir||T12819 hypothetical protein yonE - Bacillus
subtilis phage SPBc2
gi|2634532|emb|CAB14030.1| (Z99115) yonE [Bacillus subtilis]
gi|3025533|gb|AAC13028.1| (AF020713) unknown [Bacteriophage
SPBc2]
Length =
506
Score = 36.6 bits (83), Expect = 0.20
Identities = 16/70 (22%), Positives = 37/70 (52%), Gaps =
3/70 (4%)
Frame = -1
Query: 646 EFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSYRLRLESWIRQKVYR--
473
+FD
I ++ + G++ ++LNG+G Y+ + + +D +R
+E + QK++
Sbjct: 337 KFDHINSDIQSAYGLSGSLLNGDGGNYATSSLNLDTFYKRIGVLMEDIEQEVYQKLFNLV
396
Query: 472 -PISEIQGFY 446
P ++ +Y
Sbjct: 397 LPAAQKDNYY 406
Frame +1 has lots of prolines and glycines and raises all kinds
of proline
and glycine rich stuff.
The first part matches AAC13028.1 unk in phage/full at 0.002, the match is visibly the same motif as yonE above. psiblast in phage full started to bring in portal proteins at < 0.05 eg. AAD41031.1
The second part matched CAB16752.1 gp40 at 0.006. It was compositionally compromised but a shuffle 1,100 gave 98% confidence.
GeneMark (heuristic) favored these two frames:
Predicted genes
Gene Strand LeftEnd
RightEnd Gene
Class
#
Length
1
- <3
248 246
1
2
- 326
>724 399
1
>G25a Frame -1
NFMLVYHYALQTDYVGSAGRILPLNQEFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSY
RLRLESWIRQKVYRPISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQLIVLRFLGVNKI
>G25b Frame -3
KRNAINCPEISWSQQDLTSNQSVLN
FIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFDTNAPKAPSGGGAPGGPSGGAPKPPTTPDSP
GPNNGNKPGGPPTPGGPKGNDAPGKADPSS
Note: these two overlap, and might be the same frame disrupted by sequencing error in the middle region of the contig.
G25a is the one that finds YonE, here in bacteria only:
>gi|9630180|ref|NP_046607.1| unknown [Bacteriophage SPBc2]
gi|7519724|pir||T12819 hypothetical protein yonE - Bacillus
subtilis phage SPBc2
gi|2634532|emb|CAB14030.1| (Z99115) yonE [Bacillus subtilis]
gi|3025533|gb|AAC13028.1| (AF020713) unknown [Bacteriophage
SPBc2]
Length =
506
Score = 34.7 bits (78), Expect = 0.030
Identities = 16/70 (22%), Positives = 37/70 (52%), Gaps =
3/70 (4%)
Query: 27 EFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSYRLRLESWIRQKVYR--
84
+FD
I ++ + G++ ++LNG+G Y+ + + +D +R
+E + QK++
Sbjct: 337 KFDHINSDIQSAYGLSGSLLNGDGGNYATSSLNLDTFYKRIGVLMEDIEQEVYQKLFNLV
396
Query: 85 -PISEIQGFY 93
P ++ +Y
Sbjct: 397 LPAAQKDNYY 406
but G25b finds a YonE homologue at e=2.7; however, I think it really is the end of G25a.
>gi|15024052|gb|AAK79106.1|AE007629_12 (AE007629) Phage related
protein, YonE B.subtilis homolog
[Clostridium
acetobutylicum]
Length =
505
Score = 30.4 bits (67), Expect = 0.66
Identities = 18/58 (31%), Positives = 27/58 (46%)
Query: 8 PEISWSQQDLTSNQSVLNFIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFD
65
P
+ +L S + + + QL + GL+S T L LK D + E+K
E E D
Sbjct: 405 PSFKFESLNLQSEKDFRSEVMQLYTFGLLSRETTLSELKFDFKQEKKRRESENSENLD
462
So I arbitrarily fused them to make:
>G25 fused
NFMLVYHYALQTDYVGSAGRILPLNQEFDFIQKELMNGLGINQAMLNGEGPTYSNAQVGMDALARRYMSY
RLRLESWIRQKVYRPISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQL
PEISWSQQDLTSNQSVLN
FIQQLQSKGLVSMSTILPMLKLDPETERKNLEQERGSVFDTNAPKAPSGGGAPGGPSGGAPKPPTTPDSP
GPNNGNKPGGPPTPGGPKGNDAPGKADPSS
That allowed this match:
>gi|15024052|gb|AAK79106.1|AE007629_12 (AE007629) Phage related
protein, YonE B.subtilis homolog
[Clostridium
acetobutylicum]
Length =
505
Score = 34.7 bits (78), Expect = 0.18
Identities = 31/155 (20%), Positives = 65/155 (41%), Gaps
= 27/155 (17%)
Query: 27 EFDFIQKELMNGLGINQAMLNGEG--PTYSNAQVGMDALARRYMSYRLRLESWIRQKVYR
84
++
+ + +++ +GI+ ++ G G +++ A + + LA+R
+ ++ +I + R
Sbjct: 333 KYQTVNESILSSIGISAIVVTGNGGSGSFAQASINLSTLAKRIKDGQNKIAKFINLLLKR
392
Query: 85 PISEIQGFYKPTNGTVARRYLSSKEQRKLADRKEMQLPEISWSQQDLTSNQSVLNFIQQL
144
S G ++G
+P + +L S + + + QL
Sbjct: 393 KFS---GRSSTSDG----------------------IPSFKFESLNLQSEKDFRSEVMQL
427
Query: 145 QSKGLVSMSTILPMLKLDPETERKNLEQERGSVFD 179
+ GL+S T L LK D + E+K E E D
Sbjct: 428 YTFGLLSRETTLSELKFDFKQEKKRRESENSENLD 462
However, I wasn't able to coax it any further.
seq. is weak in middle.
One matchs a small seg. in an unidentified Staph gene but doesn't
seem extendable. The other is unidentified.
>G26
CTATACTCCCTCCTAAATATCAAAAAAGTAGTACCTTCTT
CGTCGTTTTTTCTTTCTTCTTCAATAGCAACCTTACTTCTTAACTTAGCGAATGCTACCA
ATTGATCAAGTGACATATTGTATTTTTTAGCTACGCCTTCAATCGCACCAATCATGTCAG
CTAATTCAATTAAAAGCATTAAGTCTTGGCCTTGTTCTTCAGCATCATATGCTTCCTGTA
ATTCTTCTTCGATTTTTGATAATTCGCCATAAATACCTTTTTGAATTTTACGATTATGGA
ATCCAGACATGGAAACTCTCCTTTAAAAAATATTTAAAGAGAGCTGAAAATCAGCTCTTT
TAAAATACCTCTAGATTTTTTCAAATGCTACCTTTAAGATATGTTTACAAATAGCGCCAC
GGAATTGGTGATGAGGACAGTTACAATCAACAACCTTCTCAGCATCAACTTTTACTAAAT
AACCCTTGTCTTCTTCATGGTTAATAACAATGTATTCTAAACCGTCTTTACCAGAATCTA
AAATTCGGAAGCTTTCGGATTCAAATGTTTCAATTGATTTACGTAGAGCAGCTAATTTCG
TAGTACCTTTTTTTGGAGCAACTTCGCTACGGCCATCGTAAGGTTTTTTAATGTTTTCAG
TAACTTCTGCTTCATTTGAAGGAACAT
Picked up following by TBLASTX in microbial genomes:
gnl|OUACGT_1280|s.aureus_Contig859 Staphylococcus aureus NCTC 8325
unfinished fragment of complete
genome
Length = 2663
Score = 37.3 bits (75), Expect = 0.11
Identities = 12/21 (57%), Positives = 14/21 (66%)
Frame = -2 / +2
Query: 425 CNCPHHQFRGAICKHMLKVAF 363
CNCPH R ICKHM+ + F
Sbjct: 2252 CNCPHADGRRVICKHMIALLF 2314
gnl|TIGR_1280|S.aureus_4348 Staphylococcus aureus COL unfinished
fragment of complete genome
Length = 49859
Score = 37.3 bits (75), Expect = 0.11
Identities = 12/21 (57%), Positives = 14/21 (66%)
Frame = -2 / -3
Query: 425 CNCPHHQFRGAICKHMLKVAF 363
CNCPH R ICKHM+ + F
Sbjct: 9594 CNCPHADGRRVICKHMIALLF 9532
Frame -3 open except for two terminators in region that is suspect.
665 gttccttcaaatgaagcagaagttactgaaaacattaaaaaaccttacgatggccgtagc
V P
S N E A E V T E N
I K K P Y D G R S
605 gaagttgctccaaaaaaaggtactacgaaattagctgctctacgtaaatcaattgaaaca
E V
A P K K G T T K L
A A L R K S I E T
545 tttgaatccgaaagcttccgaattttagattctggtaaagacggtttagaatacattgtt
F E
S E S F R I L D S
G K D G L E Y I V
485 attaaccatgaagaagacaagggttatttagtaaaagttgatgctgagaaggttgttgat
I N
H E E D K G Y L V
K V D A E K V V D
425 tgtaactgtcctcatcaccaattccgtggcgctatttgtaaacatatcttaaaggtagca
C N
C P H H Q F R G A
I C K H I L K V A
365 tttgaaaaaatctagaggtattttaaaagagctgattttcagctctctttaaatattttt
F E
K I * R Y F K R A
D F Q L S L N I F
305 taaaggagagtttccatgtctggattccataatcgtaaaattcaaaaaggtatttatggc
* R
R V S M S G F H N
R K I Q K G I Y G
245 gaattatcaaaaatcgaagaagaattacaggaagcatatgatgctgaagaacaaggccaa
E L
S K I E E E L Q E
A Y D A E E Q G Q
185 gacttaatgcttttaattgaattagctgacatgattggtgcgattgaaggcgtagctaaa
D L
M L L I E L A D M
I G A I E G V A K
125 aaatacaatatgtcacttgatcaattggtagcattcgctaagttaagaagtaaggttgct
K Y
N M S L D Q L V A
F A K L R S K V A
65 attgaagaagaaagaaaaaacgacgaagaaggtactacttttttgatatttaggagggag
I E
E E R K N D E E G
T T F L I F R R E
5 tatag 1
Y
>lcl|Sequence 1 Frame +1
LYSLLNIKKVVPSSSFFLSSSIATLLLNLANATN*SSDILYFLATPSIAPIMSANSIKSIKSWPCSSASY
ASCNSSSIFDNSP*IPF*ILRLWNPDMETLL*KIFKES*KSALLKYL*IFSNATFKICLQIAPRNW**GQ
LQSTTFSASTFTK*PLSSSWLITMYSKPSLPESKIRKLSDSNVSIDLRRAANFVVPFFGATSLRPS*GFL
MFSVTSASFEGT
>lcl|Sequence 2 Frame +2
YTPS*ISKK*YLLRRFFFLLQ*QPYFLT*RMLPIDQVTYCIF*LRLQSHQSCQLIQLKALSLGLVLQHHM
LPVILLRFLIIRHKYLFEFYDYGIQTWKLSFKKYLKRAENQLF*NTSRFFQMLPLRYVYK*RHGIGDEDS
YNQQPSQHQLLLNNPCLLHG**QCILNRLYQNLKFGSFRIQMFQLIYVEQLIS*YLFLEQLRYGHRKVF*
CFQ*LLLHLKEH
>lcl|Sequence 3 Frame +3
ILPPKYQKSSTFFVVFSFFFNSNLTS*LSECYQLIK*HIVFFSYAFNRTNHVS*FN*KH*VLALFFSIIC
FL*FFFDF**FAINTFLNFTIMESRHGNSPLKNI*RELKISSFKIPLDFFKCYL*DMFTNSATELVMRTV
TINNLLSINFY*ITLVFFMVNNNVF*TVFTRI*NSEAFGFKCFN*FT*SS*FRSTFFWSNFATAIVRFFN
VFSNFCFI*RNX
>lcl|Sequence 4 Frame -1
MFLQMKQKLLKTLKNLTMAVAKLLQKKVLRN*LLYVNQLKHLNPKASEF*ILVKTV*NTLLLTMKKTRVI
**KLMLRRLLIVTVLITNSVALFVNIS*R*HLKKSRGILKELIFSSL*IFFKGEFPCLDSIIVKFKKVFM
ANYQKSKKNYRKHMMLKNKAKT*CF*LN*LT*LVRLKA*LKNTICHLINW*HSLS*EVRLLLKKKEKTTK
KVLLF*YLGGSI
>lcl|Sequence 5 Frame -2
CSFK*SRSY*KH*KTLRWP*RSCSKKRYYEISCST*IN*NI*IRKLPNFRFW*RRFRIHCY*P*RRQGLF
SKS*C*EGC*L*LSSSPIPWRYL*TYLKGSI*KNLEVF*KS*FSALFKYFLKESFHVWIP*S*NSKRYLW
RIIKNRRRITGSI*C*RTRPRLNAFN*IS*HDWCD*RRS*KIQYVT*SIGSIR*VKK*GCY*RRKKKRRR
RYYFFDI*EGV*
>lcl|Sequence 6 Frame -3
VPSNEAEVTENIKKPYDGRSEVAPKKGTTKLAALRKSIETFESESFRILDSGKDGLEYIVINHEEDKGYL
VKVDAEKVVDCNCPHHQFRGAICKHILKVAFEKI*RYFKRADFQLSLNIF*RRVSMSGFHNRKIQKGIYG
ELSKIEEELQEAYDAEEQGQDLMLLIELADMIGAIEGVAKKYNMSLDQLVAFAKLRSKVAIEEERKNDEE
GTTFLIFRREYX
The staph seq. Is:
TGAGTTAAAACATGTGACTTTTGGTTATAATAAAAAGCAGATGGTGCTACAAGATATCAA
TATTACTATACCTGATGGAGAAAATGTTGGTATTTTAGGCGAAAGTGGCTGTGGTAAAAG
TACGCTCGCTTCATTGGTTCTTGGCTTGTTTAAACCTGTTAAAGGAGAGATTTACTTAAG
TGACAATGCTGTGTTACCGATTTTCCAACACCCTTTAACTAGCTTTAACCCTGATTGGAC
GATTGAGACCTCATTAAAAGAAGCGTTATATTATTACAGAGGTCTAACTGATAATACTGC
TCAGGATCAATTATTATTACAACATTTATCTACTTTTGAGTTAAACGCGCAATTATTGAC
TAAATTACCAAGCGAAGTGAGTGGCGGACAATTACAAAGATTTAATGTCATGCGTTCGTT
ATTAGCACAGCCTCGCGTTTTAATATGTGATGAGATAACTTCAAATTTAGATGTTATAGC
TGAACAAAATGTAATCAATATATTAAAAGCGCAAACGATTACGAACTTAAATCATTTTAT
CGTTATTTCTCATGATTTATCCGTGTTACAACGCTTAGTTAATAGAATTATCGTTCTTAA
GGATGGCATGATAGTCGATGATTTTGCAATAGAGGAATTATTTAATGTTGATAGACACCC
TTATACAAAAGAATTAGTGCAAGCATTTTCATATTAGTTATTTAAGAATGCGATAATTCT
AGACTTGTTATAAAATATAGATAAATCAAGTATTTTAATCTAGACACTTATCTATTTTAT
TTTCTTTATTTAAAAATAATAATAAAAAGGAGTATCATTAATGGGATTACTTGATATTGC
AAGTATTCGTTCTATAGAAAGAGGCTTTAATTATTATCAAAGTGAATGCGTCATTAACTT
AAAATCATTTTCAGAAACGCAGCATGAGGCTGAAGTAAAGGGCAGTGGCAACAAAGTATA
TCGTTGTTATATTGATATGGAACATCCTAGAAAATCCATATGTAATTGTCCTCATGCTGA
TGGAAGACGAGTGATATGTAAACATATGATTGCATTACTCTTTACAGCTAGTCCAGAAGC
AGCAAATAAACATATAATGATGTTAAACGAAGTTGAAGAAGACTATCAATTACGCAGAAA
TATGTGGATTGATAGTCTTAAAGAAATGATTAATGATATGAGTGAAGAAGAACTCCGCGA
TGCATATTTAAACATGTTAATTGAACATGGAGAAATGGCAGAATTATTTGGATTAGATGA
AGAGGAAGAAATGTTCGAGGACGAATTTTATTAAAATAGCCCCTATCGATTGATAATGAT
TATCATTTGATAGGGGTGTTTTTATTTATATGATTTTAAGACTTTGCAAATAGCTTGTGC
ATAATAATTGATGCGTTAGACTTTATCAACTG
>Unidentified staph gene
MGLLDIA
SIRSIERGFNYYQSECVINLKSFSETQHEAEVKGSGNKVYRCYIDMEHPRKSICNCPHADGRRVICKHMI
ALLFTASPEAANKHIMMLNEVEEDYQLRRNMWIDSLKEMINDMSEEELRDAYLNMLIEHGEMAELFGLDE
EEEMFEDEFY
821 atgggattacttgatattgcaagtattcgttctatagaaagaggc
M G
L L D I A S I R S
I E R G
866 tttaattattatcaaagtgaatgcgtcattaacttaaaatcattt
F N
Y Y Q S E C V I N
L K S F
911 tcagaaacgcagcatgaggctgaagtaaagggcagtggcaacaaa
S E
T Q H E A E V K G
S G N K
956 gtatatcgttgttatattgatatggaacatcctagaaaatccata
V Y
R C Y I D M E H P
R K S I
1001 tgtaattgtcctcatgctgatggaagacgagtgatatgtaaacat
C N
C P H A D G R R V
I C K H
1046 atgattgcattactctttacagctagtccagaagcagcaaataaa
M I
A L L F T A S P E
A A N K
1091 catataatgatgttaaacgaagttgaagaagactatcaattacgc
H I
M M L N E V E E D
Y Q L R
1136 agaaatatgtggattgatagtcttaaagaaatgattaatgatatg
R N
M W I D S L K E M
I N D M
1181 agtgaagaagaactccgcgatgcatatttaaacatgttaattgaa
S E
E E L R D A Y L N
M L I E
1226 catggagaaatggcagaattatttggattagatgaagaggaagaa
H G
E M A E L F G L D
E E E E
1271 atgttcgaggacgaattttattaa 1294
M F
E D E F Y *
Which in turn matched:
emb|CAA67095.1| (X98455) SNF [Bacillus cereus]
Length =
1064
Score = 36.0 bits (81), Expect = 0.19
Identities = 21/78 (26%), Positives = 38/78 (47%), Gaps =
3/78 (3%)
Query: 10 DIASIRSIERGFNYYQSECVINLKSFSETQHEAEVKGSGNKVYRCYIDMEHPRKSI--CN
67
++
S +RG YY+S VI + + ET+ E
GN+ +R ++ + C+
Sbjct: 12 EVCGETSYKRGEAYYKSNKVI-VNYYDETKEICEATVKGNEDFRVTVEKAKKGDVVARCS
70
Query: 68 CPHADGRRVICKHMIALL 85
CP
+ C+H+ A+L
Sbjct: 71 CPSLASFQTYCQHVAAVL 88
fused all the frames together and found nothing in prodomain.
GeneMark says:
Predicted genes
Gene Strand LeftEnd
RightEnd Gene
Class
#
Length
1
- <3
290 288
1
2
- 351
>665 315
1
>G26a
VPSNEAEVTENIKKPYDGRSEVAPKKGTTKLAALRKSIETFESESFRILDSGKDGLEYIVINHEEDKGYL
VKVDAEKVVDCNCPHHQFRGAICKHILKVAFEKI
>G26b
RRVSMSGFHNRKIQKGIYG
ELSKIEEELQEAYDAEEQGQDLMLLIELADMIGAIEGVAKKYNMSLDQLVAFAKLRSKVAIEEERKNDEE
GTTFLIFRREY