This is the GenBank flat file for Molony Murine Leukemia Virus heavily annotated internally to help using it as a template sequence for identifying features in endogenous retroviral-like sequences: _______________________________________________________________________ LOCUS MLMCG 8332 bp ss-RNA VRL 13-APR-1996 DEFINITION Moloney murine leukemia virus, complete genome. ACCESSION J02255 J02256 J02257 M76668 NID g331934 KEYWORDS coat protein; complete genome; glycoprotein; histone; origin of replication; overlapping genes; polymerase; polyprotein; reverse transcriptase; terminal repeat; unidentified reading frame. SOURCE Murine leukemia virus cDNA to genomic RNA. ORGANISM Murine leukemia virus Viruses; Retroid viruses; Retroviridae; Mammalian type C retroviruses; Mammalian type C viruses. REFERENCE 1 (bases 5777 to 5890) AUTHORS Van Beveren,C., van Straaten,F., Galleshaw,J.A. and Verma,I.M. TITLE Nucleotide sequence of the genome of a murine sarcoma virus JOURNAL Cell 27 (1 Pt 2), 97-108 (1981) MEDLINE 82115347 REFERENCE 2 (bases 1 to 8332) AUTHORS Shinnick,T.M., Lerner,R.A. and Sutcliffe,J.G. TITLE Nucleotide sequence of Moloney murine leukaemia virus JOURNAL Nature 293 (5833), 543-548 (1981) MEDLINE 82035843 REFERENCE 3 (bases 380 to 629) AUTHORS Schwartzberg,P., Colicelli,J. and Goff,S.P. TITLE Deletion mutants of Moloney murine leukemia virus which lack glycosylated gag protein are replication competent JOURNAL J. Virol. 46 (2), 538-546 (1983) MEDLINE 83189355 REFERENCE 4 (bases 1560 to 2559) AUTHORS Miller,A.D. and Verma,I.M. TITLE Two base changes restore infectivity to a noninfectious molecular clone of Moloney murine leukemia virus (pMLV-1) JOURNAL J. Virol. 49 (1), 214-222 (1984) MEDLINE 84090399 REFERENCE 5 (sites) AUTHORS Crawford,S. and Goff,S.P. TITLE Mutations in gag proteins P12 and P15 of Moloney murine leukemia virus block early stages of infection JOURNAL J. Virol. 49 (3), 909-917 (1984) MEDLINE 84138787 REFERENCE 6 (sites) AUTHORS Schwartzberg,P., Colicelli,J., Gordon,M.L. and Goff,S.P. TITLE Mutations in the gag gene of Moloney murine leukemia virus: effects on production of virions and reverse transcriptase JOURNAL J. Virol. 49 (3), 918-924 (1984) MEDLINE 84138788 REFERENCE 7 (bases 563 to 744) AUTHORS Bender,M.A., Palmer,T.D., Gelinas,R.E. and Miller,A.D. TITLE Evidence that the packaging signal of Moloney murine leukemia virus extends into the gag region JOURNAL J. Virol. 61 (5), 1639-1646 (1987) MEDLINE 87198894 REFERENCE 8 (sites) AUTHORS Alford,R.L., Honda,S., Lawrence,C.B. and Belmont,J.W. TITLE RNA secondary structure analysis of the packaging signal for Moloney murine leukemia virus JOURNAL Virology 183 (2), 611-619 (1991) MEDLINE 91306444 COMMENT [5] sites; gag region deletion mutant analysis. [6] sites; gag-pol region deletion mutant analysis. Main features of genome: r(1-68) - u5(69-145) - gag - pol - env - u3(7816-8264) - r(8265-8332). 594 bp LTR of provirus is u3-r-u5. Three 10 bp direct repeats at 7923, 7998 and 8073 may be associated with a 75 bp duplication at 7933-8007 and 8008-8082, relative to a known provirus sequence. The mRNAs have not yet been mapped. The gag gene product is involved in the assembly and release of virion particles. The gag protein is cleaved into four products with distinct biochemical properties after release from the cell surface [6]. FEATURES Location/Qualifiers source 1..8332 /organism="Murine leukemia virus" /proviral /specific_host="Mus musculus" LTR 1..145 /note="5' LTR" mRNA 1..8332 /note="Mo-Mulv mRNA" misc_signal 146..163 /note="binding site for pro-tRNA primer" conflict replace(617,"") /citation=[7] CDS 621..2237 /note="gag polyprotein pr65" /codon_start=1 /db_xref="PID:g331935" /translation="MGQTVTTPLSLTLGHWKDVERIAHNQSVDVKKRRWVTFCSAEWP <--- MA, p15 TFNVGWPRDGTFNRDLITQVKIKVFSPGPHGHPDQVPYIVTWEALAFDPPPWVKPFVH PKPPPPLPPSAPSLPLEPPRSTPPRSSLYPALTPSLGAKPKPQVLSDSGGPLIDLLTE MA,p15 -------------><---p12, ? function DPPPYRDPRPPPSDRDGNGGEATPAGEAPDPSPMASRLRGRREPPVADSTTSQAFPLR p12 ---><-p30;CA AGGNGQLQYWPFSSSDLYNWKNNNPSFSEDPGKLTALIESVLITHQPTWDDCQQLLGT LLTGEEKQRVLLEARKAVRGDDGRPTQLPNEVDAAFPLERPDWDYTTQAGRNHLVHYR QLLLAGLQNAGRSPTNLAKVKGITQGPNESPSAFLERLKEAYRRYTPYDPEDPGQETN -maj homol reg. CA --- VSMSFIWQSAPDIGRKLERLEDLKNKTLGDLVREAEKIFNKRETPEEREERIRRETEE KEERRRTEDEQKEKERDRRRHREMSKLLATVVSGQKQDRQGGERRRSQLDRDQCAYCK --><-beg. of NC Zn finger motif: C C EKGHWAKDCPKKPRGPRGPRPQTSLLTLDD" ^ start of peptide cleaved off NC mat_peptide 624..1013 /note="protein p15" /codon_start=1 mat_peptide 1014..1265 /note="protein p12" /codon_start=1 mat_peptide 1266..2054 /note="protein p30 (major structural protein)" /codon_start=1 variation 1849 /note="a in infectious virus; g in noninfectious virus [4]" mat_peptide 2055..2222 /note="protein p10 (histone-like protein)" /codon_start=1 mat_peptide 2238..5834 /note="pol polyprotein" /codon_start=1 variation 2255 /note="g in infectious virus; c in noninfectious virus [4]" CDS 5777..7774 /note="env polyprotein" /codon_start=1 /db_xref="PID:g331936" /translation="MARSTLSKPLKNKVNPRGPLIPLILLMLRGVSTASPGSSPHQVY NITWEVTNGDRETVWATSGNHPLWTWWPDLTPDLCMLAHHGPSYWGLEYQSPFSSPPG PPCCSGGSSPGCSRDCEEPLTSLTPRCNTAWNRLKLDQTTHKSNEGFYVCPGPHRPRE SKSCGGPDSFYCAYWGCETTGRAYWKPSSSWDFITVNNNLTSDQAVQVCKDNKWCNPL VIRFTDAGRRVTSWTTGHYWGLRLYVSGQDPGLTFGIRLRYQNLGPRVPIGPNPVLAD QQPLSKPKPVKSPSVTKPPSGTPLSPTQLPPAGTENRLLNLVDGAYQALNLTSPDKTQ ECWLCLVAGPPYYEGVAVLGTYSNHTSAPANCSVASQHKLTLSEVTGQGLCIGAVPKT HQALCNTTQTSSRGSYYLVAPTGTMWACSTGLTPCISTTILNLTTDYCVLVELWPRVT YHSPSYVYGLFERSNRHKREPVSLTLALLLGGLTMGGIAAGIGTGTTALMATQQFQQL gp70/15E cut: ^ memb. fusion seq: ^--------------------------------^ QAAVQDDLREVEKSISNLEKSLTSLSEVVLQNRRGLDLLFLKEGGLCAALKEECCFYA DHTGLVRDSMAKLRERLNQRQKLFESTQGWFEGLFNRSPWFTTLISTIMGPLIVLLMI memb spanning reg. (TM): ^----------------------- LLFGPCILNRLVQFVKDRISVVQALVLTQQYHQLKPIEYEP" -------------^ mat_peptide 5876..7183 /note="protein gp70" /codon_start=1 mat_peptide 7184..7723 /note="protein p15e" /codon_start=1 mat_peptide 7724..7771 /note="r protein" /codon_start=1 LTR 7816..8332 /note="3' LTR" BASE COUNT 2143 a 2395 c 2025 g 1769 t ORIGIN 5' end of viral genome. 1 gcgccagtcc tccgattgac tgagtcgccc gggtacccgt gtatccaata aaccctcttg ^beg. of R poly(A) site: AATA AA 61 cagttgcatc cgacttgtgg tctcgctgtt ccttgggagg gtctcctctg agtgattgac ^ beg. of U5 121 tacccgtcag cgggggtctt tcatttgggg gctcgtccgg gatcgggaga cccctgccca -------------> IR (see 7816), aka att; & end of LTR primer binding site: -------------------- 181 gggaccaccg acccaccacc gggaggtaag ctggccagca acttatctgt gtctgtccga env. splice donor: ---------- 241 ttgtctagtg tctatgactg attttatgcg cctgcgtcgg tactagttag ctaactagct dimerization motif (DIS): ^-- ---------? packaging region, dimerization motif and up to 4 hairpin loops probably somewhere in here before start of gag. 301 ctgtatctgg cggacccgtg gtggaactga cgagttcgga acacccggcc gcaaccctgg 361 gagacgtccc agggacttcg ggggccgttt ttgtggcccg acctgagtcc aaaaatcccg 421 atcgttttgg actctttggt gcacccccct tagaggaggg atatgtggtt ctggtaggag 481 acgagaacct aaaacagttc ccgcctccgt ctgaattttt gctttcggtt tgggaccgaa 541 gccgcgccgc gcgtcttgtc tgctgcagca tcgttctgtg ttgtctctgt ctgactgtgt 601 ttctgtattt gtctgagaat atgggccaga ctgttaccac tcccttaagt ttgaccttag <--- p15, First peptide is MA, matrix, membrane associating ---- 661 gtcactggaa agatgtcgag cggatcgctc acaaccagtc ggtagatgtc aagaagagac 721 gttgggttac cttctgctct gcagaatggc caacctttaa cgtcggatgg ccgcgagacg 781 gcacctttaa ccgagacctc atcacccagg ttaagatcaa ggtcttttca cctggcccgc 841 atggacaccc agaccaggtc ccctacatcg tgacctggga agccttggct tttgaccccc 901 ctccctgggt caagcccttt gtacacccta agcctccgcc tcctcttcct ccatccgccc 961 cgtctctccc ccttgaacct cctcgttcga ccccgcctcg atcctccctt tatccagccc -----------><---p12 Beg. of p12 (perhaps no function, not conserved) 1021 tcactccttc tctaggcgcc aaacctaaac ctcaagttct ttctgacagt ggggggccgc 1081 tcatcgacct acttacagaa gaccccccgc cttataggga cccaagacca cccccttccg 1141 acagggacgg aaatggtgga gaagcgaccc ctgcgggaga ggcaccggac ccctccccaa 1201 tggcatctcg cctacgtggg agacgggagc cccctgtggc cgactccact acctcgcagg 1261 cattccccct ccgcgcagga ggaaacggac agcttcaata ctggccgttc tcctcttctg ----><-------- p30 (major structure protein), CA, capsid.------- 1321 acctttacaa ctggaaaaat aataaccctt ctttttctga agatccaggt aaactgacag 1381 ctctgatcga gtctgttctc atcacccatc agcccacctg ggacgactgt cagcagctgt 1441 tggggactct gctgaccgga gaagaaaaac aacgggtgct cttagaggct agaaaggcgg 1501 tgcggggcga tgatgggcgc cccactcaac tgcccaatga agtcgatgcc gcttttcccc 1561 tcgagcgccc agactgggat tacaccaccc aggcaggtag gaaccaccta gtccactatc 1621 gccagttgct cctagcgggt ctccaaaacg cgggcagaag ccccaccaat ttggccaagg 1681 taaaaggaat aacacaaggg cccaatgagt ctccctcggc cttcctagag agacttaagg CA maj.homol:At aAcaCaaGgg CcaAatGagT ctCccTcgGc cTtcCtaGag AgaCttAaaG 1741 aagcctatcg caggtacact ccttatgacc ctgaggaccc agggcaagaa actaatgtgt aaGccTatCg cAgg 1801 ctatgtcttt catttggcag tctgccccag acattgggag aaagttagag aggttagaag A in infectious virus 1861 atttaaaaaa caagacgctt ggagatttgg ttagagaggc agaaaagatc tttaataaac 1921 gagaaacccc ggaagaaaga gaggaacgta tcaggagaga aacagaggaa aaagaagaac 1981 gccgtaggac agaggatgag cagaaagaga aagaaagaga tcgtaggaga catagagaga 2041 tgagcaagct attggccact gtcgttagtg gacagaaaca ggatagacag ggaggagaac --------------><---- p10 (histone-like protein, NC peptide)------ 2101 gaaggaggtc ccaactcgat cgcgaccagt gtgcctactg caaagaaaag gggcactggg Zn finger: C ys Cy s Zn finger: His 2161 ctaaagattg tcccaagaaa ccacgaggac ctcggggacc aagaccccag acctccctcc Cy s <-- beg. peptide cleaved off NC 2221 tgaccctaga tgactaggga ggtcagggtc aggagccccc ccctgaaccc aggataaccc -> <-------beginning of Pol polyprotein G in infectious viru --- Suppressed terminator Pu tract involved in suppression 2281 tcaaagtcgg ggggcaaccc gtcaccttcc tggtagatac tggggcccaa cactccgtgc protease act. site L euValAspTh rGlyAla 2341 tgacccaaaa tcctggaccc ctaagtgata agtctgcctg ggtccaaggg gctactggag 2401 gaaagcggta tcgctggacc acggatcgca aagtacatct agctaccggt aaggtcaccc 2461 actctttcct ccatgtacca gactgtccct atcctctgtt aggaagagat ttgctgacta 2521 aactaaaagc ccaaatccac tttgagggat caggagctca ggttatggga ccaatggggc 2581 agcccctgca agtgttgacc ctaaatatag aagatgagca tcggctacat gagacctcaa protease--------><-- beg. of pol 2641 aagagccaga tgtttctcta gggtccacat ggctgtctga ttttcctcag gcctgggcgg 2701 aaaccggggg catgggactg gcagttcgcc aagctcctct gatcatacct ctgaaagcaa 2761 cctctacccc cgtgtccata aaacaatacc ccatgtcaca agaagccaga ctggggatca 2821 agccccacat acagagactg ttggaccagg gaatactggt accctgccag tccccctgga 2881 acacgcccct gctacccgtt aagaaaccag ggactaatga ttataggcct gtccaggatc RT box A 2941 tgagagaagt caacaagcgg gtggaagaca tccaccccac cgtgcccaac ccttacaacc 3001 tcttgagcgg gctcccaccg tcccaccagt ggtacactgt gcttgattta aaggatgcct RT box B 3061 ttttctgcct gagactccac cccaccagtc agcctctctt cgcctttgag tggagagatc 3121 cagagatggg aatctcagga caattgacct ggaccagact cccacagggt ttcaaaaaca RT box C 3181 gtcccaccct gtttgatgag gcactgcaca gagacctagc agacttccgg atccagcacc 3241 cagacttgat cctgctacag tacgtggatg acttactgct ggccgccact tctgagctag RT box E 3301 actgccaaca aggtactcgg gccctgttac aaaccctagg gaacctcggg tatcgggcct 3361 cggccaagaa agcccaaatt tgccagaaac aggtcaagta tctggggtat cttctaaaag 3421 agggtcagag atggctgact gaggccagaa aagagactgt gatggggcag cctactccga 3481 agacccctcg acaactaagg gagttcctag ggacggcagg cttctgtcgc ctctggatcc 3541 ctgggtttgc agaaatggca gcccccttgt accctctcac caaaacgggg actctgttta RT box F 3601 attggggccc agaccaacaa aaggcctatc aagaaatcaa gcaagctctt ctaactgccc <---- --- connector to RNAse H 3661 cagccctggg gttgccagat ttgactaagc cctttgaact ctttgtcgac gagaagcagg 3721 gctacgccaa aggtgtccta acgcaaaaac tgggaccttg gcgtcggccg gtggcctacc 3781 tgtccaaaaa gctagaccca gtagcagctg ggtggccccc ttgcctacgg atggtagcag 3841 ccattgccgt actgacaaag gatgcaggca agctaaccat gggacagcca ctagtcattc 3901 tggcccccca tgcagtagag gcactagtca aacaaccccc cgaccgctgg ctttccaacg 3961 cccggatgac tcactatcag gccttgcttt tggacacgga ccgggtccag ttcggaccgg 4021 tggtagccct gaacccggct acgctgctcc cactgcctga ggaagggctg caacacaact 4081 gccttgatat cctggccgaa gcccacggaa cccgacccga cctaacggac cagccgctcc 4141 cagacgccga ccacacctgg tacacggatg gaagcagtct cttacaagag ggacagcgta RNAse H motif: 4201 aggcgggagc tgcggtgacc accgagaccg aggtaatctg ggctaaagcc ctgccagccg 4261 ggacatccgc tcagcgggct gaactgatag cactcaccca ggccctaaag atggcagaag RNAse H motif 4321 gtaagaagct aaatgtttat actgatagcc gttatgcttt tgctactgcc catatccatg RNAse H motif 4381 gagaaatata cagaaggcgt gggttgctca catcagaagg caaagagatc aaaaataaag 4441 acgagatctt ggccctacta aaagccctct ttctgcccaa aagacttagc ataatccatt 4501 gtccaggaca tcaaaaggga cacagcgccg aggctagagg caaccggatg gctgaccaag RNAse motif? 4561 cggcccgaaa ggcagccatc acagagactc cagacacctc taccctcctc atagaaaatt ----------------- 4621 catcacccta cacctcagaa cattttcatt acacagtgac tgatataaag gacctaacca ------------------ pol/int boundary. 4681 agttgggggc catttatgat aaaacaaaga agtattgggt ctaccaagga aaacctgtga 4741 tgcctgacca gtttactttt gaattattag actttcttca tcagctgact cacctcagct Zn finger: His 4801 tctcaaaaat gaaggctctc ctagagagaa gccacagtcc ctactacatg ctgaaccggg 4861 atcgaacact caaaaatatc actg*agacct gcaaagcttg tgcacaagtc aacgccagca Zn finger: C ys Cy s 4921 agtctgccgt taaacaggga actagggtcc gcgggcatcg gcccggcact cattgggaga Conserved: Trp 4981 tcgatttcac cgagataaag cccggattgt atggctataa atatcttcta gtttttatag Conserved: Asp 5041 ataccttttc tggctggata gaagccttcc caaccaagaa agaaaccgcc aaggtcgtaa Thr Thr conserved in int? 5101 ccaagaagct actagaggag atcttcccca ggttcggcat gcctcaggta ttgggaactg Asp 5161 acaatgggcc tgccttcgtc tccaaggtga gtcagacagt ggccgatctg ttggggattg 5221 attggaaatt acattgtgca tacagacccc aaagctcagg ccaggtagaa agaatgaata Glu 5281 gaaccatcaa ggagacttta actaaattaa cgcttgcaac tggctctaga gactgggtgc Lys 5341 tcctactccc cttagccctg taccgagccc gcaacacgcc gggcccccat ggcctcaccc 5401 catatgagat cttatatggg gcacccccgc cccttgtaaa cttccctgac cctgacatga 5461 caagagttac taacagcccc tctctccaag ctcacttaca ggctctctac ttagtccagc env. splice adaper: -------- 5521 acgaagtctg gagacctctg gcggcagcct accaagaaca actggaccga ccggtggtac 5581 ctcaccctta ccgagtcggc gacacagtgt gggtccgccg acaccagact aagaacctag 5641 aacctcgctg gaaaggacct tacacagtcc tgctgaccac ccccaccgcc ctcaaagtag 5701 acggcatcgc agcttggata cacgccgccc acgtgaaggc tgccgacccc gggggtggac 5761 catcctctag actgacatgg cgcgttcaac gctctcaaaa ccccttaaaa ataaggttaa Met Beginning of env Ter Ter Ter 5821 cccgcgaggc cccctaatcc ccttaattct tctgatgctc agaggggtca gtactgcttc for pol.pep.--> <-p15E------------ 7201 cctggcccta ttattgggtg gactaaccat ggggggaatt gccgctggaa taggaacagg 7261 gactactgct ctaatggcca ctcagcaatt ccagcagctc caagccgcag tacaggatga 7321 tctcagggag gttgaaaaat caatctctaa cctagaaaag tctctcactt ccctgtctga 7381 agttgtccta cagaatcgaa ggggcctaga cttgttattt ctaaaagaag gagggctgtg 7441 tgctgctcta aaagaagaat gttgcttcta tgcggaccac acaggactag tgagagacag 7501 catggccaaa ttgagagaga ggcttaatca gagacagaaa ctgtttgagt caactcaagg 7561 atggtttgag ggactgttta acagatcccc ttggtttacc accttgatat ctaccattat 7621 gggacccctc attgtactcc taatgatttt gctcttcgga ccctgcattc ttaatcgatt 7681 agtccaattt gttaaagaca ggatatcagt ggtccaggct ctagttttga ctcaacaata p15E -----------><---- R 7741 tcaccagctg aagcctatag agtacgagcc atagataaaa taaaagattt tatttagtct Env. Ter 7801 ccagaaaaag gggggaatga aagaccccac ctgtaggttt ggcaagctag cttaagtaac PBS: <------------- IR ----------------------enhancer dupl. #2-------- NF1: CGGCCCA GGGCC GRE: GAAC AGCTG MCREF-1: CA GGATATCTGT ets, Lvt: CA GGA Lvb: CA GGATA CBF: TGT 8041 ggtaagcagt tcctgccccg gctcagggcc aagaacagat ggtccccaga tgcggtccag ---------- ---------- ---------- ->-------- ---------> this part dupl. in MLOCG GRE: GAACAGCT G GGT NF1: CG GCCCAGGGCC :NF1 GRE: GAACAGAT G GGTA 8101 ccctcagcag tttctagaga accatcagat gtttccaggg tgccccaagg acctgaaatg 8161 accctgtgcc ttatttgaac taaccaatca gttcgcttct cgcttctgtt cgcgcgcttc CCAAT box 8221 tgctccccga gctcaataaa agagcccaca acccctcact cggggcgcca gtcctccgat TATA box LTR U5 ------><---beginning of R (overlaps 1..68) 8281 tgactgagtc gcccgggtac ccgtgtatcc aataaaccct cttgcagttg ca poly A site: AATAAA End of R region-----><-U3;see 69..145