Reexamining the gene arrangement of human rab4b.

This is an exercise illustrating how to make your own examination of the likely splice patterns in an annotated human gene in GenBank (in this case NCBI RefSeq).  It brings out that the gene as annotated is possibly in error.  Since NCBI finds and corrects errors, the state of the entry may be changed by the time you read this document.  Irrespectively, the point is to show how one should reaccess the data concerning gene structure to form your own opinion.  The 2 GenBank entries are appended at the end as they stand now, from which one could reproduce the exercise.

The gene in question was picked as a gene showing up in a Blast search keyed by ras and illustrating how the gene from two different species but of identical sequence is collapsed to one entry in NCBI's nr database:

Notice from this nr header (3/2005) that the SwissProt annotators think the human and dog orthologs are identical, but NCBI's RefSeq annotators think otherwise:

>gi|54792729|ref|NP_001003275.1| rab4b GTP-binding protein [Canis familiaris]
 gi|28422140|gb|AAH46927.1| RAB4B protein [Homo sapiens]
 gi|20379046|gb|AAM21083.1| small GTP binding protein RAB4B [Homo sapiens]
 gi|919|emb|CAA39800.1| rab4b [Canis familiaris]
 gi|46577635|sp|P61018|RB4B_HUMAN Ras-related protein Rab-4B
 gi|46577634|sp|P61017|RB4B_CANFA Ras-related protein Rab-4B
 gi|5640004|gb|AAD45923.1| ras-related GTP-binding protein 4b [Homo sapiens]
 gi|108108|pir||F36364 GTP-binding protein rab4b - dog

The RefSeq entry for human rab4b would appear to specify a different N terminal:

>gi|21361509|ref|NP_057238.2| ras-related GTP-binding protein 4b [Homo sapiens]
 gi|10441901|gb|AAG17228.1| unknown [Homo sapiens]
          Length = 248

 Score =  432 bits (1112), Expect = e-120
 Identities = 208/208 (100%), Positives = 208/208 (100%)

Query: 6   DFLFKFLVIGSAGTGKSCLLHQFIENKFKQDSNHTIGVEFGSRVVNVGGKTVKLQIWDTA 65
           DFLFKFLVIGSAGTGKSCLLHQFIENKFKQDSNHTIGVEFGSRVVNVGGKTVKLQIWDTA
Sbjct: 41  DFLFKFLVIGSAGTGKSCLLHQFIENKFKQDSNHTIGVEFGSRVVNVGGKTVKLQIWDTA 100

Query: 66  GQERFRSVTRSYYRGAAGALLVYDITSRETYNSLAAWLTDARTLASPNIVVILCGNKKDL 125
           GQERFRSVTRSYYRGAAGALLVYDITSRETYNSLAAWLTDARTLASPNIVVILCGNKKDL
Sbjct: 101 GQERFRSVTRSYYRGAAGALLVYDITSRETYNSLAAWLTDARTLASPNIVVILCGNKKDL 160

Query: 126 DPEREVTFLEASRFAQENELMFLETSALTGENVEEAFLKCARTILNKIDSGELDPERMGS 185
           DPEREVTFLEASRFAQENELMFLETSALTGENVEEAFLKCARTILNKIDSGELDPERMGS
Sbjct: 161 DPEREVTFLEASRFAQENELMFLETSALTGENVEEAFLKCARTILNKIDSGELDPERMGS 220

Query: 186 GIQYGDASLRQLRQPRSAQAVAPQPCGC 213
           GIQYGDASLRQLRQPRSAQAVAPQPCGC
Sbjct: 221 GIQYGDASLRQLRQPRSAQAVAPQPCGC 248

The sequence of 21361509:

        1 msvslpltvm vrerdwigih lfslylslpv gipdfgsiws dflfkflvig sagtgkscll
       61 hqfienkfkq dsnhtigvef gsrvvnvggk tvklqiwdta gqerfrsvtr syyrgaagal
      121 lvyditsret ynslaawltd artlaspniv vilcgnkkdl dperevtfle asrfaqenel
      181 mfletsaltg enveeaflkc artilnkids geldpermgs giqygdaslr qlrqprsaqa
      241 vapqpcgc

As opposed to the N terminus in dog and the Swiss Prot human version:

MAETYDFLFKFLVIGSAGTGKSCLLHQFIENKFKQDSNHTIGVEFGSRVVNVGGKTVKLQ

The human MSVSL variant is annotated to come from NM_016154.2:857..1603

Note that NM_016154.2 is a 2nd version.  Following back to NM_016154.1 reveals an entry submitted by an actual author (although the paper is unpublished) and having a different splicing pattern and the MAETY N terminus.

Blastn against chromosome division, with entries limited to "homo sapiens"[orgn]

Wasn't contiguous:  had to mask repeated sequences to keep it from blowing up.

NG_000008 was the genomic segment cited by the gene entry:

Searching the Features in NG_000008 revealed this position for rab4b

   CDS             join(3101..3178,3464..3578,6857..6919,7000..7154,
                     9744..9839,9927..10057,19649..19652)
                     /gene="RAB4B"

However, examining these coordinates reveals that the annotator got them wrong. (The AA seq doesn't match).

Searching NM_016154.2 versus NC_000019 (Complete human chromosome 19) blows up unless repeats are filtered.
Generally speaking, you save yourself a lot of trouble if you use repeatmasker first on any mammalian nucleic acid sequence prior to attempting blast searches.

Repeat masker reveals that 1-800 in NM_16154.2 is an island of repetitive sequence.

Repeat Annotations:

   SW  perc perc perc  query     position in query    matching repeat        position in  repeat
score  div. del. ins.  sequence  begin  end (left)   repeat   class/family  begin  end (left)  ID

  302  28.9  8.5  2.6  test          6  158 (1849) C  L2       LINE/L2       (378) 2936   2775   1
  784  22.7  5.0  4.6  test        161  375 (1632) +  MIR      SINE/MIR         10  226   (36)   2
  440  10.6  0.0  8.6  test        376  420 (1587) +  L1PA11   LINE/L1        6086 6127   (47)   3
 2423   7.1  0.3  0.3  test        421  730 (1277) +  AluSp    SINE/Alu          1  310    (3)   4
  440  10.6  0.0  8.6  test        731  777 (1230) +  L1PA11   LINE/L1        6128 6170    (4)   3
  784  22.7  5.0  4.6  test        778  801 (1206) +  MIR      SINE/MIR        227  250   (12)   2
  189   0.0  0.0  0.0  test       1987 2007    (0) +  (A)n     Simple_repeat     1   21    (0)   5

Searching NM_0160154.2 versus just the first 20000 bp of NG_000008 (to cut down the amount of repetitive sequence hits) allows the following approximate correct location of the exons as actually translated to make the refseq protein entry.
The initiator was at 857, so no repetitive sequence was involved in the translation.

6..702        matched 2126..2822
731..1058     matched 2851..3178
1059..1175    matched 3464..3580
1173..1336    matched 6856..6919
1234..1391    matched 6997..7154
1391..1492    matched 9743..9844
1485..1618    matched 9924..10057
1616..1970    matched 19646..20000

Blast 2 sequences suggests that the 5' ends of the NM_016154.1 and NM_016154.2 messages do not overlap

Where is the first 138 bases of NM_016154.1 in MG_000008?

Query: 20   agtaggaaggagccgttgctgtagccggagtggagcggctgccagccgaggagcaggcgc 79
            |||||||||||||||  |||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1351 agtaggaaggagccggggctgtagccggagtggagcggctgccagccgaggagcaggcgc 1410
 

Query: 80   ggccgcggcgccatattgcggccctcagcggccgcgaccgagtcatggctgagacctac 138
            |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1411 ggccgcggcgccatattgcggccctcagcggccgcgaccgagtcatggctgagacctac 1469

The initial 20 bases of NM_016154.1 were full of ambiguity codes, so the lack of match there is not meaningful.

NM_016154.1 is not repetitive according to repeatmasker (except for poly(a)).

Searching NM_016154.1 vs human ESTs reveals that there are human ESTs with this structure.


 

What about the NM_016154.2 arrangement?

A search of the human EST database with NM_016154.2 (with repetitive sequences [1..800] removed) shows that no ESTs have this arrangement.  There are however mRNA entries in the non EST division


 
 

Conclusion:  EST data do not support the splicing pattern given in RefSeq.  Larger RNAs of various structure reported in GenBank may be unspliced nuclear RNA or rare splicing errors from the next upstream gene..

Here's the region at UCSD genome browser:
Here's the region at ensemble.
 

Hunting for transcription signals.

Let's pursue this gene a bit further to explore methods for finding possible transcription signals.

Several of the ESTs are labeled as due to a 5' cloning strategy and go back to 1308 on NG_000008.  Let's take that as an approximate mRNA start point.
The sequence is: 1 ccggaggggg gcggaggcgg aagtggcggt gccgggcccg gggagtagga aggagccggg
Recording the sequence will assist finding exactly this place in versions of the chromosome sequence that are numbered differently.
For example, searching this string with NCBI's Blast 2 sequences against NC_000019 reveals that this proposed start site is at 45975974 in the current version of the complete human chromosome 19 sequence, and oriented in the positive direction.

Reviewing the UCSC genome browser reveals an estimate of the end of the 3' UTR of the upstream gene as :
TGAgctcagcctaccg
ctggccctgccgtttcccctccttggctttatgcaaatacaatcagccca
gtgcaaa

ending at 45975235

So there is only about 740 bp to consider for the transcriptional controls.
Use Entrez to retrieve NC_000019, limited to 45975235..45975974

        1 acggctcgtc tccgtggtct ttggggtggg gtagggtagg gtggggactg tacaaatgaa
       61 atgtttctct aggttgctga atctaaccaa ttaacccgct gcctgtggta acgtcagtgg
      121 ttgctaggca gagtttcact gatgaaagcc ctgtgcagta ggagcgctcc taagcttagg
      181 tttcggacac aagcaaagga aaacctaagc agcccaacta ggggattgta gtgtcctctc
      241 tagaccagtg ggagggagcc aatcggacta cgcggaggat ataaatcgca cagtaaaggt
      301 tttcttggaa gattatctgg aaagggaggt gggaagtaac ctgcgcctat ctgcccagtt
      361 ttcctccttt tcgcctttga gaacagtaat cgctcccgcc agctcaagat cagctcctct
      421 tccagcctct ttctgtcaat cctgcccgtg cccccttctt tgcatttgta tcccctcccc
      481 caagtccgct ccgatccaat ccggagactc gactctgccc cccgtactcc agactaaatc
      541 cgttcctcgg ctcagctaag cagctctgca cgtcccttcc cacttgctca gccaatcgct
      601 gggctcggcc ccgccccttt gggcttggtc cttctcacgc caccaatccg cgcctcttgt
      661 cccgccccct gcccgcgagt ccgccaatcc cgcccttcgg aaagcgtcgc ctggtatcca
      721 gctgccggag cgggtcgcgc

Note: the AATGAAA at 45 in this segment is the only candidate I see for the poly(a) addition signal for gen MIA, narrowing the region down furhter.
http://www.fruitfly.org/seq_tools/promoter.html gave the underlined sequence as a predicted eucaryotic promoter.
http://genes.mit.edu/McPromoter.html didn't predict a promoter.

Turning on Gene expresion and CpG island tracks in the UCSC genome browser indicates other predictors of a promoter in this area.

The distribution of ATG is marked in blue above.  The distribution of CG is marked in red.  Since human messages usually start on the first AUG, the absence of ATG in the latter half of this sequence supports the idea of a longer 5' untranslated region.  On the other hand, the CpG island in the bottom half of the sequence would suggest the possibility of a GC rich house keeping promoter in the bottom half.  The relative inconsistencies among EST starts may indicate that there are multiple start sites and perhaps multiple promoters for this region.

If you click on the conservation track at UCSC, you will see the alignments of several mammalian genes through this area.  Unfortunately, it is not numbered in a very helpful way.  However, with a practiced eye you can find the various landmarks above.  The TATAAAT, presumably the core of the predicted promoter as underlined, does not appear to be conserved, although the CCAAT box to the left of it may be.  There is considerable gapping until one gets into the CG island.  Thereafter there is little gapping, high conservation, and CG's are more often conserved across the alignment.  This illustrates that the statistical properties of guessing a relevant site on a single sequence are poor, whereas guidance from cross species conservation can narrow down the possibilities and give you a better statistical chance of finding a real site.

The only conserved transcription factor binding site marked at USCS is a YY1 site at 45976082.  (CGGCGCCATATTGCGGC) YY1 sites are usually around the start of transcription.  There are indeed a number of ESTs starting around that site, so we should probably revise our major start site to the right and regard the slightly longer ESTs as upstream starts.

There is an excellent tutorial on the use of the TRANSFAC database and other resources for exploring promoters written by Enrique Blanco, Genome BioInformatics Research Lab, IMIM..  An example result posted there illustrates the high density of false positives you can expect without some way of narrowing down the options.  Other methods of narrowing down the possibilities include biochemical footprinting, and having some knowledge of which transcription factors are likely to be present in the relevant cell types.
 
 

3/4/2005 - Steve Hardies

NM_016154.2

1:  NM_016154. Reports  Homo sapiens RAB4...[gi:21361508]  Links

LOCUS       NM_016154               1208 bp    mRNA    linear   PRI 18-DEC-2004
DEFINITION  Homo sapiens RAB4B, member RAS oncogene family (RAB4B), mRNA.
ACCESSION   NM_016154 REGION: 800..2007
VERSION     NM_016154.2  GI:21361508
KEYWORDS    .
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 1208)
  AUTHORS   Strausberg,R.L., Feingold,E.A., Grouse,L.H., Derge,J.G.,
            Klausner,R.D., Collins,F.S., Wagner,L., Shenmen,C.M., Schuler,G.D.,
            Altschul,S.F., Zeeberg,B., Buetow,K.H., Schaefer,C.F., Bhat,N.K.,
            Hopkins,R.F., Jordan,H., Moore,T., Max,S.I., Wang,J., Hsieh,F.,
            Diatchenko,L., Marusina,K., Farmer,A.A., Rubin,G.M., Hong,L.,
            Stapleton,M., Soares,M.B., Bonaldo,M.F., Casavant,T.L.,
            Scheetz,T.E., Brownstein,M.J., Usdin,T.B., Toshiyuki,S.,
            Carninci,P., Prange,C., Raha,S.S., Loquellano,N.A., Peters,G.J.,
            Abramson,R.D., Mullahy,S.J., Bosak,S.A., McEwan,P.J.,
            McKernan,K.J., Malek,J.A., Gunaratne,P.H., Richards,S.,
            Worley,K.C., Hale,S., Garcia,A.M., Gay,L.J., Hulyk,S.W.,
            Villalon,D.K., Muzny,D.M., Sodergren,E.J., Lu,X., Gibbs,R.A.,
            Fahey,J., Helton,E., Ketteman,M., Madan,A., Rodrigues,S.,
            Sanchez,A., Whiting,M., Madan,A., Young,A.C., Shevchenko,Y.,
            Bouffard,G.G., Blakesley,R.W., Touchman,J.W., Green,E.D.,
            Dickson,M.C., Rodriguez,A.C., Grimwood,J., Schmutz,J., Myers,R.M.,
            Butterfield,Y.S., Krzywinski,M.I., Skalska,U., Smailus,D.E.,
            Schnerch,A., Schein,J.E., Jones,S.J. and Marra,M.A.
  TITLE     Generation and initial analysis of more than 15,000 full-length
            human and mouse cDNA sequences
  JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 99 (26), 16899-16903 (2002)
   PUBMED   12477932
COMMENT     PROVISIONAL REFSEQ: This record has not yet been subject to final
            NCBI review. The reference sequence was derived from AF217985.1.
            On Jun 10, 2002 this sequence version replaced gi:7706672.
FEATURES             Location/Qualifiers
     source          1..1208
                     /organism="Homo sapiens"
                     /mol_type="mRNA"
                     /db_xref="taxon:9606"
                     /chromosome="19"
                     /map="19q13.2"
     gene            <1..1208
                     /gene="RAB4B"
                     /db_xref="GeneID:53916"
     CDS             58..804
                     /gene="RAB4B"
                     /note="small GTP binding protein RAB4B"
                     /codon_start=1
                     /product="ras-related GTP-binding protein 4b"
                     /protein_id="NP_057238.2"
                     /db_xref="GI:21361509"
                     /db_xref="GeneID:53916"
                     /translation="MSVSLPLTVMVRERDWIGIHLFSLYLSLPVGIPDFGSIWSDFLF
                     KFLVIGSAGTGKSCLLHQFIENKFKQDSNHTIGVEFGSRVVNVGGKTVKLQIWDTAGQ
                     ERFRSVTRSYYRGAAGALLVYDITSRETYNSLAAWLTDARTLASPNIVVILCGNKKDL
                     DPEREVTFLEASRFAQENELMFLETSALTGENVEEAFLKCARTILNKIDSGELDPERM
                     GSGIQYGDASLRQLRQPRSAQAVAPQPCGC"
ORIGIN
        1 tggcaagtac tacataagag ttatctgtta ctgttaccta ttggtttgtt tctgcctatg
       61 tctgtttctc tccctctaac tgtgatggtt agagagagag attggattgg aatccatctt
      121 ttttccctgt atctttctct ccctgtgggt atccctgatt ttggctccat ctggtcagac
      181 ttcctcttca aattcctggt gattggcagt gcaggaactg gcaaatcatg tctccttcat
      241 cagttcattg agaataagtt caaacaggac tccaaccaca caatcggcgt ggagtttgga
      301 tcccgggtgg tcaacgtggg tgggaagact gtgaagctac agatttggga cacggctggc
      361 caggagcggt ttcggtcagt gacgcggagt tattaccgag gggcggctgg agccctgctg
      421 gtgtacgaca tcaccagccg ggagacatac aactcactgg ctgcctggct gacggatgcc
      481 cgcaccctgg ccagccccaa catcgtggtc atcctctgtg gcaacaagaa ggacctggac
      541 cctgagcggg aggtcacttt cctggaggcc tcccgctttg cccaggagaa tgagctgatg
      601 ttcctggaga ccagcgctct cacaggcgag aacgtggagg aggcgttcct caagtgtgcc
      661 cgcactatcc tcaacaagat tgactcaggc gagctagacc cggagaggat gggctctggc
      721 attcagtacg gggatgcgtc cctccgccag cttcggcagc ctcggagtgc ccaggccgtg
      781 gcccctcagc cgtgtggctg ctgagctctg tggagccagc tcacctgttc tccaggacca
      841 gccctgctgg ggcccaggcc caggctctga gaggccgtgt cctaacctgc cctggccccg
      901 gagaagctac gttgccacct gtcccccttc cctggcctgg tggggcctgg ctttggggca
      961 agactgagcc acgggggaag ggggaatccc gtacctgctg ctgcttcctc tgtcttggct
     1021 aacgtctgtc cccctgaacc cctaaccata tcccaagagc tcccaaagcc tgagaccagg
     1081 gtcatttgtc cccaactccc catctggccc tgctgttgct agtacctgtt atttattacc
     1141 tggaggcctg tccagcaccc accctacccc cataaagcat tgtttacaaa aaaaaaaaaa
     1201 aaaaaaaa
//

NM_016154.1

1:  NM_016154. Reports  ...[gi:7706672] The record has been replaced by NM_016154.2

LOCUS       NM_016154               1168 bp    mRNA    linear   PRI 24-AUG-2001
DEFINITION  Homo sapiens RAB4B, member RAS oncogene family (RAB4B), mRNA.
ACCESSION   NM_016154
VERSION     NM_016154.1  GI:7706672
KEYWORDS    .
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 1168)
  AUTHORS   Huang,C., Wu,T., Xu,S., Gu,W., Wang,Y., Han,Z. and Chen,Z.
  TITLE     Novel genes expressed in hematopoietic stem/progenitor cells from
            myelodysplastic syndromes patient
  JOURNAL   Unpublished
COMMENT     PROVISIONAL REFSEQ: This record has not yet been subject to final
            NCBI review. The reference sequence was derived from AF165522.1.
            [WARNING] On Jun 10, 2002 this sequence was replaced by a newer
            version gi:21361508.
            COMPLETENESS: full length.
FEATURES             Location/Qualifiers
     source          1..1168
                     /organism="Homo sapiens"
                     /mol_type="mRNA"
                     /db_xref="taxon:9606"
                     /chromosome="19"
                     /map="19q13.2"
                     /cell_type="hematopoietic stem/progenitor cell"
     gene            1..1168
                     /gene="RAB4B"
                     /db_xref="GeneID:53916"
     CDS             124..765
                     /gene="RAB4B"
                     /codon_start=1
                     /product="ras-related GTP-binding protein 4b"
                     /protein_id="NP_057238.1"
                     /db_xref="GI:7706673"
                     /db_xref="GeneID:53916"
                     /translation="MAETYDFLFKFLVIGSAGTGKSCLLHQFIENKFKQDSNHTIGVE
                     FGSRVVNVGGKTVKLQIWDTAGQERFRSVTRSYYRGAAGALLVYDITSRETYNSLAAW
                     LTDARTLASPNIVVILCGNKKDLDPEREVTFLEASRFAQENELMFLETSALTGENVEE
                     AFLKCARTILNKIDSGELDPERMGSGIQYGDASLRQLRQPRSAQAVAPQPCGC"
     misc_feature    148..630
                     /gene="RAB4B"
                     /note="arf; Region: ADP-ribosylation factor family"
                     /db_xref="CDD:pfam00025"
     misc_feature    148..630
                     /gene="RAB4B"
                     /note="RAB; Region: Rab subfamily of small GTPases"
                     /db_xref="CDD:smart00175"
     misc_feature    148..609
                     /gene="RAB4B"
                     /note="RAS; Region: Ras subfamily of RAS small GTPases"
                     /db_xref="CDD:smart00173"
     misc_feature    151..663
                     /gene="RAB4B"
                     /note="ras; Region: Ras family"
                     /db_xref="CDD:pfam00071"
     misc_feature    151..498
                     /gene="RAB4B"
                     /note="ARF; Region: ARF-like small GTPases"
                     /db_xref="CDD:smart00177"
     misc_feature    151..495
                     /gene="RAB4B"
                     /note="SAR; Region: Sar1p-like members of the Ras-family
                     of small GTPases"
                     /db_xref="CDD:smart00178"
     misc_feature    160..639
                     /gene="RAB4B"
                     /note="RHO; Region: Rho (Ras homology) subfamily of
                     Ras-like small GTPases"
                     /db_xref="CDD:smart00174"
     misc_feature    163..633
                     /gene="RAB4B"
                     /note="RAN; Region: Ran (Ras-related nuclear proteins)
                     /TC4 subfamily of small GTPases"
                     /db_xref="CDD:smart00176"
     misc_feature    286..669
                     /gene="RAB4B"
                     /note="GTP_EFTU; Region: Elongation factor Tu family"
                     /db_xref="CDD:pfam00009"
ORIGIN
        1 astncsngth tntynchcka gtaggaagga gccgttgctg tagccggagt ggagcggctg
       61 ccagccgagg agcaggcgcg gccgcggcgc catattgcgg ccctcagcgg ccgcgaccga
      121 gtcatggctg agacctacga cttcctcttc aaattcctgg tgattggcag tgcaggaact
      181 ggcaaatcat gtctccttca tcagttcatt gagaataagt tcaaacagga ctccaaccac
      241 acaatcggcg tggagtttgg atcccgggtg gtcaacgtgg gtgggaagac tgtgaagcta
      301 cagatttggg acacggctgg ccaggagcgg tttcggtcag tgacgcggag ttattaccga
      361 ggggcggctg gagccctgct ggtgtacgac atcaccagcc gggagacata caactcactg
      421 gctgcctggc tgacggatgc ccgcaccctg gccagcccca acatcgtggt catcctctgt
      481 ggcaacaaga aggacctgga ccctgagcgg gaggtcactt tcctggaggc ctcccgcttt
      541 gcccaggaga atgagctgat gttcctggag accagcgctc tcacaggcga gaacgtggag
      601 gaggcgttcc tcaagtgtgc ccgcactatc ctcaacaaga ttgactcagg cgagctagac
      661 ccggagagga tgggctctgg cattcagtac ggggatgcgt ccctccgcca gcttcggcag
      721 cctcggagtg cccaggccgt ggcccctcag ccgtgtggct gctgagctct gtggagccag
      781 ctcacctgtt ctccaggacc agccctgctg gggcccaggc ccaggctctg agaggccgtg
      841 tcctaacctg ccctggcccc ggagaagcta cgttgccacc tgtccccctt ccctggcctg
      901 gtggggcctg gctttggggc aagactgagc cacgggggaa gggggaatcc cgtacctgct
      961 gctgcttcct ctgtcttggc taacgtctgt ccccctgaac ccctaaccat atcccaagag
     1021 ctcccaaagc ctgagaccag ggtcatttgt ccccaactcc ccatctggcc ctgctgttgc
     1081 tagtacctgt tatttattac ctggaggcct gtccagcacc caccctaccc ccataaagca
     1141 ttgtttacaa aaaaaaaaaa aaaaaaaa
//