<--   m4  -----> <---- m3 ------> <-----m2 ------> <- m1 --
                        |                |                |        
        0000000001111122|0000000001111122|0000000001111122|000000001
        0112344596788801|0112344596788801|0112344596788801|011234458
        8584936650337844|8584936650337844|8584936650337844|858493665
        ____________________________________________________________
        GTCGGGTGGTAGCGGG|GTCGGGTGGTAGCGGG|GTCGGGTGGTAGCGGG|GTCGGGTGG
L1spa   ..TA............|...A............|...A...A..G.....|A...A.A..
81109   ---------.G.....|.......A..G.....|................|....A....
81111   -G.......C..G...|..T..........AT.|AGT.........G...|....A....
L1orl   AGT.........G...|.......A..G.....|...AA.......G...|....A....
81112   --------.C..G...|...A...A..G....A|AG.....A..G....A|AG......C
81113   ----------------|----------------|--------........|...A...A.
81107   --------AC......|....A...........|....A..A......T.|....A....
81106   AGT.........G...|.......A..G...T.|AG......A.......|....A.A..
81104   ----------------|--------..G...T.|AG......A.......|....A...C
81114   -----A....G..AT.|A..AA......A....|...A...AA.GA....|....A....
81108   ----------------|--------..G...T.|.A......A.......|....A....
81105   ----------------|---------.......|...A...A..G.....|....A.A..
81110   AG......A.......|.....A....G...T.|AG......A....A..|...A.A..C
5938A   ----------------|----------------|--------..G...T.|AA......C
5938B   ...AA...........|.......A..G...T.|AG......A.......|....A.A..
         <----- m8 ----> <---- m7-------> <---- m6 ------> <------ m5 ---->
                        |                |                |     
        0000000001111122|0000000001111122|0000000001111122|0000000001111122|
        0112344596788801|0112344596788801|0112344596788801|0112344596788801|
        8584936650337844|8584936650337844|8584936650337844|8584936650337844|
        ____________________________________________________________________
        GTCGGGTGGTAGCGGG|GTCGGGTGGTAGCGGG|GTCGGGTGGTAGCGGG|GTCGGGTGGTAGCGGG|
L1spa   ---------.G.....|..TA............|...A....A.G...T.|AA......A.C.....|
81109   ----------------|----------------|----------------|----------------|
81111   ----------------|----------------|----------------|----------------|
L1orl   ----------------|----------------|---------.G.....|...A...AA.G..AT.|
81112   ----------------|----------------|----------------|----------------|
81113   ----------------|----------------|----------------|----------------|
81107   ----------------|----------------|----------------|----------------|
81106   ----------------|----------------|----------------|---------.G..AT.|
81104   ----------------|----------------|----------------|----------------|
81114   ----------------|----------------|----------------|----------------|
81108   ----------------|----------------|----------------|----------------|
81105   ----------------|----------------|----------------|----------------|
81110   ----------------|---------.G...T.|AG......A.......|..........G...T.|
5938a   ----------------|----------------|----------------|----------------|
5938b   AG......A.G.....|A......AA.G...T.|AGT......C..G...|...... A..G.....|

            m9 -------->|
                01111122|
                96788801|
                50337844|
                _________
5938b           ..G...T.|


Also T at 40 in 5938Bm3, 81110m3, 81106m2
Also G at 114 in 5938Bm3, 81105m2
Also A at 55 in 81108m1 and 5938Bm1
MOTIFS:

Derived from      A..G...T.|AG...

1a        |.......A..G...T.|AG......A.......|....
1b            A...AA.G..AT.|AGT......C..G...|...
1c            A....A.G...T.|AA......A
1d            A...A..G....A|AG......
1e         .......A..G.....|...AA......
1f        |...A...A..G.....|

L1spa specific:

2    .....|..TA............|...A...

Repeats within arrays:

          m9    m8   m7    m6    m5    m4    m3    m2    m1
L1spa           <     2     >        <     2     >          
81112                                        < 1d  >< 1d > 
81107                                       < 39A>< 39A>< 39A>
81106                            <   1b    > <   1a    >
81110                <   1a    > <   1a    > <   1a    >
53938B    <   1a   > <   1b    >             <   1a    >

Others with motif 1:

81114                            < ..1e    > <   1b... >
81104                                        <   1a    >
81108                                        <  ~1a    >
81111                                        <   1b    >
 

Other similarities between elements:

          m9   m8    m7    m6    m5    m4    m3    m2    m1
L1spa                   <1c>                      <1f>  <46A>
53938A                                               <1c>
81105                                             <1f>  <46A>

L1Orl                       <     1b    > <    1e   >
53938B    <   1a   > <    1b    > <     1e   >



Array expansions

Generally there is a tendency for a motif involving T.|AG to fall on m_odd/m_even boundaries, and to alternate with a motif not involving that sequence on the m_even/m_odd boundaries. This presumably represents the developement of a T.|AG motif on an early founder m3/2 boundary followed by an expansion of 2 monomer units.

There are several occassions were more recent expansions of peculiar motifs apparently occurred:

The ancestral state

Comparison to m1 and m2 of the M lineage founder suggests that in each case the ancestral state is as indicated at the top line and the variants listed within the table are derived. The exception is 43A, which appears not to carry any useful information. The higher frequency variants are therefore presumably older.

Therefore it seems that the T.|AG motif is ancestral to the AT.|AGT motif. Since several time the AT.|AGT motif appears upstream of a T.|AG motif, and T.|AG most commonly appears on the m3/m2 boundary, it is reasonable to propose that the AT.|AGT motif came into being on the m5/m4 boundary after an initial duplication from the T.|AG motif. Then 81111 represents a subseqent collapse of the array to bring the AT.|AGT motif down to m3/m2.

If there is any phylogenetic signal:

L1spa group:

L1spa, AF081106, AF081105 are grouped by m1-46. AF081106 maintains no other similarity to these in the promoter, and does not have affinity in the body, so the m1-46 is a single base homoplasy for AF081106. AF081105 and L1spa maintain close relationship through end of AF081105. These two were strongly grouped away from each other throughout the body, and so there will have to be a recombination between them between the last informative body marker (3397) and the promoter. L1spa and AF081109, which were strongly clustered throughout the body, could be related in the promoter through a shift of m2 back to m1, however the case is weak. 5938A which holds with the TFnode1 group throughout the body has a short promoter motif found only in L1spa out at m6/5, so it could be a recombinant, but again the case is weak. Hence the best case for a L1spa group specific promoter is L1spa itself, which at least seems much different than the others. However, other examples will have to be obtained to clarify if this assignment is meaningful.

TFnode1 group

AF081105 is shifted out by recombination as indicated above, leaving AF081108, AF081110, AF081114, and AF081104. Of these, AF081110 and AF081104 match particularly well throughout the promoter, and at the shared variant at 185 m1. AF081108 also matches well, and can be considered as a minor variant of the above. 5938A could also be considered a match like 81108 except the characteristic motif 3 is shifted from m2,3 down to m2,1. So it's unclear if 5938A stays with TFnode1 or is more related to L1spa. AF081112,4,10, and 5938A are grouped by the only shared variant in the otherwise constant part of m1 (185m1 C). Three of these (81104,81110,5938A) are also grouped to TFnode1 by the body, whereas 81112 is regarded as a single base homoplasy. These three can therefore be taken as representing full length, unrecombined TFnode1 sequences. The longest promoter array of these is AF081110, which repeats the same two monomers 3 times. This may reflect a reexpansion of the promoter array local to TFnode1 sequences, or may be a private reexpansion of the array in AF081110. Howver, the promoter motif characteristic in this group is also found in TFnode2, and is probably ancestral, so there is no clear indication of a TFnode1-specific promoter variant.

TFnode2 group

AF081106 (a TFnode2 sequence) matches AF081104/AF081110 also very well through m3, but upstream has a variant motif that is also found in L1Orl, AF081111, and AF081114. This probably reflects the ancestral arrangement to AF081110 which then lost the upstream motif while triplicating the downstream motif. AF081114 then fits in its expected position below AF081106 by retention of the upstream motif. 5938B is particularly close to L1orl, although there is a one monomer shift in position of the matching monomers. Of the sequences clustered as TFnode2, 5938B, AF081106, L1Orl, and AF081111 retain high affinity through retention of motif 1, although in AF081111 it has collapsed down to m2 by deletion. Thus the arrays both expand and collapse. L1Orl and AF081106 also share a variant with the L1spa group at 3430, which might be used to claim that they are upper TFnode2. Of the others that should belong in TFnode2, AF081113 is too short in the promoter array, but what it has seems to go its own way. AF081112 seems to have collapsed something related to motif 2 down into m1, and then reexpanded it. AF081107 also seems to mostly go its own way.

In summary:

  1. There is evidence for both deletion of monomers within the promoter arrays and repeated expansions, often of 2 monomer units.
  2. There is evidence for ancestral promoter arrays differentiating and leaving progeny that differentially lose one of the two motifs. It has not yet been possible to separate conversion or recombination among elements from simple paralogy based on descent from different monomers of an ancestral array.
  3. One additional recombinant is discovered bringing the total to 2 TF to TF and one TF to A2 recombinants out of 14 full length sequences. This plus 2 long patch gene conversions in the body bring the total density of identifiable recombination points to 1/14,000 bp (not counting the recombination and gene conversion affecting the TF masters).
  4. The best candidates for unrecombined representatives of the 3 subgroups are: