Comparison of LINE-1 and retroviral-like repetitive sequence phylogenies in mice.

Stephen C. Hardies, Yingping Zhao, Lanxin Zhou, Christopher P.D. Jacobs, Liping Wang, N. Carol Casavant, and Rhonda Greene-Till

Dept. of Biochemistry, Univ. of Texas Health Science Center in San Antonio, TX 78284

ABSTRACT

The phylogeny of LINE-1 families in Mus musculus domesticus and M. spretus has been extended to higher resolution. The recently reported active F-type LINE-1 from the reeler locus was found to belong to the previously known L1Md4 clade in its 3' sequence. The previously defined spretus LINE-1 families coalesce with the L1Md4/L1reeler lineage nearly half way down from its split with the active L1MdA2 subfamily. This implied that there should also be a spretus A2 clade and the L1Md4/L1reeler clade appear to have experienced the spretus/domesticus split at much different times. A new retroviral-like repetitive sequence, named MuERVC has been characterized in comparison to previously known retroviral-like families. Like LINE-1, each family appears to have been derived from a different nondefective ancestor, but unlike LINE-1 the MuERVC families appear to amplify after [should say concomitant with] loss of functional reading frames. Dispersal of MuERVC sequences appears to be more episodic than that of LINE-1 subfamilies. Examples are now known of both LINE-1 and MuERVC that recently crossed from spretus to domesticus within an introgressing haplotype. GM51847

Interspecific gene transfer monitored by LINE-1.

L1EL111 and L1C105 are two spretus-specific LINE-1s found in the Mus musculus domesticus strain C57BL/6J. They are estimated to have been derived from M. spretus within the last 50,000 yrs. They map to two different chromosomes.

L1C105 is part of a spretus-derived haplotype.

Next to L1C105 is an endogenous retrovirus-like sequence named MuERVC. Downstream between two ancient LINE-1 inserts, we have identified a 500 bp stretch of unique sequence (the open box above). The MuERVC insert at the C105 locus predated the spretus/musculus speciation (assayed by the E PCR primer set). An identical sequence (1140 bp) of MuERVC was recovered from C57BL/6J at the C105 locus and from M. spretus (using the M PCR primer set). This mandates that the MuERVC-C105 sequence cotransferred with L1C105 within the last 50,000 yrs. [For any mouse sequence without conservative selection, two alleles will diverge at 2%/Myr since the time of the common ancestor to the alleles. This sequence is known to have no selective constraints, since the reading frame is disrupted. So identical sequence over 1140 bp in both domesticus and spretus MuERVC alleles requires an exchange in the last 50,000 yrs. irrespective of any other information.]

Alleles of unique sequence flanking MuERVC-C105 were recovered from a variety of musculus and spretus strains (using the U PCR primer set). The C57BL/6J allele is also spretus-like, although we are still looking for a perfect match.

Dendogram of unique sequence alleles at the C105 locus. Numbers of base substitutions are indicated on branches. The division at 1 Mya is the spretus/musculus speciation (the time when most spretus and musculus homologues shared their last common ancestor). Mice marked with an asterisk map to the wrong side of the division and are hence candidates for recipients of interspecific gene transfer. SMZ retains an ancestral polymorphism. Also shown are alleles detected by length polymorphism with the E primers and their transitions (in parentheses).

An independent interspecific transfer at the C105 locus of a domesticus allele into M. spretus is also detected (from C57BL/KsJ to SEG and an outbred spretus individual from Morocco) which is statistically much more clear (p =.007). By sequence, the transferred allele is of domesticus origin, so the unusual history of C57BL/KsJ is not an issue. Unlike the classic inbred strains, the two spretus recipients should be representative of the wild population. This second transfer supports the conclusion that these interspecific transfers happened in nature.

LINE-1 Lineages in the Mouse.

Background: Subfamilies are often reported for LINE-1, however these are often ambiguous as to whether they are different age groups on the same lineage or independent coexisting LINE-1 lineages. Coexisting lineages are useful to map because they reveal if different behaviors can simultaneously exist in the host-transposon relationship.

Three different LINE-1 lineages are now characterized as they cross the musculus/spretus speciation.

Legend. Each clade shown (other than those with question marks) have from 500-10,000 members. Those clades indicated with question marks are known only from 2-5 individual sequences. The number of diagnostic base substitutions assigned to each lineage is given. Triangles indicate tentative positions of peak output. The endpoints of the clades are scaled vertically based on degree of divergence among its youngest known members. The indicated speciation time is at 1 Mya.

The A2 lineage, carrying the original mouse prototypical LINE-1 named L1MdA2, is now recognized to have a sister family in spretus joining at the anticipated speciation time for musculus and spretus. L1MdA2 has a prototypical A-type promoter.

The L1MdZ lineage, previously unrecognized, similarly splits to musculus and spretus lineages at the anticipated speciation time. The promoter type in this lineage is unknown. It's unclear if the lineage presisted after an initial surge of output.

The L1reeler/L1Md4 lineage is now seen to have shared a common ancestor about 0.5 Myr after the anticipated speciation time. This is apparantly an example of gene exchange between the species that carried an active LINE-1 master and successfully populated the second species. On the spretus side, this lineage split to the previously characterized MS7024 clade and the aggressively amplifying MS475 clade. On the musculus side, this lineage became the previously known L1Md4 clade which is now known to include L1reeler. L1reeler is a spontaneous insert in recent times and carries an F-type promoter.

Did the interspecific transfer establish a new stable lineage?

Horizontal transfer is often associated with transposons that tend to amplify impressively and then fail in the new host. LINE-1 is usually associated with stable maintenance in the host germline over at least 100 Myr time intervals. It is therefore of interest to see what happens after the unusual interspecific transfer of a LINE-1 lineage. So far it seems that the new lineage establashed and has persisted. There is some evidence of a preexisting lineage of the same type in M. domesticus which may have been displaced by the invading lineage.

When did the L1reeler lineage acquire the F-type promoter?

It will be of interest to determine when the promoter switch occurred. An attractive hypothesis is to identify promoter alterations with the generation of new lineages. This might reflect acquisition of expression in a new germline compartment as a means of escaping mechanisms restraining the older lineage. Discussion

Are there Surges in LINE-1 Output?

L1 subfamilies are often discussed with the implication that they relate to episodes of increased output (like SINEs). The term "master" LINE-1 is often associated with LINE-1 subfamilies with the implication that a single locus underlies the elaboration of an episode of increased output. However, those fluctuations in output that have been documented last for an order of magnitude longer than the persistence of an individual active locus. A theory is developed at right to deal with surges of output as components of the host-transposon interaction that is tied to adaptations in the LINE-1 sequence as opposed to a particular locus. However, it is important to refine the observational basis of LINE-1 fluctuations in output to a finer time scale and to sort out those that are just driven by incidental changes in parameters such as mutation rate or host generation time.

The existence of unrecognized additional lineages can easily cause a false indication of a surge in output. We are applying intensive sampling methods to develop a more confident description of each of these lineages. See Casavant et al.

MuERVC

Background: C-type retroviruses can be considered like transposons that are activated in somatic tissue and then transmit themselves through the blood stream to establish new insertions in the germline. They also, of course, are capable of horizontal transmission. Horizontally transferring elements often amplify at first and then die out leaving behind a population of defective inserts as a repetitive sequence family. An issue of interest is whether these elements amplify wildly and then are brought abruptly to extinction, or whether they amplify only slightly above their equilibrium condition and then fade more gradually.

MuERVC-C105 is the prototype of a new murine endogenous retrovirus-related repetitive sequence family of 50-100 copies in the mouse genome. It is derived from a C-type retrovirus (as opposed to VL30 and IAP which are derived from B-type retroviruses). The relationship of MuERVC-C105 to other C-type retroviruses and the murine endogenous retrovirus-related sequences MuRRS and MuRVY is shown below.

Density of defects in MuERVC, MuRRS, & MuRVY. The upper line of dots are in-frame terminators. The lower line of dots are frameshifts.

Nonsynonymous/Synonymous (N/S) ratios during the descent of both defective and non defective C-type retroviruses. The N/S ratios reveal the following:

MuERVC is well positioned to provide observational data on a single amplification event. In contrast, MuRRS is known to have undergone a compound amplification generating families sharing diagnostic deletions of the pol gene. MuRVY is confined to the Y chromosome and may have amplified by a non retroviral mechanism.

Multiple copies of MuERVC were recovered by PCR from M. spretus (S) and C57BL/6J (M) and sequenced. The tree is shown below with only splits of >95 bootstrap values.

  1. M8 is derived from a separate nondefective progenitor and is a candidate for a another new MuERV family.
  2. (M1,C105,S6,S8) are allelic and are involved in the interspecific transfer discussed on the earlier panel. SB was trans-mobilized after acquiring frame defects.
  3. The N/S ratio of all branches below the radiation approaches that of random drift. There is no clear evidence of any activity after the initial radiative amplification.
  4. The average divergence back to the amplification is only 4%.

Filtering of noise produced the following tentative picture of how the amplification proceeded:

  1. There is still excessive discordancy among different positions after accounting for coincidental mutation and CG hypervariability. This suggests a substantial component of retroviral recombination among propagating sequences during the amplification.
  2. There are some residual recombination patterns in evidence wherein several positions on the left of a sequence place it differently than several positions on the right.
  3. The N/S of substitutions isolated to within the radiation is intermediate between that of a non defective retrovirus and that of a pseudogene. The non defective viruses are trans-mobilizing their own defective copies and recombining with them. The buildup of non defective proviruses is therefore not keeping pace with the buildup of total copies.
  4. The total span of the visible portion of the amplification occurred within about 1% divergence. Since remnants can't be recovered until the copy number reaches 1/genome, this span only accounts for about the last 6 doublings. If the numbers of proviruses/genome doubled at the same pace since the original introduction into the species, there should have been a substantial period wherein the frequency of the element built up to the point where it is first observed.

This system is amenable to detailed investigation.

Through sampling of mouse species that split off during the amplification, it should be possible to gain a similar level of information about this amplification event as that exhibited in the LINE-1 system. Discussion

Selfish Transposons and Stability through Self-Restraint in Higher Eucaryotic Systems.

  1. Selfish Selection
    1. Axiom: Even if described as self-restrained or symbiotic, transposons will accumulate more aggressively replicating variants as much as possible.

    2. Axiom: Adaptive responses are required of the host to prevent an exponential build up of transposons (run-away transposition).

    3. It is argued that properties such as self-restraint are not derived at this lowest level of description, but are emergent properties at a higher level of description.

  2. Insertional mutagenesis may restrain transpositional activity if sufficiently augmented.
    1. Insertional mutagenesis is diluted by diploidy. To select against activity (as opposed to the presence) of a transposon, many of its inserts must be lethal/sterile mutations before they can segregate away from the parental locus (Charlesworth and Langley, Genetics 112; 359-383 (1986)). Because of diploidy and a lot of intergenic DNA in vertebrate genomes, apparently too few new inserts are immediately lethal/sterile for insertional mutagenesis to have a restaining effect.

    2. Creation of non productive inserts augments the effect of insertional mutagenesis. Mechanisms that force the transposon to make multiple inserts to get one successfully replicated copy increase the effect of insertional mutagenesis. Such mechanisms include creation of partial copies, trans-mobilization of defective templates, and epigenetic inactivation (methylation) of new copies.

    3. LINE-1 may be restrained by insertional mutagenesis. LINE-1 creates 10-20x truncated or mutated inserts per faithful copy. There is an unknown but probably small component of trans-mobilization of defective LINE-1 templates. LINE-1 is also thought to trans-mobilize the SINES, of which there can be as much as 10x more in the genome than there are LINEs. These factors together may augment insertional mutagenesis for LINE-1 > 100x and yield a stabilized system of self-restraint. The math

  3. Oscillations in host-transposon interactions.
    1. Outbreaks of run-away transposition followed by host adaptations to contain them should lead to oscillations in transposon output.

    2. A history of multiple adaptations by the host is revealed by the variety of mechanisms employed in different host-transposon interactions.

      1. In Drosophila, excision and selection against transposon presence appears to maintain an equilibrium (Charlesworth & Langley, Ann. Rev. Genet. 23:251-287 (1989)).

      2. For LINE-1, there is no selection against the simple presence of LINE-1. It is proposed that selection against activity brought on by augmented insertional mutagenesis. It is proposed that this came about as the host interfered with full length insertions and as the host expressed SINE RNAs as decoy templates.

      3. For endogenous retroviruses (for this purpose, a transposon that is expressed somatically and then passes through the bloodstream to reinfect the germline), trans-mobilization may be more of a factor. The host employs epigenetic modification as well as its immune system.

    3. Severe transposon epidemics are not observed. From first principle, one should expect severe oscillations between run-away transposition events, and resistant host states. There should be considerable host mortality associated with selecting a resistant state. No such thing has ever been observed. Each of the above systems either tends toward a very similar equilibrium or towards relatively mild fluctuations.

  4. Stability and Self-Restraint as Emergent Properties
    1. Long term evolution culls out extreme oscillations. Transposons that we see have a very long history of association with the host (although not necessarily the same host). Extreme oscillations carry the risk that either the transposon or the host will fail to adapt and be driven to extinction. Thus, over time, host/transposon interactions that have incorporated adaptations that damp the oscillations will be enriched. In this view, stability means less fluctuation, but not necessarily an equilibrium.

    2. Adaptations that damp oscillations are emergent regulatory mechanisms. Emergent properties of complex systems are properties that are not easily anticipated from the elementary interactions of the different units, but which tend to emerge as an aggregate result of many such interactions. For example, it makes no sense for the host to make a factor that causes LINE-1 to make more inserts. However, as a side effect of making factors that mostly inhibit LINE-1 insertion, if it happens that there is an increase in the production of non productive inserts, then the system may become stably regulated by insertional mutagenesis.

    3. Self Restraint simply means that the ways to cheat have become quite limited. Trans-repression has been criticized as a mechanism of self-restraint on the grounds that the repressed transposons will just mutate to ignore the repression. If the transposon represses itself (in cis) to avoid making too many copies and triggering some deleterious condition, then it may not be able to ignore the same repressor made by other copies of its family. It will still cheat if it can; cheating may just not be so easy. In this regard, if a transposon makes many more than 1 insert per host generation, it looses the opportunity to seggregate away from its own progeny leading to a quick elimination from the population. This effect may be enough alone to lead to widespread occurrence of cis (and hence trans) repression.

    4. Regulation with dynamic range.

      1. Trans-repression automatically increases transposon output if the host makes an adaptation that causes transposon copy number to fall. This mechanism has dynamic range: it can allow the transposon to adapt without acquiring new mutations (and the losses incumbent on selecting for those new mutations). Regulation with dynamic range may be especially prone to develop in a fluctuating selective environment.

      2. If SINEs are indeed a defense against LINEs, they would also have a kind of dynamic range. If an aggressive LINE-1 variant spreads in the population, it would cause increased output of SINEs. Since the new SINE inserts are also potentially templates for making SINE RNA, the dosage of SINE RNA is dynamically increased. More importantly, the segregation of new SINE templates into the population would be as fast or faster than the segregation of new aggressive LINE-1 copies. So the whole host population might be able to adapt without suffering serious mortality.

  5. Selfishness as an Emergent Property.
    1. In its fullest sense, selfishness of a transposon means that one can intuit its behavior in the genome by analogy to the behavior of more ordinary organisms in their environments. This analogy becomes more plausible under the assumption that the transposon is in an adaptive struggle to deal with a network of restraints placed on its replicative success.

    2. They may engage in complex regulation to maximize their replicative success under the limitations imposed upon them.

    3. They may compete with each other for limited replicative opportunities.

    4. They may adapt to exploit new niches.

      1. Does LINE-1 acquire new promoters to gain access to new germline compartments?

      2. Did retroviruses acquire virion proteins to access expression in somatic tissues and then to reinfect the germline?

      3. Is horizontal transmission the ultimate mechanism to seek new niches?

  6. Horizontal Transmission Creates a Complex System at an Even Higher Level.
    1. Horizontal transfer, by putting new host-transposon combinations together, creates a source of random perturbation to offset the accrual of stabilized systems.

    2. Damping of oscillations should also be a property of horizontally transferred elements. Although a host-transposon combination brought about by horizontal transfer may be new in some sense, the host will have seen and survived transposons of this kind before, and the transposon will have a long history of surviving (at least to a point) in other hosts. So although there may be an increased incidence of instability, one should expect both the host and the transposon to come equipped with regulatory mechanisms selected for damping oscillation and enhancing their survival under these conditions.