Worm Breeder's Gazette 13(3): 17 (June 1, 1994)
These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.
I am exploring new statistical methods for sensitively recognizing and classifying features in genome sequence, such as new examples of known protein families, together with Richard Durbin and the rest of the informatics group in Cambridge.
During the course of work aimed at assembling good data sets for testing sequence recognition techniques, I built a statistical model of reverse transcriptase (RVT) sequences using "hidden Markov models", a method for detecting and describing the conserved features of a sequence family. The model was constructed from a set of 183 RVT sequences which included examples from retroviruses, retrotransposons, group II introns, and bacterial retrons. I've found that this model recognizes a number of reverse transcriptase genes in the known C. elegans genome sequence. Most of these elements have also been recognized by the Cambridge and St. Louis informatics groups and annotated in ACeDB, and one (in cosmid C06E8 )is the rte-1 element found by Youngman and Plasterk and reported previously in the Gazette.
In all, six apparently full-length RVT elements were detected in about 4 Mb of sequence (in cosmids F58A4 , F44E2 , F40F12 , C06E8 , ZK1236 ,and C07A9 ).All (with the exception of C07A9 ,see below) are single long open reading frames. The sequences seem to cluster into at least three families Ñtwo non-LTR retrotransposon families and one probable example of an LTR retrotransposon. The rte-1 element in C06E8 is most similar to LINE non-LTR retrotransposons. The element in F44E2 is similar to gypsy-class LTR retrotransposons, although I've been unable to find convincing LTRs flanking it. All the rest of the sequences seem to group as one family of non-LTR retrotransposons whose closest non-worm relative is the T1 retrotransposon of mosquitoes.
Five C-terminal fragments of RVT elements were also found (in C30C11 , K11H3 , C18H2 ,and two in ZC262 ).5'-truncated copies of non-LTR retrotransposons have been seen in many other organisms, and are thought to arise from premature termination of an RVT elongation step in the transposon replication mechanism.
The C07A9 element reading frame is interrupted by a 239 base inverted repeat element flanked by a CA dinucleotide direct repeat. It seems likely that these inverted repeat elements are genetically mobile, perhaps using transposase activity provided in trans by Tc elements.
Several (if not all) of these elements are associated with a large amount of nearby genomic rearrangement, in the form of multiple duplicated segments of 100-1000 bases rearranged apparently randomly on both sides of the element, but so far I haven't gotten a satisfactory picture of what's going on. It may be that the retrotransposons tend to be found more often in junky sections of the genome, just because of selection pressure. A more interesting (but probably unlikely) possibility is that the retrotransposon replication mechanism somehow results in local genomic rearrangements, in which case the rearrangements may provide clues about the replication mechanism.
Judging from the lack of retrotransposon-induced mutations in C. elegans, I suppose that none of these elements are currently active. Why, and when, did they die? What is the reason for the association with such a high level of nearby genome rearrangement? It might be interesting to look at these features in related nematode species by PCR. One might be able to date when the RVT elements (and the disrupting C07A9 inverted repeat element) inserted into their current sites, and determine whether the genomic rearrangements came before, after, or at the same time as the retrotransposon insertions.