U2, U4, and U6 snRNA Genes

Jeffrey Thomas, Erin Zucker-Aprison and Tom Blumenthal

The C.  elegans genome contains about 12 U2 snRNA genes, and we have 
cloned and at least partially sequenced ten of them.  Seven of the 
genes are identical in sequence or differ by only a single base pair, 
and these genes occur in three clusters of two or three U2 genes per 
lambda clone.  Three remaining U2 genes we sequenced are not closely 
linked to other U2 genes, and they differ from the majority class by 
one, five and six base pairs.  The gene sequence is 66% identical to 
the human U2 sequence and can be folded into a very similar secondary 
structure.  All sequence alterations are consistent with this 
structure, and all but one of the changes in stem regions are 
conservative with respect to base-pairing.  We have also determined 
the sequence of the 5' flanking DNA of all ten genes.  We find a 
remarkable degree of conservation from position    -64 to the start 
site of transcription.  There is a consensus base for nearly every 
position within this region and most genes differ from the consensus 
sequence by fewer than ten bases.  Further upstream there is little 
resemblance between the ten sequences.  Thus we believe the region 
between -64 and the cap site is likely to contain the cis-acting 
sequences involved in activation of the U2 genes.  Within this region 
are sequences that are similar to sequence elements shown to function 
in vertebrate U2, U1 and U4 gene transcription.
We have also begun analysis of U4 and U6 snRNAs and the genes that 
encode them.  We have obtained one lambda clone that contains two U4 
genes, another that contains a single U4 gene, and a third that 
contains two U6 genes.  The gene sequences of the two U4 genes are 
identical to each other and have 75% similarity to human U4.  The U6 
genes are also identical in sequence and are 88% similar to their 
human homolog.  There is good evidence from other organisms that the 
U4 and U6 snRNAs interact by base-pairing and C.  elegans has 
conserved complementarity between the regions implicated in this 
interaction.  In contrast to the U4 coding regions, and in contrast to 
the U2 5' flanking regions, the U4 5' flanking regions are highly 
diverged from each other.  The only easily recognizable similarity 
between all three upstream regions is a sequence from -65 to -41 which 
also resembles the sequence found at this position in the worm U2 gene 
promoter.  The three U4 genes share no similarity downstream of their 
3' ends.
The U6 5' flanking regions are nearly identical to each other to the 
extent we sequenced (-160), presumably reflecting a recent duplication.