Worm Breeder's Gazette 8(3): 61

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

The Organization of Expressed Major Sperm Protein Genes

S. Ward, D. Burke, E. Hogan

The major sperm protein (MSP) in C.  elegans is actually a family of 
small basic proteins which makes up about 15% of the protein in sperm. 
The genes encoding this family comprise a large multigene family with 
at least 30 and possibly 60 members (Burke and Ward (1983) J. Mol. 
Biol.  171:1-29; Klass et al (1984) Mol.Cell Biol.  4:529-537).  Among 
a set of 15 MSP cDNA clones which we sequenced nine different 
sequences were found.  Thus many of the genes in this family must be 
expressed.  The protein coding sequences of the genes are highly 
conserved, better than 90% homology, but the 3'-untranslated regions 
have diverged.  This suggested that it might be possible to study the 
expressed members of this family by using probes homologous to just 
the 3'-untranslated portions of the genes.  We prepared five such 
probes by using an Applied Biosystems automated synthesizer to make 
oligonucleotides 20 bases long that correspond to the complement of 
the untranslated mRNA sequences.  We have made five additional probes 
corresponding to MSP cDNA sequences obtained recently from Michael 
Klass.
Probing transfers of restriction enzyme digests of worm DNA with 
these probes shows that they each hybridize with either one or a few 
fragments which are a subset of fragments hybridizing to the coding 
region of the MSP genes.  Using the probes we have cloned the genomic 
copies of these specific MSP genes.  Of nine such genomic phage 
examined, seven of them have more than one MSP gene on the same phage. 
Some of these genes are adjacent to each other, but other are 
separated by 3-6Kb of intervening DNA.  None of the phage appeared to 
overlap each other so these results suggest that there is at least 
some local clustering of expressed MSP genes but they are not all 
close together.
Just as we were about to begin examining DNA around these genes for 
additional MSP genes by walking from them in our genomic libraries, 
John Sulston described his strategy of physically mapping the whole 
genome (WBG vol 8,2).  We sent him DNA from four of the phage with MSP 
genes and he was able to identify cosmids that appeared to contain or 
overlap two of the phage.  One of these was actually a phage sent from 
David Hirsh's lab that contained the col-2 gene.  (Note the 
extraordinary sensitivity of John's method, the overlap by which he 
identified these phage was only 10 Kb.) Mike Krause sent us this phage,
and we found that it does indeed contain an expressed MSP gene about 
6Kb from the col-2 gene.  Jim Kramer and Joe Cox have mapped this gene 
by restriction enzyme polymorphisms to the region near daf-14 on IV.  
They also walked in a cosmid library around this region.  To our 
disappointment when we probed their neighboring cosmids no additional 
MSP genes are within about 50 Kb on either side.  We are attempting to 
locate these cosmids more accurately relative to deletion break points 
nearby.
John Sulston found that another MSP containing phage aligned with 
two overlapping cosmids of unknown location.  These cosmids were found 
to contain several additional MSP genes.  Thus in a 50Kb region there 
are at least five MSP genes.  Two of them correspond to one of our 
cDNA sequences and two others correspond to one of Michael Klass's 
cDNA sequences so at least four of these genes must be expressed.  If 
we can learn where these genes are in the genome it might be possible 
to eliminate them with a small deletion and so discover what happens 
to sperm with insufficient MSP.
We are currently sequencing the genomic MSP genes to see if there 
are conserved regions outside the coding sequence.  These might be 
either regulatory sites or the remnants of residual tranlocatable 
elements which could have participated in dispersing these genes 
throughout the genome.