Worm Breeder's Gazette 16(4): 22 (October 1, 2000)

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

Global Analysis of Operons Using Microarrays

Tom Blumenthal1, Donald Evans1, Chris Link1, Kyle Duke1, Stuart Kim2

1 Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine.
2 Institute for Behavioral Genetics, University of Colorado Department of Developmental Biology, Stanford University Medical Center Department of Developmental Biology, Stanford University Medical Center

The C. elegans genome contains polycistronic transcription units, or operons, a finding that was based initially on the observation that genes that are trans-spliced to the spliced leader SL2 are close to, and in the same orientation as, an upstream gene. It was hypothesized that the SL2 snRNP was specialized for trans-splicing at internal trans-splice sites in polycistronic pre-mRNAs. Since then, operons have been identified by numerous C. elegans labs, and the correlation between being a downstream gene in an operon and receiving SL2 has been strongly supported. However, no systematic attempt to identify all operons has been attempted since completion of the genomic sequence. This is at least partly because there are no criteria for identifying operons from genomic sequence alone. Since all known operons contain genes in the same orientation within 500 bp of each other (and most are a lot closer than that), one could create a list of all such gene arrangements and propose these as the operons. However, it is probable that some of the genes on any such potential operon list are not actually in operons, but instead are adjacent genes that are not cotranscribed. Thus, any list of possible operons should be tested by determining which genes are trans-spliced to SL2.

In an attempt to identify members of the class of genes that are trans-spliced by SL2, we used a DNA microarray containing genomic fragments corresponding to 17,805 C. elegans genes. The DNA microarray was simultaneously hybridized with a cy3 labelled probe dependent on the SL2 sequence for its synthesis and a cy5 labelled probe from polyA+ RNA. The SL2 probe was made by priming a second strand from oligo(dT)-primed cDNA with a primer consisting of the T7 RNA polymerase promoter followed by the SL2 sequence. This unamplified DNA was then transcribed with T7 polymerase and the resulting RNA was used to synthesize cy3-labelled cDNA. mRNAs that have an SL2 sequence should have a high SL2/polyA+ ratio. This ratio was found to range from 38 to 0.2 for the different genes on the microarray, with 14% of the entries showing an enrichment above 2.0 fold.

To evaluate how well this worked, we used the invaluable tool, intronerator (http://www.cse.ucsc.edu/~kent/intronerator/), to examine the genomic arrangements of genes with high or low SL2/polyA+ ratios. Intronerator graphically presents gene predictions with an alignment of cDNA clones. We scored genes as "operon+" if their potential trans-splice site was within 500 bp downstream of the 3’ end of another gene. We sometimes knew the locations of 3’ ends based on cDNAs. Also, the sites of trans-splicing and even SL2 specificity was available for some genes based on cDNAs. In other instances we only had gene predictions to go on. The "operon+" designation indicated that the gene could be in an operon.

Ratios were positively and directly correlated with whether genes were likely to be in operons. Of the genes with the highest SL2/polyA+ ratios, those with ratios >7.0, 103/113 (91%) were scored as operon+. Of those with ratios of >4.0, 271/364 (74%) were scored as operon+. In contrast, only ~12% of 56 genes with ratios about 2.0 and ~13% of 69 genes with ratios about 1.0 were scored as operon+. We performed a similar experiment with an SL1 probe, and only 8% of the 63 genes with the highest SL1/polyA+ ratios were scored as operon+. These data strongly support, for the first time on a global level, the observation that SL2 trans-splicing does indeed imply that a gene is a downstream gene in an operon.

This conclusion is supported by a separate analysis. Dan Lawson (personal communication) did a search for cDNAs that had at least partial SL2 sequence at the 5’ end, and we supplemented this list with genes that were known to be SL2 trans-spliced, which resulted in a list of 116 confirmed SL2-trans-spliced genes. We used a list of possible operons, based on 3’ ends < 1 kb upstream of 5’ ends (created by Alessandro Guffanti, personal communication), in addition to intronerator, to determine that 91% of these confirmed SL2-accepting genes were downstream in operons. The 10 SL2 trans-spliced genes that did not appear to be downstream in an operon are intriguing. Either there is another way to specify SL2 or there is a mistake in the gene annotation such as a gene missed by the prediction program.

We tested some of the genes with high SL2/polyA+ ratios, but that had not been scored as operon+ for SL2 trans-splicing using RT-PCR. None were SL2 trans-spliced, whereas two that were scored as operon+ tested positive for SL2 trans-splicing. We don’t know why some genes that are apparently not trans-spliced to SL2 had high SL2/polyA+ ratios. Perhaps they were incorrectly identified on the chip or had an internal sequence on which the SL2 oligo could prime.

We used this analysis to obtain a rough estimate of the number of operons in the genome. The 271 genes with SL2/polyA+ ratios >4 represented 244 different potential operons. Since we found that 29.4% of 68 proven SL2 trans-spliced genes reported in the literature or having SL2 cDNAs had SL2/polyA+ ratios of > 4, we estimated that the 244 potential operons represent about 29.4% of the operons, so there would be about 830 operons in the genome. This could represent an over estimate if many of the genes scored as operon+ represent genes that are fortuitously close to one another, but are in fact transcribed independently. On the other hand, it could represent an under estimate if genes in operons are sometimes farther than 500 bp apart.

The following summarizes operon characteristics developed from this analysis:

Estimated number of operons in the genome: 830

Average number of genes/operon: 3

Median distance between genes in confirmed operons: 121 bp

Possible functional relationships between genes in the same operon are often observable, but no test has been applied yet to determine whether these apparent functional associations are statistically significant.