Worm Breeder's Gazette 13(3): 19 (June 1, 1994)

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

Genome Parametrics, or Look, I can use Query and Tablemaker!

Tom Barnes

Department of Biology, McGill University, Montreal PQ H3A1 B1Canada

There are a number of factoids we all use with the physical map. With some 2.5 Mb of sequence in release 2-10 to play with, I wanted to see how well these estimates correlate with this 2.5% of the genome. The region analysed below corresponds to ctg377 coordinates -750 to 631, excluding -116 to -108 and -56 to -48, covering 2 491 050 bp.

First, average kb per band. The standard figure comes from dividing the average number of bands per cosmid (~23; Coulson et al. 1986) with an estimate of mean cosmid size (40 kb) to get ~1.8 kb/band. Each Hind III fragment will generate 2 bands, unless there is no Sau 3AI site in the fragment. Thus the average frag size should be somewhat less than 3.6 kb. However, from 977 whole frags spanning 2 483 174 bp we get 2.54 kb/frag. This discrepancy arises for three independent reasons. First, from the Hind III frags lacking Sau 3AI sites, the average number of bands per Hind III frag is only 1.87 (1825 theoretical bands/978 real sites). Secondly, for the 1825 theoretical bands there are only 1360 drawn bands. Most of this discrepancy derives from the loss of smaller bands from the bottom of the fingerprinting gel (there are 373 theoretical bands that are < 70 bases). Finally, the remaining compression (1360 drawn bands/~1450 theoretically resolvable bands = ~94%, so ~6% compression) is probably the net effect of drawing clones compromised by sporadic errors from the digitizing process. This means that for a given clone on the physical map, the position of its ends with respect to the ends of nearby clones will not be perfectly precise. Despite all this, the figure of 1.8 kb/band derives from the drawn bands, and so the value of kb/drawn band in this region is very close to the standard estimate: 2 491 050/1360 = 1.83.

This exercise can be extended to derive other averages: 1) Of 7277 canonical cosmid clones, there is a total of 154 231 bands giving 21.2 drawn bands/canonical cosmid. 2) 21.2 drawn bands/cos x 1.83 kb/drawn band = 39.5 kb/canonical cosmid. To summarize, in acedb drawn bands cover 1.83 kb on average and cosmids contain 21.2 of these on average, giving an actual mean cosmid size of 39.5 kb. However, in reality, the average length of a Hind III frag is 2.54 kb, and one in 7 frags has no Sau 3AI site.

In addition, it is possible to count Not I sites and examine their distribution across the genome. W clone inserts should all terminate in a Not I site on one side (Gibson et al. 1987b; WBG 11(1):15). Correspondingly, in the physical map there are both solitary W clones and groups of W clones which cluster around a single position. The minimum number of Not I sites that must be proposed to account for the way all the 876 W clones cluster is 281 (giving 350 kb average frag size; largest W-free zone is over 3 Mb on IV), implying less than 3.1 clones per site on average. To refine this estimate, we can ask: do some Not I sites have no associated W clone, and, are there some W clones are not bounded by a Not I site? Turning to the full available sequence, there are 8 Not I sites present. All of these occur near W clones or W clone clusters, accounting for 19 W clones. In another region near dpy-5 examined by Heidi Browning and Janet Paulsen (pc), there are 9 W clones associated with two defined Not I sites. This suggests that most Not I sites in the genome (10/10 so far, accounting for 28 clones) will indeed have an associated W clone. However, there is one W clone in the sequenced region ( W06F7 )which is nowhere near a Not I site. One end is, however, near two sequences that match 7/8 bases of a Not I site. There is also one orphan W clone (W O1F6 )within the Not I fragment examined by Browning and Paulsen. This means that some W clones (roughly 2/30) will have been created by 'star' activity. As these will likely be solitary clones, the minimum estimate of Not I sites should be more like 281-(876*2/30) = 223. Allowing for a few missed Not I sites, it seems that the total number in the genome will be on the order of 240.

These sites (composed of 8 G/C bases) are distributed strikingly uniformly over the genome, with no arm/cluster biases. Using my definition of clusters (see WBG 12(3):24 and other abstract this issue), in autosomal clusters, autosomal arms, and the X, the percentages of genome bands are 36, 45 and 19, respectively, while the percentages of inferred Not I sites are 35, 46 and 20 respectively. One easy way to see this is in the arms, where there is a general thinning of all but W clones, so that the preponderance of W clones goes up. The direct corollary of this is that there is no simple nucleotide compositional difference between clusters and arms. Consistent with this interpretation is that the G+C content of the sequenced region (37%) is identical to that determined for the whole genome (36%; Sulston and Brenner 1974). Therefore the preferred recombination (WBG 12(3):24) associated with the intragenic DNA (other abstract this issue) in the arms must arise from a sequence difference of a higher order than simple nucleotide composition.