Worm Breeder's Gazette 11(4): 71

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

A Suggestion

Tom Barnes

There has been a disturbing tendency in recent times to have gene 
names that are overly descriptive.  We do not have an endless supply 
of three-letter tags (17,576 to be precise), whereas numbers stretch 
out into infinity.  For example, there is a number of single-member 
gene names (anc-1, caf-1, ...) which is just plain 
wasteful.  Impressive lists of dafs, dpys, lins, and uncs were not 
built up by trivially worrying about information content.  Where would 
we be today if we distinguished between daf-constitutives and daf-
defectives, muscle uncs and neural uncs, Muvs and Vuls, plain dpys and 
dosage-compensation dpys? I propose we start some rationalization.  
Firstly, all the xols, sdcs, dosage compensation dpys, hers, tras, 
sems, sogs, oocs, sers and spes can become sex.  Come to think of it, 
we can probably throw in ehas, glps, mabs, msps, pals, plgs, sels, 
sems and vits.  This would give a nice sex-1 through sex-100 or 
thereabouts, with a gene-name compaction ratio (g.c.r.) of 19.  Then 
we have a lot of enzymes, cytoskeletal components and suchlike.  These 
can be conveniently recalled by the new designator cel.  This takes 
care of ace, act, ama, ben, cad, cal, cha, deb, ges, gus, hch, kin, 
mlc, myo, nuc, phm, rpl, rpo, rrn, rrs, the rtxs, sup and sus, to 
produce cel-1 to cel-63 (gcr=22), by my reckoning.  Notice that this 
set was particularly guilty of a low average former gene multiplicity (
f.g.m.) of about 3 versus about 5 for the sex set.  Next to go are all 
the drug resistance names (cas, kra, lan, lev, and tpa), now drg-1 to 
drg-11 (gcr=5, fgm=2).  With a clear conscience I think behavioral 
names can be fused with unc, to create the splendid exi for excitable 
cells.  This means aex, bor, che, das, deg, eat, egl, enu, exp, mec, 
osm, pbo, sns, tax, ttx and unc become exi-1 to exi-256 (gcr=16, 
fgm=16).  Now some might argue that this f.g.m. is a bit high, and 
therefore the compaction not warranted.  But in light of the high g.c.
r., and the fact that it is far more traditional to have gene names of 
lower information content, I think we should stick with it.  Moving 
quickly now, ali, bli, dpy, lon, mor, rol, sma and sqt become shp-1 to 
shp-56 (for shape, gcr=8, fgm=7); anc, ced, ces, clr, dig, mig and ncl 
become nom-1 to nom-24 (for nomarski, gcr=7, fgm=3); cat, flu, pup and 
srf become stn-1 to stn-15 (for staining, gcr=4, fgm=4).  Genes 
defined by cloning become dna-1 to dna-26 (from clb, col, hsp and uvt; 
gcr=4, fgm=6.5); things affecting development become dev-1 to dev-68 (
from gro, ham, lin, mab and vab; gcr=5, fgm=13); things killing worms 
are now kil-1 to kil-359 (from age, emb, let, mel, par, zyg; gcr=6, 
fgm=60-attests to excellent example set by let).  This leaves just him,
mei, mut, rad and rec, which become chr-1 to chr-30 (gcr=5, fgm=6).  
The astute reader will note a number of arbitrary assignments, such as 
ama into cel rather than drg, and pal into sex rather than dev.  The 
astute reader is quite right.  After all, do not drugs affect things 
in cells?  Do not things that affect sexual development merely form a 
subset of all those things which affect development?  Thus, in the 
interests of consistency, I'm forced to propose that drg and stn be 
absorbed into cel; and sex, shp, nom and kil be absorbed into dev.  
This yields cel-1 to cel-89, dev-1 to dev-544, exi-1 to exi-256, dna-1 
to dna-26 and chr-1 to chr-30.  These last two seem woefully 
underrepresented, so they may as well be joined with cel to produce 
cel-90 to cel-145.  It is sadly true that in reality, all things that 
affect development or excitable cells are really only affecting cells, 
and are all components of cells.  Hence I think it is inevitable that 
exi and dev fuse into cel to produce the final cel-1 to cel-945.  Note 
also that one can construe 'cel' to mean C.  elegans as well, useful 
in case any restriction enzymes are discovered in C.  elegans tissues. 
I think this is a rather flexible final arrangement, one which can 
encompass most of the likely future mutant descriptions, and of course 
one which can tolerate considerable expansion.