Worm Breeder's Gazette 11(5): 12

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

Genome Project Database

Jean Thierry-Mieg and Richard Durbin

We have been writing a computer database system for worm genomic 
information as part of the St Louis/Cambridge genome project.  The 
idea is to have a flexible mouse driven graphical system that can 
handle the sequence data and genetic and physical maps, together with 
as much related material as is easily possible (e.g.  the bibliography,
gene list etc).  For instance, a user can display in separate windows 
pieces of genetic or physical map, information about a gene, allele or 
clone, all the items connected with an author, or a paper.  The 
database structure is designed to enable extensive annotation (both 
structured with keywords, and unstructured with arbitrary comments) 
and internal cross-referencing.  Currently the program is called 
'acedb' (pronounced ace-dee-bee, and standing for 'A C. elegans 
database' ).  
The main purpose of the system is to provide a standardized, 
integrated environment for those assembling the genomic information.  
However, we are also putting quite a lot of effort into making it a 
nice system to query and extract information from, and will be happy 
to make read-only versions of the system available in the future to 
those in the worm community who want them, in the same way that the 
current physical map program is available.  The database will then 
have to be updated at regular intervals, via email update packages.  
Once we have a stable version we will also make source code available 
to anyone interested.  
We see our program as complementary to the Worm Community System (
WCS) of Schatz et al (WBG 11 n.3, p 6).  The 'canonical' versions of 
the physical and sequence databases will be assembled and managed in 
acedb.  They will be made directly available to the WCS project, so 
that if you use WCS you will be able to see the same genomic 
information.  Schatz et al.  also plan to share community knowledge 
via annotations, and provide literature via abstracts and page images. 
The data we currently have available comprise the CGC genetic data 
and bibliography, the gene list, the physical map from Cambridge, and 
all worm sequences currently in EMBL.  We will also of course have all 
the sequence generated by the St Louis/Cambridge project (this being 
one of the original reasons for the database).  As well as displaying 
these data we are working on integrating functions to perform genetic 
map and sequence calculations.  We are able to output information in 
text or postscript form (for laserprinting).  In addition we plan to 
provide compatibility with the ASN.1 format proposed by the NCBI at 
the National Library of Medicine for genome database information 
exchange.  
We have written acedb from scratch in plain C, rather than using a 
preexisting database management system like Sybase or Oracle.  We made 
this decision because these relational systems are rather rigid about 
pre-specifying data structure, are not optimal for long linear data 
such as sequence or genomic maps, and can not be modified or 
distributed freely.  
Acedb runs on Unix workstations under the SunView and (imminently) X 
windowing systems.  With some setting up effort, Macs and PC's can be 
used as X terminals if connected by ethernet to a UNIX system running 
acedb.  We expect to have a version available for distribution before 
the next worm meeting.