Worm Breeder's Gazette 14(1): 17 (October 1, 1995)

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

The C. elegans genome sequencing project: A progress report.

The C. elegans Genome Consortium

Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri, USA and Sanger Centre, Hinxton Hall, Cambridge, UK.

A total of 22.5 megabases of sequence from 732 clones has been finished to
date by the Consortium with the following breakdown by chromosome:

II=6.7 Mb   III=7.2 Mb   IV=0.3 Mb   X=8.3 Mb

An additional 15 Mb of sequence data is in various stages of completion
bringing the total sequence available to more than 37.5 Mb. The gene-rich
regions of chromosomes II and III are complete with the exception of some
gaps where cosmids were not available (Figure 1). While we are rescuing
these regions, we will continue to sequence the gene-rich regions of other
chromosomes. Sequencing on chromosome X is well advanced and some cosmids
are now finished on chromosome IV. We have begun library construction and
some shotgun on chromosome V and will move finally to chromosome I.
Efforts are also underway to develop strategies for sequencing the gene-
poor regions of each of the chromosomes. Currently, approximately 46% of
all predicted genes have significant database similarities. The current
prediction for total gene number in C. elegans is 13526 (+/-500).

The Consortium provides preliminary sequence data for all clones currently
in production whether they are partially finished, finished but not yet
fully annotated, or submitted to GenBank/EMBL. Sequences for those clones
which are started but not yet submitted are provided with the caveat that
they are preliminary and often contain errors. It should also be noted
that the segment submitted from a cosmid will often not correspond to the
full insert. However, the information about the actual start and end of
the cosmid insert sequence (starting and ending positions within the
neighboring cosmids) is available in ACeDB and in the GenBank/EMBL
submissions. All sequences which have been submitted to GenBank/EMBL, or
finished but not yet fully annotated are also available in ACeDB data
releases obtained via anonymous ftp from:

ncbi.nlm.nih.gov (130.14.20.1) in the USA, in repository/acedb
ftp.sanger.ac.uk in England, in pub/acedb
lirmm.lirmm.fr (193.49.104.10) in France, in genome/acedb

In ACeDB, cosmids which which have not yet been manually reviewed and
fully annotated are denoted with gene predictions labeled with
COSMID_NAME.alphabetic_character. Those which have been fully annotated
are indicated by gene predictions labeled with COSMID_NAME.digit.

The ftp and web sites for St. Louis and the Sanger Centre are:

St. Louis:
ftp: genome.wustl.edu (directory:/pub/gsc1/sequence/st.louis/elegans)
www: http://genome.wustl.edu/gsc/gschmpg.html

Sanger Centre:
ftp: ftp.sanger.ac.uk (directory:/pub/C.elegans_sequences)
www: http://www.sanger.ac.uk/

For ftp'ing, log in as user "anonymous" and give a user identifier as
password. For connection to the WWW sites, MOSAIC or NETSCAPE can be used
to open the URLs listed above.

A variety of information is available at one or both WWW/ftp sites
including software used in the project, acedb documentation, personnel
information, cosmid sequences, lists of cosmids in map order providing
overlap information between the submitted sequences, and weekly lists of
protein and cDNA similarities for cosmids which were finished that week.
At the Sanger Centre WWW site, services are available to BLAST your
sequence against all cosmids currently in production. This service will
soon be implemented in St. Louis as well.

For further information on C. elegans gene predictions and annotation from
the sequencing projects, please contact John Spieth
(jspieth@watson.wustl.edu) or Steve Jones (sjj@sanger.ac.uk). For
information on sequencing plans or estimated completion times, please
contact Richard Wilson (rwilson@watson.wustl.edu) or Alan Coulson
(alan@sanger.ac.uk). For additional information on the distribution of
ACeDB, please contact Richard Durbin (rd@sanger.ac.uk) or Jean Thierry-
Mieg (mieg@kaa.cnrs-mop.fr). All requests for cosmid clones should be
addressed to Alan Coulson. For further information about the C. elegans
genome project including our policy statement about sharing both data and
sequencing expertise, please contact Richard Wilson.

See WBG for figure.