Worm Breeder's Gazette 14(4): 13 (October 1, 1996)

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

The C. elegans Genome Sequencing Project: Another progress report

The C. elegans Genome Consortiun

Genome Sequencing Center, Washington University School of Medicine, St. Louis, MO, USA and The Sanger Centre, Hinxton Hall, Cambridge, UK

Just over 52 megabases of finished C. elegans genomic sequence from over
1700 cosmids have been completed by the Consortium with the breakdown by
chromosome as follows:

I. 4.5 Mb, II. 8.8 Mb, III. 7.3 M, IV. 7.1 Mb, V. 9.1 Mb, X. 14.9 Mb

Most of the gene-rich regions of the genome represented in cosmids
(~80%) are either finished or in some stage of library construction or
production.  Of the remaining ~20% preliminary data suggests that some
may be represented in a fosmid library (see the article; "C. elegans and
C. briggsae arrayed DNA libraries" in this issue).  Techniques are being
developed to retrieve the remaining sequence from YAC subclones.
        The number of predicted proteins in the finished sequence is
around 9700 of which 47% have significant similarity to genes from other
organisms.  The WORMPEP database contains nearly 7300 of the predicted
proteins and is retrievable by ftp from the Sanger Centre.  The number
of tRNAs is now almost 300.  The proportion of predicted genes having
one or more EST sequences is 32%, thus confirming they are real genes.
        In addition to C. elegans selected C. briggsae clones are also
being sequenced.  To date, 8 briggsae clones (8 fosmids and 3 cosmids)
have been finished with another 46 fosmids in various stages of
production.  As many as 20 megabases of the C. briggsae genome
eventually may be sequenced as part of the C. elegans Genome Project.
Plans are being made to include the C. briggsae sequence in future ACEDB
releases. The C. briggsae sequences are available by ftp from the Genome
Sequencing Center in St. Louis.
        All of the C. elegans sequence data is available after it
completes the initial "shotgun" and assembly phases of sequencing via
anonymous ftp and the World Wide Web from the Sanger Centre and the
Genome Sequencing Center.  Each site contains only their own unfinished
sequence.  Both sites now provide on-line searching capabilities of
finished and unfinished sequences from the respective site.
        We actively curate the sequence, and would like to hear from you
when you determine correct gene structure from cDNA data, or if you
think you have found a sequence error.
        For further information on the C. elegans gene predictions and
annotations from the sequencing project contact John Spieth
(jspieth@watson.wustl.edu) or Steve Jones (sjj@sanger.ac.uk). For
information on the distribution of ACEDB contact Richard Durbin
(rd@sanger.ac.uk) or Jean Thierry-Mieg (mieg@kaa.cnrs-mop.fr). For
information on sequencing plans or estimated completion times contact
Richard Wilson (rwilson@watson.wustl.edu) or Alan Coulson
(alan@sanger.ac.uk). All requests for cosmid clones should be sent to
Alan Coulson (alan@sanger.ac.uk).

The ftp and WWW sites for St. Louis and the Sanger Centre are:
St. Louis:

ftp:genome.wustl.edu (directory:/pub/gsc1/sequence/st.louis/elegans)
WWW: http://genome.wustl.edu/gsc/gschmpg.html
Sanger Centre:

ftp: ftp.sanger.ac.uk (directory /pub/databases/C.elegans_sequences)
WWW: http://www.sanger.ac.uk/

ACEDB data releases can be obtained from:
Ncbi.nlm.nih.gov (130.14.20.10) in the USA, in repository/acedb
ftp.sanger.ac.uk (193.60.84.11) in the UK, in pub/acedb
lirmm.lirmm.fr (193.49.104.10) in France, in genome/acedb