Worm Breeder's Gazette 14(5): 14 (February 1, 1997)

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

The C. elegans Genome Sequencing project: Towards completion

The C. elegans Genome Sequencing Consortium

Genome Sequencing Center, Washington University School of Medicine, St. Louis, MO, USA and The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

Over 61.5 megabases have now been completed from over 2050 cosmids. The breakdown by chromosome is as follows:-
I=6.7MB     II=10.5MB     III=7.7MB     IV=8.9MB     V=11.7MB     X=15.5MB	
An overview of the regions where sequencing has been completed and where shotgun sequencing is still in progress is shown in figure 1. The finished sequence data is currently predicted to contain 10,300 protein coding genes and more than 330 tRNA genes. In the autosomal arms, where cosmid coverage is poor, sequencing is being carried out via the shotgun sequencing of YAC clones in conjunction with any underlying cosmid clones. The autosomal arms will be sequenced in parallel. We are also now actively addressing the gaps in our current sequence contigs. Some are small and can be resolved using long range PCR. Failing this approach fosmid clones will be isolated for these regions. In addition to C. elegans, mapping and sequencing of C. briggsae is ongoing. Currently 15,000 C. briggsae fosmid clones have been fingerprinted and 167 fosmids are currently being sequenced. Over 6.3MB of C. briggsae genome sequence is already available from the St. Louis FTP site and is also searchable on-line. C. briggsae clone requests should be sent to Marco Marra (mmarra@watson.wustl.edu).

As stated previously, all of the C. elegans sequence data is available after it completes the initial shotgun and assembly phases. Currently, over 84MB of C.elegans sequence data is available and can be accessed via anonymous FTP or the World Wide Web. Both sites allow on-line searching of the C. elegans data. A new feature is that each site now carries all the unfinished data. Therefore, only one search is required (at either site) in order to query the complete dataset.

We are committed to actively curating and updating our sequence data and annotations. If you believe that there is a sequencing error, an incorrect gene prediction or an inappropriate annotation we need to hear from you.

For further information on the C. elegans gene predictions and annotations from the sequencing project contact John Spieth (jspieth@watson.wustl.edu) or Steve Jones (sjj@sanger.ac.uk). For information on the distribution of the C. elegans database ACEDB contact Richard Durbin (rd@sanger.ac.uk) or Jean Thierry-Mieg (mieg@kaa.cnrs-mop.fr). For information on sequencing plans or estimated completion times contact Richard Wilson (rwilson@watson.wustl.edu) or Alan Coulson (alan@sanger.ac.uk). Requests for cosmid clones should be sent to Alan Coulson.

The FTP and WWW sites for the Sanger Centre and St. Louis are:-

Sanger Centre:
ftp: ftp.sanger.ac.uk(directory /pub/databases/C.elegans_sequences)
WWW: http://www.sanger.ac.uk
St. Louis:
ftp: genome.wustl.edu (directory /pub/gsc1/sequence/st.louis/elegans)
WWW: genome.wustl.edu/gsc/gschmpg.html
ACEDB data releases can be obtained from:
ncbi.nlm.nih.gov (130.14.20.10) in the USA, in repository/acedb
ftp.sanger.ac.uk (193.60.84.11) in the UK, in pub/acedb
lirmm.lirmm.fr (193.49.104.10) in France, in genome/acedb