Worm Breeder's Gazette 16(1): 23 (October 1, 1999)

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

WormPD: a Comprehensive Proteome Database for C. elegans.

J. D. Hogan, J. E. Lew-Smith, A. M. Fancher, M. E. Cusick, C. A. Lingner, B. P. Davis, C. Lengieza, M. Tillberg, K. J. Roberg-Perez

Proteome, Inc. 100 Cummings Center, Suite 435M, Beverly, MA 01915, USA

In July, 1999 we introduced the C. elegans Proteome Database (WormPDTM), accessible at http://www.proteome.com/databases, as the newest volume in our collection of model and pathogenic organism databases. WormPD organizes and presents comprehensive information for each of the nearly 19,000 proteins predicted from the recently completed C. elegans genome sequence1. Thanks to ongoing coordination with members of the sequencing consortium at The Sanger Centre and The Washington University Genome Sequencing Center, we are able to present the best collection of publicly available C. elegans protein sequences. We have coordinated gene name assignments and identifiers with ACeDB (one of many external databases to which we provide links) and strive to adhere to nomenclature standards through collaboration with Jonathan Hodgkin and the Caenorhabditis Genetics Center. Links to WormPD are provided by ACeDB sites at The Sanger Centre and Lincoln Steinís Lab at Cold Spring Harbor Laboratory.

Every protein in WormPD is described by a title line that best summarizes the proteinís function. The information concerning each of the approximately 1100 experimentally characterized proteins is based on detailed curation of the scientific literature for C. elegans. The information is structured around a one-page-per protein format and is presented both in a tabulated format that summarizes a variety of protein properties including function, localization, and physical interactions and as free-text annotations grouped by topic. In addition to summarizing experimentally derived information, we identify related proteins and provide information on protein family membership. Alignments based on the BLAST program2 with refinement by a Smith-Waterman algorithm3 are presented between each C. elegans protein and all other C. elegans proteins, human proteins, and proteins from other model organisms including S. cerevisiae. Such analyses, in combination with additional sources of information including our seamlessly integrated, annotated Yeast Proteome Database (YPDTM), provide the basis for the information concerning the remaining uncharacterized proteins. Using such tools our staff of curators has defined protein families and made intelligent predictions for each of the approximately 10,500 experimentally uncharacterized proteins with significant similarity to characterized proteins in C. elegans and other organisms. For those proteins that display high levels of similarity, we have further predicted properties such as biochemical function, cellular role, and subcellular localization. All predicted properties are clearly distinguished from experimentally demonstrated properties.

WormPD is freely available to academic users. Corporate users should direct inquiries regarding subscription services to hfo@proteome.com. We appreciate feedback from all of our users concerning new data submission, additions, clarifications, and corrections. Any such correspondence should be directed to wormpd@proteome.com or by mail to the address above.

1 The C. elegans Sequencing Consortium, Science 282, 2012 (1998).
2 Altschul et al., Nucleic Acids Research 25, 3389 (1995).
3 Waterman, M.S., Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman & Hall, London. (1995).