The mutagen EMS induces both point mutations and deletions. We have previously reported that whole-genome sequencing using massively parallel technology is an efficient way to identity point mutations in C. elegans mutant strains (Sarin et al., 2008). Here we report that this technique can also be used to identify deletions responsible for the phenotype in EMS mutagenised strains.

In an EMS screen for mutants affecting the specification of the AIY interneuron we have isolated an allele, ot358, which displays a highly penetrant loss of AIY-specific cell fate markers. We mapped ot358 within a 0.6 map unit interval. In order to identify the molecular lesion responsible for the ot358 phenotype, we subjected ot358 to whole-genome sequencing using our Illumina Genome Analyser II platform. The analysis of the sequence using our MAQGene software (Bigelow et al., 2009) reveals only one point variant, an intergenic point mutation, in the region to which we mapped ot358. As EMS can also generate deletions of a few base pairs to many kilobases, we next tried to determine whether the ot358 strain might contain a deletion in the mapped region. For this purpose we used a file generated by the MAQGene software, which reports any uncovered interval larger than a chosen threshold. ot358, which was sequenced at an average coverage of 11.4x, contains 5 uncovered regions of more than 100bp in the mapping region. Three of them are also present in a different mutant independently isolated in the same screen and likely reflect either regions difficult to sequence/map (like repetitive sequences) or deletions initially present in the screening strain. We analyzed the two ot358-specific uncovered regions by PCR and Sanger sequencing. This shows that the smaller uncovered region (103 bp) does not correspond to a deletion but reflects mis-sampling due to the relatively low coverage, while the larger one corresponds to a real 1888 bp deletion. Further analysis revealed that this deletion is indeed responsible for the phenotype and removes a cis-regulatory element of a transcription factor.

The success of this approach in identifying a deletion in a small mapping region prompted us to test whether this method could also be used to identify deletions at a whole genome level. The analysis of several other genomes suggests that with a 10x coverage the background of mis-sampling is low enough to successfully identify deletions of more than 500 bp at the whole genome scale using this simple approach. With a higher coverage (30x) deletions of more than 100 bp can be identified.