Todd Harris

Todd Harris1 and Lincoln Stein

¹Hi-Line Informatics, LLC. Livingston, MT, ²Ontario Institute for Cancer Research, Toronto ON, Canada

Correspondence to: Todd Harris (todd@wormbase.org)

At WormBase, we are often asked if it is possible to install and run a local version of the website. Although certainly possible and very well documented, it’s not recommended for three reasons. 1) The size of the databases require substantial download time that must be repeated on a monthly basis to maintain an up-to-date resource; 2) The site is complex and requires significant time to install and configure; moreover, it is constantly evolving. You’ll need to commit time to keeping your site up-to-date; 3) Finally, the site requires a substantive compute environment along with concomitant system administration acumen required to keep everything up and running smoothly. If you still aren’t dissuaded, please see the installation notes on the WormBase Wiki.

But now — through the magic of cloud computing — you can have your own WormBase up-and-running in a few minutes.

Required Steps

Establish an account on Amazon Web Services
Find and launch the WormBase Amazon Machine Image (AMI) of the version of your choice.
Connect to the newly launched server instance using your web browser.
Stop the instance when done to avoid incurring further charges.
Repeat steps 2-4 when a new version of WormBase is released.

Intended Audience

Individual researchers or labs
Entire departments
Private research entities

Necessary Skills

For launching an instance: none beyond using a web browser
For more complicated data mining: command line expertise

Suggested Uses

Access your own WormBase: speedy and it’s private
Data mining: all databases preconfigured; includes common tools like BioPerl
Development: build new features using the WormBase web platform

Caveats

This is NOT a free service. Read the pricing details carefully.
Although we will release new AMIs for each release, your instances will not receive bug updates.

For more information and a detailed walkthrough of the process, please see An introduction to cloud computing for biologists.

Luisa Cochella¹, Stephane Flibotte², Jon Taylor³, Nikolaos Stefanakis¹, Gregory Minevich¹, Donald Moerman^2,3 and Oliver Hobert¹

¹Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, New York NY, ²Department of Zoology, University of British Columbia, Vancouver BC, Canada, ³Michael Smith Laboratories, University of British Columbia, Vancouver BC, Canada

Correspondence to: Oliver Hobert (or38@columbia.edu)

The copy number and the repetitive nature of transgenes are important factors that need to be considered when trying to recapitulate the expression of a given gene as close to its endogenous counterpart as possible. It has long been known that low complexity transgenic arrays, or arrays with very high copy number can be silenced, especially in the germline (Kelly et al., 1997). In an attempt to generate fluorescent reporters that faithfully reflect the expression pattern of the gene of interest, we and others have previously reported the use of fosmid-based transgenes (Tursun et al., 2009). Here, we analyze the composition of high complexity arrays, some of which show robust germline expression, providing guidelines that should prove useful to other C. elegans researchers.

We created transgenic arrays by injection of the specified DNA (fosmid, plasmid or PCR product of the locus of interest) at the indicated concentrations (Figure 1). In all arrays, except ntIs1 and otIs314, all component DNAs were linear. The integrated strains were generated by γ-irradiation and they were outcrossed at least twice. The strains carrying the extrachromosomal fosmid arrays were co-injected with the pBX plasmid containing the wild type copy of the pha-1 gene in a pha-1(e2123) mutant background strain (Granato et al., 1994). Genomic DNA was prepared from the transgene containing strains and analyzed by array Comparative Genomic Hybridization (aCGH) (Maydan et al., 2007). The arrays used contain 50-mer probes tiling the 100-Mb genome of C. elegans. A segmentation algorithm was able to identify all transgene components. The log₂ of the ratio between fluorescent intensities (array containing strain/wild type) was averaged over the area of the genome that was detected as being amplified and is shown as the “Mean log₂ ratio”. From this we estimated the number of copies of each component of the transgene, shown as number of copies per chromosome for the integrated transgenes, or number of copies per array for the extrachromosomal arrays (Figure 1).

Strains containing the ntIs1 transgene have been previously whole genome sequenced in our lab (Sarin et al., 2010). We used these data to calculate the copy number of the components of ntIs1 transgene by dividing the average sequencing depth of the transgene region with the average sequencing depth across all non-gap regions, and found 51 copies for gcy-5^prom::gfp (vs. 31 by CGH) and 13 copies for lin-15 (vs. 11 by CGH). Comparison of these numbers with those from the aCGH analysis supports the fact that in general estimation of copy number by aCGH is more accurate for log₂ ratios lower than +4 and there is probably an under-estimation of copy number for log₂ ratios higher than +4 since they fall in the non-linear range, near saturation in that case.

For the extrachromosomal arrays of the fosmid reporters, an injection concentration of 15-50 ng/μl resulted in an average of 8 fosmid copies per array. While the data show that there is not a perfect correlation between injection concentration and copy number one could try to reduce the injection concentration if lower copy numbers were desired. In our experience, even transgenes that are integrated at 11 copies per chromosome (22 in a homozygote animal) are still able to provide germline expression, as seen for otIs284 (Tursun et al., 2011).

Figures

Figure 1: Composition and aCGH analysis for 9 integrated and 6 extrachromosomal transgenic arrays.

References

Granato M, Schnabel H and Schnabel R. (1994). pha-1, a selectable marker for gene transfer in C. elegans. Nucleic Acids Res. 22, 1762-1763.

Kelly WG, Xu S, Montgomery MK and Fire A. (1997). Distinct requirements for somatic and germline expression of a generally expressed Caernorhabditis elegans gene. Genetics 146, 227-238.

Maydan JS., Flibotte S, Edgley ML, Lau J, Selzer RR, Richmond TA, Pofahl NJ, Thomas JH and Moerman DG. (2007). Efficient high-resolution deletion discovery in Caenorhabditis elegans by array comparative genomic hybridization. Genome Res. 17, 337-347.

Sarin S, Bertrand V, Bigelow H, Boyanov A, Doitsidou M, Polle RJ, Narula S and Hobert O. (2010). Analysis of multiple ethyl methanesulfonate-mutagenized Caenorhabditis elegans strains by whole-genome sequencing. Genetics 185, 417-430.

Tursun B, Cochella L, Carrera I and Hobert O. (2009). A toolkit and robust pipeline for the generation of fosmid-based reporter genes in C. elegans. PLoS One 4, e4625.

Tursun B, Patel T, Kratsios P and Hobert O. (2011). Direct conversion of C. elegans germ cells into specific neuron types. Science 331, 304-308.

The WBG

An online publication service of WormBook

Back Cover: Worms on Ice

Cover artwork: Casablanca

WormBase private instances via Amazon’s Cloud infrastructure