The copy number and the repetitive nature of transgenes are important factors that need to be considered when trying to recapitulate the expression of a given gene as close to its endogenous counterpart as possible. It has long been known that low complexity transgenic arrays, or arrays with very high copy number can be silenced, especially in the germline (Kelly et al., 1997). In an attempt to generate fluorescent reporters that faithfully reflect the expression pattern of the gene of interest, we and others have previously reported the use of fosmid-based transgenes (Tursun et al., 2009). Here, we analyze the composition of high complexity arrays, some of which show robust germline expression, providing guidelines that should prove useful to other C. elegans researchers.

We created transgenic arrays by injection of the specified DNA (fosmid, plasmid or PCR product of the locus of interest) at the indicated concentrations (Figure 1). In all arrays, except ntIs1 and otIs314, all component DNAs were linear. The integrated strains were generated by γ-irradiation and they were outcrossed at least twice. The strains carrying the extrachromosomal fosmid arrays were co-injected with the pBX plasmid containing the wild type copy of the pha-1 gene in a pha-1(e2123) mutant background strain (Granato et al., 1994). Genomic DNA was prepared from the transgene containing strains and analyzed by array Comparative Genomic Hybridization (aCGH) (Maydan et al., 2007). The arrays used contain 50-mer probes tiling the 100-Mb genome of C. elegans. A segmentation algorithm was able to identify all transgene components. The log2 of the ratio between fluorescent intensities (array containing strain/wild type) was averaged over the area of the genome that was detected as being amplified and is shown as the “Mean log2 ratio”. From this we estimated the number of copies of each component of the transgene, shown as number of copies per chromosome for the integrated transgenes, or number of copies per array for the extrachromosomal arrays (Figure 1).

Strains containing the ntIs1 transgene have been previously whole genome sequenced in our lab (Sarin et al., 2010). We used these data to calculate the copy number of the components of ntIs1 transgene by dividing the average sequencing depth of the transgene region with the average sequencing depth across all non-gap regions, and found 51 copies for gcy-5prom::gfp (vs. 31 by CGH) and 13 copies for lin-15 (vs. 11 by CGH). Comparison of these numbers with those from the aCGH analysis supports the fact that in general estimation of copy number by aCGH is more accurate for log2 ratios lower than +4 and there is probably an under-estimation of copy number for log2 ratios higher than +4 since they fall in the non-linear range, near saturation in that case.

For the extrachromosomal arrays of the fosmid reporters, an injection concentration of 15-50 ng/μl resulted in an average of 8 fosmid copies per array. While the data show that there is not a perfect correlation between injection concentration and copy number one could try to reduce the injection concentration if lower copy numbers were desired. In our experience, even transgenes that are integrated at 11 copies per chromosome (22 in a homozygote animal) are still able to provide germline expression, as seen for otIs284 (Tursun et al., 2011).