A conserved set of maternal genes? Insights from a molluscan transcriptome

The early animal embryo is entirely reliant on maternal gene products for a ‘jump-start’ that transforms a transcriptionally inactive embryo into a fully functioning zygote. Despite extensive work on model species, it has not been possible to perform a comprehensive comparison of maternally-provisioned transcripts across the Bilateria because of the absence of a suitable dataset from the Lophotrochozoa. As part of an ongoing effort to identify the maternal gene that determines left-right asymmetry in snails, we have generated transcriptome data from 1 to 2-cell and ~32-cell pond snail (Lymnaea stagnalis) embryos. Here, we compare these data to maternal transcript datasets from other bilaterian metazoan groups, including representatives of the Ecydysozoa and Deuterostomia. We found that between 5 and 10% of all L. stagnalis maternal transcripts (~300-400 genes) are also present in the equivalent arthropod (Drosophila melanogaster), nematode (Caenorhabditis elegans), urochordate (Ciona intestinalis) and chordate (Homo sapiens, Mus musculus, Danio rerio) datasets. While the majority of these conserved maternal transcripts (“COMATs”) have housekeeping gene functions, they are a non-random subset of all housekeeping genes, with an overrepresentation of functions associated with nucleotide binding, protein degradation and activities associated with the cell cycle. We conclude that a conserved set of maternal transcripts and their associated functions may be a necessary starting point of early development in the Bilateria. For the wider community interested in discovering conservation of gene expression in early bilaterian development, the list of putative COMATs may be useful resource.


Introduction
Cell division requires that genome replication and assortment are achieved while cellular function is maintained. In somatic cells, there is continuity of cytoplasm from mother to daughter, so that new nuclei take up the reins of cellular control as transcription of their genomes is resumed after division. In contrast, in the formation of a new organism the early zygote has to perform a similar feat of taking control of a new cell, but the task is made more complex because the gametic pronuclei must be reprogrammed and coordinated before transcription initiation. In animal embryos the zygotic cytoplasm, provisioned by the mother, has been found to contain all the machinery necessary to drive the first stages of embryonic development. This maternal provisioning has been demonstrated through the blocking of transcription from the zygotic genome (Baroux et al., 2008). In transcriptionally-blocked embryos, maternal products are often sufficient to drive the first rounds of cell division, and even the first phases of differentiation (Baroux et al., 2008).
The switch between maternal and zygotic control is called the maternal-zygotic transition (MZT), or the midblastula transition (MBT), and spans the period from fertilisation to the point where maternally provisioned factors are no longer sufficient to deliver normal development (Baroux et al., 2008, Stitzel and Seydoux, 2007).
The MZT is associated with the activation of the zygotic genome. In animal species where fine-scale analyses have been performed, zygotic gene activation has been modelled as two phases (Baroux et al., 2008, Tadros and). An early phase, involving a few loci, is associated with degradation of maternal proteins and mRNAs, while the second phase is much more extensive and includes genes involved in a wide range of biological processes (Schier, 2007, Tadros and). Initial, albeit limited, zygotic genome activation has been identified as early as the fertilised zygote (in the paternal pronuclei of mouse, sea urchin and the nematode Ascaris suum), and as late as the 256-cell embryo stage (in Xenopus) (Baroux et al., 2008, Wang et al., 2013. Experimental evidence indicates that the MZT is tightly regulated, and includes the birth of zygotic RNAs and the death of maternal RNAs (Schier, 2007, Stitzel and Seydoux, 2007, taking place at multiple levels and in a controlled and managed manner. Thus, while many embryos are able to transcribe experimentally introduced DNA, the early embryonic genome is maintained in a state that is incompatible with transcription. Changes in chromatin structure, combined with a dilution of factors such as transcriptional repressors by cell division, allow for the initiation of zygotic transcription. Nonetheless, despite the complexity, it has been suggested that the MZT can be simplified into two interrelated processes: the first whereby a subset of maternal mRNAs and proteins is eliminated, and the second whereby zygotic transcription is initiated (Schier, 2007, Tadros and).
In zebrafish, maternally-provisioned products from just three genes, Nanog, Pou5f1 and SoxB1 (known for their roles in embryonic stem cell fate regulation), are sufficient to initiate the zygotic developmental program and to induce clearance of the maternal program by activating the expression of a microRNA (Lee et al., 2013, Leichsenring et al., 2013. In Xenopus, increasing nuclear to cytoplasmic ratio is believed to be the controlling element in the switch, with just four factors regulating multiple events during the transition (Collart et al., 2013). However, the generality of these findings remains unknown. Furthermore, while the regulation of RNA transcription (gene expression) has received considerable attention (primarily due to the advances in nucleic acid sequencing technologies), protein expression and turnover rates remain relatively under-studied (Stitzel and Seydoux, 2007). Our knowledge of maternal-to-zygotic transcription phenomena is also largely restricted to the dominant model animal species, with relatively few experimental studies existing for other metazoans.
Although there has been a recent upsurge in interest in the maternal control of embryonic development, especially the MZT (Benoit et al., 2009, De Renzis et al., 2007, Lee et al., 2013, Leichsenring et al., 2013, the study of maternal factors has played an important part in the history of embryology and development, particularly in the model animal taxa Drosophila melanogaster (phylum Arthropoda from superphylum Ecdysozoa), Caenorhabditis elegans (Nematoda, Ecdysozoa), Strongylocentrotus purpuratus (Echinodermata, Deuterostomia), Mus musculus, Homo sapiens and Danio rerio (Chordata, Deuterostomia) (Gilbert, 2006). Missing from this roster of models are representatives of "the" superphylum Lophotrochozoa, a morphologically diverse group that includes the Mollusca and Annelida. Two annelid models, Platynereis dumerilii and Capitella telata, are becoming well established (Dill and Seaver, 2008, Giani et al., 2011, Hui et al., 2009, but model molluscs have been developed for their potential to answer particular questions (e.g. asymmetric distribution of patterning molecules during development; Lambert and Nagy, 2002), or their association with a particular disease (e.g. schistosome transmitting Biomphalaria; Knight et al., 2011).
As part of an ongoing effort to identify the maternal gene that determines left-right asymmetry in molluscs (Harada et al., 2004, Kuroda et al., 2009, Liu et al., 2013, we are developing Lymnaea stagnalis pond snails as a model because they are one of the few groups that exhibit genetically-tractable, natural variation in their left-right asymmetry, or chirality, and so are ideal systems in which to understand why chirality is normally invariant, yet also pathological when it does vary (Schilthuizen and Davison, 2005). In generating a maternal transcriptomic resource for this species (the chirality-determining gene is maternally expressed; Boycott andDiver, 1923, Sturtevant, 1923), we were surprised to discover that while there are general studies on the composition and regulation of maternal expression (Shen-Orr et al., 2010), there has been no comprehensive description of shared bilaterian maternal genes. One reason may be that no maternal gene resource exists for the Lophotrochozoa, Spiralia or Mollusca. Instead, previous work has described early developmental transcription in the molluscs Ilyanassa sp. (Lambert et al., 2010) and 2,834:1,796 maternal-only:maternal-zygotic M. musculus versus 1,069:884 maternalonly:maternal-zygotic L. stagnalis, P < 0.0001), especially when considering COMATs (Fisher's exact test, 2,834:1,796 versus 219:261, P < 0.0001). A similar result was found in comparisons between L. stagnalis and C. elegans (Fisher's exact test, 2794:2285 versus 733:929 or 222:259, P < 0.0001, P < 0.0002). Similar comparisons were also made for maternal transcripts identified as being actively degraded or not degraded in the early embryo (Baugh et al., 2003, Evsikov et al., 2006, but no differences were found.
The distribution of GO annotations into functional categories revealed no obvious qualitative differences between the 1 to 2-cell and ~32 cell L. stagnalis transcriptomes (Supplementary Figure 1). A Fisher's exact test, with multiple correction for false discovery rate, confirmed that no functional categories were significantly under or overrepresented between the two libraries. In comparison, the COMAT subset was enriched for many functional categories compared with the complete L. stagnalis 1 to 2-cell transcriptome ( Fig. 1; Table 5;   Supplementary Table 2). In particular, GO terms associated with nucleotide metabolism and binding in general were overrepresented in the COMAT subset ( Figure 1; Table 5;  Supplementary Table 2). The maternal expression of a selected set of the COMAT genes was validated in one-cell zygotes using in situ methods (Fig. 2).

Comparison with human housekeeping genes
The COMAT subset was compared to 3802 well-characterised human housekeeping genes (Eisenberg and Levanon, 2013). All but 38 of the 481 COMAT transcripts had a significant match to this set (92%), indicating that the majority are housekeeping in function, at least in humans. In comparison, of the 4,311 L. stagnalis 1 to 2-cell transcripts that had a significant BLASTx match in the NCBI nr protein database, only 2,165 (50%) also had matches to the human housekeeping gene dataset. The conserved maternal gene dataset is therefore highly enriched for putative housekeeping genes (Fisher's exact test, 2156:4311 versus 443:481, P < 0.0001).
We wished to understand if a particular subset of housekeeping genes are over-represented in the COMAT subset, or whether the genes are a random subset of all housekeeping genes. We therefore compared the GO annotations of the 3,802 human housekeeping genes against the subset of 300 human housekeeping genes ( Table 6) that were found in the COMAT (a proportion of the COMATs hit the same human gene, hence fewer genes than expected). Similar GO annotations were enriched in this selected pairwise comparison compared with the COMAT as a whole ( Supplementary Tables 3 and 4). At the highest level, the same first seven Molecular Functions were found in both H. sapiens housekeeping versus H. sapiens COMAT, and L. stagnalis 1 to 2-cell transcriptome versus L. stagnalis COMAT comparisons, with P < 5E −8 (Supplementary Table 4; ATP binding, GTPase activity, unfolded protein binding, protein serine/threonine kinase activity, GTP binding, threoninetype endopeptidase activity, and ATP-dependent RNA helicase activity). Similarly, the first seven terms relating to Biological Process were also found (P < 5E −8 ; anaphase-promoting complex-dependent proteasomal ubiquitin-dependent protein catabolic process, protein polyubiquitination, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle, DNA damage response, signal transduction by p53 class mediator resulting in cell cycle arrest, positive regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle, antigen processing and presentation of exogenous peptide antigen via MHC class I, and TAP-dependent, GTP catabolic process). Thus, the overall conclusion is that the COMAT generally consists of housekeeping genes, but is particularly enriched for a particular subset, including those involved in nucleotide binding functions, protein degradation and activities associated with the cell cycle.
A final concern was that the COMATs are simply conserved genes that tend to be highly expressed, and so are more likely to be detected in non-exhaustive sequencing experiments. We therefore used the expression data of Eisenberg & Levanon (2013) to compare the read depth of these two types of gene (COMATS and non-COMATS) in human tissues. Overall, COMATs tend to be more highly expressed, but they represent a set of genes that have a large range in their quantitative gene expression ( Figure 3). Thus, while the mean gene expression in the conserved data set is higher (COMAT mean log geometric gene expression = 1.08, S.E. 0.03; non-COMAT mean = 0.90, S.E. 0.008; P < 0.001), the individual variation is considerable in both datasets (S.D. 0.51 and 0.47 respectively). Thus, a lack of depth in sequencing experiments cannot wholly explain the existence of COMATs.

Discussion
Much excitement has been caused by the discovery that the evolution of gene expression patterns seems to underpin the morphological hourglass pattern of both plants and animals (Kalinka et al., 2010, Meyerowitz, 2002, Quint et al., 2012. Thus, the long-standing observation that vertebrate morphology is at its most conserved during the embryonic pharyngula or phylotypic period is generally mirrored by conserved expression patterns of conserved genes at these stages (Kalinka andTomancak, 2012, Kalinka et al., 2010). In contrast, active transcription in the early zygote is much more limited. Early animal embryos instead largely rely upon RNAs and proteins provided by the maternal gonad during oocyte maturation. This transcriptionally-quiescent period might, a priori, be considered evolutionarily constrained, as the maternally provided transcriptome is widely considered to fulfill one major role, the initiation and management of several rounds of rapid cell division. Every one of these early cell divisions is a critical event that must be faithfully completed to ensure the development of a healthy embryo (Evsikov et al., 2006). Few studies have investigated the level of conservation of maternally provided genes (Shen-Orr et al., 2010), despite their well-recognised importance in early development (Wieschaus, 1996). Indeed there are few comprehensive datasets of maternally provisioned transcripts even in well-characterised taxa, and none in the Lophotrochozoa. Improvements in sequencing technologies mean that quantitative transcriptome studies are now possible on organisms that lack genomic resources. Our work therefore provides a list of conserved maternal transcripts, or COMATs (Table 6; Supplementary Table 1), that may be useful to the wider community interested in the study of early bilaterian development.
We identified a core set of COMATs from seven representatives of the three bilaterian superphyla, spanning >600 million years of evolution (Peterson et al., 2008). These species display highly divergent modes of development (from direct to indirect, and mosaic to regulative). Since the L. stagnalis maternal transcriptome we report here is unlikely to be complete, one possibility is that our estimate of 5-10% of all maternally provisioned transcripts being conserved across the Bilateria may rise upon deeper sampling of the snail transcriptome. Conversely, the number may reduce as maternal transcriptomes from more taxa are included in the analysis.
Unsurprisingly, we found that many of these genes had nucleotide (especially ATP and GTP) binding functions, were associated with protein degradation or had activities associated with the cell cycle ( Table 6). The majority of functions ascribed are probably accurately defined as housekeeping (Eisenberg and Levanon, 2013). One possibility is that some of the most conserved maternal RNAs are those that cannot be provided (solely) as proteins. Cell cycle genes may be illustrative, because some cell cycle proteins are degraded every cycle and so maternal protein alone cannot be sufficient. Finally, the fact that the ~32-cell transcriptome was neither enriched nor underrepresented for any gene ontology relative to the 1 to 2-cell transcriptome, along with a relative over-representation of maternal-zygotic transcripts that are conserved between M. musculus / C. elegans and L. stagnalis suggests that the same transcripts are at least still present during early zygotic transcription (Supplementary Figure  1).
Given the wide variety of developmental modes and rates displayed by metazoan embryos, as well as the hourglass theory of evolution (Kalinka and Tomancak, 2012), one view is that we might expect to find relatively few deeply conserved maternal transcripts. Alternatively, as it has been documented that a relatively large fraction (between 45% and 75%) of all genes within a species' genome can be found as maternal transcripts (see references within Tadros and Lipshitz, 2009), another view is that maternal transcripts that are conserved between different organisms may be a stochastic subset of a large maternal transcriptome. Instead, our analyses suggest that there is a core and specific set of maternal transcripts that may be essential for early cell divisions, irrespective of the precise mode of development.
While both our data and the others utilised in this study have obvious limitations, primarily the limited sequencing coverage, it is thus uncertain whether further investigation will reveal a greater or lesser proportion of conserved maternal transcripts. However, a simultaneous consideration is that we have detected those genes that are conserved and transcribed at a relatively high level across all taxa, since the study is at best partially quantitative. Further studies are warranted to reveal the true nature of this conservation. Nonetheless, as we found that the conserved maternal part of a well annotated group of H. sapiens housekeeping genes is enriched for precisely the same functions (Table 6, Supplementary Table 3), we can robustly conclude that there is undoubtedly highly conserved gene expression in the early development of bilaterian embryos. There may also be a distinct set of genes, with mostly housekeeping and nucleotide metabolic functions, that is a necessary starting point of the maternal-to-zygotic transition.
Our analyses thus suggest that the ancestral function of maternal provisioning in animal eggs is to supply the zygote with the materials with which to perform the basic cellular functions of rapid cell division in the early stages of development. The extent of the provisioning is evolutionarily labile, with species that have evolved rapid development relying more on maternal products. Addition of patterning molecules is phylogenetically contingent: as different groups and species have evolved different mechanisms of patterning the embryo and been under selection for fast patterning (as in lineage-driven, or mosaic development) or delayed patterning (as in species with regulative development), so the role of maternal factors in driving patterning has changed.

Materials and Methods cDNA library construction
Early development in the pond snail L. stagnalis has been described in exquisite morphological and cytological detail (Raven, 1966). However, the L. stagnalis MZT has not been mapped in the same detail as in model species, but transcription from zygotic nuclei was first detected in 8-cell embryos, and major transcriptional activity detected at the 24-cell stage (Morrill, 1982). While division cycles are not as rapid as development in C. elegans or D. melanogaster, the L. stagnalis embryo does not divide for ~3 hour at the 24-cell stage, suggesting this may represent a shift from maternal to zygotic control. We thus separately sampled 1 to 2-cell and ~32-cell stage L. stagnalis embryos from a laboratory stock maintained in Nottingham, representing the maternal component and the early stages of zygotic transcription. Zygotes were manually dissected out of their egg capsules and stored in RNAlater (Ambion). As one embryo was expected to yield ~ 0.5 ng RNA, more than one thousand individual embryos of each type were pooled. Total RNA was then extracted using the Qiagen RNeasy Plus Micro Kit. cDNA was then synthesised and two non-normalised cDNA libraries were constructed using the MINT system (Evrogen). The libraries were then processed for sequencing on the Roche 454 FLX platform in the Edinburgh Genomics facility, University of Edinburgh. The raw data have been submitted to the European Nucleotide Archive under bioproject PRJEB7773.

Transcriptome assembly
The raw Roche 454 data were screened for MINT and sequencing adapters and trimmed of low quality base calls. The reads from each library were assembled using gsAssembler (version 2.6; also known as Newbler; 454 Life Sciences) and MIRA (Chevreux et al., 2004) separately, and then the two assemblies were assembled together using CAP3 (Huang and Madan, 1999), following the proposed best practice for transcriptome assembly from 454 data (Kumar and Blaxter, 2010 Europe PMC Funders Author Manuscripts by quality off (−CL:qc=no). CD-HIT was then used to remove redundant sequences from the merged CAP3 assemblies (Li and Godzik, 2006), running cd-hit-est with sequence identity threshold 0.98 (−c 0.98) and clustering to most similar cluster (−g 1). The assembly has been made available on afterParty (http://afterparty.bio.ed.ac.uk).

Maternal transcriptomes from other species
We identified a number of published, high-throughput, maternal transcriptome studies from Ciona intestinalis (Urochordata, Deutrostomia), Danio rerio, Mus musculus, Homo sapiens (Chordata, Deuterostomia), C. elegans (Nematoda, Ecdysozoa) and D. melanogaster (Arthropoda, Ecdysozoa). A "maternal transcript" is an mRNA that is present in the embryo before the initiation of major zygotic transcription. This does not mean that these mRNAs are not also later also transcribed from the zygotic genome in the developing embryo.
We carried out a reciprocal tBLASTx comparison of the L. stagnalis 1 to 2-cell transcriptome against each of the other datasets, using a threshold expect value of 1e −10 . By identifying L. stagnalis transcripts that had homologues in all of the species we identified a putative set of conserved bilaterian maternal transcripts.

Functional annotation of transcriptome
The 1 to 2-cell and 32-cell transcriptome assemblies were annotated with gene ontology (GO) terms using Blast2GO v 2.7.0 against the NCBI non-redundant (nr) protein database, with an E-value cutoff of 1e-05. GO term distribution was quantified using the Combined Graph function of Blast2GO, with enrichment assessed using the Fisher's Exact Test function (Conesa et al., 2005).

In situ validation of representative transcripts
We validated the maternal expression of a selection of sequences in L. stagnalis 1-cell embryos by using whole mount in situ hybridisation (WMISH). Primers were designed to amplify fragments of selected genes, which were then cloned into pGEM-T and verified by standard Sanger sequencing. Complementary riboprobes were prepared from these templates as described in Jackson et al., (2007a). The WMISH protocol we employed here for L. stagnalis is similar to previously described protocols for molluscan embryos and larvae (Jackson et al., 2006, Jackson et al., 2007b with some important modifications (described elsewhere; in review). The colour reactions for all hybridisations (including the negative βtubulin control) were allowed to proceed for the same length of time, and all samples cleared in 60% glycerol and imaged under a Zeiss Axio Imager Z1 microscope. The primers used are shown in Table 1.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Fig. 2. Visualisation of maternal gene product spatial distribution in uncleaved zygotes of Lymnaea stagnalis by whole mount in situ hybridisation
Eight maternal gene products were visualised in uncleaved zygotes relative to a negative control (β-tubulin).