De novo transcriptome assembly reveals characteristics of flower sex determination of Excoecaria agallocha

MADS-box genes family may play important roles in the flower sex determination in Excoecaria agallocha. A total of 73 MADS-box genes were identified in E. agallocha. De novo transcriptome assembly analysis suggested that AP1/FUL, AP3/PI, AGL104, and SOC1 plays potential roles in E. agallocha flower sex determination. Excoecaria agallocha is a dioecious species containing both male and female individuals producing unisexual flowers. The underlying mechanisms regulating inflorescence differentiation in these species remains poorly understood. Flower development influences reproduction and breeding in E.agallocha, which contributes to ecological restoration in the intertidal zone. We performed de novo transcriptome assembly analysis on male and female flowers and leaves from E. agallocha. We identified a total of 73 MADS-box genes in the E. agallocha genome, which we grouped into five distinct classes (MIKCc, Mα, Mβ, Mγ, MIKC*) after phylogenetic comparisons with J. curcas homologs. We analyzed expression profiles and suggested AP1/FUL, AP3/PI, AGL104, and SOC1 as candidate regulators of sex determination. In addition, several EaMADS genes were significantly upregulated in flowers compared to leaves. Our study represents the first detailed, comprehensive evaluation of the regulation of MADS-box genes associated with sex determination in E. agallocha. The assembled transcriptomic profiles increase the genetic information available for this species and constitute an important contribution to functional studies of inflorescence genes. In turn, this will help exploring the molecular mechanisms underlying the evolution of E. agallocha.


Introduction
Euphorbiaceae is a family of plants with significant medical and economic relevance, containing ~ 7500 species belonging to 300 genera and 5 different subfamilies (Webster 1994). The genus Excoecaria comprises nearly 40 species that are distributed across the mangrove regions of the old-world, and encompasses dioecious plants containing fruits with a 3-lobe schizocarp (Webster 1994; Mondal et al. 2016). Excoecaria agallocha L., often referred to as "milk mangrove", has relevant medicinal properties and belongs to the few non-viviparous mangrove species distributed across East Africa, India, and Celylon to China and the Ryu-Kyu Islands (Tomlinson 2016). This plant can be found in mangrove regions

Annals of Forest Science
Handling editor: Véronique Jorge. *Correspondence: 2368581774@qq.com; zhangyingred@lingnan.edu.cn and occasionally inland. Unlike the majority of mangrove species, E. agallocha does not contain specialized aerial roots extending above the soil surface in order to ensure the underground roots are supplied with oxygen (Mondal et al. 2016;Tomlinson 2016). This dioecious mangrove species has long male inflorescence and shorter female inflorescence (Duke 2006). Previous studies on E. agallocha essential focused on the hereditary patterns, medicinal properties and ecological adaption of this plant to new environments (Chen and Ye 2014;Mondal et al. 2016;Liu et al. 2021). Hence, the several aspects behind the molecular evolution of this dioecious mangrove species remain unknown.
In flowering plants, sex determination represents a developmental process that increases fitness and survival by promoting allogamy. In the case of unisexual flowers, sex determination is achieved through selective abortion or arrest of reproductive organs (either male or female) (Hernández-Cruz et al. 2019). Recently, several studies have identified and characterized MADS-domain proteins by employing genome-wide expression analyses across woody plant species, including Jatropha curcas (Tang et al. 2020), Quercus suber (Rocheta et al. 2014), Camellia sinensis (Pan et al. 2018), Malus × domestica (Moser et al. 2020), Populus trichocarpa (Leseberg et al. 2006), and Eriobotrya japonica (Xu et al. 2021). Studies focusing on overexpression assays and mutant plants suggest MADS-box genes are essential for the specification in floral organs (Krizek and Fletcher 2005;Kater et al. 2006). Importantly, MADS-box transcription factors include a conserved DNA-binding motif (known as the MADS domain) at their N-terminus region (Bai et al. 2019). This domain is approximately 58-60 amino acids long and able to bind CArG boxes (CC[A/T]6GG) (Wei et al. 2015). An ancestral MADS-box gene duplication event separates two main lineages, type I and type II (Svensson et al. 2000). The former genes can be further subdivided into three distinct groups, specifically Mα, Mβ, Mγ, according to the genetic sequence of MADS domain and the presence of other motifs . The proteins belonging to type II plants are named MIKC and, depending on the sequence divergence at domain I, can be divided into two different subgroups, MIKC c and MIKC* (Lin et al. 2016). The number of type I genes is highly variable among flowering plants (Grimplet et al. 2016). While MIKC-type genes mostly evolved as a result of genome duplications, type I MADS-box genes mostly arose via segmental duplications. The latter play an essential role in the formation of the female gametophyte and the early development of seeds (Masiero et al. 2011).
MIKC c -type have been identified as floral genes (Rounsley et al. 1995) that are also involved in flower organogenesis. These genes can be grouped into five distinct classes: A, B, C, D, and E. Each of these classes plays different roles, in particular determining the identity of sepals (A + E), petals (A + B + E), stamens (B + C + E), carpels (C + E) and ovules (D + E). In Arabidopsis, these functional classes include genes such as APETALA1 (AP1) in class A, PISTILATA (PI) and APETALA3 (AP3) in class B, AGAMOUS (AG) in class C (Hsu 2002), SEED-STICK/ AGAMOUS-LIKE 1 (STK/AGL11) and SHAT-TERPROOF (SHP) in class D (Mendes et al. 2013), and SEPALLATA (SEP1, SEP2, SEP3, SEP4) in class E (Sun et al. 2014). Several MIKC c genes in the AG and APETALA1/FRUITFULL (AP1/FUL) subfamilies further participate in the development of fruits and seeds (Meng et al. 2019), while others are seemingly involved in regulatory networks that control flowering time and initiation. Specifically, the genes FLOWERING LOCUS C (FLC), SUPRESSOR OF OVEREXPRESSION OF CONSTANTS 1 (SOC1) and SHORT VEGETATIVE PHASE (SVP) integrating signals from different flowering time regulatory pathways to regulate flowering transition (Yuan et al. 2009;Zhang et al. 2020a;Won et al. 2021). These genes can either be positive (SOC1, AGL24) or negative regulators (FLC, SVP) of genes regulating the identity of the flower meristem Schilling et al. 2020).
We provide a reference transcriptome dataset for E. agallocha that can be used in future studies. To our knowledge, we performed an unprecedented and detailed transcriptomic analysis on MADS-box genes associated with floral development or sex determination in E. agallocha. This dataset expands the available genetic resources on E. agallocha and opens the door to future functional studies seeking to explore the mechanisms behind the molecular evolution of E. agallocha.

Plant materials
We sampled E. agallocha specimens from Zhanjiang Jinsha Bay, Guangdong, China (21.2771°N, 110.3899°E). Leaves and flowers were collected from one male and female trees separately at 9:00 am on July 20, 2020. The sampled materials were then used for transcriptomic analysis. In addition, we collected leaves and flowers from the same male and female trees, and fruits form the same female tree for real-time PCR analysis (Fig. 1). The plant materials were immersed in liquid nitrogen and stored at -80 °C before downstream analyses were performed. Each samples had three biological replicates.

RNA extraction
We extracted total RNA from leaf and flower tissues using the TRIzol ® Reagent (Plant RNA Purification Reagent) following manufacturer's instructions (Invitrogen,

Phylogenetic analysis
To identify the MADS-box from E. agallocha transcriptomes, we searched all significant MADS-box-related genes against a database composed of plant TFs (Plant-TFDB 4.0; http:// plant tfdb. gao-lab. org/). We downloaded MADS protein sequences belonging to Jatropha curcas from PlantTFDB 4.0. Sequence alignments were performed using ClustalX (1.83). The phylogenetic trees comparing E. agallocha and J. curcas MADS-box proteins were built following a Maximum likelihood approach based on the similarity between full-length amino acid sequences using IQ-TREE (Nguyen et al. 2015).

Quantitative real-time PCR analysis
We selected a total of 14 DEGs (MADS-box gene family) detected in Ea_ff vs Ea_fL, Ea_ff vs Ea_mf, Ea_fL vs Ea_ mL, and Ea_mf vs Ea_mL for quantitative real-time PCR (qRT-PCR) using male leaves, male flower buds (small), male flower buds (big), male flowers, female leaves, female flower buds (small), female flower buds (big), female flowers, and fruits. This was done in order to validate the transcriptome results. The genes were selected at random based on the changes in expression and function identified in this study (Table S1 dataset Zhou et al. 2022). A total of three biological and technical replicates were performed for each organ and gene, respectively ( Fig. 1). We performed cDNA synthesis with the Pri-meScript ® II First Strand cDNA Synthesis Kit (TaKaRa, Japan). The primers were designed using the software Primer Premier 5.0. The amplified PCR product lengths varied between 80 and 300 bp (Table S2 dataset Zhou et al. 2022). qRT-PCR was performed in a CFX96Touch Real-Time PCR Detection system (Bio-Rad, America) with SYBR Premix Ex Taq ™ II (TaKaRa, Japan). We used the 2 −ΔΔCt method to estimate the relative expression levels of the genes in different samples. The data analysis by SPSS version 13.0 (SPSS Inc., Chicago, IL, USA) was used for one-way analysis of variance (ANOVA) and post hoc multiple comparisons. We used male leaves as controls to calculate the relative expression of DEGs associated with MADS-box. Differences were considered to be significant when P < 0.05. The figures were done with GraphPad Prism 8.

Transcriptome sequencing and de novo assembly of E. agallocha
We isolated total RNA from male leaf (mL), male flower (mf ), female leaf (fL), and female flower (ff ) to build the transcriptome of E. agallocha. After performing adapters trimming and filtering low-quality bases, we obtained 6.91-11.79 Gb of sequencing data from 12 cDNA libraries (Table S3 dataset Zhou et al. 2022). The estimated error rate of the RNA-seq was ~ 0.02%. All Q30 values were higher than 90.45%, while the GC content in each sample was exceeded 42%. We obtained an assembly of 427,506 transcripts (mean sequence length 584 bp) for the E. agallocha transcriptome ( Figure S1a  . We identified total of 66,975 simple sequence repeats (SSRs) from the transcriptome of E. agallocha using the MISA software ( Figure S3 and Table S4 dataset Zhou et al. 2022). These results suggest genetic high variation in E. agallocha.

Gene functional annotation
After assembling, we blasted the unigenes against seven public databases, including NR, NT, SwissProt, KO, Pfam, GO, and KOG. This allowed us to identify gene functions and associated pathways ( Figure S4 dataset Zhou et al. 2022; Table 1 ( Figure S4b). Furthermore, the NR BLASTX matches showed that the top two species were J. curcas (26.90%) and R. communis (21.60%) ( Figure S4d dataset Zhou et al. 2022). The above results highlight the highest homology obtained in our study. We further annotated 71,565 (28.85%) unigenes using 55 Gene Ontology (GO) terms. This included 25, 20, 10 and GO terms associated with biological processes, cellular components, and molecular functions, respectively ( Figure S5 dataset Zhou et al. 2022). The main GO terms found in each of three categories cellular process (GO: 0009987) and metabolic process (GO: 0008152); cell (GO: 0005623) and cell part (GO: 0044464); and binding (GO: 0005488) and catalytic activity (GO: 0003824), respectively. To further characterize the biological functions and interaction between unigenes, these groups were then classified into different metabolic pathways with KEGG (Table S5 dataset Zhou et al. 2022). We assigned a total of 34,874 unigenes (14.06%) to 19 categories divided into five clusters, including cellular processes, environmental information processing, genetic information processing, metabolism and organismal systems ( Figure S6 dataset Zhou et al. 2022). The top 5 KEGG pathways were translation (3719 unigenes), carbohydrate metabolism (3475 unigenes), folding and degradation (2749 unigenes), overview (2622 unigenes), and amino acid metabolism (2199 unigenes) (Table S5 dataset Zhou et al. 2022). These observations demonstrate these samples were characterized by active cell development and differentiation.

Sexuality-specific genes between female and male of E. agallocha
We identified specific DEGs in female and male E. agallocha plants based on the sexuality-specific unigenes present in leaf and flower samples (p value < 0.05 and |log2(fold change)| > 5). The results were presented in Fig. 2. We found a total of 4440 and 12,812 DEGs upregulated in the Ea_ff vs Ea_fL and Ea_mf vs Ea_mL group, respectively; while 4471 and 9655 DEGs were respectively down-regulated in the same groups. We uncovered 7923 DEGs between female and male flowers, of which 3370 were upregulated and 4553 were downregulated in the Ea_ff vs Ea_mf group. A further 9396 DEGs existed between female and male leaves, of which 4846 were upregulated and 4550 were downregulated in the Ea_fL vs Ea_mL group. We functionally annotated the E. agallocha female and male transcriptomes by performing a GO analysis on DEGs. Our results (Fig. 3) showed that under the biological process, cellular component and molecular function categories, the most significantly enriched terms were associated with the regulation of nitrogen compound metabolic process (GO:0051171), transcription factor complex (GO:0005667) and nuclear/ DNA-directed RNA polymerase complex (GO:0055029/GO:0000428) in flowers (Ea_ff vs Ea_mf group); metabolic process (GO: 0008152), transcription factor complex (GO:0005667) and catalytic activity (GO: 0003824) in females (Ea_ff vs Ea_ fL group); metabolic process (GO: 0008152), membrane To identify which unigenes were significantly associated with either metabolic or signal transduction pathways, we used the DEGs to query the KEGG database and compared the results to the whole-transcriptome data. This pathway enrichment analysis showed that those unigenes were associated with 116, 115, 124, and 119 pathways in the Ea_ff vs Ea_mf, Ea_ff vs Ea_fL, Ea_ mf vs Ea_mL, and Ea_fLvs Ea_mL groups, respectively (Table S6 dataset Zhou et al. 2022). The estimated top-3 significant pathways were "plant-pathogen interaction" (ko04626), "fatty acid elongation" (ko00062) and "sesquiterpenoid and triterpenoid biosynthesis" (ko00909) in flowers (Ea_ff vs Ea_mf group) (Fig. 4a); "starch and sucrose metabolism" (ko00500), "photosynthesis" (ko00195) and "phenylpropanoid biosynthesis" (ko00940) in females (Ea_ff vs Ea_fL group) (Fig. 4b); "starch and sucrose metabolism" (ko00500) , "plant hormone signal transduction" (ko04075) and "phenylpropanoid biosynthesis" (ko00940) in males (Ea_mf vs Ea_mL group) (Fig. 4c); and "starch and sucrose metabolism" (ko00500), "plant hormone signal transduction" (ko04075) and "plant-pathogen interaction" (ko04626) in leaves (Ea_fLvs Ea_mL group) (Fig. 4d).

MADS-box transcription factors in E. agallocha
Using the method described above, we obtained 73 unique MADS-box genes from E. agallocha and divided these genes into several branches (Fig. 5). At the same time, we compared the results of MADS-box protein domains in E. agallocha and J. curcas and performed phylogenetic analysis. EaMADS-box classification was based on a previous study (Tang et al. 2020). Specifically, we identified a total of 9, 7, 4, and 3 genes with Mα, Mβ, Mγ, and Mδ MIKC*-type genes. The remaining 50 MADS-box proteins belonged to the MIKC c subgroup (Fig. 5). Notably, we unveiled 7 MIKC c clades in E. agallocha belonging to the MIKC c subgroup class in J. curcas. These 7 MIKC c clades included B, SVP and SOC1 clades with 4 EaMADS-box sequences; the A clade included 3 EaMADS-box sequences; the E and C/D clades included 2 EaMADS-box sequences; and the B clade contained a single EaMADS-box sequence. We identified 17, 38, 61, and 11 MADS-box related DEGs in the Ea_ff vs Ea_mf, Ea_ff vs Ea_fL, Ea_mf vs Ea_mL, and Ea_fL vs Ea_mL groups, respectively  Table S1 dataset Zhou et al. 2022). In flowers (Ea_ff vs Ea_mf group), we found a total of 3 DEGs with Mα, 1 with Mγ, 1 with MIKC*, and 11 with the MIKC c subgroup. In females (Ea_ff vs Ea_fL group), we identified 3 genes with Mα, 2 with Mβ, 1 with Mγ, 2 with MIKC*, and 33 with the MIKC c subgroup. In males (Ea_mf vs Ea_mL group), we uncovered 6 genes with Mα, 6 with Mβ, 2 with Mγ, 3 with MIKC*, and 39 with the MIKC c subgroup. Finally, we found 3 genes with Mα, 1 with Mβ,1 with Mγ, 6 with the MIKC c subgroup in leaves (Ea_fL vs Ea_mL group). Consequently, the phylogenetic and DEGs analyses showed that different kinds of female and male E. agallocha organs contained MADS proteins in each putative functional group.
However, the functional relevance of these predicted genes needs to be further determined.

Discussion
This study provides the first de novo transcriptome assembly for the flower development of a dioecious mangrove. The uncovered unigenes were functionally annotated using seven public databases. The species distribution of the NR BLASTx matches revealed E. agallocha is closely related to J. curcas species. In fact, both J. curcas and E. agallocha belong to Euphorbiaceae (Satyan et al. 2009;Abdelgadir and Van Staden 2013). We identified a large number of sex-specific unigenes significantly associated with development and other biological processes. GO analysis revealed a large number of genes associated with various transcription factor complexes were enriched in females, "membrane" was significantly abundant in males, which affecting sex determination of E. agallocha. KEGG pathway analysis showed much of the identified DEGs are involved in starch and sucrose metabolism in the floral and leaf tissues of E. agallocha. We found a higher number of DEGs involved in the ribosome and plant hormone signal transduction pathway affecting male flowers. Plant hormones can regulate the transition from vegetative development to the first inflorescence phase of reproductive development (Xing et al. 2015). In addition, phytohormones play an essential role in sex determination, and the constitutive expression of MADS-box genes leads to precocious flower differentiation in A. thaliana (Tanurdzic 2004;Amasino 2010). Accordingly, the present study screened additional DEGs associated with sex determination between female and male flowers in E. agallocha. This approach identified 73 DEGs belonging to the MADS-box gene family, and demonstrate this family plays is a key regulator of floral development in E. agallocha. The nine harvested samples allowed further identification of the expression profiles during flower and fruit development. A total of 14 candidate genes were chosen for qRT-PCR analysis, which showed they have distinct expression profiles in different organs.
We analyzed the phylogeny of MADS-box protein families by searching for orthologous genes involved in flower development in E. agallocha. This allowed us to validate the identity of the different subfamilies of these genes. We performed the phylogenetic analysis on MADS-box protein sequences from J. curcas. The MADS-box proteins regulate floral organogenesis and are essential for flower development and sex determination (De Bodt et al. 2003;Pan et al. 2018;Bar-Lev et al. 2021). These proteins can be subdivided into two lineages, the M-type (type I) and MIKC (type II). The former can be further grouped into Mα, Mβ, and Mγ subgroups based on the respective sequence similarities between their MADS-box regions (Pan et al. 2018). We annotated 73 transcripts in the E. agallocha transcriptome that aligned to the MADS-box family and/or contained the MADS domain. Our phylogenetic analysis identified several orthologous genes between E. agallocha and J.curcas in almost all MADS-box protein clades ( Fig. 5; Table S1 dataset Zhou et al. 2022). Similar to other plants, E. agallocha also contains a higher number of Mα and MIKC c than Mγ or MIKC* genes ( Fig. 5; Table S1 dataset Zhou et al. 2022). The distribution of MADS-box subfamily genes in E. agallocha is similar to those observed genes in most plants, indicating this gene family is evolutionarily conserved (Tian et al. 2015;Wei et al. 2015;Ma et al. 2017). Several studies showed the MIKC c gene family plays a central role in flower development and the regulation of flowering time (Yuan et al. 2009;Wang et al. 2015;Zhang et al. 2020b). Interestingly, we found 50 MIKC c in E. agallocha ( Fig. 5; Table S1 dataset Zhou et al. 2022). This might indicate that there may be a complex regulatory mechanism for flowering in E. agallocha. Moreover, a reduction in the number of subfamilies, including Mβ, Mγ, and MIKC*, indicates some of these families may gradually disappear during evolution.
In MIKC c group, we found class A, B, C, D, and E genes in E. agallocha. Of these, those that are significantly upregulated in male or female flowers did not show significant differences in expression between flowers and leaves. The A-class genes, AP1/FUL (Ea51397. c4.g1, Ea65400.c0.g1, and Ea71403.c0.g1) were upregulated in the male flower. In addition, the B-class genes AP3/PI (Ea49623.c0.g1 and Ea60496.c2.g1) were also upregulated in the male flower and may influence male organ development. Similar results have been proposed in Quercus suber (Rocheta et al. 2014). As reported in the Norway spruce, B-type MADS-box genes are active in male organ primordia (Sundstrom and Engstrom 2002). C and D-class genes were significantly upregulated in male and female flowers in E. agallocha. STK (Ea56657. c2.g1) was only expressed in floral organs and is likely involved in flower organ identity (Soltis 2002). AGL104 (Ea52637.c2.g2) was upregulated in the female flowers compared to male flowers. In Arabidopsis, AGL104 is involved in pollen maturation and tube growth (Adamczyk and Fernandez 2009). The E-class genes, SEP2/ AGL4 (Ea80597.c1.g2 and Ea70652.c0.g1) were significantly up-regulated in female and male flowers. SEP2/ AGL4 is required for the specification of floral organs (Wang et al. 2016). However, these genes did not show significant differences in expression between female and male flowers. Accordingly, we hypothesize E-class genes might not be necessary for sex determination. In apples, SUPRESSOR OF CONSTANS 1 (SOC1) regulates the induction of the flower bud (Xing et al. 2015). SOC1 (Ea47429.c0.g1, Ea78911.c0.g3, Ea69574.c0.g1 and Ea82297.c2.g2) was significantly down-regulated in floral Page 11 of 12 Zhou et al. Annals of Forest Science (2022) 79:36 organs compared to leaves, but significantly expressed in flower buds. The SVP gene represses floral transition in B. juncea (Li et al. 2019), and is highly expressed in both male and female leaves in E. agallocha. It would be very interesting to evaluate whether this gene may be similarly inhibited in E. agallocha, restricting leaf development. We propose AP1/FUL, AP3/PI, AGL104, and SOC1 should be considered candidate regulators of sex determination in E. agallocha.

Conclusion
We report the transcriptome of E. agallocha. The data provides important information for understanding the evolutionary history of mangrove. The high number of transcript sequences can be used to discover novel genes, specifically those associated with flower development in E. agallocha. A total of 7923 DEGs were identified between female and male flowers, 73 of which were MADS-box genes (50 belonged to the MIKC c subgroup).
Our results show that AP1/FUL, AP3/PI, AGL104, and SOC1 genes play an important role during the formation of floral organs in E. agallocha.