Skip to main content

De novo transcriptome assembly reveals characteristics of flower sex determination of Excoecaria agallocha

Abstract

Key message

MADS-box genes family may play important roles in the flower sex determination in Excoecaria agallocha. A total of 73 MADS-box genes were identified in E. agallocha. De novo transcriptome assembly analysis suggested that AP1/FUL, AP3/PI, AGL104, and SOC1 plays potential roles in E. agallocha flower sex determination.

Context

Excoecaria agallocha is a dioecious species containing both male and female individuals producing unisexual flowers. The underlying mechanisms regulating inflorescence differentiation in these species remains poorly understood.

Aims

Flower development influences reproduction and breeding in E.agallocha, which contributes to ecological restoration in the intertidal zone.

Methods

We performed de novo transcriptome assembly analysis on male and female flowers and leaves from E. agallocha.

Results

We identified a total of 73 MADS-box genes in the E. agallocha genome, which we grouped into five distinct classes (MIKCc, Mα, Mβ, Mγ, MIKC*) after phylogenetic comparisons with J. curcas homologs. We analyzed expression profiles and suggested AP1/FUL, AP3/PI, AGL104, and SOC1 as candidate regulators of sex determination. In addition, several EaMADS genes were significantly upregulated in flowers compared to leaves.

Conclusion

Our study represents the first detailed, comprehensive evaluation of the regulation of MADS-box genes associated with sex determination in E. agallocha. The assembled transcriptomic profiles increase the genetic information available for this species and constitute an important contribution to functional studies of inflorescence genes. In turn, this will help exploring the molecular mechanisms underlying the evolution of E. agallocha.

1 Introduction

Euphorbiaceae is a family of plants with significant medical and economic relevance, containing ~ 7500 species belonging to 300 genera and 5 different subfamilies (Webster 1994). The genus Excoecaria comprises nearly 40 species that are distributed across the mangrove regions of the old-world, and encompasses dioecious plants containing fruits with a 3-lobe schizocarp (Webster 1994; Mondal et al. 2016). Excoecaria agallocha L., often referred to as “milk mangrove”, has relevant medicinal properties and belongs to the few non-viviparous mangrove species distributed across East Africa, India, and Celylon to China and the Ryu-Kyu Islands (Tomlinson 2016). This plant can be found in mangrove regions and occasionally inland. Unlike the majority of mangrove species, E. agallocha does not contain specialized aerial roots extending above the soil surface in order to ensure the underground roots are supplied with oxygen (Mondal et al. 2016; Tomlinson 2016). This dioecious mangrove species has long male inflorescence and shorter female inflorescence (Duke 2006). Previous studies on E. agallocha essential focused on the hereditary patterns, medicinal properties and ecological adaption of this plant to new environments (Chen and Ye 2014; Mondal et al. 2016; Liu et al. 2021). Hence, the several aspects behind the molecular evolution of this dioecious mangrove species remain unknown.

In flowering plants, sex determination represents a developmental process that increases fitness and survival by promoting allogamy. In the case of unisexual flowers, sex determination is achieved through selective abortion or arrest of reproductive organs (either male or female) (Hernández-Cruz et al. 2019). Recently, several studies have identified and characterized MADS-domain proteins by employing genome-wide expression analyses across woody plant species, including Jatropha curcas (Tang et al. 2020), Quercus suber (Rocheta et al. 2014), Camellia sinensis (Pan et al. 2018), Malus × domestica (Moser et al. 2020), Populus trichocarpa (Leseberg et al. 2006), and Eriobotrya japonica (Xu et al. 2021). Studies focusing on overexpression assays and mutant plants suggest MADS-box genes are essential for the specification in floral organs (Krizek and Fletcher 2005; Kater et al. 2006). Importantly, MADS-box transcription factors include a conserved DNA-binding motif (known as the MADS domain) at their N-terminus region (Bai et al. 2019). This domain is approximately 58–60 amino acids long and able to bind CArG boxes (CC[A/T]6GG) (Wei et al. 2015). An ancestral MADS-box gene duplication event separates two main lineages, type I and type II (Svensson et al. 2000). The former genes can be further subdivided into three distinct groups, specifically Mα, Mβ, Mγ, according to the genetic sequence of MADS domain and the presence of other motifs (Zhang et al. 2021). The proteins belonging to type II plants are named MIKC and, depending on the sequence divergence at domain I, can be divided into two different subgroups, MIKCc and MIKC*(Lin et al. 2016). The number of type I genes is highly variable among flowering plants (Grimplet et al. 2016). While MIKC-type genes mostly evolved as a result of genome duplications, type I MADS-box genes mostly arose via segmental duplications. The latter play an essential role in the formation of the female gametophyte and the early development of seeds (Masiero et al. 2011).

MIKCc-type have been identified as floral genes (Rounsley et al. 1995) that are also involved in flower organogenesis. These genes can be grouped into five distinct classes: A, B, C, D, and E. Each of these classes plays different roles, in particular determining the identity of sepals (A + E), petals (A + B + E), stamens (B + C + E), carpels (C + E) and ovules (D + E). In Arabidopsis, these functional classes include genes such as APETALA1 (AP1) in class A, PISTILATA (PI) and APETALA3 (AP3) in class B, AGAMOUS (AG) in class C (Hsu 2002), SEEDSTICK/ AGAMOUS-LIKE 1 (STK/AGL11) and SHATTERPROOF (SHP) in class D (Mendes et al. 2013), and SEPALLATA (SEP1, SEP2, SEP3, SEP4) in class E (Sun et al. 2014). Several MIKCc genes in the AG and APETALA1/FRUITFULL (AP1/FUL) subfamilies further participate in the development of fruits and seeds (Meng et al. 2019), while others are seemingly involved in regulatory networks that control flowering time and initiation. Specifically, the genes FLOWERING LOCUS C (FLC), SUPRESSOR OF OVEREXPRESSION OF CONSTANTS 1 (SOC1) and SHORT VEGETATIVE PHASE (SVP) integrating signals from different flowering time regulatory pathways to regulate flowering transition (Yuan et al. 2009; Zhang et al. 2020a; Won et al. 2021). These genes can either be positive (SOC1, AGL24) or negative regulators (FLC, SVP) of genes regulating the identity of the flower meristem (Wang et al. 2015; Schilling et al. 2020).

We provide a reference transcriptome dataset for E. agallocha that can be used in future studies. To our knowledge, we performed an unprecedented and detailed transcriptomic analysis on MADS-box genes associated with floral development or sex determination in E. agallocha. This dataset expands the available genetic resources on E. agallocha and opens the door to future functional studies seeking to explore the mechanisms behind the molecular evolution of E. agallocha.

2 Material and methods

2.1 Plant materials

We sampled E. agallocha specimens from Zhanjiang Jinsha Bay, Guangdong, China (21.2771°N, 110.3899°E). Leaves and flowers were collected from one male and female trees separately at 9:00 am on July 20, 2020. The sampled materials were then used for transcriptomic analysis. In addition, we collected leaves and flowers from the same male and female trees, and fruits form the same female tree for real-time PCR analysis (Fig. 1). The plant materials were immersed in liquid nitrogen and stored at – 80 °C before downstream analyses were performed. Each samples had three biological replicates.

Fig. 1
figure 1

Plant material used for qRT-PCR analysis. Note: 1 Male leaf. 2 Male flower bud (small). 3 Male flower bud (big), 4 Male flower. 5 Female leaf. 6 Female flower bud (small). 7 Female flower bud (big). 8 Female flower. 9 Fruit

2.2 RNA extraction

We extracted total RNA from leaf and flower tissues using the TRIzol® Reagent (Plant RNA Purification Reagent) following manufacturer’s instructions (Invitrogen, Carlsbad, CA, USA). Genomic DNA was removed using DNase I (TaKara Bio, Inc., USA). The integrity and purity of the RNA were measured using the 2100 Bioanalyser (Agilent Technologies, Inc., Santa Clara CA, USA). RNA was quantified using the ND-2000 (NanoDrop Thermo Scientific, Wilmington, DE, USA). Only high-quality RNA samples (OD260/280 = 1.8 ~ 2.2, OD260/230 ≥ 2.0, RIN ≥ 8.0, 28S:18S ≥ 1.0, > 1 μg) were used to construct the sequencing library. Three biological replicates were performed for each sample.

2.3 Library preparation and Illumina NovaSeq 6000 sequencing

We performed the RNA purification, reverse transcription, library construction and sequencing experiments at the Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China). Each of these analyses followed manufacturer’s instructions (Illumina, San Diego, CA). The RNA-seq transcriptome libraries (i.e., Ea_ff1, Ea_ff2, Ea_ff3, Ea_fL1, Ea_fL2, Ea_fL3, Ea_mf1, Ea_mf2, Ea_mf3, Ea_mL1, Ea_mL2, and Ea_mL3) were prepared using the Illumina sample preparation kit TruSeqTM RNA (Illumina, San Diego, CA). Three biological replicates were performed for each sample. We obtained purified poly(A) mRNA using oligo-dT-attached magnetic beads and a fragmentation buffer. We then used these short fragments as templates and synthesized double-stranded cDNA with SuperScript double-stranded cDNA synthesis kit (Invitrogen, CA) and random hexamer primers (Illumina). We performed end-repair on the synthesized cDNA, followed by phosphorylation and the addition of ‘A’ bases as per Illumina’s library construction protocol. The libraries we selected for cDNA target fragments of 200–300 bp using 2% Low Range Ultra Agarose and PCR amplification (15 cycles) with Phusion DNA polymerase (New England Biolabs, Boston, MA). After quantification by TBS380, we sequenced 12 RNAseq libraries on an Illumina NovaSeq 6000 sequencer (Illumina, San Diego, CA) using one lane and 2 × 150 bp paired-end reads. Correlation test was performed to verify the reliability of the repeated experiments. The raw reads generated in this study were deposited in the NCBI database under accession number PRJNA760657. See also Table S7 in dataset Zhou et al. 2022.

2.4 De novo assembly and annotation

The paired-end reads were trimmed and filtered for quality controls using SeqPrep (https://github.com/jstjohn/SeqPrep) and Sickle (https://github.com/najoshi/sickle) with default parameters. We used Trinity (http://trinityrnaseq. sourceforge.net/) to perform de novo assembly on the clean data (Ea_ff1, Ea_ff2, Ea_ff3, Ea_fL1, Ea_fL2, Ea_fL3, Ea_mf1, Ea_mf2, Ea_mf3, Ea_mL1, Ea_mL2, and Ea_mL3) (Grabherr et al. 2011) The assembled transcripts were functionally annotated against the NCBI protein nonredundant (NR), NCBI nucleotide sequence database (NT), SwissProt, Pfam, and Clusters of orthologous groups for eukaryotic complete genomes (KOG) databases using BLASTX with an E-value cut-off 1.0 × 10−5. In addition, the BLAST2GO (http://www.blast2go.com/b2ghome) (Conesa et al. 2005) program was used to obtain GO annotations of uniquely assembled transcripts to describe the associated biological processes, molecular functions and cellular components. Metabolic pathway analysis was performed using KEGG (http://www.genome.jp/kegg/) (Ogata et al. 1999).

2.5 Differential expression analysis and functional enrichment

The clean data of each sample was mapped to the assembled reference transcriptome using Jatropha curcas L. (Seesangboon et al. 2018). The abundance of genes was calculated using RSEM (http://deweylab.biostat.wisc.edu/rsem/) (Li and Dewey 2011) and normalized using the transcripts per million reads (TPM) method (Table S8 dataset Zhou et al. 2022). Differential expression analysis was implemented using EdgeR (Robinson et al. 2010) that genes with |log2FC|>1 and Q value < = 0.05 were considered to be significantly differentially expressed (DEGs). GO and KEGG enrichment analyzes were performed on DEGs to identify relevant pathways and functions using Goatools (https://github.com/tanghaibao/Goatools) and KOBAS (http://kobas.cbi.pku.edu.cn/home.do), respectively (Xie et al. 2011).

2.6 Phylogenetic analysis

To identify the MADS-box from E. agallocha transcriptomes, we searched all significant MADS-box-related genes against a database composed of plant TFs (PlantTFDB 4.0; http://planttfdb.gao-lab.org/). We downloaded MADS protein sequences belonging to Jatropha curcas from PlantTFDB 4.0. Sequence alignments were performed using ClustalX (1.83). The phylogenetic trees comparing E. agallocha and J. curcas MADS-box proteins were built following a Maximum likelihood approach based on the similarity between full-length amino acid sequences using IQ-TREE (Nguyen et al. 2015).

2.7 Quantitative real-time PCR analysis

We selected a total of 14 DEGs (MADS-box gene family) detected in Ea_ff vs Ea_fL, Ea_ff vs Ea_mf, Ea_fL vs Ea_mL, and Ea_mf vs Ea_mL for quantitative real-time PCR (qRT-PCR) using male leaves, male flower buds (small), male flower buds (big), male flowers, female leaves, female flower buds (small), female flower buds (big), female flowers, and fruits. This was done in order to validate the transcriptome results. The genes were selected at random based on the changes in expression and function identified in this study (Table S1 dataset Zhou et al. 2022). A total of three biological and technical replicates were performed for each organ and gene, respectively (Fig. 1). We performed cDNA synthesis with the PrimeScript® II First Strand cDNA Synthesis Kit (TaKaRa, Japan). The primers were designed using the software Primer Premier 5.0. The amplified PCR product lengths varied between 80 and 300 bp (Table S2 dataset Zhou et al. 2022). qRT–PCR was performed in a CFX96Touch Real-Time PCR Detection system (Bio-Rad, America) with SYBR Premix Ex Taq™ II (TaKaRa, Japan). We used the 2−ΔΔCt method to estimate the relative expression levels of the genes in different samples. The data analysis by SPSS version 13.0 (SPSS Inc., Chicago, IL, USA) was used for one-way analysis of variance (ANOVA) and post hoc multiple comparisons. We used male leaves as controls to calculate the relative expression of DEGs associated with MADS-box. Differences were considered to be significant when P < 0.05. The figures were done with GraphPad Prism 8.

3 Results

3.1 Transcriptome sequencing and de novo assembly of E.agallocha

We isolated total RNA from male leaf (mL), male flower (mf), female leaf (fL), and female flower (ff) to build the transcriptome of E. agallocha. After performing adapters trimming and filtering low-quality bases, we obtained 6.91–11.79 Gb of sequencing data from 12 cDNA libraries (Table S3 dataset Zhou et al. 2022). The estimated error rate of the RNA-seq was ~ 0.02%. All Q30 values were higher than 90.45%, while the GC content in each sample was exceeded 42%. We obtained an assembly of 427,506 transcripts (mean sequence length 584 bp) for the E. agallocha transcriptome (Figure S1a dataset Zhou et al. 2022). We chose the longest transcript in each gene as the candidate unigene, and obtained a set composed by a total of 152,412 unigenes (Figure S1b dataset Zhou et al. 2022). Unigene length ranged from 201 to 16,447 bp (with a mean of 575 bp and N50 = 738 bp). The correlation of gene expression levels between samples indicates a high degree of similarity of expression patterns between samples (Figure S2 dataset Zhou et al. 2022). We identified total of 66,975 simple sequence repeats (SSRs) from the transcriptome of E. agallocha using the MISA software (Figure S3 and Table S4 dataset Zhou et al. 2022). These results suggest genetic high variation in E. agallocha.

3.2 Gene functional annotation

After assembling, we blasted the unigenes against seven public databases, including NR, NT, SwissProt, KO, Pfam, GO, and KOG. This allowed us to identify gene functions and associated pathways (Figure S4 dataset Zhou et al. 2022; Table 1). The highest annotation was found with NR (102,678; 41.40%), followed by Pfam (66,309; 26.73%) and KOG (24,375; 9.82%). The E-value distribution of the top-scoring BLASTX hits against the NR database showed that 25.5% of the mapped sequences exhibited high homology (< 1e−45). In contrast, the E-values for 73.80% of the homologous sequences ranged between 1e−5 and 1e−45 (Figure S3a dataset Zhou et al. 2022). In addition, 41.10% of the mapped sequences has a similarity higher than 80%, while 16.20% showed a similarity lower than 60% (Figure S4b). Furthermore, the NR BLASTX matches showed that the top two species were J. curcas (26.90%) and R. communis (21.60%) (Figure S4d dataset Zhou et al. 2022). The above results highlight the highest homology obtained in our study.

Table 1 Summary of the annotation about the E. agallocha transcriptome

We further annotated 71,565 (28.85%) unigenes using 55 Gene Ontology (GO) terms. This included 25, 20, 10 and GO terms associated with biological processes, cellular components, and molecular functions, respectively (Figure S5 dataset Zhou et al. 2022). The main GO terms found in each of three categories cellular process (GO: 0009987) and metabolic process (GO: 0008152); cell (GO: 0005623) and cell part (GO: 0044464); and binding (GO: 0005488) and catalytic activity (GO: 0003824), respectively. To further characterize the biological functions and interaction between unigenes, these groups were then classified into different metabolic pathways with KEGG (Table S5 dataset Zhou et al. 2022). We assigned a total of 34,874 unigenes (14.06%) to 19 categories divided into five clusters, including cellular processes, environmental information processing, genetic information processing, metabolism and organismal systems (Figure S6 dataset Zhou et al. 2022). The top 5 KEGG pathways were translation (3719 unigenes), carbohydrate metabolism (3475 unigenes), folding and degradation (2749 unigenes), overview (2622 unigenes), and amino acid metabolism (2199 unigenes) (Table S5 dataset Zhou et al. 2022). These observations demonstrate these samples were characterized by active cell development and differentiation.

3.3 Sexuality-specific genes between female and male of E. agallocha

We identified specific DEGs in female and male E. agallocha plants based on the sexuality-specific unigenes present in leaf and flower samples (p value < 0.05 and |log2(fold change)| > 5). The results were presented in Fig. 2. We found a total of 4440 and 12,812 DEGs upregulated in the Ea_ff vs Ea_fL and Ea_mf vs Ea_mL group, respectively; while 4471 and 9655 DEGs were respectively down-regulated in the same groups. We uncovered 7923 DEGs between female and male flowers, of which 3370 were upregulated and 4553 were downregulated in the Ea_ff vs Ea_mf group. A further 9396 DEGs existed between female and male leaves, of which 4846 were upregulated and 4550 were downregulated in the Ea_fL vs Ea_mL group.

Fig. 2
figure 2

Differentially expression gene (DEGs) analysis in E. agallocha

We functionally annotated the E. agallocha female and male transcriptomes by performing a GO analysis on DEGs. Our results (Fig. 3) showed that under the biological process, cellular component and molecular function categories, the most significantly enriched terms were associated with the regulation of nitrogen compound metabolic process (GO:0051171), transcription factor complex (GO:0005667) and nuclear/ DNA-directed RNA polymerase complex (GO:0055029/GO:0000428) in flowers (Ea_ff vs Ea_mf group); metabolic process (GO: 0008152), transcription factor complex (GO:0005667) and catalytic activity (GO: 0003824) in females (Ea_ff vs Ea_fL group); metabolic process (GO: 0008152), membrane (GO:0016020) and catalytic activity (GO: 0003824) in males (Ea_mf vs Ea_mL group); and DNA metabolic process (GO:0006259), sarcoplasm and sarcoplasmic reticulum (GO:0016528 and GO:0016529) and ADP binding (GO:0043531) in leaves (Ea_fL vs Ea_mL group).

Fig. 3
figure 3

GO functional enrichment map of differentially gene expression between female and male of E.agallocha in different comparison groups. a Ea_ff vs Ea_mf group. b Ea_ff vs Ea_fLgroup. c Ea_mf vs Ea_mL group. d Ea_fL vs Ea_mL group

To identify which unigenes were significantly associated with either metabolic or signal transduction pathways, we used the DEGs to query the KEGG database and compared the results to the whole-transcriptome data. This pathway enrichment analysis showed that those unigenes were associated with 116, 115, 124, and 119 pathways in the Ea_ff vs Ea_mf, Ea_ff vs Ea_fL, Ea_mf vs Ea_mL, and Ea_fLvs Ea_mL groups, respectively (Table S6 dataset Zhou et al. 2022). The estimated top-3 significant pathways were “plant-pathogen interaction” (ko04626), “fatty acid elongation” (ko00062) and “sesquiterpenoid and triterpenoid biosynthesis” (ko00909) in flowers (Ea_ff vs Ea_mf group) (Fig. 4a); “starch and sucrose metabolism” (ko00500), “photosynthesis” (ko00195) and “phenylpropanoid biosynthesis” (ko00940) in females (Ea_ff vs Ea_fL group) (Fig. 4b); “starch and sucrose metabolism” (ko00500) , “plant hormone signal transduction” (ko04075) and “phenylpropanoid biosynthesis” (ko00940) in males (Ea_mf vs Ea_mL group) (Fig. 4c); and “starch and sucrose metabolism” (ko00500), “plant hormone signal transduction” (ko04075) and “plant-pathogen interaction” (ko04626) in leaves (Ea_fLvs Ea_mL group) (Fig. 4d).

Fig. 4
figure 4

KEGG Pathway enrichment diagram of differentially gene expression between female and male of E.agallocha in different comparison groups. a Ea_ff vs Ea_mf group. b Ea_ff vs Ea_fL group. c Ea_mf vs Ea_mL group. d Ea_fL vs Ea_mL group

3.4 MADS-box transcription factors in E. agallocha

Using the method described above, we obtained 73 unique MADS-box genes from E. agallocha and divided these genes into several branches (Fig. 5). At the same time, we compared the results of MADS-box protein domains in E. agallocha and J. curcas and performed phylogenetic analysis. EaMADS-box classification was based on a previous study (Tang et al. 2020). Specifically, we identified a total of 9, 7, 4, and 3 genes with Mα, Mβ, Mγ, and Mδ MIKC*-type genes. The remaining 50 MADS-box proteins belonged to the MIKCc subgroup (Fig. 5). Notably, we unveiled 7 MIKCc clades in E. agallocha belonging to the MIKCc subgroup class in J. curcas. These 7 MIKCc clades included B, SVP and SOC1 clades with 4 EaMADS-box sequences; the A clade included 3 EaMADS-box sequences; the E and C/D clades included 2 EaMADS-box sequences; and the B clade contained a single EaMADS-box sequence. We identified 17, 38, 61, and 11 MADS-box related DEGs in the Ea_ff vs Ea_mf, Ea_ff vs Ea_fL, Ea_mf vs Ea_mL, and Ea_fL vs Ea_mL groups, respectively (Fig. 6; Table S1 dataset Zhou et al. 2022). In flowers (Ea_ff vs Ea_mf group), we found a total of 3 DEGs with Mα, 1 with Mγ, 1 with MIKC*, and 11 with the MIKCc subgroup. In females (Ea_ff vs Ea_fL group), we identified 3 genes with Mα, 2 with Mβ, 1 with Mγ, 2 with MIKC*, and 33 with the MIKCc subgroup. In males (Ea_mf vs Ea_mL group), we uncovered 6 genes with Mα, 6 with Mβ, 2 with Mγ, 3 with MIKC*, and 39 with the MIKCc subgroup. Finally, we found 3 genes with Mα, 1 with Mβ,1 with Mγ, 6 with the MIKCc subgroup in leaves (Ea_fL vs Ea_mL group). Consequently, the phylogenetic and DEGs analyses showed that different kinds of female and male E. agallocha organs contained MADS proteins in each putative functional group. However, the functional relevance of these predicted genes needs to be further determined.

Fig. 5
figure 5

Phylogenetic analysis (inner circle) for J. curcas and E.agallocha MADS-box proteins, and heatmap analysis (outer circle) of MADS gene family in the E.agallocha of Ea_ff, Ea_fL, Ea_mf, and Ea_mL

Fig. 6
figure 6

qRT-PCR was used to analyze the expression profiles of MADS-box genes in E. agallocha. a Ea49623.c0.g1. b Ea52637.c2.g1. c Ea58933.c1.g2. d Ea60496.c2.g1. e Ea66029.c1.g4. f Ea69574.c0.g1. g Ea73380.c1.g3. h Ea73650.c1.g5. i Ea73657.c0.g3. j Ea75556.c1.g1. k Ea78824.c3.g2. l Ea78865.c1.g2. m Ea79919.c2.g1. n Ea81288.c3.g1. Note: The actin gene as the internal control for gene expression analysis

3.5 Sexuality-specific expression patterns of MADS-box genes in E. agallocha

Because no MADS-box TFs had been previously identified in E. agallocha, we chose 14 EaMADS-box genes belonging to each of the MADS-box subgroups and evaluated the expression patterns of these genes (Fig. 5). Our results showed differential gene expression in 14 MADS-box genes in E. agallocha (Fig. 6; Figure S7 dataset Zhou et al. 2022). The pattern of RNA-seq and qRT-PCR were almost the same, specifically when comparing leaves to flowers (Figure S7 dataset Zhou et al. 2022). This indicated that 4 EaMADS-box genes were highly expressed in the male flower and flower bud tissues. These include Ea49623.c0. g1, Ea60496.c2.g1, Ea73380.c1.g3 and Ea78865.c1.g2. Importantly, Ea69574.c0.g1, Ea73657.c0.g3, and Ea75556.c1.g1 were also highly expressed in the female leaves, Ea52637.c2.g1, Ea58933.c1.g2, and Ea73650.c1.g5 in fruits, Ea78824.c3.g2 in male flowers, and Ea66029.c1.g4 in male leaves. The floral expression profiles of EaMADS-box genes provide relevant information for investigating flower development in E. agallocha. Taken together, these results suggest the transcriptome data agreed with the observed expression patterns of most E. agallocha genes.

4 Discussion

This study provides the first de novo transcriptome assembly for the flower development of a dioecious mangrove. The uncovered unigenes were functionally annotated using seven public databases. The species distribution of the NR BLASTx matches revealed E. agallocha is closely related to J. curcas species. In fact, both J. curcas and E. agallocha belong to Euphorbiaceae (Satyan et al. 2009; Abdelgadir and Van Staden 2013).

We identified a large number of sex-specific unigenes significantly associated with development and other biological processes. GO analysis revealed a large number of genes associated with various transcription factor complexes were enriched in females, “membrane” was significantly abundant in males, which affecting sex determination of E. agallocha. KEGG pathway analysis showed much of the identified DEGs are involved in starch and sucrose metabolism in the floral and leaf tissues of E. agallocha. We found a higher number of DEGs involved in the ribosome and plant hormone signal transduction pathway affecting male flowers. Plant hormones can regulate the transition from vegetative development to the first inflorescence phase of reproductive development (Xing et al. 2015). In addition, phytohormones play an essential role in sex determination, and the constitutive expression of MADS-box genes leads to precocious flower differentiation in A. thaliana (Tanurdzic 2004; Amasino 2010). Accordingly, the present study screened additional DEGs associated with sex determination between female and male flowers in E. agallocha. This approach identified 73 DEGs belonging to the MADS-box gene family, and demonstrate this family plays is a key regulator of floral development in E. agallocha. The nine harvested samples allowed further identification of the expression profiles during flower and fruit development. A total of 14 candidate genes were chosen for qRT-PCR analysis, which showed they have distinct expression profiles in different organs.

We analyzed the phylogeny of MADS-box protein families by searching for orthologous genes involved in flower development in E. agallocha. This allowed us to validate the identity of the different subfamilies of these genes. We performed the phylogenetic analysis on MADS-box protein sequences from J. curcas. The MADS-box proteins regulate floral organogenesis and are essential for flower development and sex determination (De Bodt et al. 2003; Pan et al. 2018; Bar-Lev et al. 2021). These proteins can be subdivided into two lineages, the M-type (type I) and MIKC (type II). The former can be further grouped into Mα, Mβ, and Mγ subgroups based on the respective sequence similarities between their MADS-box regions (Pan et al. 2018). We annotated 73 transcripts in the E. agallocha transcriptome that aligned to the MADS-box family and/or contained the MADS domain. Our phylogenetic analysis identified several orthologous genes between E. agallocha and J.curcas in almost all MADS-box protein clades (Fig. 5; Table S1 dataset Zhou et al. 2022). Similar to other plants, E. agallocha also contains a higher number of Mα and MIKCc than Mγ or MIKC* genes (Fig. 5; Table S1 dataset Zhou et al. 2022). The distribution of MADS-box subfamily genes in E. agallocha is similar to those observed genes in most plants, indicating this gene family is evolutionarily conserved (Tian et al. 2015; Wei et al. 2015; Ma et al. 2017). Several studies showed the MIKCc gene family plays a central role in flower development and the regulation of flowering time (Yuan et al. 2009; Wang et al. 2015; Zhang et al. 2020b). Interestingly, we found 50 MIKCc in E. agallocha (Fig. 5; Table S1 dataset Zhou et al. 2022). This might indicate that there may be a complex regulatory mechanism for flowering in E. agallocha. Moreover, a reduction in the number of subfamilies, including Mβ, Mγ, and MIKC*, indicates some of these families may gradually disappear during evolution.

In MIKCc group, we found class A, B, C, D, and E genes in E. agallocha. Of these, those that are significantly upregulated in male or female flowers did not show significant differences in expression between flowers and leaves. The A-class genes, AP1/FUL (Ea51397.c4.g1, Ea65400.c0.g1, and Ea71403.c0.g1) were upregulated in the male flower. In addition, the B-class genes AP3/PI (Ea49623.c0.g1 and Ea60496.c2.g1) were also upregulated in the male flower and may influence male organ development. Similar results have been proposed in Quercus suber (Rocheta et al. 2014). As reported in the Norway spruce, B-type MADS-box genes are active in male organ primordia (Sundstrom and Engstrom 2002). C and D-class genes were significantly upregulated in male and female flowers in E. agallocha. STK (Ea56657.c2.g1) was only expressed in floral organs and is likely involved in flower organ identity (Soltis 2002). AGL104 (Ea52637.c2.g2) was upregulated in the female flowers compared to male flowers. In Arabidopsis, AGL104 is involved in pollen maturation and tube growth (Adamczyk and Fernandez 2009). The E-class genes, SEP2/AGL4 (Ea80597.c1.g2 and Ea70652.c0.g1) were significantly up-regulated in female and male flowers. SEP2/AGL4 is required for the specification of floral organs (Wang et al. 2016). However, these genes did not show significant differences in expression between female and male flowers. Accordingly, we hypothesize E-class genes might not be necessary for sex determination. In apples, SUPRESSOR OF CONSTANS 1 (SOC1) regulates the induction of the flower bud (Xing et al. 2015). SOC1 (Ea47429.c0.g1, Ea78911.c0.g3, Ea69574.c0.g1 and Ea82297.c2.g2) was significantly down-regulated in floral organs compared to leaves, but significantly expressed in flower buds. The SVP gene represses floral transition in B. juncea (Li et al. 2019), and is highly expressed in both male and female leaves in E. agallocha. It would be very interesting to evaluate whether this gene may be similarly inhibited in E. agallocha, restricting leaf development. We propose AP1/FUL, AP3/PI, AGL104, and SOC1 should be considered candidate regulators of sex determination in E. agallocha.

5 Conclusion

We report the transcriptome of E. agallocha. The data provides important information for understanding the evolutionary history of mangrove. The high number of transcript sequences can be used to discover novel genes, specifically those associated with flower development in E. agallocha. A total of 7923 DEGs were identified between female and male flowers, 73 of which were MADS-box genes (50 belonged to the MIKCc subgroup). Our results show that AP1/FUL, AP3/PI, AGL104, and SOC1 genes play an important role during the formation of floral organs in E. agallocha.

Availability of data and materials

The datasets generated during the current study are available in the NCBI database: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA760657. Supplementary dataset is also available at: https://doi.org/10.6084/m9.figshare.20362563.v5

References

Download references

Code availability

Not applicable

Funding

This study was funded by the National Natural Science Foundation of China (No. 41776148 and No. 31360173), Basic and Applied Basic Research Fund of Guangdong Province (No. 2019A1515110138), and School-level Talents Project of Lingnan Normal University (No. ZL2003 and No. ZL2032).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Ying Zhang, Yan Zhou; methodology: Lulu Hao, Lexiang Huang; formal analysis and investigation: Xiaoming Tang, DanTing Zhou; writing—original draft preparation: Yan Zhou; writing—review and editing: Yan Zhou, Liyun Wang; funding acquisition: Ying Zhang, Liyun Wang; resources: Ying Zhang; supervision: Liyun Wang. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Li Yun Wang or Ying Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

All authors gave their informed consent to this publication and its content.

Competing interests

The authors declare that they have no competing interests.

Additional information

Handling editor: Véronique Jorge.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Hao, L., Huang, L. et al. De novo transcriptome assembly reveals characteristics of flower sex determination of Excoecaria agallocha. Annals of Forest Science 79, 36 (2022). https://doi.org/10.1186/s13595-022-01156-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13595-022-01156-6

Keywords