Whole-genome screening for near-diagnostic genetic markers for four western European white oak species identification

Kremer, Antoine; Delcamp, Adline; Lesur, Isabelle; Wagner, Stefanie; Rellstab, Christian; Guichoux, Erwan; Leroy, Thibault

doi:10.1186/s13595-024-01236-9

Research Paper
Open access
Published: 21 May 2024

Whole-genome screening for near-diagnostic genetic markers for four western European white oak species identification

Annals of Forest Science volume 81, Article number: 21 (2024) Cite this article

802 Accesses
15 Altmetric
Metrics details

Abstract

Key message

Mining genome-wide DNA sequences enabled the discovery of near-diagnostic markers for species assignment in four European white oaks (Quercus petraea (Matt.) Liebl., Quercus pubescens Willd., Quercus pyrenaica Willd., and Quercus robur L.) despite their low interspecific differentiation. Near-diagnostic markers are almost fully fixed in one species and absent in the three others. As a result, only a handful of markers are needed for species identification, making this genetic assay a very promising operational taxonomic assignment procedure in research and forestry.

Context

Identifying species in the European white oak complex has been a long-standing concern in taxonomy, evolution, forest research, and management. Quercus petraea (Matt.) Liebl., Q. robur L., Q. pubescens Willd., and Q. pyrenaica Willd. are part of this species complex in western temperate Europe and hybridize in mixed stands, challenging species identification.

Aims

Our aim was to identify near-diagnostic single-nucleotide polymorphisms (SNPs) for each of the four species that are suitable for routine use and rapid diagnosis in research and applied forestry.

Methods

We first scanned existing whole-genome and target-capture data sets in a reduced number of samples (training set) to identify candidate diagnostic SNPs, i.e., genomic positions being characterized by a reference allele in one species and by the alternative allele in all other species. Allele frequencies of the candidates SNPs were then explored in a larger, range-wide sample of populations in each species (validation step).

Results

We found a subset of 38 SNPs (10 for Q. petraea, 7 for Q. pubescens, 9 for Q. pyrenaica, and 12 for Q. robur) that showed near-diagnostic features across their species distribution ranges with Q. pyrenaica and Q. pubescens exhibiting the highest (0.876) and lowest (0.747) diagnosticity, respectively.

Conclusions

We provide a new, efficient, and reliable molecular tool for the identification of the species Q. petraea, Q. robur, Q. pubescens, and Q. pyrenaica, which can be used as a routine tool in forest research and management. This study highlights the resolution offered by whole-genome sequencing data to design near-diagnostic marker sets for taxonomic assignment, even for species complexes with relatively low differentiation.

1 Introduction

Identifying species in the European white oak complex has been a long-standing concern in evolutionary biology as well as in forest research and management. According to the latest taxonomic classification, there are about 15 oak species in Europe, which form the subsection of the Roburoids within the Quercus section (white oak section) (Denk et al. 2017; Hipp et al. 2020). Within the continent, however, species richness varies, with higher species diversity in the Mediterranean region and in eastern Europe compared to other areas (Camus 1938; Le Hardy de Beaulieu and Lamant, 2006). In western temperate Europe, four white oaks species occur north of the Pyrenees and Alps (Q. petraea (Matt.) Liebl., Q. robur L., Q. pubescens Willd., and Q. pyrenaica Willd.). Co-occurrence of all four species in the same forest is rare. The few reported cases indicate extensive gene flow and admixture between all four species, leading to considerable morphological variations and uncertainties when it comes to taxonomic classification based on morphological characteristics (Lepais et al. 2013; Leroy et al. 2017; Viscosi et al. 2009). The co-occurrence of the three species Q. petraea, Q. robur, and Q. pubescens is more common, especially in the southern parts of the temperate range, for which hybridization and morphological variation are well documented (Dupouey and Badeau 1993; Grandjean and Sigaud 1987; Macejovsky et al. 2020; Rellstab et al. 2016). Finally, forests with co-occurrences of two species and interspecific admixture have also raised questions about species classification. This is especially true for co-occurrences of Q. petraea-Q. robur (Bacilieri et al. 1995; Jurksiene and Baliuckas 2014; Kelleher et al. 2005; Kremer et al. 2002; Yucedag and Gailing 2013) but also for Q. petraea-Q. pyrenaica (Lopez de Heredia et al. 2009), and Q. petraea-Q. pubescens (Bruschi et al. 2000; Reutimann et al. 2020, 2023). This brief overview of species admixture and problems of taxonomic classification based on morphological characteristics highlights the pressing need for a time- and cost-efficient molecular tool for reliable species assignment within European white oaks for use in forest science and management.

In response to this challenge, molecular tools have been continuously improved, and a number of species marker kits have been developed and applied during the last decade (Guichoux et al. 2011; Neophytou 2014; Reutimann et al. 2020; Degen et al. 2021; Schroeder and Kersten 2023). These methods have set new milestones for the delimitation of oak species, but their validity has been constrained by some biological and technical limitations. From a biological point of view, the markers used in the kits are still shared between the species, although interspecific differentiation of the selected markers was higher than in earlier studies. From a technical point of view, the genomic resources explored for selecting the marker candidates were very limited until recently. Using previously published genome-wide data and genome scans targeting genomic positions that maximize differentiation between populations of Q. robur, Q. petraea, Q. pubescens, and Q. pyrenaica, we overcame these limitations and designed a new single-nucleotide (SNP) marker set for range-wide species identification in European white oaks.

Earlier genome scans for species differences showed that interspecific differentiation (F_ST) followed an L-shaped distribution suggesting that there might be highly differentiated markers at an extremely low frequency within the genome (Reutimann et al. 2020; Scotti-Saintagne et al. 2004). Recent analysis of nucleotide diversity in genes underlying species barriers between European white oaks confirmed these expectations (Leroy et al. 2020b). Our approach built on these results by launching a systematic search of so-called species “diagnostic” SNPs within existing genome-wide resources. Ideally, a diagnostic SNP contains a diagnostic allele of a given species that is fully fixed in that species and the alternate allele fixed in the other species. Earlier surveys (Scotti-Saintagne et al. 2004; Reutimann et al. 2020; Lesur et al. 2018) in European white oaks indicated that such ideal cases rarely exist. However, some markers exhibit species frequency profiles close to the ideal case (so-called near-diagnostic SNPs, for example, an allele with a frequency larger than 0.9 in the target species, and alternate allele frequency larger than 0.9 in all other species) (Schroeder and Kersten 2023). Only a few of such markers would then be enough to assign trees to the correct species using appropriate analytical approaches. For example, Reutimann et al. (2020) showed that five SNPs were enough for correctly classifying 95% of Q. robur reference trees, although the single SNPs were far from being diagnostic.

In this study, we explored pool-sequenced whole-genome libraries of natural populations of four white oak species (Q. petraea, Q. pubescens, Q. pyrenaica, and Q. robur) (Leroy et al. 2020b) and genome-wide capture-based sequences of Q. petraea and Q. robur (Lesur et al. 2018) to identify near-diagnostic SNPs for each of the four species. We describe the approaches and methods used to discover near-diagnostic SNPs and explore the stability of diagnosticity over the distribution range of the four species.

Our main goal was to identify and validate a new set of near-diagnostic SNPs that can be used in the development of an efficient and cost-effective molecular tool for forest research and management. To this end, we focused on the variation of near-diagnostic SNPs across species, between populations within each species and between SNPs. We finally addressed the evolutionary drivers that may have contributed to the maintenance and/or modification of diagnosticity within the genome and throughout the distribution range of the four species.

2 Material and methods

2.1 Discovery of near-diagnostic markers

The discovery of near-diagnostic SNPs was conducted by scanning oak genomic data that have been generated in earlier studies assessing genomic diversity and differentiation in the four sympatric white oak species (Quercus petraea, Q. pubescens, Q. pyrenaica, Q. robur; Leroy et al. 2017 and 2020b, Lesur et al. 2018).

2.1.1 Discovery of near-diagnostic SNPs in whole genome pool-sequenced (pool-seq) resources

Pool sequencing

In Leroy et al. (2020b), we used leaf and bud samples from up to 20 adult trees of the four species coming from four different forests located at maximum 200 km away from each other in South-West of France (Table 4 in Appendix). The sampled stands were of mixed oak composition (generally two or three species) and of natural origin. DNA extracts were pooled in equimolar amounts to obtain a single pool for each species. Libraries were then sequenced on 9 to 10 lanes for each of the four species (1 pool per species) on a Illumina HiSeq 2000 sequencing platform (Leroy et al. 2020b for details). In this study, to reduce the computation load, we only used two lanes per pool from SRA, namely ERR2215923 and ERR2215924, ERR2215937 and ERR2215938, ERR2215909 and ERR2215910, and ERR2215916 and ERR2215917 for Q. pubescens, Q. petraea, Q. pyrenaica, and Q. robur, respectively. Raw reads were then trimmed using Trimmomatic (v. 0.33, Bolger et al. 2014). Bases off the start and off the end of each read were removed if their quality fell below 3 (LEADING:3, TRAILING:3). Then, a sliding window was performed using SLIDINGWINDOW:4:15. Starting at the 5′ end, the scanning cut the read once the average quality within the 4-bp window falls below 15. By considering multiple bases, a single quality base below the threshold of 15 will not cause the removal of high-quality data later in the read. Following these steps, only reads longer than 50 bp were kept for the analysis (MINLEN:50).

Mapping and SNP calling

Data from two sequencing lanes per species (from up to 10 lanes per species in Leroy et al. 2020b) were then mapped against the oak haplome assembly (“PM1N,” Plomion et al. 2018) using BWA-MEM (Li 2013). PCR duplicates were removed using Picard v. 1.140 (http://broadinstitute.github.io/picard/). Samtools v.1.1 (Li 2011) and PoPoolation2 v. 1.201 (Kofler et al. 2011) were then used to call biallelic SNPs with at least 10 reads of alternate alleles and a depth between 50 and 2000 × at each position. To ensure a reasonably low rate of false positives due to Illumina sequencing errors, all SNPs with a MAF lower than 0.05 were discarded. A total of 24,345,915 SNPs were identified and then screened for their diagnostic value (see next paragraph).

Genome scan for near-diagnostic SNPs

Allele frequencies were computed from the SNP-frequency-diff.pl script of PoPoolation2. SNPs exhibiting a high difference in allele frequency (Δp > 0.9 between the focal species and all other species) were then selected. All candidate near-diagnostic SNPs with a coverage lower than 80 in the four populations were discarded, in order to ensure that the high Δp was not associated with inaccurate allele frequency estimation in low coverage regions. Despite the relatively limited linkage disequilibrium in oaks (Coq-Etchegaray et al. 2023) even in species barrier regions (Leroy et al. 2020b), the relatively high nucleotide diversity in oaks (Plomion et al. 2018; Saleh et al. 2022) allows several neighboring SNPs to be identified by this screening. We therefore selected the best SNPs per identified region considering the constraints associated with the SNP design (see below).

2.1.2 Discovery of near-diagnostic SNPs in sequence-captured (seq-cap) genomic resources

In addition to the pool-seq resources, we mined a separate genome-wide resource that came from a sequence capture experiment of Q. robur and Q. petraea aiming at calling SNPs for inferring genomic relatedness among trees (Lesur et al. 2018). Here, the discovery population consisted of a far larger panel (245 adult trees in total) equally distributed between Q. petraea and Q. robur growing in the Petite Charnie forest located in the western part of France (Table 5 in Appendix). We used the capture data in complement of the pool-seq data to ensure a higher diagnosticity of the markers for this specific pair, given the larger panel of Q. robur and Q. petraea samples available in the capture data. The capture-based assay consisted in sequencing 2.9 Mb (15,623 target regions) on an Ion Proton System (Thermo Fisher, Scientific, Waltham, MA, USA) covering both genic and intergenic regions and resulted in the calling of more than 190,000 SNPs with a coverage of more than 10 × (Lesur et al. 2018). The study provided allele frequencies of each SNP, and we screened the total set of SNPs for their differentiation between Q. petraea and Q. robur, by ranking their F_ST values to complete the discovery panel. Although limited to two species (Q. petraea and Q. robur), this data set corresponded also to a genome-wide exploration of species differentiation implemented on a larger population sample (Lesur et al. 2018). It was therefore selected for this study, pending its relevance for selecting near-diagnostic markers for the remaining two species (Q. pubescens and Q. pyrenaica), which is investigated in this study.

2.2 Training and validation of near-diagnostic SNPs

2.2.1 Training populations

The candidate diagnostic SNPs of the discovery panel were first tested on a limited number of oak individuals (training set), randomly sampled in 7 to 14 populations per species (in total 19 to 48 sampled trees per species, Table 6 in Appendix). The training experiment was conducted over two sessions that took place during two periods (training 1 and training 2, Table 6 in Appendix) with different samples (but from the same geographic range). The two sessions differed only by the samples included which was constrained by the availability of the material. The objective of the training step was to check whether the candidate SNPs exhibited near-diagnostic frequency profiles in natural populations originating mainly from the area of the discovery panel. The training step also included quality controls and repeatability assessments of the genotyping assay (see results paragraph 3.3.1).

2.2.2 Validation populations

Given that the discovery and training of diagnostic SNPs were implemented on limited number of trees originating mostly from the western part of the distribution of the four species, we included a round of validation by increasing the sample sizes of the training populations and enlarging the collection of populations, studying the SNP diagnosticity across a larger part of the species’ natural distribution (Fig. 1). Additionally, the validation step aimed also at reducing the number of SNPs, while still maintaining overall multilocus diagnosticity, in order to produce a low-cost and easy-to-use screening tool in operational forestry. In total, 24 populations of Q. petraea, 10 of Q. pubescens, 6 of Q. pyrenaica, and 19 of Q. robur were part of the validation set, representing in total 977 trees (Fig. 1 and Table 6 in Appendix). All samples were collected in natural populations, and their taxonomic status was assessed by the local collectors based on leaf morphology and were not ckecked after their receipt. A few misclassifications were detected after the molecular assay (Arbalan, Killarney, Dalkeith). Sampled populations were in most cases of mixed oak composition. Some of the populations were used in earlier large-scale genetic surveys (Gerber et al. 2014; Kremer et al. 2002); others were purposely collected for this study.

2.2.3 Genotyping assay

Medium-throughput SNP genotyping assays were implemented on single tree DNA extracts using the MassARRAY® technology (Agena Bioscience, San Diego, CA, USA). The assay design, using the MassARRAY Assay Designer version 4.0.0.2, was performed on candidate SNPs from pool-seq and seq-cap resources. Nine multiplexes, for a total of 359 SNP (eight 40-plex and one 39-plex), were designed for identifying the best markers. Genotyping was performed using iPLEX Gold chemistry following Ellis and Ong (2017) on a MassARRAY System CPM384 (Agena Biosciences) at the PGTB platform (https://doi.org/10.15454/1.5572396583599417E12). Data analysis was achieved using MassARRAY Typer Analyzer 4.0.4 (Agena Biosciences). After genotyping, we excluded all markers for which there was evidence that the candidate SNP identified during the discovery step was not recovered, for example, when the SNP exhibited fixation across the four species at the same allele. We also discarded loci with weak (magnitude < 5) or ambiguous signal (i.e., displaying more clusters than expected or unclear cluster delineation) and loci with more than 20% missing data. Following this selection process, 61 SNPs (in two multiplexes) were selected on the basis of their near-diagnostic value and their compatibility in one multiplex kit for subsequent genotyping on all the samples.

2.2.4 Diagnosticity of candidate SNPs

Standard genetic statistics (allele frequencies, diversity statistics, differentiation, and fixation indices) were estimated using GENEPOP (Raymond and Rousset 1995) and ADEGENET software (Jombart 2008). Genotypic arrays of sampled trees in training and validation populationa are available at https://doi.org/https://doi.org/10.57745/0JYLZU (Kremer et al. 2024a).

We defined a metric of species diagnostic accuracy, which we coined “diagnosticity” index (D) to screen SNP alleles for their ability to be close to full diagnosticity.

Full diagnosticity requires two properties: fixation of the diagnostic allele in the target species and fixation of the alternate allele in the remaining species. These two properties are included in the metric D. Considering a set of n species, diagnosticity of an allele for species x (D_x) regarding the remaining (n-1) species could be expressed as follows:

$${D}_{x}={p}_{x}-\frac{1}{n-1}\sum\limits_{j=1}^{n-1}{p}_{j}$$

where p_x is the frequency of the candidate diagnostic allele in the target species x and p_j the frequency of the same allele in the alternate species j. D_x amounts to the difference of allelic frequencies between species x and the remaining (n-1) species. D_x is equivalent to the mean Gregorius genetic distance between species x and the three other species for a diallelic locus (Gregorius 1984).

D_x has two components, which account for the two properties of diagnosticity.

p_x: The higher the p_x, the closer the near-diagnostic allele to fixation in the target species.
$\frac{1}{n-1}\sum_{j=1}^{n-1}{p}_{j}$: The lower the mean value of p_j, the closer the alternate allele to fixation in the remaining (n-1) species.

D_x is more appropriate for practical diagnostic assessments than the traditional differentiation metric F_ST when more than two species are involved (see Gregorius and Roberts (1986) for a comparison of D and F_ST). To illustrate the discrepancy between D and F_ST regarding diagnosticity, consider the case of four species with frequency profiles (p₁ = 1, p₂ = 1, p₃ = 0, p₄ = 0). Addressing diagnosticity for species 1, F_ST would yield 1, while D₁ would yield 0.67. D₁ accounts for the the lack of frequeny differences between species 1 and 2, while F_ST does not.

By extension of the definition of a diagnostic allele, a near-diagnostic SNP is a SNP bearing near-diagnostic alleles, and diagnosticity of a species (or a population of that species) refers to the mean value of all near-diagnostic SNPs assessed for that species or population. Diagnosticity of candidate SNPs is estimated in the training and validation populations.

2.2.5 Multilocus species clustering

To validate the selected near-diagnostic SNP for a multilocus species assignment procedure, we implemented an empirical clustering approach using principal component analysis, free of any underlying evolutionary assumptions (ADEGENET, Jombart 2008). This method allows to check for the ability of the near-diagnostic SNPs to visually discriminate the four species.

3 Results

3.1 Discovery of near-diagnostic SNPs

All together, we recovered 61 candidate near-diagnostic alleles, 49 originating from the pool-seq study, and 12 from the seq-cap analysis (Table 7 in Appendix). The candidate SNPs are distributed over all chromosomes (except chromosome 4), and their number ranges from 1 (chromosomes 3, 9, and 12) to 17 (chromosome 2, Fig. 2). In a few cases, near-diagnostic markers of a given species clustered in pairs in a few spots (mainly for Q. robur on chromosome 2, 5, 6, Fig. 2). In such cases, one marker of the pair was discarded during the validation step. Near-diagnostic markers are distributed over six chromosomes for Q. petraea, Q. pubescens, and Q. pyrenaica and over eight chromosomes for Q. robur. As indicated by their location on the chromosomes, the minimum physical distance of near-diagnostic SNPs located on the same chromosomes was 17 Kb (Table 7 in Appendix). All except two SNPs are located on scaffolds that are anchored on the pseudo-chromosome assembly of the oak genome as shown in Fig. 2.

3.2 Diagnosticity of candidate SNPs in the training set

The 61 candidate near-diagnostic SNPs exhibited allele frequency profiles close to the requisite properties of a diagnostic SNP but did not fulfill entirely criteria of full diagnosticity (Fig. 2, Fig. 6 in Appendix). D values indeed varied between 0.283 and 0.963. Most of the near-diagnostic SNPs (92%, 56/61) exhibited D scores greater than 0.50 (mean value 0.758). Among the 61 SNPs, 16 are candidate near-diagnostic of Q. petraea, 11 of Q. pubescens, 12 of Q. pyrenaica, and 22 of Q. robur.

Diagnosticity scores were higher in the pool-seq uncovered set (D = 0.771) than in the seq-cap uncovered set (D = 0.704).

Concerning the near-diagnostic SNPs identified with the pool-seq data, diagnosticity was highest for Q. pyrenaica (0.897) and Q. robur (0.780) and lower in Q. petraea (0.736) and Q. pubescens (0.657). Deviations to full diagnosticity in the two latter species are associated with different patterns (Fig. 6 in Appendix):

• Lower diagnosticity in Q. petraea was mostly related to the sharing of the near-diagnostic allele with the other species, especially with Q. pubescens.
• Lower diagnosticity for Q. pubescens was mainly due to three SNPs (Sc0000170_630013, Sc0000192_329301, and Sc0000482_334917) that showed substantial deviation from fixation within Q. pubescens (frequency being respectively 0.468, 0.587, 0.283), while the alternate alleles were fixed in the three other species.

Concerning the seq-cap uncovered SNPs, we selected 12 SNPs that exhibited the highest species differentiation in the Petite Charnie population. As expected, all 12 SNPs showed strong frequency differences between Q. petraea and Q. robur in our training panel. Eight out of the 12 SNPs exhibited allele frequency differences among the four species consistent with diagnosticity requirements for four species, with the near-diagnostic marker being almost fixed in the reference diagnostic species and present at very low frequencies in all the three remaining species (Fig. 6 in Appendix). The four remaining candidate SNPs exhibited near-diagnostic alleles being almost fixed, not only in one but in two species:

• Sc0000040_1694351 in Q. petraea and Q. pubescens
• Sc0000481_366275 in Q. robur and Q. pyrenaica
• Sc0000546_456229 in Q. robur and Q. pyrenaica
• Sc0000598_295142 in Q. robur and Q. pyrenaica

3.3 Validation of the near-diagnostic SNPs

3.3.1 Screening of near-diagnostic SNPs

The validation step aimed at verifying the diagnosticity of the candidate SNPs on a larger geographic scale while at the same time optimizing the assay by selecting the best SNPs according to various genetic and technical criteria. We thus attempted to optimize the MassARRAY® genotyping assays by reducing the number of near-diagnostic SNPs and combine them in one final assay, without limiting the species assignment purpose and reducing its diagnosticity. Indeed, given the frequency profiles of near-diagnostic alleles we observed in the training set (Fig. 6 in Appendix), the required number of near-diagnostic SNPs for species assignment can be limited to a handful of markers (Reutimann et al. 2020). We aimed at selecting about 10 near-diagnostic SNPs per species for the final design of the operational assay. The following criteria were applied (Table 7 in Appendix):

• Repeatability and clarity of the cluster delimitation on the scatter plots
• Diagnosticity of SNPs
• A nearly equal number of near-diagnostic SNPs per species

Combining the remaining SNPs within one or two multiplex sets resulted in amplification incompatibilities among SNPs which lead us to discard additional SNPs. Finally, a total of 10 near-diagnostic SNPs were selected for Q. petraea, 7 for Q. pubescens, 9 for Q. pyrenaica, and 12 for Q. robur (Table 8 in Appendix).

3.3.2 Allele frequency profiles of near-diagnostic SNPs in the validation populations

Overall, the average diagnosticity of the 38 near-diagnostic SNPs was slightly higher in the validation than in the training populations, with the exception of Q. pyrenaica (Fig. 3, Fig. 6 in Appendix): 0.784 (validation) vs 0.715 (training) in Q. petraea, 0.747 vs 0.690 in Q. pubescens, 0.876 vs 0.897 in Q. pyrenaica, and 0.841 vs 0.758 in Q. robur. The lower diagnosticity of Q. pyrenaica in the validation set (vs the training set) was due to SNP Sc0000307_852597, which exhibited contrasting values between the training (0.753) and validation set (0.546) (Table 8 in Appendix).

However, the validation populations provided the opportunity to explore the stability of the allele frequency profiles across geographic regions and thus addressed the maintenance of diagnosticity of individual SNPs across the distribution of the four species. Most near-diagnostic SNPs exhibited larger genetic differentiation between populations within a given species than usually found (Scotti-Saintagne et al. 2004) in oak species (Tables 1, 2, 3). Mean intraspecific F_ST values of near-diagnostic SNPs amounted to 0.104, 0.192, 0.042, and 0.104 for Q. petraea, Q. pubescens, Q. pyrenaica, and Q. robur, respectively. Furthermore, F_ST values within a species exhibited large variation among SNPs. For example, F_ST values of near-diagnostic SNPs of Q. petraea between Q. petraea populations varied between 0.012 and 0.252. Quercus pyrenaica is an exception to these general rules, as the mean F_ST (0.042) is much lower than for the three other species and the range of variation reduced (− 0.022 to 0.142, data not shown).

Table 1 Frequencies and differentiation of near-diagnostic alleles of Q. petraea in Q. petraea populations

Full size table

Table 2 Frequencies and differentiation of near-diagnostic alleles of Q. pubescens in Q. pubescens populations

Full size table

Table 3 Frequencies of near-diagnostic alleles and differentiation of Q. robur in Q. robur populations

Full size table

3.3.3 Allele frequency profiles of near-diagnostic SNPs in Q. petraea populations

We examined the geographic distribution of near-diagnostic alleles between populations within a given species. To illustrate the results, we selected populations that are representative of the variation observed among all populations. We first selected a few widely distributed populations that exhibited allele frequencies at all SNPs close to the expected diagnosticity (“ED populations”: Tronçais, Lappwald, and Bézange), and added all the populations that deviate from the ED population frequency profiles, which we called populations with diverging diagnosticiy (“DD populations”). The DD populations included three extreme southern populations (Pomieri and Aspromonte in Italy, Montejo in Spain) and one population from the northern distribution edge (Killarney). All the remaining Q. petraea populations exhibited frequency profiles similar to the selected ED populations and are not shown in Table 1 and in Fig. 4. While the ED populations exhibited almost full fixation in all near-diagnostic SNPs, the DD populations showed substantial polymorphism (i.e., lower diagnosticity) at a few SNPs in Pomieri and Aspromonte (Sc0000043_1651618, Sc0000135_261350, Sc0000274_909817), and moderate polymorphism distributed among more SNPs in Killarney and Montejo.

Additionally, we examined the occurrences of near-diagnostic alleles of the other three species in Q. petraea populations (Fig. 4). The DD Q. petraea populations exhibited elevated frequencies of near-diagnostic alleles of Q. pubescens (Pomieri and Aspromonte) or Q. robur (Killarney and Montejo). Thus, they deviated also in respect to the expected frequencies of near-diagnostic alleles of the three other species, contrary to the ED populations (Fig. 4).

3.3.4 Allele frequency profiles of near-diagnostic SNPs in Q. pubescens, Q. robur, and Q. pyrenaica populations

To illustrate the intraspecific differentiation of near-diagnostic SNPs in the other three species, we followed the same procedure as for Q. petraea. We selected for each species two sets of populations: a subset of populations exemplifying the pattern close to full fixation of near-diagnostic loci at all SNPs (ED populations) and the set of populations that exhibited deviations to this trend (DD populations).

In the case of Q. pubescens, the DD populations (Switzerland and Ventoux) were located at the central northern edge of distribution. These deviations were not evenly distributed across the seven near-diagnostic SNPs of Q. pubescens but restricted to the same loci in the two populations (Table 2). The two populations Switzerland and Ventoux exhibited also higher frequencies of Q. petraea near-diagnostic alleles, in comparison to the two ED populations (Fig. 7 in Appendix).

In the case of Q. robur, there were also two DD populations located at the southwestern (Pedro) and northwestern margin of the distribution (Roudsea) (Table 3). These two populations comprised also larger frequencies of near-diagnostic alleles of other white oak species (Q. pubescens and Q. pyrenaica in the case of Pedro; Q. petraea in the case of Roudsea) (Fig. 8 in Appendix). Finally, in Q. pyrenaica, all populations behave as ED populations (data not shown), e.g., all Q. pyrenaica populations exhibited frequency profiles similar to those shown for Q. pyrenaica in Fig. 3 and Table 8 in Appendix.

3.4 Multilocus structure of near-diagnostic SNPs

We used a principal component analysis (PCA) in the validation populations to assess and illustrate species differentiation (Fig. 5). We added 13 samples of known first-generation hybrid origin to the species samples. Ten samples resulted from controlled interspecific crosses, and three came from parentage analysis conducted in a mixed Q. petraea-Q. robur stand (Truffaut et al. 2017). A combination of the three first components allowed to visually differentiate the four different species. While principal component 1 differentiated mainly Q. petraea and Q. robur (Fig. 5a), component 3 distinguished Q. pyrenaica from the three other species (Fig. 5b), and the biplot of components 2 and 3 provided the best visual separation between Q. pubescens and Q. petraea (Fig. 5c).

These multilocus representations showed that there is a small number of samples located at intermediate positions, especially between Q. petraea and Q. robur (Fig. 5a) and between Q. petraea and Q. pubescens (Fig. 5c). These regions of the PCA are also occupied by known interspecific hybrids, suggesting that the species samples, although identified as pure species in the field, represent either hybrids or introgressed forms. These intermediate positions are also preferentially occupied by trees belonging to diverging populations, as shown by the targeted PCA analysis on the two pairs of species sharing intermediate samples: Q. petraea and Q. pubescens (Fig. 9 in Appendix) and Q. petraea and Q. robur (Fig. 10 in Appendix).

4 Discussion

We explored large-scale existing genomic resources in four European white oaks of the subsection Roburoid (Q. petraea, Q. pubescens, Q. pyrenaica, Q. robur) to screen their genomes for near-diagnostic SNPs that could be used for molecular fingerprinting (species and hybrid identification) in forest research and operational forestry, as wood or seed traceability in the wood chain and in forest nurseries. Despite the widely reported low interspecific genetic differentiation among European white oak species, we were able to identify a subset of SNPs that exhibited near-diagnostic features across their species’ distribution ranges. Moreover, mutlivariate analysis showed that these markers can be used for reliable hybrid detection and accurate quantification of admixture levels. However, diagnosticity varied substantially among species, among populations within species, and among SNPs. In the following, we discuss these variations in relation to the known evolutionary history and genetic interactions among and within the four species.

4.1 Variation of diagnosticity among species

Diagnosticity was highest in Q. pyrenaica (0.876) and lowest in Q. pubescens (0.747) with Q. robur and Q. petraea showing intermediate values. Near-diagnostic SNPs are likely located in genomic regions that exhibit larger divergence and/or regions less permeable to interspecific gene flow. The range of diagnosticity among the four species may therefore reflect the variation of divergence time and/or the variation of the intensity of gene flow during the ongoing interglacial period.

It is striking to notice that higher and lower diagnosticity was observed for species that showed the older (Q. pyrenaica, Q. robur) and more recent (Q. petraea, Q. pubescens) divergence, respectively (Leroy et al. 2017). Fixation of near-diagnostic SNPs in species with large population sizes as in oaks requires long time periods. Consequently, lower diagnosticity is likely associated with species that diverged more recently. This is illustrated by Q. pubescens, which shows lower diagnosticity due to the higher sharing of near-diagnostic alleles with Q. petraea than with the other two species (Fig. 3 and Fig. 6 in Appendix). Diagnosticty may in addition be dependent on the variation of population size (Ne) among species and along divergence, for which we lack any estimation today. Our results may therefore be revisited in the light of future evidence of Ne differences. Regarding gene flow, we showed earlier that the four species came into contact only recently, during the late last glacial maximum, after being isolated for most of their earlier history (Leroy et al. 2020b, 2017), resulting in gene flow among species. While interfertility among the four species has been shown experimentally by controlled crosses (Lepais et al. 2013), hybridization in natura has also been observed among the four species in rare mixed forests where all four species co-occur (Lepais and Gerber 2011; Lepais et al. 2009). Interspecific matings of Q. pyrenaica in controlled crosses with the remaining three species were quite successful; however, occurrences of natural hybridization were less frequent due to the very late flowering of Q. pyrenaica in comparison to the three other species (Lepais and Gerber 2011; Lepais et al. 2013). Furthermore, Q. pyrenaica is mainly distributed in southwestern Europe, where the other three species are only present in scattered forests, leading, for example, to reported but rare hybridization with Q. petraea (Valbuena-Carabana et al. 2005) and Q. robur (Moracho et al. 2016). Altogether, phenological prezygotic barriers and limited overlapping distributions with the other three species may have contributed to reduced genetic exchanges between Q. pyrenaica and the other three species and thus account for the high diagnosticity of the SNPs in of Q. pyrenaica. In contrast to Q. pyrenaica, no reproductive barriers were observed in Q. pubescens when crosses were made with Q. petraea as female parent, as interspecific crosses were as successful as intraspecific crosses (Lepais et al. 2013). Reduced barriers between these two species were corroborated by frequent admixture detected in genetic surveys conducted in mixed stands of Q. pubescens and Q. petraea (Alberto et al. 2010; Neophytou 2014; Reutimann et al. 2023). As a result, near-diagnostic SNPs of Q. pubescens and Q. petraea were more frequently shared between the two species (Fig. 3 and Fig. 6 in Appendix), thus contributing to reduced diagnosticity. Finally, interspecific gene exchanges involving Q. robur were mainly investigated with regard to Q. petraea. Uneven gene flow has been repeatedly observed in mixed stands with limited pollination from Q. robur to Q. petraea (Bacilieri et al. 1996; Lagache et al. 2013; Lepais et al. 2013), with a few exceptions in stands of unbalanced mixtures (Gerber et al. 2014). Uneven and unidirectional gene exchanges between these two species may have resulted in higher diagnosticity of Q. robur in comparison to Q. petraea.

4.2 Variation of diagnosticity among populations

There are striking differences of species diagnosticity of the markers among populations within species (Tables 1, 2, and 3). In populations of Q. petraea, Q. pubescens, and Q. robur located in the central part of their distributions, high levels of diagnosticity (mean values of SNP diagnosticity of the population) could be observed, while in populations located at the margins of the distributions, southern as well as northern, lower diagnosticity was found. We further showed that populations located at the edges of distribution are characterized by higher frequencies of near-diagnostic alleles of the other three congeneric species, suggesting extensive genetic exchanges (Fig. 4, Figs. 7 and 8 in Appendix). More frequent interspecific gene flow at the northern edge of distribution has been shown earlier in the case Q. petraea and Q. robur (Beatty et al. 2016; Jensen et al. 2009; Gerber et al. 2014) and has been interpreted as a driver of the succession dynamics at the northern colonization front of the two species (Kremer and Hipp 2020; Petit et al. 2003). In our study, the sessile oak population Killarney (Fig. 4) and the pedunculate oak population Roudsea (Fig. 8 in Appendix) are typical examples illustrating interspecific gene flow between the two species. Similar observations of more frequent hybridization were made in the case of Q. petraea and Q. pubescens at the northern edge of distribution of Q. pubescens (Neophytou et al. 2015; Reutimann et al. 2020), which may have as well contributed to the expansion of Q. pubescens.

In populations located at the southern edge of distribution (Pomieri, Aspromonte, and Montejo for Q. petraea, Fig. 1 and Fig. 4), the lower diagnosticity may have resulted from more ancient genetic exchanges with Q. pubescens and Q. robur, not excluding the potential role of genetic drift in isolated populations. Indeed, the two italian populations (Pomieri and Aspromonte) in Sicilia and Calabria consist today in almost pure isolated stands, where Q. pubescens is extremely rare, if not absent (Bagnato et al. 2012; Modica 2001), while our results indicated introgression of Q. pubescens into Q. petraea (Fig. 4). Similarly, the sessile oak population Montejo, in Central Spain, is introgressed by Q. robur (Fig. 4), where the latter species is absent today and where contemporary hybridization has rather been detected with Q. pyrenaica (Valbuena-Carabana et al. 2005). Finally, a similar scenario holds for the pedunculate oak population Pedro, which is located at the extreme southern edge of distribution of Q. robur (Fig. 1; Table 3, Fig. 8 in Appendix). Hybridization has been observed with Q. pyrenaica which is today the most frequent species in the area (Moracho et al. 2016) and is confirmed by our results revealing the presence of Q. pyrenaica near-diagnostic alleles in the Q. robur population (Fig. 8 in Appendix). However, introgression by Q. pubescens is even more pronounced in our data despite the today’s absence of Q. pubescens in Extremadura (Fig. 8 in Appendix). To sum up, when comparing our results with previous investigations on interspecific gene flow, recent and/or ancient gene exchanges have faded diagnosticity in the so-called diverging populations, which are located at the northern or southern margins of the distribution.

4.3 Variation of diagnosticity among SNPs

Frequency profiles of near-diagnostic alleles differed markedly across SNP in diverging populations. There were cases where lack of diagnosticity affected mainly the same limited number of loci in a given species (Aspromonte and Pomieri in Q. petraea, Table 1; Pedro in Q. robur, Table 3; and to a smaller extend Switzerland and Ventoux in Q. pubescens, Table 2). In the remaining diverging populations (Killarney for Q. petraea, Table 1, and Roudsea for Q. robur, Table 3), reduced diagnosticity is more evenly distributed across more if not all loci. Contrasting diagnosticity distribution across loci may likely correlate to the timing of hybridization and introgression among the congeneric species. Recent gene exchanges, as first generation hybridization and subsequent backcrosses, will indistinctly impact all loci during the early phase of secondary contact among species and result in reduced diagnosticity of alleles in sympatric species. Such a scenario may hold for the two northern Q. petraea (Killarney) and Q. robur (Roudsea) populations. Continuous gene exchanges over multiple generations may ultimately result in heterogeneous genomic landscapes, shaped by variable permeability to gene flow along the chromosomes due to the presence of prezygotic or postzygotic barriers and the heterogeneous recombination landscape. This scenario leads ultimately to the maintenance of near-diagnostic loci in genomic regions impermeable to gene flow, while the remaining part of the genome will become poorly differentiated. While this scenario was supported by ABC simulations (Leroy et al. 2020b, 2017), our results further suggest that the genomic distribution of near-diagnostic loci is environment dependent. It is striking that a very limited number of near-diagnostic alleles discovered in western populations of Q. petraea show poor diagnosticity in the southern populations Pomieri and Aspromonte (Table 1). Our results further indicated that this low diagnosticity may be due to more interspecific gene flow with Q. pubescens, which suggest preferentially introgression in specific genomic regions—whether adaptive or not—resulting ultimately in heterogeneous genomic distribution of near-diagnostic SNPs especially in marginal range parts. In a recent paper, we showed that introgressed regions between Q. robur and Q. petraea may be more frequent at higher altitudes (Leroy et al. 2020a), while in another case study in two Asian oak species, the authors found that the genomic landscape of introgression changed in different ecological settings (Fu et al. 2022). A similar picture holds for the diverging southern Q. robur population Pedro, where diagnosticity is substantially reduced at a few near-diagnostic SNPs in comparison to other Q. robur populations (Table 3), most likely due to introgression by Q. pubescens and Q. pyrenaica (Fig. 8 in Appendix). Anecdotally, the diverging status of Aspromonte, Pomieri, and Pedro echoes with the taxonomic subspecies status that has been assigned to the Sicilian and Calabrian Q. petraea populations (Q. petraea ssp. austrothyrrenica, Bagnato et al. 2012; Lupini et al. 2019; Merlino et al. 2014) and to the extreme southern spanish Q. robur populations (Q. robur ssp. estremadurensis, Vazquez-Pardo et al. 2009).

5 Conclusions and outlook

Here, we showed that near-diagnostic marker development for species identification is feasible despite few species barriers, extensive secondary contact, and, consequently, frequent hybridization and introgression. Recently, we demonstrated that the set of near-diagnostic markers resolved species assignment on fossil and archeological oak wood remains, where anatomical features do not allow to discriminate the four deciduous species (Wagner et al. 2024). With the steadily ongoing availability of whole genomes in non-model species including oaks (Lazic et al. 2021), the search of near-diagnostic markers could be extended to the whole Roburoid subsection facilitating white oak species assignment throughout Europe, beyond the subset of four species that we considered here. The near-diagnostic SNPs for the four white oak species could not only be used in forest research and management for reliable and affordable species assignment but also to identify admixed individuals and accurately quantify admixture levels in natural populations (Reutimann et al. 2020). Because the presented alleles are often almost fixed for the target species, these SNPs also allow the identification of hybrid state (F1, F2, backcrosses, later generation hybrids, etc.) with methods like NEWHYBRIDS (Anderson 2008) and altogether help to understand the importance of hydribization and introgression in evolutionary processes. Together with prospect of emergence of field-based genotpying techniques (Urban et al. 2021), such near-diagnostic markers would even allow fast fingerprinting in situ to make decision for forest managers and scientists.

Availability of data and materials

The data are accessible at the INRAE data repository via a DOI URL address (https://doi.org/https://doi.org/10.57745/0JYLZU).

References

Alberto F, Niort J, Derory J, Lepais O, Vitalis R, Galop D et al (2010) Population differentiation of sessile oak at the altitudinal front of migration in the French Pyrenees. Mol Ecol 19(13):2626–2639. https://doi.org/10.1111/j.1365-294x.2010.04631.x
Article CAS PubMed Google Scholar
Anderson EC (2008) Bayesian inference of species hybrids using multilocus dominant genetic markers. Phil Trans r Soc B 363:2841–2850. https://doi.org/10.1098/rstb.2008.0043
Article PubMed PubMed Central Google Scholar
Bacilieri R, Ducousso A, Petit RJ, Kremer A (1996) Mating system and asymmetric hybridization in a mixed stand of European oaks. Evolution 50(2):900–908. https://doi.org/10.1111/j.1558-5646.1996.tb03898.x
Article PubMed Google Scholar
Bacilieri R, Ducousso A, Kremer A (1995). Genetic, morphological, ecological, and phenological differentiation between Quercus petraea (Matt) Liebl and Quercus robur L in a mixed stand of nortwest of France. Silvae Genetica 44(1): 1–10. https://www.thuenen.de/media/institute/fg/PDF/Silvae_Genetica/1995/Vol._44_Heft_1/44_1_1.pdf
Bagnato S, Merlino A, Mercurio R, Solano F, Scarfo F, Spampinato G (2012). Le basi conoscitive per il restauro forestale: il caso di Bosco Pomieri (Parco Regionale delle Madonie, Sicilia). Forest@ 9: 8–19. https://doi.org/10.3832/efor0679-008
Beatty GE, Montgomery WI, Spaans F, Tosh DG, Provan J (2016) Pure species in a continuum of genetic and morphological variation: sympatric oaks at the edge of their range. Ann Bot 117:541–549. https://doi.org/10.1093/aob/mcw002
Article CAS PubMed PubMed Central Google Scholar
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Article CAS PubMed PubMed Central Google Scholar
Bruschi P, Vendramin GG, Bussotti F, Grossoni P (2000). Morphological and molecular differentiation between Quercus petraea (Matt.) Liebl. and Quercus pubescens Willd. (Fagaceae) in Northern and Central Italy. Annals of Botany 85(3): 325–333. https://doi.org/10.1006/anbo.1999.1046
Camus A (1938). Les chênes. Monographie du genre Quercus. Tome II. Genre Quercus. Sous-genre Euquercus. Paul Lechevalier: Paris 830
Caudullo G, Welk E, San-Miguel-Ayanz J (2017) Chorological mapsfor the main European woody species. Data Brief 12:662–666. https://doi.org/10.1016/j.dib.2017.05.007
Article PubMed PubMed Central Google Scholar
Coq-Etchegaray D, Bernillon S, Le-Provost G, Kremer A, Ducousso A, Lalanne C, Bonne F, Moing A, Plomion C, Brachi B (2023) Extensive variation of leaf specialized metabolite production in sessile oak (Quercus petraea) populations is to a large extent genetically determined but not locally adaptive. Preprint at bioRxiv. https://doi.org/10.1101/2023.04.07.536008
Article Google Scholar
Degen B, Blanc-Jolivet C, Bakhtina S, Ianbaev R, Yanbaev Y, Mader M, Nurnberg S, Schröder H (2021) Applying targeted genotyping by sequencing with a new set of nuclear and plastid SNP and indel loci for Quercus robur and Quercus petraea. Conserv Genet Resour 13:345–347. https://doi.org/10.1007/s12686-021-01207-6
Article CAS Google Scholar
Denk T, Grimm GW, Manos PS, Deng M, Hipp AL (2017) An updated infrageneric classification of the oaks: review of previous taxonomic schemes and synthesis of evolutionary patterns. In: Gil-Pelegrin E, Peguero-Pina JJ, Sancho-Knapik D (eds) Oaks physiological Ecology. Springer, Exploring the functional diversity of the genus Quercus L, pp 13–38
Google Scholar
Dupouey JL, Badeau V (1993) Morphological variability of oaks (Quercus robur L, Quercus petraea (Matt)Liebl, Quercus pubescens Willd) in northeastern France. Preliminary Results Ann for Sci 50:35s–40s. https://doi.org/10.1051/forest:19930702
Article Google Scholar
Ellis JA, Ong B (2017). The MassARRAY® System for targeted SNP genotyping. In: White S, Cantsilieris S (eds) Genotyping. Methods in Molecular Biology, vol 1492. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6442-0_5
Fu R, Zhu Y, Liu Y, Feng Y, Lu R, Li Y et al (2022) Genome-wide analyses of introgression bewteen two sympatric Asian oak species. Nature Ecology & Evolution 6:924–935. https://doi.org/10.1038/s41559-022-01754-7
Article Google Scholar
Gerber S, Chadoeuf J, Gugerli F, Lascoux M, Buiteveld J, Cottrell J et al (2014). High rates of gene flow by pollen and seed in oak populations across Europe. Plos One 9(1). https://doi.org/10.1371/journal.pone.0085130
Grandjean G, Sigaud P (1987) Contribution à la taxonomie et à l’écologie des chênes du Berry. Ann for Sci 44:35–66. https://doi.org/10.1051/forest:19870104
Article Google Scholar
Gregorius HR (1984) A Unique Genetic Distance Biometric J 26:1–14
Google Scholar
Gregorius HR, Roberts JH (1986) Mesuring genetic differentiation in subpopulations. Theor Appl Genet 71:826–834. https://doi.org/10.1007/BF00276425
Article CAS PubMed Google Scholar
Guichoux E, Lagache L, Wagner S, Leger P, Petit RJ (2011). Two highly validated multiplexes (12-plex and 8-plex) for species delimitation and parentage analysis in oaks (Quercus spp.). Molecular Ecology Resources 11(3): 578–585. https://doi.org/10.1111/j.1755-0998.2011.02983.x
Hipp AL, Manos PS, Hahn M, Avishai M, Bodénès C, Cavender-Bares J et al (2020) The genomic landscape of the global oak phylogeny. New Phytol 226:1198–1212. https://doi.org/10.1111/nph.16162
Article CAS PubMed Google Scholar
Jensen J, Larsen A, Nielsen LR, Cottrell J (2009) Hybridization between Quercus robur and Q. petraea in a mixed oak stand in Denmark. Annals of Forest Science 66: 706. https://doi.org/10.1051/forest/2009058
Jombart T (2008) Adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403–1405. https://doi.org/10.1093/bioinformatics/btn129
Article CAS PubMed Google Scholar
Jurksiene G, Baliuckas V (2014). Leaf morphological variation of sessile oak (Quercus petraea (Matt.)Liebl.) and pedunculate oak (Quercus robur L.) in Lithuania.. In: Treija S and Skujeniece S (eds) Research for Rural Development 2014, Vol 2, pp 63–69. https://hdl.handle.net/20.500.12259/87903
Kelleher CT, Hodkinson TR, Douglas GC, Kelly DL (2005). Species distinction in Irish populations of Quercus petraea and Q.robur: morphological versus molecular analyses. Annals of Botany 96(7): 1237–1246. https://doi.org/10.1093/aob/mci275
Kofler R, Pandey RV, Schlotterer C (2011) POPOOLATION2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27:3435–3436. https://doi.org/10.1093/bioinformatics/btr589
Kremer A, Hipp AL (2020) Oaks: an evolutionary success story. New Phytol 226:987–2011. https://doi.org/10.1111/nph.16274
Article PubMed Google Scholar
Kremer A, Dupouey JL, Deans JD, Cottrell J, Csaikl U, Finkeldey R et al (2002) Leaf morphological differentiation between Quercus robur and Quercus petraea is stable across western European mixed oak stands. Ann for Sci 59(7):777–787. https://doi.org/10.1051/forest:2002065
Article Google Scholar
Kremer A, Delcamp A, Lesur I, Wagner S, Rellstab C, Guichoux E, Leroy T (2024a) Genotypic data of near-diagnostic genetic markers for four western European white oak species identification. [dataset]. Recherche Data Gouv, V2. https://doi.org/10.57745/0JYLZU
Kremer A, Delcamp A, Lesur I, Wagner S, Rellstab C, Guichoux E, Leroy T, Anonymous A, Curtu AL, Westergren M, Dreyer E (2024b) Reviews of “Whole-genome screening for near-diagnostic genetic markers for four western European white oak species identification”. [Open Peer Review Report]. HAL https://doi.org/10.17180/73QH-D818.
Lagache L, Klein EK, Guichoux E, Petit RJ (2013) Fine-scale environmental control of hybridization in oaks. Mol Ecol 22(2):423–436. https://doi.org/10.1111/mec.12121
Article PubMed Google Scholar
Lazic D, Hipp AL, Carlson JE, Gailing O (2021) Use of genomic resources to assess adaptive divergence and introgression in oaks. Forests 12:690. https://doi.org/10.3390/f12060690
Article Google Scholar
Le Hardy de Beaulieu A, Lamant T (2006). Guide illustré des chênes. Tome 1. Editions du 8ième: Paris 688
Lepais O, Gerber S (2011) Reproductive patterns shape introgression dynamics and species succession within the European white oak species complex. Evolution 65(1):156–170. https://doi.org/10.1111/j.1558-5646.2010.01101.x
Article PubMed Google Scholar
Lepais O, Petit RJ, Guichoux E, Lavabre JE, Alberto F, Kremer A et al (2009) Species relative abundance and direction of introgression in oaks. Mol Ecol 18(10):2228–2242. https://doi.org/10.1111/j.1365-294x.2009.04137.x
Article CAS PubMed Google Scholar
Lepais O, Roussel G, Hubert F, Kremer A, Gerber S (2013) Strength and variability of postmating reproductive isolating barriers between four European white oak species. Tree Genet Genomes 9(3):841–853. https://doi.org/10.1007/s11295-013-0602-3
Article Google Scholar
Leroy T, Roux C, Villate L, Bodenes C, Romiguier J, Paiva JAP et al (2017) Extensive recent secondary contacts between four European white oak species. New Phytol 214(2):865–878. https://doi.org/10.1111/nph.14413
Article CAS PubMed PubMed Central Google Scholar
Leroy T, Louvet JM, Lalanne C, Le Provost G, Labadie K, Aury JM et al (2020a) Adaptive introgression as a driver of local adaptation to climate in European white oaks. New Phytol 226:1171–1182. https://doi.org/10.1111/nph.16095
Article PubMed Google Scholar
Leroy T, Rougemont Q, Dupouey JL, Bodénès C, Lalanne C, Belser C et al (2020b) Massive postglacial gene flow between European white oaks uncovered genes underlying species barriers. New Phytol 226:1183–1197. https://doi.org/10.1111/nph.16039
Article PubMed Google Scholar
Lesur I, Alexandre H, Boury C, Chancerel E, Plomion C, Kremer A (2018) Development of target sequence capture and estimation of genomic relatedness in a mixed oak stand. Front Plant Sci 9:996. https://doi.org/10.3389/fpls.2018.00996
Article PubMed PubMed Central Google Scholar
Li H (2011) Improving SNP discovery by base alignment quality. Bioinformatics 27:1157–1158. https://doi.org/10.1093/bioinformatics/btr076
Article CAS PubMed PubMed Central Google Scholar
Li H (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arxiv. https://arxiv.org/abs/1303.3997 (2013)
Lopez de Heredia U, Valbuena-Carabana M, Cordoba M, Gil L (2009). Variation components in leaf morphology of recruits of two hybridising oaks Q. petraea (Matt.) Liebl. and Q. pyrenaica Willd. at small spatial scale. Eur J For Res 128(6): 543–554. https://doi.org/10.1007/s10342-009-0302-6
Lupini A, Aci M, Mauceri A, Luzzi G, Bagnato S, Menguzzato G et al (2019) Genetic diversity in old populations of sessile oak from Calabria assessed by nuclear and chloroplast SSR. J Mt Sci 16:1111–1120. https://doi.org/10.1007/s11629-018-5335-1
Article Google Scholar
Macejovsky V, Schmidtova J, Hrivnak M, Krajmerova D, Sarvasova I, Gomory D (2020). Interspecific differentiation and gene exchange among the Slovak Quercus sect. Quercus populations. Dendrobiology 83: 20–29. https://doi.org/10.12657/denbio.083.002
Merlino A, Baliva M, Di Filippo A, Piovesan G, Solano F. (2014). Analisi strutturali e dendroecologiche su popolamenti di Quercus Petraea subsp. austrothyrrenica Brullo, Guarino e Siracusa nel parco regionalle delle Madonie (Sicilia). Second International Congress of Silviculture, pp 183–189. https://doi.org/10.4129/2cis-am-ana
Modica G (2001). La rovere (Quercus petraea (Matt.) Liebl.) in Aspromonte. Monti e Boschi 3/4: 13–18. https://www.academia.edu/28363486/La_rovere_Quercus_petraea_Matt_Liebl_in_Aspromonte
Moracho E, Moreno G, Jordano P, Hampe A (2016) Unusually limited pollen dispersal and connectivity of pedunculate oak (Quercus robur) refugial populations at the species southern range margin. Mol Ecol 14:3319–3331. https://doi.org/10.1111/mec.13692
Article CAS Google Scholar
Neophytou C (2014) Bayesian clustering analyses for genetic assignment and study of hybridization in oaks: effects of asymmetric phylogenies and asymmetric sampling schemes. Tree Genet Genomes 10(2):273–285. https://doi.org/10.1007/s11295-013-0680-2
Article Google Scholar
Neophytou C, Gartner SM, Vargas-Gaete R, Michiels HG (2015). Genetic variation of Central European oaks: shaped by evolutionary factors and human intervention? Tree Genetics & Genomes 11(4). https://doi.org/10.1007/s11295-015-0905-7
Petit RJ, Bodenes C, Ducousso A, Roussel G, Kremer A (2003) Hybridization as a mechanism of invasion in oaks. New Phytol 161(1):151–164. https://doi.org/10.1046/j.1469-8137.2003.00944.x
Article CAS Google Scholar
Plomion C, Aury JM, Amselem J, Leroy T, Murat F, Duplessis S et al (2018) Oak genome reveals facets of long lifespan. Nature Plants 4(7):440–452. https://doi.org/10.1038/s41477-018-0172-3
Article CAS PubMed PubMed Central Google Scholar
Raymond M, Rousset F (1995) GENEPOP(version 1.2): population genetics software for exact tests and ecumenicism. J Hered 86:248–249. https://doi.org/10.1093/oxfordjournals.jhered.a111573
Article Google Scholar
Rellstab C, Buhler A, Graf R, Folly C, Gugerli F (2016). Using joint multivariate analyses of leaf morphology and molecular-genetic markers for taxon identification in three hybridizing European white oak species (Quercus spp.). Annals of Forest Science 73(3): 669–679. https://doi.org/10.1007/s13595-016-0552-7
Reutimann O, Gugerli F, Rellstab C (2020) A species-discrimnatory single-nucleotide polymorphism set reveals maintenance of species integrity in hybridizing European white oaks (Quercus spp.) despite high levels of admixture. Ann Bot 125:663–676. https://doi.org/10.1093/aob/mcaa001
Article CAS PubMed PubMed Central Google Scholar
Reutimann O, Dauphin B, Baltensweiler A, Gugerli F, Kremer A, Rellstab C (2023). Abiotic factors predict taxonomic composition and genetic admixture in populations of hybridizing white oak species (Quercus sect. Quercus) on a regional scale. Tree Genetics & Genomes 19: 22. https://doi.org/10.1007/s11295-023-01598-7
Saleh D, Chen J, Leple JC, Leroy T, Truffaut L, Dencausse B et al (2022) Genome-wide evolutionary response of European oaks during the Anthropocene. Evolution Letters 6(1):4–20. https://doi.org/10.1002/evl3.269
Article PubMed PubMed Central Google Scholar
Schroeder H, Kersten B (2023). A small set of nuclear markers for reliable differentiation of the two closely related oak species Quercus robur and Q.petraea. Plants 12 (3), 566. https://doi.org/10.3390/plants12030566
Scotti-Saintagne C, Mariette S, Porth I, Goicoechea PG, Barreneche T, Bodenes K et al (2004). Genome scanning for interspecific differentiation between two closely related oak species Quercus robur L. and Q petraea (Matt.) Lieb. Genetics 168(3): 1615–1626. https://doi.org/10.1534/2Fgenetics.104.026849
Truffaut L, Chancerel E, Ducousso A, Dupouey JL, Badeau V, Ehrenmann F et al (2017) Fine-scale species distribution changes in a mixed oak stand over two successive generations. New Phytol 215(1):126–139. https://doi.org/10.1111/nph.14561
Article CAS PubMed PubMed Central Google Scholar
Urban L, Holzer A, Jotautas Baronas J, Hall MB, Braeuninger-Weimer P, Scherm MJ, Kunz DJ, Perera SN, Martin-Herranz DE, Tipper ET, Salter SJ, Stamnitz MR (2021) Freshwater Monitoring by Nanopore Sequencing Elife 10:e61504. https://doi.org/10.7554/eLife.61504
Article CAS PubMed Google Scholar
Valbuena-Carabana M, Gonzalez-Martinez S, Sork V, XCollada C, Soto A, PGoicoechea P, et al (2005) Gene flow and hybridisation in a mixed oak forest (Quercus pyrenaica Willd. and Quercus petraea (Matt.) Liebl.) in Central Spain. Heredity 95:457–465. https://doi.org/10.1038/sj.hdy.6800752
Article CAS PubMed Google Scholar
Vazquez-Pardo F, Rincon-Hercules S, Gutierrez-Esteban M, Garcia-Alonso M, Marquez-Garcia F, Ramos -Maqueda S et al. (2009). Congreso Forestal Espanol. Montes y sociedad: Saber que hacer. Leon SECFJdCy (ed.). Socieda Espanola de Ciencas Forestales: Avila, pp 3–13
Viscosi V, Lepais O, Gerber S, Fortini P (2009) Leaf morphological analyses in four European oak species (Quercus) and their hybrids: a comparison of traditional and geometric morphometric methods. Plant Biosystems 143(3):564–574. https://doi.org/10.1080/11263500902723129
Article Google Scholar
Wagner S, Seguin-Orlando A, Leplé JC, Leroy T, Lalanne C, Aury JM, Poirier S, Wincker P, Plomion C, Kremer A, Orlando L (2024) Tracking population structure and phenology through time using ancient genomes from watterlogged white oak. Mol Ecol 33(3):1–17. https://doi.org/10.1111/mec.16859
Article Google Scholar
Yucedag C, Gailing O (2013). Morphological and genetic variation within and among four Quercus petraea and Q. robur natural populations. Turk J Bot 37(4): 619–629. https://doi.org/10.3906/bot-1205-18

Download references

Acknowledgements

We thank colleagues that contributed to the collection of material made for this study: Dalibor Ballian (Bosnia and Herzegovina), María Valbuena Carabaña and Luis Gil (Spain), and Giovanni Giuseppe Vendramin (Italy). We extend our appreciation to partners of the former EU-supported FAIROAK and OAKFLOW projects, and of the EVOLTREE Network of Excellence, who collected material included in this study. The MassArray genotyping was performed at the PGTB (doi:10.15454/1.5572396583599417E12) with the help of Laure Dubois, Céline Lalanne, and Marie Massot. We are grateful to François Ehrenmann for his contributions to the figures of the manuscript.

Funding

This research was supported by the European Research Council through an Advanced Grant (project TREEPEACE no. FP7-339728), by an ANR (Agence Nationale de la Recherche) Grant (project GENOAK 2022, #BSV6-009–02), and by the EVOLTREE Opportunity call (project OakID2).

Author information

Erwan Guichoux and Thibault Leroy contributed equally to the research.

Authors and Affiliations

UMR BIOGECO, INRAE, Université de Bordeaux, 69 Route d’Arcachon, Cestas, 33612, France
Antoine Kremer & Thibault Leroy
UMR BIOGECO, INRAE, PGTB, Université de Bordeaux, 69 Route d’Arcachon, Cestas, 33612, France
Adline Delcamp & Erwan Guichoux
Helix Venture, Merignac, 33700, France
Isabelle Lesur
UMR CAGT, CNRS, Université Paul Sabatier, 37 Allées Jules Guesde, Toulouse, 31000, France
Stefanie Wagner
CNRGV Plant Genomic Center, INRAE,, 24 Chemin Borde Rouge, Castanet-Tolosan, 31320, France
Stefanie Wagner
Swiss Federal Research Institute WSL, Birmensdorf, 8903, Switzerland
Christian Rellstab
GenPhySE, INRAE, ENVT, Université de Toulouse, Castanet-Tolosan, France
Thibault Leroy

Authors

Antoine Kremer
View author publications
You can also search for this author in PubMed Google Scholar
Adline Delcamp
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Lesur
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Christian Rellstab
View author publications
You can also search for this author in PubMed Google Scholar
Erwan Guichoux
View author publications
You can also search for this author in PubMed Google Scholar
Thibault Leroy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conception of the study, TL and AK; sampling and collection of material, AK, CR, and TL; discovery of near-diagnostic markers in pool-sequenced resources, TL; discovery of near-diagnostic markers in sequence captured resources, IL; design of multiplexes and genotyping of natural populations, EG and AD; data analysis, AK, TL, and SW; and writing of the manuscript, AK, EG, and TL. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Antoine Kremer.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors gave their informed consent to this publication and its content.

Competing interests

The authors declare that they have no competing interests.

Additional information

Handling editor: Marjana Westergren.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Reference to preprint server

The preprint version of this article is available in the bioRxiv server at https://www.biorxiv.org/content/10.1101/2023.11.29.568959v1.

This paper has undergone an Optional Open Peer Review. Read the Peer review report (Kremer et al. 2024b) at https://doi.org/10.17180/73QH-D818.

Appendix

Table 4 Discovery samples of the whole genome pool-sequenced resources

Full size table

Table 5 Discovery samples of the sequence captured genomic resources

Full size table

Table 6 Geographic origins of training and validation samples

Full size table

Table 7 Genetic and genomic features of near-diagnostic SNPs

Full size table

Table 8 Overall frequencies of near-diagnostic alleles in the validation populations

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kremer, A., Delcamp, A., Lesur, I. et al. Whole-genome screening for near-diagnostic genetic markers for four western European white oak species identification. Annals of Forest Science 81, 21 (2024). https://doi.org/10.1186/s13595-024-01236-9

Download citation

Received: 28 November 2023
Accepted: 03 May 2024
Published: 21 May 2024
DOI: https://doi.org/10.1186/s13595-024-01236-9

Whole-genome screening for near-diagnostic genetic markers for four western European white oak species identification

Abstract

Key message

Context

Aims

Methods

Results

Conclusions

1 Introduction

2 Material and methods

2.1 Discovery of near-diagnostic markers

2.1.1 Discovery of near-diagnostic SNPs in whole genome pool-sequenced (pool-seq) resources

Pool sequencing

Mapping and SNP calling

Genome scan for near-diagnostic SNPs

2.1.2 Discovery of near-diagnostic SNPs in sequence-captured (seq-cap) genomic resources

2.2 Training and validation of near-diagnostic SNPs

2.2.1 Training populations

2.2.2 Validation populations

2.2.3 Genotyping assay

2.2.4 Diagnosticity of candidate SNPs

2.2.5 Multilocus species clustering

3 Results

3.1 Discovery of near-diagnostic SNPs

3.2 Diagnosticity of candidate SNPs in the training set

3.3 Validation of the near-diagnostic SNPs

3.3.1 Screening of near-diagnostic SNPs

3.3.2 Allele frequency profiles of near-diagnostic SNPs in the validation populations

3.3.3 Allele frequency profiles of near-diagnostic SNPs in Q. petraea populations

3.3.4 Allele frequency profiles of near-diagnostic SNPs in Q. pubescens, Q. robur, and Q. pyrenaica populations

3.4 Multilocus structure of near-diagnostic SNPs

4 Discussion

4.1 Variation of diagnosticity among species

4.2 Variation of diagnosticity among populations

4.3 Variation of diagnosticity among SNPs

5 Conclusions and outlook

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Reference to preprint server

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Annals of Forest Science

Contact us