Provisioning forest and conservation science with high-resolution maps of potential distribution of major European tree species under climate change

Chakraborty, Debojyoti; Móricz, Norbert; Rasztovits, Ervin; Dobor, Laura; Schueler, Silvio

doi:10.1007/s13595-021-01029-4

Annals of Forest Science

Table 2 Description of the dataset according to the ODMAP protocol

From: Provisioning forest and conservation science with high-resolution maps of potential distribution of major European tree species under climate change

ODMAP elements	Contents
Overview
Authorship	Authors: Debojyoti Chakraborty, Norbert Móricz, Ervin Rasztovits, Laura Dobor, Silvio Schueler Contact email: debojyoti.chakraborty@bfw.gv.at Title: DOI:
Model objective	SDM Objective: forecast/transfer Target output: probability of occurrence of target tree species
Taxon	Seven tree species of Europe: Abies alba, Fagus sylvatica, Larix decidua, Picea abies, Pinus sylvestris, Quercus petraea, Quercus robur
Location	Europe
Scale of analysis	Spatial extent (Lon/ Lat): Longitude: − 32.65000°E, −69.44167°E Latitude: 30.877982°N, −71.57893°N Spatial resolution: 30 arcsec Temporal resolution: We modeled for historic climate (1961–1990) and three future time frames which include averages of (2041–2060, 2061–2080, and 2081–2100). The predictions were done for two Representative Concentrations RCP 4.5 and RCP 8.5
Biodiversity data overview	Observation type: standardized monitoring Response data type: presence/absence data
Type of predictors	Climatic
Conceptual model/hypotheses	A large body of scientific studies indicate that climate is one of the major drivers of the distribution of tree species at the continental scale. We exploited this correlation between species’ current occurrence and climate to develop SDMs that predict the potential distribution of the target tree species
Assumptions	We assumed that species are at pseudo-equilibrium with the environment. The source of the presence/absence data (Mauri et al. 2017) used in this study is largely from national forest inventories where tree individuals below a certain diameter at breast height are not recorded. We assume that this data collection procedure did not bias our occurrence data Since our occurrence dataset covers the whole current distribution of the target species, which represents both current and likely future climate of Europe, we safely assumed that the species retain their niches across space and time and the current occurrence~climate correlation remains stable when predicting the models for future climate
SDM algorithms	Algorithms: We selected 10 modeling algorithms: GLM (Generalized Linear Models), GAM (Generalized Additive Models), GBM (Generalized Boosted regression Models), CTA (Classification Tree Analysis), ANN (Artificial Neural Networks), SRE (Surface Range Envelop or BIOCLIM), FDA (Flexible Discriminant Analysis), MARS (Multivariate Adaptive Regression Spline), RF (Random Forest for classification and regression), and MAXENT. Tsuruoka. These model algorithms were implemented through an ensemble model platform biomod2 (Thuiller et al. 2016) Model complexity: The individual models were run using the standard default settings of biomod2 that are designed to balance model complexity and overfitting Ensembles: The prediction of individual model algorithms were ensembled through biomod2 (Thuiller et al. 2016)
Model workflow	The model workflow includes the following: 1. Data cleaning and generation of pseudo absences 2. Finding the best climate variables to fit the models 2. Model running through biomod2 platform 3. Ensemble prediction 4. Generation of the maps as gridded 30 arcsec rasters
Software	Software: All analyses were conducted using R version 3.3.2 (R Core Team 2016). Packages used: biomod2 (Thuiller et al. 2016), Random Forest (Breiman 2001), Data availability: Presence absence data are available from Mauri et al. (2017) Climate data is available from Chakraborty D, Dobor L, A, Hlásny T, Schueler S (2020) High-resolution gridded climate data for Europe based on bias-corrected EURO-CORDEX: the ECLIPS-2.0 dataset [Zenodo: https://doi.org/10.5281/zenodo.3952159.]
Data
Biodiversity data	Taxon names: Abies alba, Fagus sylvatica, Larix decidua, Picea abies, Pinus sylvestris, Quercus petraea, Quercus robur Ecological level: Species-level Data source: Species presence-absence data was obtained from the EU-Forest dataset (Mauri et al. 2017). The dataset harmonizes European tree occurrence from National Forest inventories (NFI), Forest Focus (Hiederer et al. 2011), and Biosoil datasets (Houston Durrant et al. 2011). A major part of the data arises from the NFI data (96%) while 4% contributed by Forest Focus (Hiederer et al. 2011), Biosoil datasets (Houston Durrant et al. 2011) Sampling design: The background data included in the EU-Forest (Mauri et al. 2017) varied in their sampling intensity and design. This data was harmonized and aggregated to a spatial resolution of 1 square kilometer, in line with an INSPIREcompliant 1-km × 1-km grid Sample size The dataset includes a total of 1,000,525 occurrence records at a spatial resolution of 1 × 1 km (Mauri et al. 2017) Data filtering: Form the EU-Forest dataset we obtained 412,2881 occurrence records about the seven target species Presence-absence data: In our case the geographic locations of the target species in the EU-Forest dataset was assumed to be true presences, while the remaining locations of occurrence of other species were assumed to be the absence locations To ensure that the absence locations are not only climatically dissimilar but also geographically distant from the observed presence locations, we developed the so-called pseudo absences according to Senay et al. (2013). This is a three-step approach: (i) specifying a geographical extent outside the observed presences, (ii) environmental profiling of the absences outside this geographic extent, and (iii) k-means clustering of the environmental profiles and selecting random samples within each cluster. In our case, a 2-degree buffer was found to be optimum following Senay et al. (2013). The absence locations outside this geographic extent were classified into 10–15 (depending on species) environmentally dissimilar clusters according to the k-means clustering algorithm. The number of clusters for each species were determined with a plot of total within-cluster sum of square (WSS) and number of clusters The number of pseudoabsence locations was further reduced by randomly selecting a sample of locations defined by the 95% confidence interval from each of the clusters. This approach was used to generate pseudoabsence for all the seven species
Data partitioning	The occurrence dataset for each target species was partitioned by splitting into 75% for model training and 25% for model evaluation
Environmental predictors	Predictor variables Environmental predictors were 80 biologically relevant climate variables comprising of annual, seasonal, and monthly variables From this list of 80 variables, a small subset of potential predictor variables was selected for each target species during the variable selection process Data sources: The spatial resolution of predictor data: 30 arcsec which is roughly equivalent to 1 × 1 km or lower depending on latitude The temporal resolution of predictor variable: Historic climate (1961–1990) and three future time frames which include averages of (2041–2060, 2061–2080, and 2081–2100) for two Representative Concentration RCP 4.5 and RCP 8.5 were used for the SDM predictions Geographic projection: WGS 84 (EPSG: 4326)
Model
Variable selection and multicollinearity	From the list of potential predictor variables (Table 2 in Appendix), the ones which explain most of the variation in the observed presence and absences of each species were selected with a recursive feature elimination approach (RFE) implemented within the Random forest algorithm (Breiman 2001). Within the RFE approach, the variables were eliminated iteratively, starting from the full set of potential predictors (Table 2 in Appendix), and retaining only those variables that reduce the mean square error over random permutations of the same variable. The variables which were linearly correlated with other variables and had a variance inflation factors VIF > 5 as suggested by Booth et al. (1994) were identified, and the ones with the lower value according to the Akaike Information Criteria (AIC) (Akaike 1974) were retained for further model development. This subset of uncorrelated climate variables (Table 3 in Appendix) was used as predictor variables for developing the ensemble species distribution models
Model settings	The models were run with the default settings of biomod2 (Thuiller et al. 2016)
Model estimates	The models estimated median ensemble probability of species occurrence and associated model uncertainty represented by the coefficient of variation
Model ensemble	Predicted probabilities from the individual models for each target species were ensembled as a consensus model which combined the median probability over the selected models with true skill statistics threshold (TSS > 0.7) (Allouche et al. 2006; Coetzee et al. 2009)
Threshold selection	True skill statistics threshold (TSS > 0.7), a commonly used threshold for SDMS (Allouche et al. 2006; Coetzee et al. 2009), was used
Assessment
Model performance statistics	For each such model run as well as the final ensemble models for each target species, the model evaluation statistics were recorded. These statistics were true skill statistics (TSS) and area under the relative operating characteristic (ROC), model sensitivity (the ability of the model to predict true presences), and model specificity (the ability of the model to predict the true absences). TSS takes into account both omission and commission errors and ranges also from − 1 to + 1, not being affected by prevalence as KAPPA (Allouche et al. 2006). TSS values ranging from 0.2 to 0.5 were considered poor, from 0.6 to 0.8 useful, and values larger than 0.8 were good to excellent (e.g. Coetzee et al. 2009). Prediction accuracy is considered to be similar to random for ROC values lower than 0.5; poor, for values in the range 0.5–0.7; fair in the range 0.7–0.9; and excellent for values greater than 0.9 (Pontius and Parmentier 2014)
Prediction
Prediction output	Predicted probabilities from the individual models and target species were ensembled as a consensus model which combined the median probability over the selected models with true skill statistics threshold (TSS > 0.7) (Allouche et al. 2006; Coetzee et al. 2009). The median was chosen because it is known to be less sensitive to outliers than the mean. The estimated ensemble model predictions were presented as GeoTIFF rasters
Uncertainty quantification	Model uncertainty was estimated in terms of coefficient of variation (CV) among the predictions of the individual models. The estimated CVs are also presented as GeoTIFF rasters where each cell corresponds to a CV value whereby higher and lower CV values indicate higher and lower uncertainty respectively in the ensemble model

Back to article page

ISSN: 1297-966X

Contact us

Submission enquiries: annforsci@inrae.fr
General enquiries: info@biomedcentral.com