Skip to main content
  • Research Paper
  • Open access
  • Published:

Mapping stem volume in fast-growing eucalypt plantations: integrating spectral, textural, and temporal remote sensing information with forest inventories and spatial models

Abstract

Key message

Accurate volume estimation in Eucalyptus plantation stands was achieved by a linear model using SPOT and Landsat multispectral imagery, specifically texture indices and pixel-scale NDVI time integrals, which reflect the local plantation growth history. Spatial modelling techniques such as Kriging with External Drift and Generalized Additive Model slightly improved predictions by accounting for spatial correlation of volume between sample points.

Context

Forest inventories are widely used to estimate stand production. To capture the inherent spatial variability within stands, spatial modelling techniques such as Kriging with External Drift (KED) and the generalized additive model (GAM) have emerged. These models incorporate information on spatial correlation and auxiliary variables that can be obtained from satellite imagery.

Aims

Our study explored the use of reflectance data from SPOT and Landsat multispectral imagery. We focused on texture indices and temporal integration of vegetation indices as auxiliary variables in KED and GAM to predict stem volume of fast-growing Eucalyptus sp. plantations in Brazil.

Methods

The components extracted from the high-resolution SPOT-6 image included spectral band values, band ratio metrics, key vegetation indices (NDVI, SAVI, and ARVI), texture measurements, and indices derived from texture analysis. Additionally, we included the accumulated NDVI time series acquired from Landsat 5, 7, and 8 satellites between the planting date and the forest inventory measurement date.

Results

The best linear model of stem volume using remotely sensed predictors gave an R-squared value of 0.95 and a Root Mean Square Error (RMSE) of 12.44 m3/ha. The R-squared increased to 0.96 and the RMSE decreased to 10.6 m3/ha when the same predictors were included as auxiliary variables in the KED and GAM spatial models.

Conclusion

The linear model using remotely sensed predictors contributed most to volume prediction, but the addition of spatial coordinates in the KED and GAM spatial models improved local volume predictions.

1 Introduction

Short rotation forest plantations are expanding to meet the demand for paper, cellulose, industrial wood (such as particle board) and coal for industry (e.g., iron and steel). In Brazil, there were 9.55 million hectares of tree plantations in 2020, with the majority being eucalypt plantations, covering 7.47 million hectares (IBA 2021). The development of the forestry sector necessitates sustainable management and information about existing forest resources. Stakeholders require monitoring of forest resources for wood production assessment, harvest prevision, sanitary control, carbon accounting, and evaluation of biotic and abiotic damages.

Forest inventories and field surveys remain the primary methods for monitoring forest plantations. Forest inventories involve field measurements of tree diameters and heights of small plots following a plantation sampling design. These measurements are then used in allometric relationships to predict the volume of individual trees, which is subsequently summed for all trees in the plot and divided by the plot area to estimate the volume per unit area. Stand-level values are typically obtained by averaging the plot-level values within the stand. The method assumed that all plots are independent and equally weighted, and therefore do not consider the spatial dependence of field data (Viana et al. 2012; Castillo-Santiago et al. 2013). Additionally, despite the assumption of homogeneity of planted forests, Castillo-Santiago et al. (2013) showed that it was necessary to consider the natural variations that occur in the structure of vegetation. Spatial variability within and between plantation stands, particularly in fast-growing species such as Eucalyptus in Brazil, can be influenced by various factors, including different soil conditions, topography, planting dates, genetic materials, management, and past events such as diseases and storms (Marsden et al. 2010a, b; Le Maire et al. 2011a, b, c; Zhou et al. 2013; le Maire et al. 2019). Thus, although the assumption of independence between inventory variables within and between stands is commonly used, knowledge of the spatial variations within a forest (both within and between stands) can improve the prediction of the variable of interest (Zhou et al. 2013).

Various statistical techniques are used to model spatial variations found in the sampled data (Propastin 2012). Geostatistical techniques facilitate the characterization of the spatial variability present in a data structure, including the spatial correlation between sampling units. This information can then be used to improve the prediction of variables of interest at unsampled locations, thereby enhancing the accuracy of stand-scale or regional-scale averages. Numerous studies have used geostatistical interpolation to account for spatial dependence between forest inventory plots. This method falls within the model-based inferential framework described by Ståhl et al. (2016). Geostatistical interpolation was used to predict variables such as volume, biomass, and diameter increments, as well as to examine the spatial continuity of the parameter of interest (Tesfamichael et al. 2009; Viana et al. 2012; Galeana-Pizaña et al. 2014; Scolforo et al. 2016; Silveira et al. 2019).

Correlations may exist between the forest inventory variables, such as tree height and volume, and information extracted from satellite images. This allows for a direct, spatially explicit prediction of these variables at the resolution of the satellite image (Lu 2006; Avitabile et al. 2012; Castillo-Santiago et al. 2013). Satellite images have been widely used to map structural forest parameters, such as above-ground volume and biomass (Zhang et al. 2014; Zhao et al. 2016; Lochhead et al. 2018; Souza et al. 2019). More specifically, some studies used image texture (Avitabile et al. 2012; Rodriguez-Galiano et al. 2012; Dube and Mutanga 2015), image time series (Le Maire et al. 2011a, b, c; Boisvenue et al. 2016), or other types of sensors such as radar (Baghdadi et al. 2015; Zhao et al. 2016).

Geostatistical interpolation techniques, such as Kriging with External Drift (KED, also called Universal Kriging), use ancillary data, including variables derived from satellite imagery, to explain some of the spatial variability observed in the data and complement the interpolation process (Goovaerts 1997). KED combines the strengths of both kriging and regression methods. Kriging models the spatial patterns of the variable of interest, while multivariate regression models the influence of external predictors. This combination allows for more accurate prediction of the variable of interest at unsampled locations, especially when the multivariate regression model provides a precise prediction of the variable of interest. KED is preferred over classical co-kriging when a strong linear relationship exists between the primary and secondary variables (Wackernagel 2003).

The generalized additive model (GAM) is an extension of the generalized linear model. It adds a smoothing function is added to the explanatory variable to make the relationship between the response variable and the auxiliary variables more flexible (Hastie and Tibshirani 1986). The shape of this relationship does not need to be known beforehand, but it can be estimated through smoothing curves from the dataset. The smoothing functions take into account the geographic coordinates. Compared to KED it handles any non-linear relationships between the auxiliary variables and the spatial values being predicted. GAMs are semi-parametric and multi-dimensional and generally use penalized splines for the smoothing function. When developing ecological models, some studies use GAM to calculate wood stock, forest characteristics, biomass, and other related factors (Ahrens et al. 2020; Alberdi et al. 2020; Fassnacht et al. 2021).

Geostatistical models that incorporate spatial auxiliary variables require known variables at each location of model application. Satellite multispectral images can provide such information, for example, through spectral vegetation indices, which are mathematical formulations that combine reflectance of several spectral bands. Vegetation indices are typically highly correlated with green vegetation within the pixels, while minimizing the contributions of the soil and sun angle (Lu 2006; Cutler et al. 2012). However, vegetation indices often have limited direct spatial correlation with forest inventory variables, indicating the need to consider alternative co-variables to improve the performance of geostatistical models. This is especially true for short rotation plantation forests such as Eucalyptus plantations in Brazil. These forests are temporally dynamic, with high growth rates averaging 40 m3 ha−1 year−1. They reach canopy height of over 20 m when they are harvested at approximately 7 years old. The aim of this study is to investigate the potential use of image texture computed on very high-resolution satellite images, and historical information about the forest plantation since planting date obtained from satellite image time series, as co-variables in geostatistical models.

KED and GAM geostatistical models were calibrated to predict stem wood volume (m3 ha−1) in Eucalyptus plantations. We added several co-variables computed from satellite images, including band reflectances, simple band ratios, vegetation indices, texture indices, and more complex variables computed from the integration of time series of vegetation indices. The objective was to provide more accurate predictions of wood volume within and between stands. The specific objectives were (a) to assess the direct correlation between variables derived from satellite imagery and stem volume; (b) to evaluate their effectiveness as auxiliary variables in fitting KED and the GAM geostatistical models for volume prediction; and (c) to compare the volumes predicted by the linear, KED, and GAM models with the volumes observed on randomly selected forest inventory plots not used in model calibration.

2 Material and methods

2.1 Study site and forest inventory data

The study was conducted in Eucalyptus sp. stands located in the southern region of Mato Grosso do Sul state, near the city of Ribas do Rio Padro. The area has an average altitude of 374 m and a tropical climate (Aw type according to the Köppen classification) characterized by a dry season in the winter with an average precipitation of 1241 mm and an average temperature of 24.2 °C.

The forest inventory data was collected in August 2018 from 211 circular plots of 500 m2, systematically distributed on a pre-defined grid across the landscape, with one plot every ~ 722 m in the study area. All plots were geo-referenced using GPS measurements from the central point of each plot. In the study area, all stands were less than 4 years old in August 2018 (Table 1). The inventory consisted of measuring the circumference at breast height (CBH—1.3 m above the ground level) for all trees in the circular inventory plot and the tree height of a central sub-sample of 25% of the trees in the plot. Dominant height was calculated as the mean of the five trees with the largest CBH. Unmeasured tree heights were predicted using the Curtis hypsometric model. Stem volumes of the trees were then predicted by age class using a company-calibrated Schöepfer Fifth-Degree Polynomial volume projection equation specific to the genetic material and calibrated in the same region based on destructive measurements of trees. These equations are generally very precise in these clonal plantations (Scolforo et al. 2019). The plot volume was calculated by adding the volumes of each tree in the plot. The age information was incorporated as a covariable in the linear models (see section “2.4”).

Table 1 Summary of planting characteristics and descriptive statistics of average volume data per plot in the study site 

2.2 Image processing

A satellite image from the commercial observation satellite SPOT-6 was acquired in August 2018, during the same period as the forest inventory. The SPOT-6 satellite images have four broadbands (Blue, Green, Red, and Near Infrared) at a spatial resolution of 6 m and a panchromatic band at 1.5 m resolution. The radiometric resolution is 12 bits per pixel. The broadbands were merged with the panchromatic band using a pansharpenning algorithm, resulting in a multispectral image of 1.5 m spatial resolution. The top-of-atmosphere reflectance was used directly in the analysis since only one image was used in this study. There were no clouds in the area of interest. The final image was georeferenced based on ground control points.

A time series of images from Landsat satellites was used, covering the period from January 2000 to August 2018. The time series included 30 m resolution multispectral images from Landsat-5 Thematic Mapper (TM), Landsat-7 Enhanced Thematic Mapper Plus (ETM +), and Landsat-8 Operational Land Imager (OLI). The atmospherically corrected Surface Reflectance images were produced by the United States Geological Survey (USGS). They include information on eventual sensor failure, cloud and cloud shadow masks. To eliminate residual clouds or cloud shadows, the masks were expanded with a supplementary buffer of 150 m (5 pixels). Reflectances from three different Landsat sensors (5, 7, and 8) were inter-calibrated following Roy et al. (2016). The Landsat Surface Reflectance product was obtained from Google Earth Engine (Collection 1 Tier 1 dataset).

The SPOT-6 and Landsat images were resampled to the same resolution of 7.5 m to facilitate post-treatments. This resolution was chosen as it can be easily obtained from the SPOT-6 1.5 m resolution image and from the Landsat 30 m resolution image. Additionally, this resolution is also compatible with the size of an inventory plot (~ 3 × 3 pixels), and speeds up the texture computation on the SPOT-6 image. The resampling method used was a simple nearest neighbor algorithm.

Five sets of variables were extracted and calculated from SPOT-6 and/or Landsat images. These sets include (i) reflectance of spectral bands, (ii) spectral band ratios, (iii) vegetation indices, (iv) textural indices, and (v) integrals of vegetation indices of satellite time series. Table 2 provides a list of the variables obtained and their descriptions.

Table 2 Description of the 86 variables computed and extracted from the SPOT-6 and Landsat 5, 7, and 8 satellites images 

The SPOT-6 image was used to compute the four spectral bands, twelve simple ratios of these bands, and three commonly used vegetation indices: NDVI (Tucker 1979), SAVI (Huete 1988), and ARVI (Kaufman and Tanré 1996).

Additionally, textural indices were computed using the gray level co-occurrence matrix (GLCM) proposed by Haralick et al. (1973), which is a widely used method for extracting texture from images. Four second-order texture measurements were obtained, for each spectral band: contrast (CON), correlation (COR), energy (ENE), also known as second angular momentum, and homogeneity (HOM). These measures are commonly used and considered relevant in remote sensing studies (Rodriguez-Galiano et al. 2012). The GLCM was calculated on the resampled 7.5 m resolution SPOT-6 image, using a 3 × 3 pixel window. This calculation was performed in the four angles of orientation of the neighboring pixel (0°, 45°, 90° and 135°) with an omnidirectional approach to cover the various types of spatial relationships among the pixels. This window definition was also used in the study by Sarker and Nichol (2011). Finally, the average of the values computed in the four directions was determined. The spectral and textural indices were combined to generate indices (Table 2).

The NDVI was calculated for each pixel in the Landsat image time series. The NDVI time series for each Landsat pixel were smoothed over time, using a cubic spline function. This smoothing technique eliminates small variations that occur on a short time scale, which do not reflect changes in the canopy, such as residual atmospheric effects (le Maire et al. 2011a, b, c). The NDVI time series provided several variables: the minimum and maximum values during the rotation, and the integrals of the NDVI smoothed curve between a starting date (e.g., planting date) and an ending date (e.g., inventory date) (Table 2). These variables were selected based on previous studies by Marsden et al. (2010a, b) and le Maire et al. (2011a, b, c). The image components were extracted using R (R Core Team 2021) and the following packages: 'rgdal', 'maptools', 'raster', 'sp' and 'glcm'.

2.3 Linear models for stem volume prediction and variable selection

First, we develop a multilinear regression model to predict stem volume directly from the spatial variables listed in Table 2. Due to the large number of variables (86) and possible multicollinearity, variable selection was necessary to obtain a simpler linear regression model for stem volume prediction. Different regression methods, all including variable selection, were tested (Kuhn 2008): forward stepwise selection (mod_fw), backward stepwise selection (mod_bw), sequential replacement (mod_sw), Sparse Partial Least Squares (mod_spls), Spike and Slab Lasso (mod_sl), Ridge (mod_rd), Lasso (mod_ls), and Elastic Net (mod_en). These methods have different theoretical basis, which may lead to different selected variables and model accuracies, underlining the importance to test a range of models. To evaluate the predictive performance of all tested models to predict plot-level stem volume, the plots were split into two partitions, 80% for training and 20% for validation. The Root Mean Square Error (RMSE), the Adjusted Coefficient of Determination (R2 adj), and the corrected Akaike Information Criterion (AICc) were computed to compare the final models. Statistical analyses were conducted using R and the following packages: ‘leaps’, ‘gstat’, ‘spls’, ‘pastecs’, ‘spikeslab’, ‘caret’, and ‘measures’.

2.4 Kriging with External Drift (KED) geostatistical model

Kriging with External Drift predicts the variable Z at an unsampled point s by combining a deterministic function μ(s) which indicates the underlying drift, and a stochastic residual component ε(s) (Goovaerts 1997; Wackernagel 2003).

$$Z\left(s\right)=\mu\left(s\right)+\varepsilon\left(s\right)$$
(1)

In KED, μ(s) is modeled by a function of auxiliary variables:

$$\mu \left(s\right)=\sum\nolimits_{l=0}^{L}{a}_{l}{f}_{l}\left(X\left(s\right)\right)$$
(2)

where \({a}_{l}\) are coefficients to be estimated from the data, \({f}_{l}\) is the function of auxiliary variables describing the drift (\({f}_{0}\) is the constant function with value of 1), X(s) is the vector of auxiliary variables at location s, and L + 1 is the number of functions used in modelling the drift.

Given n observed values, (s1),…,Z(sn), at sample points, s1,s2,…,sn, we predict Z*(s0) at an unsampled location s0 as:

$${Z}^{*}\left({s}_{0}\right)=\sum\nolimits_{i=1}^{n}{\omega }_{i}Z({s}_{i})$$
(3)

where ωi is the kriging weight assigned to Z(si). These weights are computed by minimizing the estimation error variance subject to the unbiased constraint. This is solved using the Lagrange multiplier method, leading to the KED system of equations:

$$\left\{\begin{array}{ll}{\sum }_{j=1}^{n}{\omega }_{j}C\left({s}_{i}-{s}_{j}\right)+{\sum }_{l=0}^{L}{\uplambda }_{l}{f}_{l}\left(X\left({s}_{i}\right)\right)=C\left({s}_{i}-{s}_{0}\right), & \quad for\ i=1,\dots ,n\\ {\sum }_{j=1}^{n}{\omega }_{j}=1, \\ {\sum }_{j=1}^{n}{\omega }_{j}{f}_{l}\left(X\left({s}_{i}\right)\right)={f}_{l}\left(X\left({s}_{0}\right)\right), & for\ l=\text{0,1},\dots ,L\end{array}\right.$$
(4)

where \(C\left(h\right)\) is the covariance function of the variable Z in function of the distance h linking any two points s and s + h. In Eq. 4, h is written as the difference between two points, e.g., \(({s}_{i}-{s}_{j})\) (Goovaerts 1997; Wackernagel 2003). λl is a Lagrange multiplier. In the present study, we use the external drift variables X(s) that were selected from Table 2 (see section “2.4”).

The covariance function \(C\left(h\right)\) was obtained from the experimental variogram by fitting a variogram model. The variogram model is a smooth function that is fitted on the experimental variogram obtained from the data. Most variogram models are characterized by three parameters: the "nugget", reflecting measurement error and unresolved spatial variability; the "sill", representing the spatial variation of the data explained by the variogram model; and the "range", denoting the distance over which the samples exhibit spatial correlation. We tested common variogram functions including exponential (Exp), spherical (Sph), Martén (Mat), Gaussian (Gau) and Martén M. Stein’s parameterized (Ste) (mathematical formulations in Stein 1999). The five models tested were fitted to the data using Levenberg–Marquardt fitting with a weighted least square method, and we selected the one with minimum RMSE value. Spatial predictions of stem volume were then made across the entire area at pixel scale by first solving the Eq. 4 for ω and \(\uplambda\) through linear algebra. The 'gstat' R package was used for these computations.

2.5 Generalized additive model (GAM) with spatial coordinates

Local predictions were performed using GAM with spatial coordinates. The spatial relationship is represented by the coordinate smoothing function, and non-stationary effects are modeled by interactions between the smoothing function and the auxiliary variables (Nussbaum et al. 2018). The estimate of the local mean stem volume \({\widehat{\mu }}_{s}\) at location s follows the equation:

$$g\left({\widehat{\mu }}_{s}\right)={\mu }_{0}+{\beta }_{1}{X}_{s}^{1}+\dots +{\beta }_{k}{X}_{s}^{k}+f\left({x}_{s},{y}_{s}\right)+\varepsilon$$
(5)

where \({\mu }_{0}\) is the model intercept, \({X}_{s}^{1}\),…,\({X}_{s}^{k}\) are local auxiliary variables selected previously (see “2.4”), \({\beta }_{1},\dots ,{\beta }_{k}\) are the parameters to be estimated, f is a bivariate smoothing function relating the covariates \({x}_{s}\) (longitude) and \({y}_{s}\) (latitude) to the response variable, g is an increasing monotonic link function (identity in our case), and \(\varepsilon\) is the identically distributed random error. We used the ‘gam’ function of the R package ‘mgcv’, where the smoothing function f is a penalized regression spline, and where the smoothness criteria of this function is internally optimized using the Generalized Cross Validation (GCV) criterion (Wood 2017). Note that we did not use co-kriging of the residuals in this approach, as spatial location is incorporated in the model through the local smoothing term. Therefore, while our GAM approach may not be considered as a geostatistical technique strictly speaking, it explicitly includes location in the predictions, representing landscape features not captured by other auxiliary variables in the model.

2.6 Analysis of the stem volume predictions generated by the models

To assess the predictive performance of the linear model, KED and GAM, fivefold cross-validation was used. The statistics used to assess the predictive capacity of the models were the same as those applied in the method of variable selection, i.e., the Root Mean Square Error (RMSE) and the Adjusted Coefficient of Determination (R2adj). For the KED model and GAM, we also computed the part of the variance explained by the auxiliary variables and the variance explained by the spatial features of the model.

3 Results

3.1 Landsat reflectance, vegetation indices, NDVI time series, and texture indices

Pixel-scale interpolated and smoothed Landsat NDVI time-series were obtained. Figure 1 presents an example of a single pixel that underwent conversion from a pasture to a eucalypt plantation in February 2014 and was 4.5 years old at the time of inventory in August 2018. The time-series clearly illustrates the transition from low NDVI values in the pasture stage to high and stable NDVI values in the eucalypt stage. Seasonal variation is evident during the eucalypt stage, resulting from the contrast between rainy and drought seasons (Fig. 1).

Fig. 1
figure 1

Pixel-scale NDVI measured by 3 Landsat sensors (blue: Landsat TM 5; Red: Landsat 7 ETM + ; Green: Landsat 8 OLI) after pre-treatments described in the main text. The area was a pasture and was planted with Eucalyptus in February 2014 (dashed gray line). The black line shows the cubic spline smoothing curve

The dataset, described in Table 1, was analyzed using the Spearman correlation method. A correlation matrix was generated (Fig. 4 in Appendix) to identify high correlations, both positive and negative, between spectral bands, vegetation indices, texture, NDVI time series integrals, and stand age. This highlights the need for a variable selection technique to prevent multicollinearity in subsequent linear models. A Spearman correlation was also conducted between stem volume and each variable in Table 1. The results showed that most variables were significantly correlated with stem volume, either positively or negatively. However, there was large variability in the correlation coefficients (Fig. 5 in Appendix). Significant relationships were found between stem volume and tree age (correlation coefficient of 0.83), some NDVI integrals of Landsat images (INT_0_DM, INT_0_DS and INT_1_DM) with values ranging between 0.86 and 0.89, and SPOT direct spectral bands B1, B2, and B3 with negative relationships of − 0.74 to − 0.77. However, some vegetation indices, such as the Landsat minimum and maximum NDVI values after 2 years, were not significantly correlated with stem volume.

3.2 Selection of variables

To minimize the impact of multicollinearity in linear regression models, we applied variable selection through different algorithms listed in Section “2.4”. Each final model contained between 3 and 7 variables (Table 3).

Table 3 Statistics of the multilinear models for stem volume prediction (m3 ha−1) obtained from different variable selection methods (see Section “2.4”)

The models used variables computed from NDVI time series integrals. However, variables such as AGE and B1 to B3 reflectances, which showed high correlation with stem volume (Fig. 5 in Appendix), were not included. Instead, ratios between bands were found in nearly all models, with RAT_B3_B2 (which had a direct correlation coefficient of 0.52 with stem volume) being present in half of the tested models. None of the models selected the NDVI, SAVI, and ARVI vegetation indices. The models included two texture indices: MUL_B1_HOM_B1 (which had a direct correlation of -0.47 with wood volume) in mod_spls and RAT_B2_CON_B2 (which had a correlation coefficient of 0.19) in mod_en. The best model was obtained using the Lasso regression method mod_ls, with a RMSE of 14.03 m3 ha−1, an R2 of 0.94 and an AIC of 1399.54 in the calibration data. This model was also the best in the validation dataset, with an RMSE of12.44 m3 ha−1 and an R2 of 0.95. The mod_ls used three simple ratio variables and three NDVI integrals variables (Table 4). The variables INT_0_DM was found to be highly significant in the model and was also correlated with AGE (Fig. 4 in Appendix). These variables highlight the importance of spectral information, age, and NDVI fluctuations during the stand growth. The regression and the dispersion of the residuals of the validation data for this model are shown in Fig. 2.

Table 4 Coefficients and significance levels for variables of the linear model mod_ls and the GAM model 
Fig. 2
figure 2

Dispersion and residual graphs between the predicted and measured stem volume values (left) and Residual Plot (right), obtained on the validation dataset for the Linear, KED, and GAM models, after the mod_ls variable selection method (See Table 3). The regression line with a 95% confidence interval, the RMSE (m3.ha−1), and R-squared are displayed in the left graph

3.3 Analysis of stem volume prediction generated by KED and GAM

The variables selected with mod_ls were implemented as external drift in the KED algorithm. The spatial dependence structure was obtained from the residuals variogram. The best fit of the variogram was with Martén M. Stein’s model (Ste) after testing the RMSE of the variogram fit with different model types. The range value was 2750 m, the nugget was 92.5 m3.ha−1, and the sill was 179 m3.ha−1. After calibration, the KED was applied to the independent validation dataset, resulting in an R2 of 0.96 and an RMSE of 10.62 m3.ha−1. The auxiliary variables explained approximately 94% of the variance, while the spatial correlation term explained 5%.

The GAM model was also applied to the validation dataset and produced similar results to the KED, with an R2 of 0.96 and an RMSE of 10.56 m3.ha−1. In this case, the auxiliary variables accounted for approximately 98% of the variance, while the spatial correlation term accounted for only 1.2%. The coefficients were similar to those in the linear model mod_ls, with only a slight impact on the significance of some variables (Table 4).

3.4 Application of the linear, KED, and GAM models

The Linear, KED, and GAM models used the same set of six variables derived from SPOT-6 and Landsat 5, 7, and 8 images. The KED and GAM models, which incorporated the spatial structure of the data, provided the most accurate predictions in the independent dataset (Fig. 2). The stem volume map was produced by applying the models at a pixel-scale resolution of 7.5 m. This process is illustrated for GAM in a small area in Fig. 3, and a comparison with the linear model and KED is shown in Fig. 6 in Appendix. These comparisons reveal only minor visual differences between the models. Figure 3 shows significant changes in stem volume between stands, primarily due to differences in management (such as age), as well as within-stand variations.

Fig. 3
figure 3

Example of a map of distribution of the estimated volume (m3 ha−1) obtained from the GAM model, predicted for August 2018. No data values in white corresponds to other land use around the stand, or to gaps within the stand previously masked

4 Discussion

4.1 Optimal variable selection for predicting wood volume in linear models

The vegetation indices had a positive and moderate correlation with stem volume, which is consistent with Dos Reis et al. (2018). NDVI and SAVI, in particular, showed correlation coefficients of R = 0.49 and R = 0.47, respectively. It is important to note that these indices are primarily associated with the green leaf surface within a pixel. During the initial months of the plantation, when the canopy has not yet fully developed, the green leaf surface may be correlated with wood biomass or volume. However, the correlation between NDVI and canopy tends to decrease over time. This is due to the saturation of NDVI at high values as the canopy matures and closes. This phenomenon has been observed in various plantations after canopy closure (Le Maire et al. 2011a, b, c; Avitabile et al. 2012; Souza et al. 2019). After canopy closure, wood volume continues to increase while leaf area stabilizes or even decreases (le Maire et al. 2019). This is observed in the NDVI time series example shown in Fig. 1.

Texture measurements have been explored as an alternative to vegetation indices for improved discrimination in forest structure (Sarker and Nichol 2011; Dube and Mutanga 2015). However, in the present study, they only showed a moderate correlation with the stem volume. The final models included a combination of texture indices and reflectance bands (Table 3), but not in the best model, mod_ls. Castillo-Santiago et al. (2013) found moderate correlations between texture and biomass, which varied depending on the spectral region. However, Lu (2006) study showed that texture alone may not be sufficient to establish relationships with wood volume estimates. This is primarily because these relationships tend to be site-specific.

In the context of short-rotation planted forests, it is possible to analyze the stand’s history using time-series data that extends from the planting date. The variables derived from the Landsat time-series data included the minimum and maximum NDVI values within stand age intervals, or the integration of NDVI over these age intervals. Stand age was included as a distinct variable, obtained directly from the stand manager rather than from remote-sensing. Interestingly, the planting date itself can be obtained from the time-series data (Le Maire et al., 2014). Stand age had a high correlation with stem volume (R = 0.83), but was not included in the variable selection process due to its substantial correlation with the integral NDVI variables.

Marsden et al. (2010a, b) and Le Maire et al. (2011a, b, c) used NDVI time-series data derived from MODIS satellite imagery (at a 250-m pixel resolution) to calculate NDVI integrals at the stand level, spanning from planting date to inventory date. They found a strong correlation with stand-scale biomass. This study confirmed the direct correlation, with a R value of 0.89 between volume and INT_O_DM. The correlation between MODIS data integrals and volume estimates was initially established at the stand scale. However, it remained high at the local scale, even with the spatial resolution of Landsat reduced to 30 m and volume estimates based on local inventory plots of 500 m2. Despite the less frequent data acquisition frequency of Landsat in comparison to MODIS, the strength of this relationship was not diminished.

4.2 KED and GAM geostatistical models for stem volume mapping

The spatial prediction of the stem volume was slightly improved with KED or GAM, compared to the linear model alone. This suggested the presence of spatial patterns within the data that were not adequately accounted for by the remotely sensed data alone. The proportion of variance explained by auxiliary variables was high (94%) compared to that explained by spatial correlation for both KED and GAM. This is consistent with the relatively minor improvement compared to the linear model. Although KED slightly outperformed GAM in capturing spatial variations, it is worth noting that the KED method required more computational resources and the selection of a variogram model. Therefore, we chose to use GAM to produce the final stem volume map, as shown in Fig. 3.

Previous studies have shown that KED is effective in improving volume predictions and spatial representations. For example, Lochhead et al. (2018) found that KED was the most accurate method for predicting various forest attributes in Canada’s boreal forest among the tested nonlinear models. Viana et al. (2012) found limited utility of kriging in Pinus stands due to low data spatial correlation. However, they emphasized its potential in forests with more sampling units, higher spatial continuity, and by testing diverse auxiliary variables. Other studies have highlighted the advantages of the GAM spatial model in volume or biomass predictions. Alberdi et al. (2020) found that GAM could explain a greater variability in the data and outperformed other methods in estimating wood stock, especially in areas lacking wood stock information. Maack et al. (2016) showed that GAM has potential for large-scale wood volume interpolations by combining inventory data, outperforming models like Random Forest in this context.

The study assessed model uncertainties by testing on an independent subset of data. However, further work is required to improve the analysis of the uncertainty associated with model predictions, and its propagation across scales. Estimating uncertainty is complex, particularly when several models are combined in a hierarchical manner (Saarela et al. 2016), such as a model to predict individual tree volume (including uncertainty on the tree measurements), a model to predict plot scale volume (including uncertainty from the inventory measurements), and a model to interpolate spatially the plot-scale volume (including uncertainty from the auxiliary data). Saarela et al. (2020) proposed a hierarchical model-based approach that considers multiple sources of uncertainty at two different scales: the tree-level biomass model and models linking the plot-level biomass to auxiliary variables (in their case coming from LiDAR). This approach allowed for the attribution of the uncertainty to the different models, and the generation of uncertainty maps and uncertainty estimates for regional averages. A similar approach could be used to test the hypothesis that model uncertainty in tree-level and plot-level volume predictions only marginally contributes to the final uncertainty on these clonal plantations.

4.3 Model genericity and future directions for improving high-resolution mapping of forest volume

The models were created for eucalypt plantations in a specific pedoclimatic context. The spatial models developed inherently rely on local calibration with inventory data, but the method can be applied to other regions and other plantations. The linear model used to link stem volume to satellite-derived features in the spatial models is specifically designed for short-rotation plantations with dynamic growth. The most important variables are linked to NDVI integrals since planting date. Our study found a robust correlation, resulting in a low RMSE of 12.44 m3/ha. Marsden et al. (2010a, b) obtained an RMSE of 24 m3/ha for Eucalyptus plantations, which was almost double of our result. This difference may be due to greater stand variability across a larger area, encompassing diverse pedoclimatic conditions and growth histories. It is worth noting that our linear model’s additional features may have improved the predictions compared to their model. Stand age alone could replace NDVI integrals, as discussed before (Dube et al. 2015; Dos Reis et al. 2019). However, this solution reduced the model accuracy, as shown in the present study, and the intra-stand spatial variability of volume relies therefore only on other spatial features of the models (spectral reflectance ratios, etc.). The locally developed approach in our study likely applies to various eucalypt growth conditions in Brazil, with precision subject to local variations. Further exploration is necessary to confirm its applicability in different locations, especially in stands that have experienced damages resulting in local volume heterogeneity.

To improve the precision of high-resolution forest volume mapping, several strategies, and further research may be considered: (i) Incorporating additional information on the stand, such as planting density, management, genotype composition, or environmental variables (e.g. Stape and Alcarde Alvares 2023) to improve volume prediction in the linear model. This can provide more accurate volume prediction combined with the variables coming from satellite images; (ii) Leveraging other types of remote sensing data, such as higher-resolution Sentinel 2 NDVI data (10 m), and radar and lidar, to enhance the volume prediction. These data sources provide different information on the forest stand structure and complement the optical satellite data; (iii) Adding more plots in diverse locations to improve and validate the volume maps.

The method developed in this study has various applications, primarily improving stand volume precision by addressing intra-stand variability that may not be captured by measurements of only a few inventory plots. This method is cost-efficient and scalable to cover thousands of hectares, in contrast to more precise but expensive and local methods based on Lidar acquisition, for instance. The prediction of stem volume, which can be obtained for instance at the first inventory around 2 years of age, can guide the selection of a few additional temporary plots. The method of complementing permanent plots with temporary plots based on satellite analysis was found to be efficient for the National Forest Inventory (Räty and Kangas 2019). In rapidly expanding plantations, this can ensure that inventory plots are more representative of the entire study area. This improves the volume prediction accuracy across different stand ages, particularly at the pre-harvest inventory date.

5 Conclusion

The study proposed a novel approach that combined spectral, textural, and temporal features extracted from both SPOT-6 and Landsat 5, 7, and 8 satellite images, along with spatial models and forest inventory data, to map the stem volume of fast-growing plantations. Out of the 86 remotely sensed variables, only six were selected in a linear model calibrated with the Lasso algorithm. These key variables include integrals of NDVI time-series from planting date to measurement date or to 2 years of age, integral between 2 years of age and the date of maximum NDVI, and spectral reflectance ratios between bands 1 and 2, 2 and 4, and 3 and 2. The linear model alone yielded precise results, with R2 of ~ 0.95 and RMSE of ~ 12.44. The same variables were used as auxiliary variables in KED and GAM models. They provided slight improvements in prediction accuracy, with R2 of ~ 0.96 and RMSE of ~ 10.6 m3.ha−1, and captured finer spatial details, enhancing the robustness and precision of the model’s results. These spatial models were applied in the forest area, distinguishing regions with different stand ages and estimating the within-stand variability of wood volume.

Availability of data and materials

The data cannot be made available in a public repository due to proprietary issues. Data sharing agreement can be set up upon request.

References

Download references

Acknowledgements

We thank the Kotab Group and in particular Carlos Juliano Santos for collecting forest inventory data and collaborators Emily Tsiemi Shinzato and Esthevan Augusto Goes Gasparoto for processing the images. We thank the two anonymous reviewers for their thorough evaluation of the paper and relevant remarks that helped greatly in improving the article.

Code availability

The custom code and/or software application generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Funding

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001, through a scholarship granted to Lívia Lanzi Aló.

French “Programme National de Télédétection Spatiale” (PNTS) project EucInv.

Author information

Authors and Affiliations

Authors

Contributions

LLA: Conceptualization, Methodology, Formal analysis, Investigation, Data Curation, Writing—Original Draft, Visualization.  GM: Conceptualization, Methodology, Formal analysis, Resources, Writing—Original Draft, Visualization, Supervision, Project administration, Funding acquisition.  CRT: Formal analysis, Writing—Review & Editing.  TSM: Resources, Writing—Review & Editing.  RCP: Writing—Review & Editing.  JRSP: Conceptualization, Methodology, Formal analysis, Resources, Writing—Review & Editing, Supervision, Project administration, Funding acquisition.  The authors read and approved the final manuscript.

Corresponding author

Correspondence to Guerric le Maire.

Ethics declarations

Ethics approval consent to participate

Not applicable.

Consent for publication

All authors gave their informed consent to this publication and its content.

Competing interests

The authors declare that they have no competing interests.

Additional information

Handling editor: Matteo Vizzari

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

figure 4

Fig. 4 Spearman correlation matrix for 62 of the 86 variables listed in Table 2, derived from 211 forest inventories of Eucalyptus stands across various age classes. To enhance interpretability, the matrix displays the absolute values of the correlations. Texture variables from B1 and B2, which showed strong correlations with those from B3, were excluded. The variables are arranged in order to place those with high correlations adjacent to each other.

figure 5

Fig. 5 Spearman correlation between each of the 86 variables listed in Table 2 and the stem volume. Significant correlation are in red, with plus symbols for positive correlation (p<0.05) and dots for negative correlations. Variables are ordered by type of variables, as in Table 2.

figure 6

Fig. 6 Example of a map of distribution of the estimated volume (m³ ha-1) obtained from the linear and KED models (see Fig. 3 for GAM), predicted for August 2018. No data values in white corresponds to other land use around the stand, or to gaps within the stand previously masked.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aló, L.L., le Maire, G., Thiersch, C.R. et al. Mapping stem volume in fast-growing eucalypt plantations: integrating spectral, textural, and temporal remote sensing information with forest inventories and spatial models. Annals of Forest Science 81, 43 (2024). https://doi.org/10.1186/s13595-024-01255-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13595-024-01255-6

Keywords