Skip to main content
  • Research Paper
  • Published:

Combining low-density LiDAR and satellite images to discriminate species in mixed Mediterranean forest

Abstract

Key message

Using a combination of remote sensing data, Pinus pinaster Ait. and Pinus pinea L. were distinguished at individual tree level in mixed Mediterranean stands with over 95% accuracy. This approach is easily applicable over large areas, enhancing the economic value of non-wood forest products, stone pine nuts, and resin, and aiding forest managers to accurately predict this production.

Context

The discrimination of tree species at individual level in mixed Mediterranean forest based on remote sensing is a field which has gained greater importance. In these stands, the capacity to predict the quality and quantity of non-wood forest products is particularly important due to the very different goods the two species produce.

Aims

To assess the potential of using low-density airborne LiDAR data combined with high-resolution Pleiades images to discriminate two different pine species in mixed Mediterranean forest (Pinus pinea L. and Pinus pinaster Ait.) at individual tree level.

Methods

A Random Forest model was trained using plots from the pure stand dataset, determining which LiDAR and satellite variables allow us to obtain better discrimination between groups. The model constructed was then validated by classifying individuals in an independent set of pure and mixed stands.

Results

The model combining LiDAR and Pleiades data provided greater accuracy (83.3% and 63% in pure and mixed validation stands, respectively) than the models which only use one type of covariables.

Conclusion

The automatic crown delineation tool developed allows two very similar species in mixed Mediterranean conifer forest to be discriminated using continuous spatial information at the surface: Pleiades images and open source LiDAR data. This approach is easily applicable over large areas, enhancing the economic value of non-wood forest products and aiding forest managers to accurately predict production.

1 Introduction

Forest management is currently undergoing important changes and challenges with the up-to-date development and use of new tools built up based on remote sensing technology. In general, field data have always been gathered manually, but nowadays, the application of new technology in forest inventories has revolutionized this field and has brought down the costs associated with forest inventories (Barrett et al. 2016). These new technologies have the greatest potential for evaluating forest characteristics, given that canopy vegetation height, one of the key data on forest resources evaluation, is a function of species composition, site quality, and age; hence, this data can be used for land cover classification, habitat mapping, and forest management (Dubayah and Drake 2000). Additionally, forest parameters related to tree height and stem volume at both plot and individual tree level can be estimated using the information from remote sensing systems (Popescu et al. 2003; Hollaus et al. 2009; Tonolli et al. 2011; Ruiz et al. 2016; Castaño-Díaz et al. 2017; Arias-Rodil et al. 2018).

An increasing amount of evidences suggest that mixed-species forests, in comparison with monospecific ones, provide a greater number of ecological and socio-economic goods and services, even in Mediterranean ecosystems (Barba et al. 2013; Riofrío et al. 2017). Consequently, the management of these forests should be based on a comprehensive assessment of their wide diversity and multifunctionality, as well as on accurate species classification. Furthermore, when the management of these stands is aimed at the individual tree level rather than stand level, species discrimination is essential in order to provide an inventory for the different resources. The use of remote sensing techniques offers the advantage that such questions can be addressed at large scales in mixed forests, including at landscape level.

The high economic and social importance of non-wood forest products (NWFP) is one of the main factors that distinguishes Mediterranean forests from other temperate forests (Calama et al. 2008). The production of stone pine nuts from Pinus pinea L. and resin from Pinus pinaster Ait. in the Mediterranean forests of the Northern Plateau of the Iberian Peninsula has made an important contribution to rural activity since ancient times (Lopez-García 1980; Nanos et al. 2000). In many stone pine forests, pine nuts are more profitable than timber, with prices reaching over 100 €/kg, in turn favoring the NWFP transformation industry. In the case of resin, tapping is an activity which is currently undergoing resurgence. It was very important until the end of the 1970s, when the increase in the labor costs and international competitiveness of Chinese natural resins tapped from Pinus massoniana Lamb. rendered this traditional activity no longer profitable (Soliño et al. 2018). In the recent years, within the context of the global financial crisis resulting in higher rates of rural unemployment in Spain, together with the drop in Chinese resin exports, and an increasing demand for high-quality secondary products for cosmetics, pharmacy, and healthcare, resin tapping has become a profitable activity again, and many of the abandoned stands are being tapped again (Rodríguez-García et al. 2015; Soliño et al. 2018). Pinus pinea L. and Pinus pinaster Ait. share autoecological conditions (Alonso et al. 2012), so pine nuts and resin are both harvested in Mediterranean mixed pine forests. Hence, the accurate assessment of nut and resin yields is particularly important, not only for the forest management decision-making process but also as regards associated industrial process and evaluation of market prices.

Classification of tree species assisted by remote sensing data is motivated by a wide variety of applications for the sustainable forest management of wood resources and ecosystem services (European Environmental Agency 2006; Fassnacht et al. 2016). Thus, to know the tree species distribution in mixed stands is important, due to the fact that different species produce very different NWFP and, therefore, determine the forest management. Several studies have used remote sensing data, Light Detection and Ranging (LiDAR), and multispectral images to predict stand classification (Korpela et al. 2010; Ørka et al. 2012; Ballanti et al. 2016), for both species groups and individual tree species (Suratno et al. 2009a; Deng et al. 2016). Distinguishing between two species of the same genus is especially difficult, since the differences between them are often subtle. Species that show real-world differences in physical and/or spectral properties should be clearly recorded by the sensors (Vauhkonen et al. 2014; Fassnacht et al. 2016). However, when the studied species are similar, slight differences in crown structure (e.g., umbrella and apical crowns in P. pinea and P. pinaster respectively) and stem slenderness could be the key to discriminate between them, while spectral information concerning chlorophyll contents should help in this work.

The combined use of LiDAR and satellite images results in a much more complicated analysis than if LiDAR or satellite images are separately used. Besides, when the forest studied contains several species, each of them with their own differential structural characteristics, the challenge is even larger (Vauhkonen et al. 2014). In such stands, where species must be correctly classified, additional complexity is involved in the modeling process (crown individualization) if low-point density (< 1 point per m2) LiDAR data is used (Suratno et al. 2009b; Dalponte et al. 2012). Remote sensing images have been widely used for tree species classification in order to obtain spatially detailed species information (Xie et al. 2008; Dalponte et al. 2012). New satellite images with high spatial resolution are available today, such as the 50-cm images provided by the Pleiades constellation of satellites, which have led to improved analyses. Moreover, Pleiades images have recently been used in studies focusing on forest biomass modeling and forest structure mapping through image texture analysis (Beguet et al. 2014; Rougier and Anne 2014; Maack et al. 2015).

Current estimation and forecasting of cone and resin production are based on individual tree-level models (Nanos et al. 2001; Moreno-Fernández et al. 2013; Calama et al. 2016), thus sound estimates should be based on a previous correct assessment of tree-level features such as species and size. Individual tree delineation from LiDAR data can be used to define individual structure for many wood species (Popescu et al. 2003; Valbuena et al. 2016a). The individual crown delineation allows us to obtain tree-level variables from LiDAR which are included in the algorithms for species classification. Therefore, these tree-level LiDAR variables might also be applied to predict individual tree NWFP production by means of the currently existing models, which depends on the tree as an individual entity (Popescu 2007). Furthermore, species discrimination is often determined satisfactorily using satellite data. Hence, the operational process in pine stands should consist of first carrying out a process of crown delineation using LiDAR data, then applying classification models to identify species and extract tree-level attributes, and subsequently implementing pre-existing production models for each NWFP. Thus, the integration of the methods should permit assessing NWFP over large areas (Lee et al. 2016).

The aim of our study is to analyze the potential of using low-density LiDAR data combined with high-resolution Pleiades images to discriminate different Mediterranean pine species in pure and mixed Pinus pinaster Ait. and Pinus pinea L. forests, based on an automatic algorithm for crown delineation. The practical motivation is that by distinguishing these two species at a tree scale this will permit us to apply the pre-existing models for different non-wood forest products (resin or pine nuts) which act at a tree-level scale (e.g., Calama et al. 2016; Nanos et al. 2004). For this purpose, (i) in a first phase, a Random Forest model was trained with a dataset collected in pure stands of both species, to determine which LiDAR and satellite variables allow us to obtain the best discrimination between groups, and (ii) the Random Forest constructed was then validated by classifying individuals into an independent set of mixed and pure stands. Our hypotheses are (i) combining LiDAR and Pleiades data to discriminate species in mixed Mediterranean conifer forests should be more efficient than the use of each technique separately, (ii) this combination should provide an efficient tool to discriminate pine species in mixed Mediterranean forests at individual tree level, and (iii) the results for Pinus pinea L. should be more accurate than those for Pinus pinaster Ait. due to the fact that the former usually grows in open stands.

2 Material and methods

2.1 Study area

Forested land accounts for almost a third of the total land area of Spain (18.27 million ha). Pure forests of Pinus pinea L. cover 401,701 ha, while Pinus pinaster Ait. forests make up 1,131,901 ha (MAGRAMA 2012). The study area is situated in the Northern Plateau of the Iberian Peninsula (Fig. 1), a region with 65,275 ha of pure Pinus pinea L. forests, 271,000 ha of pure Pinus pinaster Ait. forests, and 53,000 ha of mixed Pinus pinea L. and Pinus pinaster Ait. forests (López et al. 2009). The Duero river basin is on the Northern Plateau, 700–800 m a.s.l., with a homogenous Mediterranean continental climate, characterized by low average annual rainfall (450 mm), strong summer drought (< 50 mm), and low average annual temperatures (12 °C). Soils have a very high sand content (> 90%) and very low water holding capacity (WHC < 100 mm), except in the upper areas where limestone soils with a percentage of clays and limes (> 40–50%) reach WHC values > 250 mm.

Fig. 1
figure 1

Study area. Locations of training and validation sets

The plots within the study area were clustered in three different forest types: pure stands of Pinus pinea L., pure stands of Pinus pinaster Ait., and mixed stands, where the least represented species accounted for at least 35% of the total number of trees. All considered stands were at the mature development stage, when regeneration is quite poor.

2.2 Field data

In this study, we used a subset from the network of permanent plots installed by the Forest Research Centre of the Spanish National Institute for Agricultural Research (INIA-CIFOR) in the province of Valladolid. Mixed and pure Pinus pinaster Ait. plots were installed between the summer of 2015 and spring of 2016. Pure even aged Pinus pinea L. plots were installed in 1996 in cooperation with the regional forest service and have been regularly monitored since then (the most recent inventory was carried out in 2016). Plots were selected in order to cover the whole range of site conditions, stand stocking and age identified within the region. The plots were all located in public mature forests, and at installation, the stand should have not been altered (thinnings, final cuttings, or pruning) during at least the previous 5 years.

The plots were circular in shape, with a variable radius. Pure plots included at least 20 trees, and mixed plots included at least 10 trees of the least represented species. Plot center coordinates were recorded using a high-precision submeter positioning Garmin GPS (Garmin International Inc., Missouri, USA), with an accuracy below 1 m. Polar coordinates (x,y) of each tree within the plot were computed after measuring their distance and heading from the center of the plot using a VERTEX IV (Haglöf, Sweden) distance meter and a compass. Trees were then numbered and measured. Tree measurements included species, diameter at breast height (1.3 m) (DBH, cm), total height (Height, m), crown base height (Live_Branch, m), and crown radius (Crown_Radio, m). Afterwards, each tree measured during the fieldwork was plotted on a map. From the whole available network of plots, the least represented conditions of species composition were both pure Pinus pinaster Ait. plots and mixed P. pinasterP. pinea ones. Thus, all the available plots from pure Pinus pinaster Ait. stands and mixed stands were included in the sample. In a second phase, we selected all the available pure Pinus pinea L. plots located in the vicinity of the pure P. pinaster and mixed plots. In the end, 27 plots were selected for the present study (Fig. 1): nine pure Pinus pinaster Ait. plots, six mixed plots, and 12 pure Pinus pinea L. plots. Of these, seven plots in pure Pinus pinea L. stands and five plots in pure Pinus pinaster Ait. stands were randomly selected and used for training the model. The remaining 15 plots (five in pure Pinus pinea L. stands, four in pure Pinus pinaster Ait. stands, and six in mixed stands) were therefore used to validate the model. All individual trees measured inside each selected plot were included in the sample (Blázquez-Casado et al. 2019).

2.3 Remote sensing data

Low-density LiDAR data for the study area was provided by the National Aerial Ortophoto Program (PNOA). Point clouds were captured in 2010 using a laser scanner, Leica ALS60 sensor (Heerbrugg, Switzerland), with a mean density of 0.5 points per m2 and a root mean square error of 20 cm in altitude accuracy. Digital files, classified and colored, were downloaded in .laz format (2 × 2 km).

Pleiades is a constellation of satellites of the National Centre for Space Studies (CNES, French Government) which provides very-high-resolution images. It was designed by the Optical and Radar Federated Earth Observation (ORFEO) program. It was initialized in 2004, and the first year of the satellite life was in 2011 (Tinel et al. 2012). Pleiades 1A and Pleiades 1B operate as a constellation in the same orbit, phased 180° apart. The identical twin satellites capture images which are transferred, orthorectified, and geometrically corrected by the Spanish Remote Sensing National Plan (PNT). Images were captured in June–July 2014, and each one covers an area of 400 km2. The Pleiades images comprise four multispectral bands with a spatial resolution of 2 m. The spectral ranges of the four bands are 430–550 nm (blue), 490–610 nm (green), 600–720 nm (red), and 750–950 nm (infrared). Additionally, there is a panchromatic band (480–830 nm) with a spatial resolution of 0.5 m.

The time delay in the field validation of the remote sensing data does not increase the level of uncertainty since in these mature stands large changes in such short periods are not expected, particularly if there is no disturbance. The low productivity in these inner Mediterranean stands (mean annual volume increment about 1 m3 ha−1) confirms this issue. In addition, as field data came from a net of permanent plots, we can confirm little change on time in the structure (number of stems per ha, stand composition) in the last few years.

2.4 Remotely sensed data processing

2.4.1 LiDAR data processing

LiDAR point files (in .laz format), which include the field plots, were compiled and normalized, subtracting their corresponding ground class points. Points below 3 m were eliminated as it was considered that this was the lowest level at which a distinction between bushes and trees could be drawn. Points over 50 m were considered “mistakes”, as there are no trees above this height in the study area; hence, they were also removed. The algorithm for tree delineation is the one by Valbuena et al. (2016a), written as a function in the SQL language for use in PostgreSQL 9.2 with the POSTGIS 2.0 add-on, which gave the database manager geometrical and other GIS functions. It was applied to each normalized point cloud in .shp format. The algorithm is based on cloud point LiDAR analysis, searching directly for relative maximums. Once a relative maximum has been located, the next lower LiDAR return is found, and the distance between them is calculated. The algorithm determines whether this distance implies that the point is a new apex or whether it forms part of the first crown, thus allowing individual tree crowns to be delineated. To reach each crown, the algorithm divides the point cloud into layers half a meter deep and analyzes the distance from each point to the crown polygons already delineated in each layer, in a downward direction (Valbuena et al. 2016a). This process delineates the horizontal projection of each crown as a polygon that takes in all the points which impact on each tree, identifying individual tree crowns and drawing them in vector format (Fig. 2). The result takes the form of a .shp layer that contains each crown polygon for each tree in graphic form and an associated table with the following data for each crown: maximum height of tree crown (Max_Height, m), minimum height of tree crown (Min_Height, m), crown area of individual tree (Crown_Area, m2), and crown height of individual tree (Crown_Height, m), equal to the difference between Max_Height and Min_Height. Density (Density, trees per ha) was calculated from the centroid of the polygon using the six-tree sampling method (Prodan 1968).

Fig. 2
figure 2

Crown delineation in a mixed stand using the Valbuena et al. (2016a) algorithm

In order to characterize the structural differences between Pinus pinea L. and Pinus pinaster Ait., three relations were derived from the previous variables: heights relation (HR), calculated by the relationship between maximum height and minimum height ((Max_Height − Min_Height)/Max_Height); area and height relation (AHR), estimated as the quotient between crown area and maximum tree height (Crown_Area/Max_Height), revealing the relationship between Pinus pinea L. and Pinus pinaster Ait. outline crowns, something each species is usually characterized by; and crown and height relation (CHR), calculated as the relationship between maximum height and crown base height ((Max_Height − Crown_height)/Max_Height), it expresses the relationship between height tree and natural pruning height, which is generally higher in Pinus pinea L. than in Pinus pinaster Ait. Thus, six LiDAR variables got into modeling: Crown_Area, Crown_Height, Density, HR, AHR, and CHR (Tables 1 and 2).

Table 1 Descriptive statistics of the training dataset. Pp_p: Pinus pinea L. individual trees in pure stand plots, Pt_p: Pinus pinaster Ait. individual trees in pure stand plots
Table 2 Descriptive statistics of validation dataset. Pp_p: Pinus pinea L. individual trees in pure stand plots, Pt_p: Pinus pinaster Ait. individual trees in pure stand plots, Pp_m: Pinus pinea L. individual trees in mixed stand plots, Pt_m: Pinus pinaster Ait. individual trees in mixed stand plots

2.4.2 Tree crown selection

Following the process of individual tree crown delineation, a subset of trees with clearly identified crowns and which had been matched with trees in the field inventory of the plot was selected for the analysis. Selected training data in pure stands is more reliable compared with mixed ones, due to the extreme importance of high-precision tree location at the individual tree level. Once the delineation algorithm generated the crowns, both shapes were matched one to one automatically through the QGIS intersect tool (Quantum GIS Development Team 2017), throwing away crowns when more than one measured tree coincided with one delineated crown to guarantee that each crown coincided with one measured tree. Therefore, 95.5% and 81.0% of trees measured have respectively matched with crown delineation data in P. pinea and P. pinaster pure stands. On the other hand, matching between field data measured and delineated crowns in mixed stands was 126.0% due to an overestimation of delineated crowns. This resulted in 42 Pinus pinea L. trees and 76 Pinus pinaster Ait. trees from the pure forest plots to be used for training the model (Table 1), while 82 Pinus pinea L. (36 from pure plots and 46 from mixed plots) and 71 Pinus pinaster Ait. (36 from pure plots and 35 from mixed stands) were selected for the validation process (Table 2).

2.4.3 Pleiades data processing

Firstly, 0.5-m high-resolution multispectral images were created from the pansharpening algorithm of high-resolution panchromatic and lower resolution multispectral imagery. Based on the previous individual crown delineation, variables derived from Pleiades images were estimated for each object. Pleiades reflectance information was assigned to crown delineation with the mean values of all pixels within each crown, estimating blue band value, green band value, red band value, and infrared band value. Data from each band was assigned to the generated objects from the crown delineation using the QGIS Zonal Statistics Plugin.

In addition, different image bands were used to calculate two spectral indices: NDVI Normalized Difference Vegetation Index (NDVI) (Rouse et al. 1973) and EVI Enhanced Vegetation Index (EVI) (Liu and Huete 1995). NDVI was selected as it is the most commonly used vegetation indicator, capable of assessing the quantity, quality, and developmental stage of vegetation. It was calculated using Pleiades bands following the equation: [(Infrared − Red)/(Infrared + Red)], where infrared and red refers to the recorded Pleiades values in the infrared and red bands, respectively. EVI is an optimized vegetation index designed to enhance the vegetation signal with improved sensitivity which reduces the adverse effects of environmental factors such as atmospheric conditions and soil background. This was calculated using the equation: [2.5 × (Infrared − Red)/(Infrared + 2.4 × Red + 1)]. Statistics for the training and validation dataset are also shown in Tables 1 and 2. The six variables from satellite data that were finally available for modeling were blue band, green band, red band, infrared band, NDVI index, and EVI index.

2.5 Classification technique

Random Forest classifier (Breiman et al. 2001) is a machine learning methodology based on independent decision trees providing diverse ways to explore numerically and graphically complex relationships, improving the accuracy of the model prediction (Valbuena et al. 2016b; Vega Isuhuaylas et al. 2018). It averages several decision trees, so there is a significantly lower risk of overfitting and it also reduces the chance of stumbling across a classifier that does not perform well. A large number of classification trees are produced from a random subset of training data, with permutations introduced at each node, selecting the most common classification result. In this study, Random Forest was used to classify species based on LiDAR and satellite data at individual tree level.

R software (R Core Team 2016) was used to generate the model, using the Random Forest library as is common practice in forest science (Maschler et al. 2018; Shi et al. 2018). Three parameters are needed to optimize the model: (i) mtry, number of variables randomly sampled as candidates at each split; (ii) ntree, number of bootstrap replicates; and (iii) nodesize, the minimum size of terminal nodes. To identify those parameter values leading to the best species discrimination, the model was optimized based on out-of-bag error estimates (OOB), defined as the rate of classification error estimated for the different subsampling sets from the training set used to train the model classifier. In addition, the Random Forest algorithm can estimate the importance of the variable in every model, showing the mean decrease of accuracy, which is defined as the loss of accuracy measured by the OOB-error when leaving out a variable (Breiman et al. 2001). Higher values indicate higher importance of each variable for the classification.

Three training models were developed. First, the “LiDAR model” was constructed only using LiDAR variables, then the “Spectral model” was built including only satellite variables, and finally, all available variables were incorporated into the so-called “Complete model”. Each of them was designed using the same approach, attempting to identify the best combination of the three Random Forest parameters and they were evaluated based on out-of-bag error estimation. The training model which provided the greatest predictive capacity was then validated in terms of classification accuracy on the three independent validation datasets: (i) individual trees in pure Pinus pinea L. stand plots, (ii) individual trees in pure Pinus pinaster Ait. stand plots, and (iii) individual trees in Pinus pinea L. and Pinus pinaster Ait. mixed stand plots. Validation was done by (i) building confusion matrices which show the relationship between the false positive fraction and true negative fraction (or vice versa), (ii) estimating overall accuracy (OA) as ([number of correctly classified Pinus pinea L. + number of correctly classified Pinus pinaster Ait.]/total number of trees used in the validation), and (iii) calculating receiver operating characteristic (ROC) area under the curve (AUC) in order to quantify the uncertainty in models’ prediction (Zipkin et al. 2012). ROC curve is a plot of the sensitivity of a diagnostic test against one minus its specificity, as the cut-off criterion for indicating a positive test is varied (Everitt 2006). As discussed in Hosmer and Lemeshow (2000), the quantitative-qualitative relationship between the AUC and prediction accuracy can be classified as follows: less than 50% suggests no model discrimination, 50–70% is considered as poor discrimination, 70–80% is considered as acceptable discrimination, and more than 80% suggests excellent discrimination.

3 Results

3.1 Training model selection

To achieve the best discrimination between the two pine species, Pinus pinea L. and Pinus pinaster Ait., three training models were developed based on data from individual trees in pure stand plots. The “Spectral model” is less accurate than the “LiDAR model” (Table 3), so species discrimination based on the latter is better. However, the “Complete model”, which combines LiDAR and satellite variables, provided the lowest out-of-bag error estimates.

Table 3 Accuracy of the three training models in classifying Pinus pinea L. and Pinus pinaster Ait. individual trees in pure stands

Thus, the model selected was the one combining LiDAR and Pleiades data since it provides the greatest accuracy. The “Complete model” possesses a high degree of predictive and explanatory power, correctly classifying 93.22% of cases (i.e., OOB estimate of error rate was 6.78%). In other words, a high proportion of Pinus pinea L. and Pinus pinaster Ait. trees were correctly identified. Figure 3 shows the relative importance of each variable in the selected model, as provided by Random Forest, revealing which variables from LiDAR and satellite data are the most significant in discriminating between species. The most important variables defining the characterization of the composition are Crown_Height, AHR (area and height relation), NDVI index, EVI index, infrared band, green band, Crown_Area, blue band, red band, Density, HR (heights relation), and CHR (crown and height relation). The first three variables are highly related to differences in crown and stem shape between the two species.

Fig. 3
figure 3

Mean estimated variable importance in the Random Forest “Complete model”. The maximum value corresponds to the main discriminating variable and the rest are presented in relation to this score

3.2 Validation model

The validation process was carried out on an independent sample of pure and mixed plots. Table 4 shows the confusion matrix obtained for the selected “Complete model” for the validation on pure datasets. In the case of Pinus pinea L. trees, the model correctly classified 72.2% of individuals. In contrast, the level of efficiency obtained in the case of Pinus pinaster Ait. was 94.4%. The overall accuracy of the model when used for plots in pure stands was 83.3%, a mid-precision value between the results observed in the training model and those from the validation performed on mixed datasets (Table 5). The results indicate that Pinus pinea L. is well discriminated although sub-estimated, because Pinus pinea L. are sometimes classified as Pinus pinaster Ait., whereas the opposite does not occur. Hence, the “Complete model” achieves a high-confidence classification of both Pinus pinea L. and Pinus pinaster Ait. trees. Also, the area under the curve (AUC) value also indicates that the model classifier is discriminating between both species with an average rate of 89.6% as shown in Fig. 4a. Thus, the model applied in pure stands showed excellent discrimination in its prediction.

Table 4 Confusion matrix obtained from the validation of pure datasets
Table 5 Confusion matrix obtained from the validation of individual trees in mixed stand datasets
Fig. 4
figure 4

ROC curve of the pure and mixed dataset. a Pure stand dataset. b Mixed stand dataset. AUC area under the curve

Accuracy results for the selected model on the mixed dataset validation are presented in Table 5. The efficiency of the model in classifying the two species when compared with field data showed that 63.0% of individual trees were correctly classified. The model can be classified as of average accuracy. In the case of Pinus pinaster Ait., 74.1% were successfully classified while in Pinus pinea L. the percentage was 54.3%. The AUC value from mixed stands was 63.4% (Fig. 4b). The model applied in mixed stands had lower discrimination power than in pure stands.

4 Discussion

LiDAR data provide a widely accepted tool for characterizing forest structure and attributes such as total biomass or volume (Lim et al. 2003; Næsset 2004; Miura and Jones 2010; Nguyen et al. 2016). In addition, LiDAR information has been extensively used to identify individual tree species (Holmgren and Persson 2004; Zhang et al. 2016; Dechesne et al. 2017). Similarly, satellite images have also been used for tree crown delineation (Wang 2010; Lin et al. 2011) and species classification (Leckie et al. 2005; Ballanti et al. 2016). Finally, the combined use of multispectral and LiDAR data has been evaluated as a tool for classifying individual trees (Holmgren et al. 2008; Suratno et al. 2009a).

Statistical analyses have been carried out applying the Random Forest algorithm, but it could have been done applying other statistical methods, such as linear discriminant analysis. In fact, linear discriminant analysis was carried out, but its results were similar to the ones obtained with the Random Forest algorithm. Because of this, it was decided to work with the Random Forest algorithm since in using this method several decision trees are averaged, resulting in a more robust analysis with lower overfitting risk.

As hypothesized, the Random Forest model which only includes LiDAR variables has greater discriminative power than the model with only spectral variables because the structural differences between the two pine species are more evident than their spectral singularities. However, the model combining information from both sources is even more precise. This finding is in accordance with those of Dalponte et al. (2012), who obtained higher accuracy using models with mixed spectral and LiDAR information than with models based on separate information sources. These authors performed the crown delineation manually and used spectral images and low as well as high-density LiDAR data to classify tree species in the Alps. They discriminated between six species, including evergreen species (Abies alba Mill., Picea abies (L.) H. Karst., Pinus mugo Turra, and Pinus sylvestris L.) and deciduous species (Fagus sylvatica L. and Larix decidua Mill.), and found that evergreen species displayed similar structural and spectral responses, with the exception of Pinus mugo Turra, the structure of which is clearly different. The response of deciduous species, however, is well discriminated, taking into account the absence of leaves during the winter season and the spectral intensity at the beginning of the vegetative development. The mean accuracy of the classification between evergreen species was 58%, slightly lower than the results for mixed stands obtained in this study due to the fact that the differences in structural variables between Pinus pinea L. and Pinus pinaster Ait. are greater than those of their spectral variables. Hence, structural variables have greater discriminative power than spectral variables in the classifier model.

Including all variables in the complete model is maybe sub-optimal due to some of them probably being redundant. In addition, machine learning algorithms like Random Forest are able to manage correlations between variables without any normalization process and the relationship between them need not be linear. Then, the importance of structural variables in discriminating between species is related to the wide differences in individual allometry observed between them. Pinus pinea L. usually shows a flattened, short and wide umbrella-like crown, with a long clean stem. On the contrary, Pinus pinaster Ait. in pure stands usually presents an apical crown-shape, with lower crown base height (Live_Branch), covering the ground form sight. Differences in crown height (Max_Height − Min_Height) reflect the importance of this structural variable that the model considers as the most important in these mature stands. Crown height (Max_Height − Min_Height) is usually greater in Pinus pinaster Ait. than in Pinus pinea L. The second variable in importance is area and height relation (Crown_Area/Max_Height) which reflects the weight of the structural characteristics involved in its formulation for each species. Regarding the relationship between the crown area (Crown_Area) and area and height relation (Crown_Area/Max_Height), high values are indicative of Pinus pinea L. The third LiDAR variable of importance is crown area (Crown_Area), although it is moderately important. It is worth mentioning that this variable is not density-dependent because this type of stands shows very low canopy cover with very little crown overlapping. In addition, the spectral index variables—NDVI, EVI, and infrared band—selected for the model give higher values in Pinus pinea L. than Pinus pinaster Ait. stands. This may be linked with both the greater plasticity of Pinus pinea L., often displacing Pinus pinaster Ait. (Bravo-Oviedo et al. 2010), and the generalized decay (associated with decoloration and defoliation) observed in Pinus pinaster Ait. forests within the region (Prieto-Recio et al. 2015).

The existence of mixed stands of Pinus pinaster Ait. and Pinus pinea L. trees indicates that both species have a similar ecological strategy and species mixture may be desirable in terms of increasing and diversifying productivity. However, the conditions currently present in the stands are likely to lead to future dominance of Pinus pinea L. over Pinus pinaster Ait. given the greater competition tolerance of the former (Ledo et al. 2014). Both species’ spectral response in mixed stands could be more similar than in pure stands. Thus, this uniformity might reflect the fact that living together, both pines become more alike, indicating their ecological competition. Furthermore, tree allometry depends on whether a tree grows in a pure or a mixed stand (Forrester et al. 2017). However, structure crown changes in mixed stands could not be attributed only to species composition. Differences of density and canopy structures, which in part can be due to interspecific interactions, might modify structural characteristics (Riofrío et al. 2017).

As expected, the training model selected in this study achieves greater accuracy in the pure stand validation set due to the fact that it was only trained with pure stands of a single species, as shown in Tables 1 and 2. Therefore, the average accuracy when validating on pure stands is higher than on mixed stands. In addition, the degree of accuracy is greater for Pinus pinaster Ait. classification. Nevertheless, in pure stands, when the model classifies a tree as Pinus pinea L. it is usually correct (only 5.6% of Pinus pinaster Ait. trees are classified as Pinus pinea L.). In the case of Pinus pinaster Ait., the failure rate is higher, with of trees 27.8% being misclassified. However, in mixed stands, Pinus pinea L. trees are underestimated while Pinus pinaster Ait. tress are overestimated by around 34%. This may be due to the wider range of variability in the allometric variables of Pinus pinea L. individuals growing in mixed stands while Pinus pinaster Ait. individuals tend to maintain their pyramidal crown structure. A decrease in accuracy between training and validation models is logical, since the validation sample is an independent data set from the one used to train the model, and this validation set includes plots from mixed stands, where the reduction in accuracy is expected to be higher, since tree allometry attributes are different in these types of stands. These differences may be improved in future research by using hyperspectral images and higher resolution LiDAR data (Holmgren et al. 2008; Dalponte et al. 2012). However, this data is not often available. Aerial photographs could be useful, but they are not always taken in the infrared channel, and they are also extremely dependent on the orography. From a NWFP management planning point of view, the maximum interest is to know the percentage of each species correctly classified at the stand level. Thus, in mixed stands, classification errors for one species could be compensated with reversed classification errors for the other species.

The main strength of this study is the application of an automatic crown delineation tool together with satellite-based continuous spatial information and open source low-density LiDAR data for distinguishing two very similar species at tree level in a mixed conifer forest. The low density of the point cloud LiDAR data is not a limiting issue in these open stands involving wide tree crowns, thus it is highly likely that the LiDAR pulse hit them a few times. Also, considering that training data was collected in pure stands where no classification mistake is expected and then validated in pure and mixed stands, the results are still good. With all these, the tool could help avoid doing stem by stem traditional inventory. The main limitation may be the Pleiades images, which currently cannot easily be replaced by open source multispectral satellite images, given the lower spatial resolution of the latter in comparison to Pleiades images. Working with a high level of detail, such as individual tree level, requires a great degree of spatial assessment between the different sources of information, in order to prevent combining information from inside and outside the crown delineation.

5 Conclusion

The species classification tool developed, together with an algorithm for automatic crown delineation, allows the discrimination of two very similar species in mixed Mediterranean conifer forests using continuous spatial information at the earth’s surface: Pleiades images and open source LiDAR data. The method presented here should be valid for regional or landscape-scale assessment of resources computed at tree-level (such as NWFP), enhancing the economic value of the forest products and providing support for forest management decision-making.

Data availability

A dataset generated with the coordinates and the species of each tree measured is available in FigShare repository (Blázquez-Casado et al. 2019) at https://doi.org/10.6084/m9.figshare.7951166.v2

References

Download references

Acknowledgments

The authors wish to thank the Forest Service of Valladolid province for their continuous help in plot installation, maintenance, and data collection. Similarly, the authors wish to express their gratitude to the PNT program for providing us with Pleiades images and to Dr. Fernando Pérez-Cabello for his advice with regard to interpreting the Pleiades images. Also, the authors wish to thank Dr. Iñigo Lizarralde, Dr. Rafael Alonso, and Dra. Beatriz Águeda for their general assistance and to Adam Collins for the English language advice.

Funding

Research of Ángela Blázquez-Casado was funded by a contract of Ministerio de Economía, Industria y Competitividad, Spanish Government (DI-14-06953). This study has also been financed through the project AGL2014-51964 FORMIXING (Ministerio de Economía, Industria y Competitividad, Spanish Government) and the CC16–095 PROPINEA agreement between INIA, ITACYL, and the Diputation of Valladolid.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ángela Blázquez-Casado.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Handling Editor: Barry Alan Gardiner

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributions of the co-authors

Conceptualization: Francisco Rodríguez; data curation: Rafael Calama, Manuel Valbuena, and Marta Vergarechea; data analysis: Ángela Blázquez-Casado; writing (original draft): Ángela Blázquez-Casado; writing (review and editing): Rafael Calama, Ángela Blázquez-Casado, and Francisco Rodríguez; supervising the work: Francisco Rodríguez.

This article is part of the topical collection on Mediterranean pines

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blázquez-Casado, Á., Calama, R., Valbuena, M. et al. Combining low-density LiDAR and satellite images to discriminate species in mixed Mediterranean forest. Annals of Forest Science 76, 57 (2019). https://doi.org/10.1007/s13595-019-0835-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13595-019-0835-x

Keywords