Skip to main content

Effectiveness of the spectral area index created by three algorithms for tree species recognition

Abstract

Key message

Tree species identification analysis of the two images (Luoyang and Hohhot of China) shows that the polygonal area indices extracted by the specific band-constrained polygon relative area (algorithm 3, obtained accuracy was ~ 13% higher than that of other algorithms in WorldView-3 and ~ 2% higher in WorldView-2) can effectively improve the classification accuracy of tree species compared to those with a constant polygon relative area constraint (algorithm 2) and without area constraint (algorithm 1) (equal accuracy was obtained by algorithms 1 and 2 in each data).

Context

Solving the problem of tree species identification by remote sensing technology is an international issue. Exploring the improvement of tree species recognition accuracy through multiple methods is currently widely attempted. A previous study has indicated that mining the differential information of various tree species in images using area differences of the polygons formed by tree species spectral curves and creating the polygon area index can improve tree species recognition accuracy. However, this study only created two such indices. Thus, a general model was developed to extract more potential polygon area indices and help tree species classification. However, the improvement of this model using a constant and a specific band to constrain the relative area of polygons still needs to be fully studied.

Aims

To obtain new algorithms for extracting polygon area indices that can mine the differential information of tree species and determine the index that is the most effective for tree species classification.

Methods

By unconstraining the area of polygons and constraining the relative area of polygons with constant and specific bands, three formulations of polygon area indices were created. Polygon area indices were extracted from WorldView-3 and WorldView-2 imagery based on three algorithms and combined with textures and spectral bands to form three feature sets. Random forest was used to classify images and rank the importance of features in the feature sets, and the effectiveness of the polygon area indices extracted by each algorithm in tree species recognition was analysed in accordance with their performance in the classifications.

Results

The proportion of polygon area index in the optimal feature sets ranged from 36.4 to 63.1%. The polygon area indices extracted with constant constrained polygon relative area and those without area constraint have minimal effect on tree species classification accuracy. Meanwhile, the polygon area indices extracted by the algorithm of specific band-constrained polygon relative area could remarkably improve tree species recognition accuracy (compared with spectral bands, WorldView-3 and WorldView-2 improved by 9.69% and 4.19%, respectively).

Conclusion

The experiments confirmed that polygon area indices are beneficial for tree species classification, and polygon area indices extracted by specific band-constrained polygon relative area play an important role in tree species identification.

1 Introduction

Different tree species can be identified from remote sensing images, which will substantially improve the technical level of forest resource investigation and monitoring. However, a large number of current studies on tree species identification showed that the accuracy of tree species identification using remote sensing images remains relatively low and has not reached the application level. Factors such as image spatial/spectral/temporal resolutions, forest environmental backgrounds, shadow and shade effects, different tree species overlapping effects and the insufficient spectral differences of different tree species affect the accuracy of tree species recognition (Pu 2021). These indicate that the identification of tree species based on remote sensing technology is a complex problem, which may take a long-term and continuous exploration to obtain solutions. Amongst these limiting factors, compensating for the insufficient spectral difference information of tree species is a key problem to be considered; therefore, mining the difference signals of different tree species from multi-band images to improve the accuracy of tree species recognition can be of great importance in solving the problem of tree species identification.

In the research on remote sensing recognition of tree species, scholars have conducted the following studies on the use of appropriate remote sensing data sources, the selection of imaging time phase of images and the application of image features and classification methods, which provide some valuable references for follow-up research. The main data used for tree species identification include IKONOS, QuickBird, WorldView-2 (satellite name), WorldView-3 (satellite name), radar data, elevation data and airborne unmanned aerial vehicle multi-spectral and hyperspectral data (e.g. Immitzer et al. 2012; Naidoo et al. 2012; Kamal et al. 2015; Wang et al. 2016; Åkerblom et al. 2017; Torabzadeh et al. 2019). Studies have shown that hyperspectral and high spatial resolution data and data containing the height information of trees are beneficial to the identification of tree species (e.g. Richards and Jia 2008; Zhang et al. 2016).

Considering the use of imaging time phases of remote sensing data, studies based on single-phase data (e.g. Li et al. 2015; Liu et al. 2015; Liu and An 2019) and multi-phase data (e.g. Pu et al. 2018; Hamraz et al. 2019; Masemola et al. 2019; Shi et al. 2020) are available. However, which type of phase data should be selected needs to be considered for research purposes. Single-phase data sources fail to highlight the phenological information of tree species in the image. Multi-temporal data can capture the phenological changes in different tree species, which can significantly ameliorate the recognition accuracy of tree species and compensate for the lack of single-phase data (e.g. Li et al. 2015; Han et al. 2019; Immitzer et al. 2019). Therefore, to obtain higher classification accuracy for applications, multi-phase data should be selected; to compare algorithm performance and feature importance, single-phase data should be used.

Scholars have attempted to test the capability of image features, such as spectral band, spectral index, texture features and digital surface model, in tree species identification (e.g. Wang et al. 2016; Åkerblom et al. 2017; Yu et al. 2017). Some scholars have also created new features that are beneficial to tree species identification and strive to increase the differentiated information of tree species in images from different perspectives (e.g. Zhou et al. 2011; Liu and An 2020a). The texture feature, digital surface model, has more potential than other features in tree species recognition, and the adoption of new features can also improve the accuracy of tree species recognition. Thus, the use of multi-source data and large number of features in remote sensing images is helpful in tree species recognition (e.g. Cross et al. 2019; Apostol et al. 2020).

Considering classifiers applied in tree species recognition, maximum likelihood classifier, support vector machine and random forest are widely used (e.g. Li et al. 2015; Lin et al. 2015; Pu et al. 2018; Modzelewska et al. 2020). With the further development of deep learning technology, an increasing number of scholars have attempted to use this method in tree species recognition to improve its recognition results (Kemal et al. 2019; Niu et al. 2019; Shi et al. 2019; Zhang et al. 2019; Zhong et al. 2019; Zhang et al. 2020). Amongst these methods for tree species identification, the maximum likelihood classifier has the advantages of fast speed and high accuracy and has an excellent performance in the classification with the input of a few features. However, with the increase in feature dimension, the Hughes phenomenon (i.e. the performance of the classifier worsens with an increase in the number of features involved in image classification) (Hughes 1968) will appear, which is not conducive to the comparative analysis of which feature sets are useful in image classification (e.g. Ghosh and Joshi 2014). Support vector machine can achieve high accuracy and is insensitive to the increase in feature dimension; however, training the models in image classification using this approach is time-consuming. The accuracy of deep learning for tree species recognition is also relatively high but requires a long training time. Random forest is insensitive to feature dimension; it has a fast training speed and high classification accuracy and can rank the importance of features. A high-dimensional data set has many features with substantially rich information. Therefore, selecting the maximum likelihood classifier is unsuitable, and other methods may be used. In multi-spectral data, the maximum likelihood classifier can be used when the number of extracted image features is relatively small, whilst random forest may be a superior choice when the number of extracted features is relatively large and requires minimal processing time.

The literature review reveals that the effective recognition of tree species is related to the types of remote sensing data sources, imaging time, spectral and spatial resolutions, the use of multiple types of features and the use of suitable classifiers (Pu 2021). Therefore, the research on any of the above-mentioned aspects is meaningful for tree species identification. However, only considering these aspects is insufficient to solve the tree species identification problem because they fail to address the problem of deeply mining the differential signals of different tree species from remote sensing images. A previous study revealed that the triangle area index and polygon area index of tree species spectral curves have great potential for tree species recognition (when the polygon area indices are combined with texture features, the accuracy of tree species classification can be improved by more than 9% compared to only using texture features) (Liu and An 2020a). This finding indicates that the way of creating valuable spectral indices is an important form of deeply mining the difference signals of different tree species in remote sensing images; however, such research is relatively lacking. In the previous study (Liu and An 2020a), only two polygon area indices, namely PAI578 (constructed by red, near-infrared 1 and near-infrared 2 bands) and PAI678 (constructed by red-edge, near-infrared 1 and near-infrared 2 bands), were created, and three problems were neglected: (1) as polygon area indices were extracted only based on three specific bands, the effect of polygon area indices extracted with more or less than three bands in tree species identification is unclear; (2) there is no relative area constraint for extracting polygon area indices, so it is unclear if the classification accuracy can be improved after area constraint; and (3) the general model for polygon area index extraction has not been obtained (Liu and An 2020a). Therefore, mining of the differentiated information of different tree species was limited. The possible creation of a general model and few improved models of polygon area indices that are useful for tree species classification must be further explored.

Creating spectral indices based on the difference in polygon area formed by the spectral curves of different tree species could increase the differentiated information of various tree species in the image, and these features can be actively applied for tree species classification. Here, each of these features is named the polygon area index, referring to the polygon area formed by the intersection of spectral reflectance curves (within a certain wavelength range) of different objects and some line segments in the coordinate system, with the wavelength and reflectance as the horizontal and vertical axes, respectively. Each polygon area index calculated using the given expression is an image layer that can increase the difference information of different objects in the image. When they are combined with each other or with other types of image features (e.g. spectral bands, texture features, and digital surface models) for classification, the classification accuracy can be improved.

According to the area surrounded by the tree species spectral curves and the coordinate axes, the general model (algorithm 1) of the polygon area index using the spectral curves of tree species under full-band coverage and improving the model with a constant (algorithm 2) and a specific band (algorithm 3) to constraint the polygon relative area can be created. Thus, the effectiveness of the polygon area index extracted by three algorithms for tree species classification will be tested in this study. Two different regions of WorldView-3 and WorldView-2 were used to answer the following questions: (1) Whether the polygon area indices extracted from other band combinations also play an important role in tree species identification (for the first problem neglected in our previous research, the general model can extract polygon area indices from any combination of bands)? (2) Whether the extraction of polygon area indices by constraining the relative area of polygons for tree species classification can improve the recognition accuracy? (3) Which form of the polygon area indices extracted using a constant or specific band for constraining the relative area of polygons is more useful for tree species identification?

2 Materials and methods

2.1 Study area

The Sui and Tang Dynasties City Ruins Botanical Garden in Luoyang City, Henan Province, China, was used as the experimental area (Fig. 1). Luoyang is located in the transition zone from the southern edge of the warm temperate zone to the north subtropical zone. This area has a warm temperate continental and subtropical monsoon climate. The four seasons are distinct and the climate is pleasant, with an annual average temperature of approximately 15 °C and an annual rainfall of approximately 630 mm. The botanical garden covers an area of 1.91 km2, of which the green area is 1.30 km2. The garden has more than 1000 kinds of plants, 1000 species of trees and shrubs (more than 1.3 million individuals in total), more than 200 kinds of aquatic plants and 0.5 km2 of grass plants. In the garden, nearly 20 tree species are distributed over a large area with large numbers and are used as image classification objects with good representativeness. An area of Hohhot City, Inner Mongolia, China, which was selected in the previous study (Liu and An 2020a), was used as a test area to assist this experiment in further testing the reliability of the results obtained in Luoyang.

Fig. 1
figure 1

Image of WorldView-3 (RGB 532 combination) in the study area with sample distribution of different tree species. T1–T16 represent the serial numbers of the 16 different tree species/bamboo to be classified. The dots of different colours in the image represent the collection locations of the corresponding tree species and grass samples

2.2 Data and preprocessing

The main data used in this study were WorldView-3 imagery of Luoyang City taken on October 27, 2017 (Liu 2022). The WorldView-3 data were purchased through the China Centre for Resources Satellite Data and Application. The data contained a panchromatic band and an eight multi-spectral band. The detailed spatial resolution and wavelength parameters of the WorldView-3 panchromatic and multi-spectral bands are shown in Table 1.

Table 1 Panchromatic and multi-spectral band spatial resolution and wavelength parameters of WorldView-3 imagery

The WorldView-3 image had a latitude range of 112° 26′ 3.06″–112° 26′ 58.33″ E and a longitude range of 34° 37′ 49.51″–34° 38′ 35.48″ N. This image covered an area of 1.94 km2 of the actual ground area. Preprocessing, such as calibration (i.e. the absCalFactor in the .imd file is identified for WorldView radiance calibration), fusion (i.e. Gram–Schmidt spectral sharpening) and atmospheric correction (i.e. fast line-of-sight atmospheric analysis of spectral hypercubes), was performed using the ENVI 5.4 software after the image was purchased. The WorldView-3 image of the study area after preprocessing is shown in Fig. 1. Another Hohhot WorldView-2 (Liu 2022) was used as a verification image for the WorldView-3 results. The basic parameters of the auxiliary data and its preprocessing were described in a previous study (Liu and An 2020a).

2.3 Tree species survey and sample collection

An image of the study area was printed for on-the-spot investigation (investigated in August 2020), and the results showed that 16 tree species/bamboo within the area could be used for classification purposes (because their proportion is relatively richer than others). In the investigation of tree species/bamboo, the tree species/bamboo in the field that can be observed in the image were outlined, and the names of tree species/bamboo were recorded. The investigated samples were marked on the corresponding electronic image with the region of interest of ENVI 5.4 in the laboratory after perceiving that sufficient samples had been investigated for each tree species/bamboo. The main methods of tree species/bamboo investigation and sample labelling are the same as those adopted in previous research (Liu and An 2020a). All collected tree/bamboo samples were used as validation samples, and 5% of the total samples were randomly selected as training samples. The surveyed tree species/bamboo and grass distribution and their sample selection are shown in Fig. 1 and Table 2.

Table 2 Family, scientific names and numbers of pixels of the training and validation samples for tree species/bamboo classification

2.4 Construction of non-vegetation mask

Many buildings with blue roofs are found in the Luoyang WorldView-3 imagery. A previous study (Liu and An 2020b) has shown that the normalised difference vegetation index (NDVI) threshold method cannot effectively distinguish vegetation from blue ground objects in 8-band WorldView-2 imagery (the band setting is consistent with WorldView-3). Therefore, Liu and An (2020b) created the blue object spectral index (BOSI) to address the aforementioned problem, and the algorithm is shown in Formula (1). This expression can be used to enhance the information on blue ground objects, which can be extracted from the image through the threshold method. Mask-1 was built in BOSI in the current study using the threshold range of (0.005–0.439895) based on the aforementioned research. The NDVI was then masked with mask-1 to obtain masked NDVI, and the threshold range of (0.2–1) was utilised to build a mask in masked NDVI to acquire mask-2, which can mask out non-vegetation parts. In tree species classification, mask 2 was used to cover the non-vegetation parts to avoid their interference with the image classification.

$$BOSI=\left(2\times {b}_8-3\times {b}_6-{b}_5-2\times {b}_4+{b}_3+5\times {b}_2+{b}_1\right)\times \left[{b}_2-\left({b}_1+{b}_3\right)/2\right],$$
(1)

where b1, b2, b3, b4, b5, b6 and b8 represent the reflectance of pixels in coastal blue, blue, green, yellow, red, red-edge and near-infrared 2 bands, respectively.

2.5 Spectral area index creation

In the Luoyang WorldView-3 imagery, the training samples from 16 tree species/bamboo were used to fit the average spectral curves, and the results are shown in Fig. 2A. The figure reveals that the area of the polygons formed by the spectral curves of different tree species and the horizontal and vertical axes are inconsistent. Thus, various tree species can be distinguished by using the different polygonal areas of tree species spectral curves. However, such image features, which can characterise the differences in the polygonal areas of spectral curves of various tree species, require the creation of algorithms for characterisation.

Fig. 2
figure 2

Polygons with different area sizes formed by spectral curves of different tree species/bamboo. A Average reflectance spectral curves of the various tree species/bamboo in WorldView-3. Differences are found in the polygonal area sizes of various tree species/bamboo, and these polygons are formed by the spectral curves and the horizontal and vertical axes. B Creation process of polygon area index and relative area constraint described by the graphic illustration. The tree species T7 and T10 were taken with the largest difference in the polygon area in A as an illustration example

As shown in Fig. 2B, the polygon formed by the spectral curve of tree species T7 and the horizontal and vertical axes is A–B–C–D–E–F–G–H–I–J–A, whilst that formed by the spectral curve of tree species T10 and the horizontal and vertical axes is a–b–c–d–e–f–g–h–I–J–a. The areas of the polygons A–B–C–D–E–F–G–H–I–J–A (formed by the accumulation of seven trapezoids A–B–2–1, B–C–3–2, C–D–4–3, D–E–5–4, E–F–6–5, F–G–7–6 and G–H–8–7) and a–b–c–d–e–f–g–h–I–J–a (formed by the accumulation of seven trapezoids a–b–2–1, b–c–3–2, c–d–4–3, d–e–5–4, e–f–6–5, f–g–7–6 and g–h–8–7) are accumulated by the areas of seven trapezoids. Therefore, the polygon area index (abbreviated as PAI) representing the polygon area of different tree species can be calculated using Formula (2). This calculation is the first form of the polygon area index extraction algorithm.

Furthermore, the area of rectangle J–j–i–I–J (the length of this rectangle is the difference between the centre wavelengths of the end (near-infrared 2) and start (coastal blue) bands, and the height is the lowest average reflectance of one type of tree species (T7) amongst all tree species in one band (red) amongst all bands) is subtracted from the coordinate axis. The polygons of tree species T7 and T10 then become A–B–C–D–E–F–G–H–i–j–A and a–b–c–d–e–f–g–h–i–i–a; at this time, the relative area of the two polygons will increase. When the area of J–j–i–I–J is calculated using a constant as the height of the rectangle (the average reflectance of T7 in the red band), the second extraction algorithm of the polygon area index is obtained, as presented in Formula (3). When the area of J–j–i–I–J is calculated using a specific band with variations in pixel value as the height of the rectangle (the band derived by the above constant), the third extraction algorithm of polygon area index is obtained, as shown in Formula (4).

$${\textrm{PAI}}_1=0.5\times \sum \limits_{\alpha =1}^{N-1}\left({b}_{\alpha }+{b}_{\beta}\right)\times \Delta {\lambda}_{\alpha \beta},$$
(2)
$${\textrm{PAI}}_2=0.5\times \sum \limits_{\alpha =1}^{N-1}\left({b}_{\alpha }+{b}_{\beta }-2m\right)\times \Delta {\lambda}_{\alpha \beta},$$
(3)
$${\textrm{PAI}}_3=0.5\times \sum \limits_{\alpha =1}^{N-1}\left({b}_{\alpha }+{b}_{\beta }-2{b}_{\gamma}\right)\times \Delta {\lambda}_{\alpha \beta},$$
(4)

where PAI1, PAI2 and PAI3 represent the polygon area indices extracted by the three algorithms; N is the number of bands of remote sensing data; α = 1, 2, 3,..., N-1; β = α + 1; bα is the spectral reflectance of α band; bβ is the spectral reflectance of α + 1 band; the constant m is the mean value of the minimum reflectance of a certain tree species in N bands, and the specific bγ is the band with minimum average reflectance obtained from one type of tree species amongst all tree species in the βth band amongst the N bands; Δλαβ is the difference between the centre wavelength of the β-band and the α-band.

Starting from band 1 of WorldView-3, the polygon area indices between bands 1 to 2, 3, 4, 5, 6, 7 and 8, as well as those between bands 2 to 3, 4, 5, 6, 7 and 8, were calculated; the polygon area index between bands 7 and 8 was also computed. A total of 28 polygon area indices were then obtained for each algorithm. The minimum average reflectance values of the 16 different tree species/bamboo on the 8 bands of WorldView-3 are 0.07398, 0.07569, 0.07906, 0.07672, 0.06768, 0.13462, 0.18017 and 0.19777. Therefore, the values of constant m and specific bγ used in different polygon area index calculations are shown in Table 3. For WorldView-2 data, the minimum average reflectance values of the eight different tree species on the eight bands are 0.10745, 0.10943, 0.10217, 0.09349, 0.08853, 0.13191, 0.18390 and 0.17123. Therefore, the values of constant m and specific bγ used in different polygon area index calculations are shown in Table 7 in Appendix.

Table 3 Constant m and specific bγ of polygon area indices extracted by different band combinations when the second and third algorithms were implemented on the basis of WorldView-3

2.6 Spectral index importance measures

The polygon area index feature sets extracted by the three algorithms were combined with the same texture and spectral features to construct three feature sets 1, 2 and 3 to measure whether the newly created polygon area index features play an important role in tree species recognition. The random forest algorithm (Breiman 2001) in the EnMAP-Box software (all parameters were default) (Van der Linden et al. 2015) was used in the feature sets to classify imagery, analyse the importance of polygon area index features and obtain the best feature sets 1, 2 and 3 in tree–grass classification.

A texture feature is an image feature that reflects the spatial distribution attribute of pixels. This feature describes the repeated local patterns and their arrangement rules in the image, usually demonstrating the characteristics of local irregularity and macroscopic regularity and is often used in image classification and scene recognition. For the texture feature set construction, the previous research (e.g. Liu et al. 2015; Shi et al. 2020) was used as a reference, and the mean (MEA), which plays an important role in tree species identification, was selected to construct the texture feature set. The feature MEA was extracted based on the second order of the grey-level co-occurrence matrix using ENVI 5.4. Texture features (based on a 3 × 3 window) were extracted from eight bands of WorldView-3 and WorldView-2 data, and the constructed texture feature set has eight features. The spectral feature set was constructed with eight bands, which were used for texture feature extraction (including eight features).

2.7 Classification result evaluation

After the tree species–grass classification with all these constructed feature sets, the validation samples were used to test all the classification results and generate the corresponding confusion matrix for accuracy verification. The overall accuracy, kappa coefficient, producer and user accuracies calculated from the confusion matrix, diagram of curves and some generated spider graphs generated were used to compare and analyse the classification results. Producer accuracy is the probability that a pixel in the classification image is put into class x given the ground truth class is x. User accuracy is the probability that the ground truth class is x given a pixel is put into class x in the classification image. In addition, the polygon area indices extracted from Hohhot WorldView-2 data by the three algorithms were used to test the results of the Luoyang region, and the consistency of the results of the two regions would be further analysed.

3 Results and analyses

3.1 Importance of polygon area index in the selected feature set

In the feature sets comprising spectral bands, texture features and polygon area indices, the importance ranking of polygon area index features extracted by the three algorithms in tree species classification is shown in Table 4 (from WorldView-3) and Table 8 in Appendix (from WorldView-2).

Table 4 Importance ranking of polygon area index features extracted by three algorithms in tree species classification based on feature sets and the results from WorldView-3

As shown in Table 4 and Table 8 in Appendix, in the feature sets of WorldView-3 and WorldView-2, many polygon area index features were located in the front according to the importance ranking regardless of the algorithm used. For the number of polygon area index features ranked at the top, the polygon area index extracted by the third algorithm was more than that extracted by the first and second algorithms. Meanwhile, for the number of polygon area index features at the bottom of the order, the polygon area index extracted by the third algorithm was less than that of the first and second algorithms. In addition, the two tables reveal that amongst the top-ranked polygon area indices, they were mostly derived from spectral curves between two and three bands. However, the lower-ranked polygon area indices were mostly derived from spectral curves between more than three bands.

3.2 Optimal feature set selection

Figure 3 shows the change in the overall accuracy in the presence of 15–30 important features involved in the classification of the tree species based on three WorldView-3 feature sets. The results derived from WorldView-2 are shown in Figure 6 in Appendix.

Fig. 3
figure 3

Relationship between the number of participating important features and the overall accuracy of tree species identification. According to the importance values of different features in Table 4, the classification results of all the top 15–30 features were selected for analysis

Figure 3 shows that the classification accuracy increases with the elimination of the lower-ranked features in feature sets. Subsequently, the overall accuracy began to decline after one of the features was eliminated. When the three feature sets achieved the highest accuracy in tree species classification, their optimal feature sets were obtained. Feature sets 1, 2 and 3 obtained the highest overall accuracy when 22, 27 and 19 features were involved in tree species classification, respectively. The overall accuracy of their optimal feature set classification was 67.1652%, 66.9151% and 75.5630%.

Considering WorldView-2 (Figure 6 in Appendix), in the three feature sets, the relationship between feature elimination and overall accuracy change was consistent with that of WorldView-3. In the presence of 27, 27 and 16 features involved in the feature sets of 1, 2 and 3, respectively, the highest overall accuracy was achieved, which was 79.3214%, 79.1481% and 80.9683%, respectively.

According to Table 3 and Fig. 3, eight (polygon area indices accounting for 36.40%), 11 (40.74%) and 12 (63.12%) polygon area indices extracted by algorithms 1, 2 and 3 were involved, respectively, in the optimal feature sets. For WorldView-2, 11 (40.74%), 11 (40.74%) and 10 (62.50%) polygon area indices were extracted by algorithms 1, 2 and 3, respectively (Table 8 and Figure 6 in Appendix). WorldView-3 and WorldView-2 reveal that the proportion of polygon area index extracted by algorithm 3 in the optimal feature sets was substantially higher than that of algorithms 1 and 2. The classification accuracy of tree species by optimal feature sets 1 and 2 was only slightly different but was far lower than the classification accuracy of optimal feature set 3.

3.3 Tree species identification based on different feature sets

The spectral band, texture and polygon area index feature sets of WorldView-3 (WorldView-2) and their feature sets were used to identify tree species. The overall accuracy and kappa coefficient of tree species identification are shown in Table 5.

Table 5 Overall accuracy and kappa coefficient of tree species identification based on different feature sets

Table 5 shows that for WorldView-3, the overall accuracy of tree species identification by polygon area index feature sets constructed by algorithms 1 (57.8338%) or 2 (57.8853%) was far lower than that by the spectral band (65.8745%) or texture feature (64.8040%) sets. However, the classification accuracy of the polygon area index feature set (70.6091%) constructed by algorithm 3 was substantially higher than that of the spectral band or texture feature set for tree species identification. Using feature sets 1 and 2 to classify the tree species, the overall accuracy was higher than using polygon area index feature sets but did not exceed the classification accuracy using spectral bands. Feature set 3 had an accuracy of 74.6850% for tree species classification, which was approximately a 4% improvement over the polygon area index feature set used.

For WorldView-2, the classification results of polygon area index and feature sets compared with spectral bands and texture features were slightly different from those of WorldView-3. However, the results of the two data tests were consistent; the polygon area index extracted by algorithm 3 has a significant advantage in tree species classification compared with the polygon area index extracted by the two other algorithms.

3.4 Individual tree species classification effectiveness of typical data set

Producer and user accuracy spider web graphs generated by using the best feature sets 1, 2, 3 and 28 polygon area index features and 8 bands based on WorldView-3 are shown in Fig. 4A and B, respectively.

Fig. 4
figure 4

Spider web graphs of the representative feature set classification results. A Spider web graph of producer accuracies. B Spider web graph of user accuracies

Figure 4 shows that the producer accuracy of grass classification and the user accuracy of tree species T7 classification by the best feature set 3 were not as good as the best feature set 2 used. In addition, the accuracy of any tree species–grass identification of the best feature set 3 is better than that of other feature sets. Using the best feature sets 1 and 2 and 8 bands to classify tree species–grass, the generated producer and user accuracies were all remarkably close. These results were not as accurate as when using 28 polygon area index features (constructed on the basis of algorithm 3) for tree species classification.

3.5 Optimal classification result analysis

Using the best feature set (based on WorldView-3, optimised from feature set 3) to classify tree species–grass, the produced confusion matrix is shown in Table 6, and the representative classification maps are shown in Fig. 5.

Table 6 Confusion matrix of the optimal classification result
Fig. 5
figure 5

Representative classification result maps used to compare visually and analyse the classification results. A Classification result of the optimal feature set, obtained by extracting polygon area index features based on algorithm 3. B Classification result of the 8-band WorldView-3 data

Table 6 shows that the producer accuracies of tree species–grass identification using the best feature set 3 varied from 44.43% (T3) to 96.30% (T7). A total of 11 tree species–grass had producer accuracy exceeding 70%. The user accuracies of the 17 tree species–grass classifications using the best feature set 3 varied from 58.11% (T3) to 92.44% (T2), and 12 tree species–grass demonstrated user accuracy exceeding 70%. The producer and user accuracies of tree species T3 were quite different and were far lower than those of other tree species. The difference between producer and user accuracies of other tree species–grass was relatively small.

Combining the distribution of tree species in the garden and Fig. 5A, tree species T1, T2, T5, T7, T8 and grass were effectively identified. However, the recognition effect of tree species T3, T6, T11, T12, T13 and T14 was not particularly ideal. Compared with Fig. 5A, B and two confusion matrices, except for T7 (user accuracy obtained was slightly lower than 8-band used), the classification results of all tree species–grass using the optimal feature set were better than those of 8-band WorldView-3 data.

4 Discussion

In the classification of tree species, the spectral bands and texture features of remote sensing data are widely used (e.g. Immitzer et al. 2012; Kamal et al. 2015; Åkerblom et al. 2017; Shi et al. 2020), and the texture features that can be extracted from all remote sensing imagery are considered to play a crucial role (e.g. Liu et al. 2015; Ferreira et al. 2019; Wang et al. 2020). Compared with spectral bands and texture features, many features of the polygon area indices created in this study rank high in importance value in tree species classification, indicating that they also play a positive role in tree species identification (Table 4 and Table 8 in Appendix). The important role of the polygon area index in tree species classification confirmed in this study is consistent with the conclusion that PAI578 and PAI678 have important roles in tree species classification in previous studies (Liu and An 2020a). In the optimal feature sets of tree species classification, the proportion of polygon area index features ranged from 36.4 to 63.1% (Table 4 and Table 8 in Appendix, Fig. 3 and Figure 6 in Appendix), accounting for a large proportion. These polygon area indices in the optimal feature sets were identified as important indices for tree species identification in WorldView-2 and WorldView-3. In addition, using the optimal feature sets constructed by the three algorithms to classify tree species, the overall accuracy was higher than that using the spectral band set or the texture feature set alone (Table 5). These results further show that the polygon area index features play an important role in tree species identification (Table 4 and Table 8 in Appendix, Fig. 3 and Figure 6 in Appendix). Using the traditional vegetation index to identify tree species fails to substantially improve classification accuracy (Liu 2016), whilst the polygon area index created by algorithm 3 in this study can markedly enhance the classification accuracy of tree species. This finding indicates that the features created for spectral differences in tree species are more effective than the ratio and normalised vegetation index created for vegetation signal enhancement.

Many polygon area indices extracted by other band combinations (e.g. bands 1 and 2, 2 and 3, 3 and 4, 4 and 5) have large important values in the optimal feature sets of tree species classification (Table 5 and Table 8 in Appendix, Fig. 3 and Figure 6 in Appendix), indicating that in addition to band combinations of 578 and 678, the polygon area indices extracted from other band combinations play an essential role in tree species identification. A minimal difference in tree species recognition was observed between polygon area indices extracted for the height of the rectangle by constraining the relative polygon area with constants (algorithm 2) and those extracted without relative area constraint (algorithm 1) (Table 5). However, polygon area indices extracted by constraining the relative areas of polygons using a specific band (algorithm 3) are more beneficial for tree species identification than those without area constraints (Table 5, Fig. 3 and Figure 6 in Appendix). This shows that constraining the relative area of polygons can be improved tree species classification accuracy depending on the constraint form. When a constant constraint was used, the identification accuracy of tree species could not be improved effectively, but when a specific band constraint was used, the classification accuracy could be effectively improved. It further shows that extracted polygon area indices using a specific band to constrain the relative polygon area were more beneficial for tree species identification than using a constant to constraint. In this study, PAI1 6 to 8 extracted on the basis of algorithm 1 was equivalent to PAI678 in the previous study (Liu and An 2020a). In the classifications of WorldView-3 and WorldView-2, in the absence of relative area constraint of polygons for PAI1 6 to 8, their ranking is 31 and 31 in the feature sets, respectively. After using algorithm 3 for the relative area constraint of polygons, PAI3 6 to 8 ranks 8 and 10 in the feature sets, respectively. The polygon relative area constraint method proposed by algorithm 3 markedly improved the ranking of the polygon area index in the absence of an area constraint. This finding shows that the new algorithm 3 proposed in this study is superior to that presented in the previous study.

The performance of polygon area indices derived from spectral curves within three bands in tree species classification was better than those with more than three bands (Table 4 and Table 8 in Appendix). The possible reason lies in the limited variability of spectral curves of the same tree species amongst fewer bands and the relatively fixed area of the formed polygons. However, the spectral curves of various tree species are different, and the area of the formed polygons is relatively different, which is conducive to the identification of different tree species. Amongst additional bands, even for the same tree species, the variability of spectral curves will increase due to the rise in the number of bands, and the area of polygons formed will be different. The area differences of polygons amongst various tree species may offset each other due to multiple local differences. For example, the polygonal area of tree species T15 amongst bands 4 to 6 was larger than that of tree species T10 and smaller than that of tree species T10 amongst bands 6 to 8 (Fig. 2A). Thus, the polygonal area difference between the two tree species amongst 4 to 8 bands may be small (Table 4 and Table 8 in Appendix). Therefore, the polygon area indices obtained between additional bands may not be conducive to tree species classification.

Previous studies have shown that the reduction and increment in spectral variability within and between classes, respectively, contribute to the classification accuracy of imagery (e.g. Pu et al. 2018; Han et al. 2019; Immitzer et al. 2019). The same category of tree species–grass in the image comprises multiple pixels; the DN/reflectance values of these pixels are not absolutely the same but fluctuate within a certain range. The DN/reflectance values of different types of tree species–grass show certain differences. When a constant is used as the height of the rectangle, the same rectangle area is subtracted from all pixels of one layer of the image, and the difference in DN/reflectance value of tree species–grass does not substantially change. However, each value as the height of the rectangle varies from pixel to pixel when subtracting a rectangle area when the height of the rectangle was used as a specific band. This phenomenon may reduce the variability of the DN/reflectance value of the same tree species–grass and increase the variability of the DN/reflectance value of different tree species–grass. Thus, using the specific band as the height of the rectangle is better than the constant used in polygon area index extraction for tree species identification.

Herein, several key points related to the progress of the polygon area index for tree species identification were clarified by analysing the performance of the polygon area index in tree species classification. Compared with previous studies (e.g. Zhou et al. 2011; Liu and An 2020a), more potential polygon area indices that were useful for tree species classification can be extracted using the polygon area index extraction models developed in this study; moreover, the robust polygon area indices in tree species classification were obtained by algorithm 3 than that extracted by previous studies (e.g. Zhou et al. 2011; Liu and An 2020a), algorithms 1 and 2. The polygon area indices extracted by algorithm 3 can greatly improve the classification accuracy of tree species. The important polygon area indices that can drive tree species recognition in WorldView-3 and WorldView-2 were identified from all the extracted polygon area index features. The importance of the polygon area index in tree species identification is related to the number and combination form of bands. The clarification of these progressive points will provide basic information for applying the polygon area index in the perspective tree species classification studies.

5 Conclusions

This study tested the effectiveness of the polygon area index extracted by the three algorithms in tree species classification through WorldView-3 and WorldView-2 data in different regions. Furthermore, the results demonstrated that the polygon area indices extracted using the three algorithms could improve the recognition accuracy of tree species. Compared with spectral bands in WorldView-3 (WorldView-2), combining the useful polygon area indices extracted using algorithms 1, 2 and 3 can increase the overall accuracy of tree species classification by approximately 1% (2%), 1% (2%) and 10% (4%), respectively. Moreover, when the polygon area indices extracted using algorithms 1 and 2 were used in tree species identification, the overall accuracy improvement was small. However, the polygon area indices extracted using algorithm 3 can greatly improve the overall accuracy of tree species identification. According to the performance of the three algorithms, the extraction of useful polygon area indices using algorithm 3 should be mainly considered an important recognition procedure in the future study of tree species classification.

Availability of data and materials

The data sets generated during and/or analysed during the current study are available at https://doi.org/10.5281/zenodo.7089686.

References

Download references

Acknowledgements

We want to provide our gratitude to the editors and the anonymous reviewers.

Code availability

The code (written with ENVI/IDL) used in the current study is available at https://github.com/Liu066/PAI-extraction.

Funding

This work was supported by the National Nature Science Foundation of China (Grant No. 32001250) and the Natural Science Foundation of Henan Province, China (Grant No. 202300410293).

Author information

Authors and Affiliations

Authors

Contributions

The field investigation, experiment, thesis writing and revision involved in the thesis are all completed by HL. The author read and approved the final manuscript.

Corresponding author

Correspondence to Huaipeng Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

The author gave informed consent to this publication and its content.

Competing interests

The author declares that there are no competing interests.

Additional information

Handling editor: Alexia Stokes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 7 Constant m and specific bγ of polygon area indices extracted by different band combinations when the second and third algorithms were implemented on the basis of WorldView-2
Table 8 Importance ranking of polygon area index features extracted by three algorithms in tree species classification based on feature sets and the results from WorldView-2
Fig. 6
figure 6

Relationship between the number of participating important features and the overall accuracy of tree species identification, based on WorldView-2. According to the importance values of different features in Table 8 in Appendix, the classification results of all the top 15–30 features were selected for analysis

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H. Effectiveness of the spectral area index created by three algorithms for tree species recognition. Annals of Forest Science 80, 17 (2023). https://doi.org/10.1186/s13595-023-01184-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13595-023-01184-w

Keywords

  • WorldView-3 and WorldView-2 data
  • Polygon area index
  • Tree species classification
  • Algorithm effectiveness evaluation
  • Random forest