Effect of sample size on the estimation of forest inventory attributes using airborne LiDAR data in large-scale subtropical areas

Sample size (number of plots) may significantly affect the accuracy of forest attribute estimations using airborne LiDAR data in large-scale subtropical areas. In general, the accuracy of all models improves with increasing sample size. However, the improvement in estimation accuracy varies across forest attributes and forest types. Overall, a larger sample size is required to estimate the stand volume (VOL), while a smaller sample size is required to estimate the mean diameter at breast height (DBH). Broad-leaved forests require a smaller sample size than Chinese fir forests. Sample size is an essential factor affecting the cost of LiDAR-assisted forest resource inventory. Therefore, investigating the minimum sample size required to achieve acceptable accuracy for airborne LiDAR-based forest attribute estimation can help improve cost efficiency and optimize technical schemes. The aims were to assess the optimal sample size to estimate the VOL, basal area, mean height, and DBH in stands dominated by Cunninghamia lanceolate, Pinus massoniana, Eucalyptus spp., and other broad-leaved species in a large subtropical area using airborne LiDAR data. Statistical analyses were performed on the differences in LiDAR metrics between different sample sizes and the total number of plots, as well as on the field-measured attributes. The relative root mean square error (rRMSE) and the determination coefficient (R2) of multiplicative power models with different sample sizes were compared. The logistic regression between the coefficient of variation of the rRMSE and the sample size was established, and the minimum sample size was determined using a threshold of less than 10% for the coefficient of variation. As the sample sizes increased, we found a decrease in the mean rRMSE and an increase in the mean R2, as well as a decrease in the standard deviation of the LiDAR metrics and field-measured attributes. Sample sizes for Chinese fir, pine, eucalyptus, and broad-leaved forests should be over 110, 80, 85, and 60, respectively, in a practical airborne LiDAR-based forest inventory. The accuracy of all forest attribute estimations improved as the sample size increased across all forest types, which could be attributed to the decreasing variations of both LiDAR metrics and field-measured attributes.


Introduction
Airborne laser scanning (ALS), also known as airborne light detection and ranging (LiDAR), has been used for operational forest inventories in Nordic countries since 2002 (Naesset 2002(Naesset , 2004;;Maltamo and Packalen 2014).Since then, the reliability of using ALS data to estimate large-scale forest inventory attributes, such as mean stem diameter or diameter at breast height (DBH), mean tree height (H), basal area (BA), stand volume (VOL), and aboveground biomass (AGB), has been verified in many other countries (Turner et al. 2011;White et al. 2013;Novo-Fernández et al. 2019).ALS-based forest attribute estimations and mappings typically utilize an area-based approach (ABA) via a two-stage procedure (Naesset and Bjerknes 2001;Naesset 2002;Jensen et al. 2006;Maltamo et al. 2006;Thomas et al. 2006).Field plot measurements are critical, but they are expensive, time-consuming, and labor-intensive (Luo et al. 2013;Dube et al. 2017;Jarron et al. 2020).Therefore, optimizing field plot measurement schemes has been a challenge for ALS-based large-scale forest resource inventory and monitoring (Junttila et al. 2013;Fassnacht et al. 2014).
The cost of field plot measurements is mainly affected by three factors: the number of plots (sample size), plot size, and plot positioning.While using the regression models to estimate the DBH, BA, and VOL, Gobakken and Naesset (2008) found that, in most cases, compared to the small plot (200 m 2 ), the relative root mean square error (rRMSE) of the model obtained with the large plot (300-400 m 2 ) decreased, and the determination coefficient (R 2 ) increased, indicating the model accuracy improved.Several other studies have supported this conclusion (Adnan et al. 2017;Naesset et al. 2011;Watt and Watt 2013;Hernández-Stefanoni et al. 2018;Zolkos et al. 2013).Ruiz et al. (2014) suggested a minimum plot size of 500-600 m 2 for estimating the VOL, AGB, and BA, while Lombardi et al. (2015) recommended a minimum plot area of 500 m 2 for evaluating the forest indicators.Regardless of plot size and forest conditions, plot positional errors of up to 1 m had minor effects on prediction accuracy (Gobakken and Naesset 2009).The sample size is the most critical factor affecting the cost of field measurement.Using Monte Carlo simulation random selection, Gobakken and Naesset (2008) found that the estimation accuracy of forest attributes showed only a minor decrease when the sample size was reduced by 75 or 50% from the original numbers of 50, 34, and 48.The results of Silva et al. (2017) showed that modeling with 63 plots could achieve accuracy comparable to that of traditional forest inventory methods, and the model accuracy gradually improved as the sample size increased.Strunk et al. (2012) suggested that the variability of mean estimates increases proportionally to √ n as the sample size (n) increases and that it is not appropriate to extrapolate estimates from confidence intervals and the model's RMSE if the sample size is less than 75.In their study on the growing stock volume estimation of pine-dominated forests in Poland, Stereńczak et al. (2018) concluded that the model accuracy reached stability only as the number of sample plots reached 300 or more.
The sample size required for forest attribute estimation depends on various factors, including the extent of the study area, sample selection, complexity of the forest structure, modeling methodology, and sampling scheme.Generally, larger study areas require more sample plots (Xu et al. 2018;Ioki et al. 2010).Ensuring adequate coverage of the geospatial and feature spaces of the variables with sample plots can improve model performance (Junttila et al. 2013).Some researchers believed that modeling techniques had a larger impact on model accuracy than sampling methods.However, the results varied greatly.For example, Yang et al. (2019) suggested that the random forest imputation model was most effective with small sample sizes (less than 50), while da Silva et al. (2020) reported that the ordinary least-squares method required fewer sample plots than the random forest (RF) method.The required sample size varies with the sampling method, and many previous studies have focused on this topic.Maltamo et al. (2011) found that LiDARassisted selection of field plots could improve the estimation accuracy of nonparametric models.Several studies have shown that using ALS data as prior information to stratify the forest before selecting samples could help reduce the required sample size while ensuring high accuracy (Hawbaker et al. 2009;Maltamo et al. 2011;Grafström and Ringvall 2013).Following this approach, the sample size in the operational Norwegian forest resource inventory was about 50 per stratum (Naesset 2015).However, the number of sample plots in most existing studies is typically small, with less than 100 plots per stratum, which is insufficient to demonstrate the effect of sample size on the estimation accuracy of forest inventory attributes.
Heavy rainfall is common in tropical and subtropical regions, where frequent rains and fog can render airborne LiDAR data acquisition and field measurements difficult.In addition, trees in these areas generally grow rapidly, with fast-growing eucalyptus forest plantations, for example, showing annual height growth of about 5-8 m.Using LiDAR to stratify these forests before conducting a field campaign in a large-scale region can result in a long time interval between LiDAR data acquisition and field measurements, which may prevent LiDAR point clouds from accurately depicting the three-dimensional (3D) structures of the forest canopy and affect the regression relationships between LiDAR-derived metrics and field-measured attributes.Therefore, in practical applications, both airborne LiDAR data acquisition and field plot measurements are generally carried out simultaneously.
This study examines the impact of sample size on the accuracy of forest attribute estimation using airborne LiDAR data.We analyzed one thousand field plots across four forest types in a large subtropical study area to achieve the following specific objectives: (1) assess how sample size influences the accuracy of different forest attributes across different forest types through multivariate power models, (2) evaluate the effects of sample size on the airborne LiDAR-derived metrics and forest attributes at the plot level, and (3) assess the minimum sample size required for airborne LiDAR data to support largescale subtropical forest resource inventories.Throughout this work, we use the terms "sample size" and "number of sample plots" interchangeably to refer to the number of field plots used to calibrate the model.

Study site
The study site covered the entire Guangxi Zhuang Autonomous Region in South China, spanning an area of 237.6 × 10 3 km 2 and expanding over an area from 104°28′-112°04′E and 20°54′-26°24′N (Fig. 1).In this study, the airborne LiDAR data acquisitions and field plot measurements were carried out separately over 3 years in three regions because of the financial allocations; namely, the Nanning region (with an area of 22.1 × 10 3 km 2 ), the eastern region (128.4× 10 3 km 2 ), and the western region (87.1 × 10 3 km 2 ).
The study area is bisected by the Tropic of Cancer and lies in a subtropical monsoon climate zone, with an annual average temperature of 16.5-23.1°C and an average annual rainfall of 1080-2760 mm.The rainy season occurs from April to September, accounting for 70-85% of annual rainfall.The topography of the study area is high in the northwest and low in the southeast, surrounded by mountains with high elevation, river valleys, plains, coastal platforms, and hills with low elevation in the center.
The study area covers three vegetation zones, namely, north tropical, south subtropical, and middle subtropical, from south to north.Seasonal rainforests, monsoon evergreen broad-leaved forests, and typical evergreen broad-leaved forests are the representative forest vegetation types in the study area.Additionally, coniferous, bamboo, and karst shrub forests can also be found in each vegetation zone.According to the 5th forest management inventory of Guangxi (2017-2020), Chinese fir forests (Cunninghamia lanceolata (Lamb.)Hook), Masson pine forests (approximately 90% is Pinus massoniana Lamb., with the remainder being P. elliottii Engelmann and P. yunnanensis Franch), Eucalyptus plantations (mainly Eucalyptus urophylla S. T. Blake and E. grandis × urophylla), and broad-leaved forests (includes a large number of tree species) account for 16.5%, 17.5%, 24.8%, and 41.2% of the study area, respectively.Among them, most Chinese fir-planted forests are pure, evenaged stands, and there are also mixed forests with Masson pine or broad-leaved trees; most of the Masson pine forests are natural forests, of which approximately 60% are pure, even-aged stands, and the remaining are mixed forests with broad-leaved trees; the broad-leaved forests are mainly natural mixed forests.Further insights into the study area's specific characteristics were produced by Li et al. (2023).

Field plot data
The field plots in the Nanning, eastern, and western regions were measured from October 2016 to January 2017, November 2018 to May 2019, and August 2019 to January 2020, respectively.The forests in the study area were categorized into four types according to the dominant tree species and species groups: Chinese fir, Masson pine, eucalyptus, and broad-leaved forests.A total of 1003 rectangular plots with a size of 30 m × 20 m were distributed in clusters over the study area, and each was subdivided into four sub-plots with an area of 15 m × 10 m.All live trees with a DBH (1.3 m above the ground) ≥ 5 cm within the sub-plot were measured and recorded.Tree height was measured using a Vertex ™ IV hypsometer (Haglöf, Långsele, Västernorrland, Sweden) for three average trees and the tallest tree in each subplot.The VOL was calculated using provincial speciesspecific allometric equations (Liao and Huang 1986), using BA and mean height as predictors.Table 1 provides the summary statistics for the 1003 field plots.Comprehensive details regarding plot installation, configuration, measurement procedures, and positioning are provided in the works of Li et al. (2023).

LiDAR data
The LiDAR data were collected in Nanning and the eastern and western regions from October 2016 to April 2017, October 2018 to October 2019, and August 2019 to January 2020, respectively.The Riegl VQ-1560 and the Riegl VQ-1560i laser scanning systems (Riegl Laser Measurement Systems GmbH, Horn, Austria) were applied to collect LiDAR data in all three regions with the same standards.The final average point density was 5.54 (± 2.14) points m −2 .The LiDAR survey flight, sensor parameters, and preprocessing method of point clouds were described in detail in the works of Li et al. (2023).
Similar to most researchers (da Silva et al. 2020;White et al. 2017;Asner et al. 2012;Chen et al. 2012;Treitz et al. 2012;Tojal et al. 2019;Nilsson et al. 2017), this study used all echoes to extract 13 LiDAR-derived metrics.These metrics include the mean height of the point clouds (Hmean), the standard deviation and the coefficient of variation of point height distribution (Hstdev and H CV ), the 95th height percentile (hp95), canopy closure (CC), the 50th and 75th density percentiles (dp50 and dp75), the mean of the leaf area density profile (LADmean) and their standard deviation (LADstdev) and coefficient of variation (LADcv) (Bouvier et al. 2015), and the mean of the vertical foliage profile (VFPmean) and their standard deviation (VFPstdev) and coefficient of variation (VFPcv) (Knapp et al. 2020).They can be categorized into three groups, namely the height-, density-, and vertical structure-variable groups, each of which accurately depicts the 3D structural aspects of the forest canopy.The forest attribute and LiDAR metrics of the field plot are available online on the ScienceDB repository (Li 2023).

Statistical analysis
Based on the total number of field plots for each forest type, we created a series of datasets with varying sample sizes using repeated random sampling.For largescale airborne LiDAR-based forest attribute estimation, we believed that at least 30 field plots were required.To generate a series of datasets with various numbers of plots, e.g., 30, 35, …, and the total number of plots, we started with a sample size of 30 and increased it by five until we reached the total number of plots.To account for the large random errors that can occur with random sampling, we performed 50 iterations of random sampling for each sample size, resulting in 50 sub-datasets for each dataset of a given sample size.For each of the 13 LiDAR metrics and four forest attributes (DBH, H, BA, and VOL) across different sample sizes, we calculated their mean and standard deviation.We also examined the variation in their mean and standard deviation across different sample sizes using the following equation: (1) where VR x is the variation of the mean or standard vari- ation of the LiDAR metric or forest attribute; x max and x min are the maximum and minimum of the mean and standard variation of the LiDAR metric or forest attribute, respectively, over different sample sizes of plots for a forest type.In addition, a two-tailed t test was used to assess the statistical significance of the differences in the means of the LiDAR metrics or forest attributes between different numbers of plots and the total number of plots.

Model calibration and validation
Over the past two decades, numerous studies have been conducted on various forest types and forest attributes (e.g., H, DBH, BA, VOL, and AGB), resulting in various estimation models (Zolkos et al. 2013;Latifi et al. 2015).These models include parametric regression and nonparametric approaches, with the primary goal of optimizing prediction accuracy by maximizing explained variability (e.g., R 2 ), minimizing prediction error (e.g., RMSE), and reducing systematic bias for specific forest attributes, forest types, and study sites (Zolkos et al. 2013;Naesset et al. 2004;Hudak et al. 2008;Penner et al. 2013;White et al. 2017).In this study, our objective was to investigate the impact of sample size on estimation accuracy and to develop simplicity and clarity models.Therefore, we focused on parametric models, specifically multivariate multiplicative power models known for their flexibility (Hollaus et al. 2009).
Using three groups of 13 LiDAR-derived metrics mentioned earlier, and a rule-based exhaustive combination approach as described by Li et al. (2023), we obtained a total of 86 formulations of the multiplicative power model, each consisting of 2-5 variables, to facilitate the estimations of the DBH, H, BA, and VOL.
To achieve the optimal model formulation for predicting forest attributes, we randomly selected 70% of the sample plots from each forest type to calibrate the model, while the remaining 30% were used for validation.Model calibration and validation were performed using the Gauss-Newton algorithm in the Python software package (Python version 3.7).We evaluated 86 model formulations and selected the best one based on the lowest rRMSE and largest R 2 values of the validation dataset.The optimal model formulation is presented in Table 2 for all forest attributes and types.
To evaluate the robustness of the best model, different numbers of plots were used for model calibration, and the model was validated using the leave-one-out cross-validation (LOOCV) approach with R 2 and rRMSE statistics.To reduce the random errors, we performed 50 repetitions for each sample size.

Determination of minimum and maximum sample size
For a forest attribute of a forest type, the coefficient of variation of rRMSE (CV rRMSE ) of the predictive model was calculated after 50 repeat iterations of model calibration for each sample size.We found that the relationship between the CV rRMSE and sample size was best fitted by the following logistic regression model: where n is the sample size; a 0 , a 1 , and a 2 are the model parameters.As the sample size increases, CV rRMSE tends (2) Table 2 Best model formulations for estimating four forest attributes across four forest types using the total number of plots a 0 , a 1 , a 2 ,…, and a 5 are model parameters
As the sample sizes for all four forest types in this study are large enough, we believe that the model accuracy using the total number of plots is the highest accuracy achievable in the study area.Accordingly, if the CV rRMSE of an estimation model is less than 10%, the model performance is essentially stabilizes, and we consider the corresponding sample size as the minimum sample size required for this attribute estimation.

Influence of sample size on estimation accuracy
The result showed that the mean rRMSE of the estimation models with a sample size of 30 was obviously higher than those with the total number of plots, while the opposite was true for the mean R 2 (Table 3).This was true for all attributes of all forest types.Table 3 also showed that the rRMSEs of the four attribute estimation models for broad-leaved forests were significantly larger than those for other forest types, while the R 2 was on the opposite side.
As the sample size increased from 30 to the total number of plots, the mean rRMSE of the VOL estimation of the Chinese fir forests decreased by 7.89%, and the mean R 2 increased by 16.89%.Similarly, for the Eucalyptus forest plantations, the mean rRMSE decreased by 13.86%, and the mean R 2 increased by 13.40%, as shown in Table 3.These results demonstrated that the effects of the sample sizes on estimation model performance varied across forest types and attributes.Furthermore, as the sample size increased from 30 to the total number of plots, the mean rRMSE of all models decreased while the mean R 2 increased for all forest types, as illustrated in Fig. 2.These trends indicated that the accuracy of all models improved as the sample size increased.
When the sample size was small, there was a larger variation in rRMSE and R 2 among the 50 repetitions of model calibration and validation.The mean rRMSE was also large, while the mean R 2 was small.However, as the sample sizes increased, the variations in rRMSE and R 2 decreased, and the mean rRMSE decreased while the mean R 2 increased (Fig. 3).For instance, in the VOL estimation of the Chinese fir forests, with a sample size of 30, the mean rRMSE and R 2 were 22.04% and 0.756, respectively, ranging from 14.27 to 32.63% and 0.419-0.886,respectively.With a sample size of 255, the mean rRMSE and R 2 were 19.69% and 0.824, respectively, ranging from 18.92 to 19.93% and 0.816 to 0.839, respectively.It was observed that for all models, the coefficient of variation in the rRMSE and R 2 decreased with increasing sample size.
The rRMSE was used as a criterion to evaluate the model performance during the model validation procedure.From the 50 repeated model calibrations for each sample size, the ten best and ten worst models were identified based on their rRMSE values.We found two quite interesting phenomena in the performance of the ten best and ten worst models (Fig. 3).Firstly, as the sample size increased, the rRMSE of the ten best models showed an increasing trend, and the R 2 showed a decreasing trend.That is, the model accuracy showed a decreasing trend.However, the accuracy of the ten best models was always consistently higher than that of all 50 models; the performances of the ten worst models were opposite to those of the ten best.Secondly, the accuracy of the ten best models with small sample sizes was always higher than that of all 50 models with large sample sizes.
The analysis of the variations in the target variables (field-measured attributes) and LiDAR variables (metrics) among the ten best and ten worst models revealed several trends.Firstly, the mean standard deviations (SDs) of the target variables in the ten best models were consistently lower than those of all 50 models for all sample sizes, while the ten worst models showed the opposite result.Secondly, in most sample sizes, the ten best models had at least one LiDAR variable with a mean SD lower than that of all 50 models, while the ten worst models showed the opposite trend.Lastly, as the sample size increased, the SDs of both the target variable and the LiDAR metric in the ten best and ten worst models tended to be close to those of all 50 models.Figure 4

Influence of sample size on the LiDAR variables and forest attributes
The means of the LiDAR metrics were consistent across different sample sizes (30, 35, …, the total number of plots), with the general variations (VR mean ) ranging from less than 1.0% to a maximum of 4.81% found in the LADmean in Chinese fir forests (Table 4).For example, in the fir forest, the mean hp95 varied by only 0.91% among 39 sample sizes of plots, with the maximum and minimum means being 12.89 and 12.78, respectively.Variations in the eucalyptus and broad-leaved forests were slightly smaller than those in the Chinese fir and Masson pine forests.The paired t tests also confirmed that there were no statistically significant differences (p > 0.05) in the means of all LiDAR-derived metrics between different sample sizes of plots and the total number of plots in all forest types.The standard deviations (SDs) of the LiDAR metrics did not vary much among the different numbers of plots in the four forest types.However, their variations were  Overall, small variations in the SDs of LiDAR metrics were found in the eucalyptus forests, followed by the broad-leaved and Masson pine forests, and a large variation was found in the Chinese fir forests (Table 4).As the sample size increased, the SDs of all 13 LiDAR metrics showed decreasing trends, and their relative differences (ΔSD) between the different sample sizes and the total number of plots also showed decreasing trends in all four forest types (Fig. 5).The fastest decreases trends were found in the eucalyptus and broad-leaved forests; as the sample sizes reached 110 and 115, respectively, they were less than 1% for most LiDAR metrics.In contrast, in the Masson pine and Chinese fir forests, the ΔSD was less than 1% only when the sample sizes reached 145 and 160, respectively.
The means of the field-measured attributes among the different sample sizes were also close to each other in all forest types, with their variations not exceeding 1.5%, and there were no statistically significant differences (p > 0.05) between the means of forest attributes of the different sample sizes and those of the total number of plots in all forest types.The variations of the means and the standard deviations of the forest attributes across the different sample sizes were similar to those of the LiDAR metrics, but their variations were significantly smaller than those of the latter.As the sample size increased, the relative differences in the SDs of forest attributes between the different sample sizes and the total number of plots in all forest types also decreased rapidly (Fig. 6).
The trends in the LiDAR-derived metrics and fieldmeasured attributes with changes in sample size could be summarized as follows: Firstly, the means of LiDAR metrics and forest attributes were close to each other among different sample sizes, with no statistically significant differences (p > 0.05) observed between the different sample sizes and the total number of plots.While their standard deviations were also close to each other, their variations were obviously larger than those in their means.Secondly, the variations in the means and standard deviations of the LiDAR metrics in eucalyptus and broad-leaved forests were slightly lower than those in Chinese fir and Masson pine forests.Lastly, with the increases in the sample sizes, there were rapid decreases in the relative differences observed in the means and standard deviations of the LiDAR metrics and fieldmeasured attributes between different sample sizes and the total number of plots.However, the decreasing trends varied across forest types.

The minimum and maximum sample sizes required for forest attribute estimation
The logistic regression was used to calibrate the relationship between the coefficient of variation of the rRMSE (CV rRMSE ) of forest attribute estimation models and the sample size.The results indicated that the variance in R 2 for the calibrated logistic regressions ranged from 0.943 to 0.991, with all rRMSEs below 15%, as demonstrated in Fig. 7.These logistic regressions show that the CV rRMSE of all forest attribute estimation models decreases with increasing sample size.
Based on the logistic regression model described in Fig. 7, we were able to determine the appropriate sample size for forest attribute estimation for a given forest type.Specifically, we could set the CV rRMSE to be either 10% or 5%, which represented the minimum sample size required to achieve acceptable accuracy and the maximum sample size required to achieve stable accuracy, respectively.These thresholds varied across forest attributes and forest types.See Table 5 for specific details on the minimum and maximum sample sizes required for different forest attributes across all four forest types.
The minimum sample size varied significantly across different forest attributes and forest types.The VOL estimation required the largest minimum sample size, ranging from 55 to 110, and the DBH estimation required the smallest minimum sample size, ranging from 50 to 70.For the estimation of all four forest attributes, the largest minimum sample sizes were required for the Chinese fir forest, ranging from 60 to 110, and the smallest for the broad-leaved forest, ranging from 55 to 70.If four forest attributes were estimated simultaneously, the minimum sample sizes required for fir, pine, eucalyptus, and broad-leaved forests were 110, 80, 85, and 70, respectively (Table 5).The maximum sample sizes also varied considerably across different forest attributes and forest types; however, their ranges of variation were smaller than those in the minimum sample sizes.
Most differences in rRMSEs between the models calibrated by the minimum sample size and the total number of plots were less than 5%, with a maximum difference of 6.12%.The differences between the rRMSEs of the models calibrated by the maximum sample sizes and those calibrated by the total number of plots were slight, mostly less than 1%, with a maximum difference of not more than 3%.These results indicated that increasing the number of sample plots beyond the maximum sample size would have minimal impact on model accuracy.

Discussion
The cost and efficiency of field plot measurements, as well as the accuracy of forest attribute estimation, are crucial factors that need to be taken into account in airborne LiDAR forest applications.In this study, we have investigated the effect of sample size on the performance of forest attribute estimation models and determined the minimum sample sizes required to estimate four forest attributes for four forest types in a large subtropical region.No previous study has been reported with such a large extent of the study area, the complexity of the forest context, the large number of sample plots, and so many Fig. 6 The relative differences of the standard deviations (ΔSD, %) of field-measured attributes between the different sample sizes and the total number of plots showed decreasing trends in all four forest types: Chinese fir (a), Masson pine (b), eucalyptus (c), and broad-leaved (d) forests forest attributes.The findings of this study have universal reference value for optimizing technology schemes for airborne LiDAR forest applications.

Cause for sample size affects the estimation accuracy
Some previous studies have demonstrated that increasing the sample size can improve the accuracy of airborne LiDAR-based forest attribute estimations (Gobakken and Naesset 2008;Junttila et al. 2013), which is supported by our study.However, to the best of our knowledge, few studies have addressed the mechanism by which sample size affects estimation accuracy.
This study revealed that if sample plots are installed based on unreliable historical forest inventory data (which is a common method in large-scale airborne LiDAR applications) and the sample size is small, the uncertainty in the relationship between field-measured attributes and LiDAR variables in the sample plots is high, resulting in a large variation in the model accuracy (rRMSE) and explanatory power (R 2 ) of the model variables, with a large mean rRMSE and a small mean R 2 , indicating low model accuracy.As the sample size increases, the sample plots become more representative of the population, and the uncertainty in the relationship between  field-measured attributes and LiDAR variables gradually decreases.The variations of rRMSE and R 2 in the model gradually decrease, the mean rRMSE decreases, the mean R 2 increases, and the model accuracy is high (Fig. 3).We found that the improvement in the accuracy of forest attribute estimation with increasing sample size was due to a gradual decrease in the standard deviation of the target attributes and LDAR variables .This finding also effectively explains why LiDAR-based stratified selection of plots can reduce the sample size and has instructive significance for other plot selection methods such as target sampling and forest attribute estimation following LiDAR-based stand stratification.Our study suggests that high model accuracy can be achieved even if the sample sizes are small (e.g., 30) if the sample plots are selected properly (Fig. 3).Thus, model accuracy is closely related to sample selection (Maltamo et al. 2011), and increasing the sample size is not always beneficial for improving model accuracy.The findings of this study provide useful guidance for the efficient and accurate use of LiDAR data for forest attribute estimation.

Heterogeneity of the 3D canopy structure and the effect of sample size
The effect of sample size on the accuracy of forest attribute estimation varies by forest type.In broad-leaved ests, increasing the sample size led to a rapid decrease in rRMSE and an increase in R 2 for all attribute estimations, indicating a rapid improvement in model accuracy.
In Chinese fir forests, the changes in rRMSE and R 2 were slow with increasing the sample size, leading to a slow improvement in model accuracy.The changes in RMSE and R 2 for four attribute estimations in Masson pine and eucalyptus forests fall between those in Chinese fir and broad-leaved forests.(Fig. 2).As a result, the minimum sample size required for forest attribute estimation was smallest in broad-leaved forests, largest in Chinese fir forests, and intermediate for Masson pine and eucalyptus forests (Table 5).Differences in the 3D canopy structure due to differences in the biophysical characteristics of different forest types are likely to be the main contributors to these results.Excluding the shrub and herbaceous layers, the tree layers of Chinese fir, Masson pine, eucalyptus, and broad-leaved forests in this study have three, four, three, and five classes of vertical forest structures, respectively (Zhou and Li 2023).Two classes of vertical forest structures (VFS) (essentially equal to single-and multi-storied forests) were obtained for all forest types following a systematic clustering analysis using vertical structure parameters.The proportions of these two VFS classes in Chinese fir, Masson pine, eucalyptus, and broad-leaved forests are 48.6% and 51.4%,57.3% and 42.7%,78.8% and 21.2%,and 16.7% and 83.3%,respectively (unpublished manuscript by Li et al.).Multi-storied forests predominate in broad-leaved forests, whereas single-and multistoried forests are closely weighted in Chinese fir forests.Chinese fir forests have the highest heterogeneity and the greatest variation in forest structure, and they require the maximum number of sample plots to accurately represent the population.Broad-leaved forests have the lowest heterogeneity and the least variation in forest structure, and a small number of sample plots are likely to be representative of the population.Although broad-leaved forests have low accuracy in estimating forest attributes due to the large number of tree species included, they have large differences in stem-and stand-form factors and forest structure, and the statistical relationship between the LiDAR variables and the measured attributes is relatively weak.The heterogeneities in the vertical forest structure of Masson pine and eucalyptus forests are intermediate between those of Chinese fir and broad-leaved forests, as is the minimum sample size required.
The effect of sample size on estimation accuracy varies across different forest attributes (Fig. 7).The minimum sample size required is the largest for the VOL estimation in the Chinese fir forest, followed by that for the Masson pine and eucalyptus forests.In contrast, the DBH, H, and BA estimations required relatively smaller minimum sample sizes, which were similar to each other (Table 5).Moreover, the accuracy of the VOL estimation was generally lower than that of the DBH, H, and BA estimations (Table 3).We hypothesize that this might be attributed to the variation in forest attributes.The VOL, which is related to stand height and density, is a two-dimensional variable with the largest coefficient of variation, while the DBH, H, and BA are one-dimensional variables with small coefficients of variation (Table 1).It is evident that a large number of sample plots are required to accurately represent a population with highly variable forest attributes.However, the sample size required for diverse forest attribute estimation in broad-leaved forests does not comply with the above assumptions.Therefore, additional studies are needed to examine the effect of sample size on the accuracy of different forest attribute estimations.

Determining the minimum and maximum sample sizes
The minimum sample size required for accurate estimation of forest attributes varies significantly depending on various factors, including the forest attribute itself, the forest type or tree species, the extent of the study area, the complexity of the forest context, the modeling methods, and the sampling strategies.Studies by da Silva et al. (2020) and Stereńczak et al. (2018) suggested that a minimum of 63 and 300 field plots were necessary for VOL estimations in eucalyptus plantations and Scots pinedominant forests, respectively.However, existing studies for airborne LiDAR-based forest attribute estimation rarely exceed 200 field plots (Fassnacht et al. 2014).
Airborne LiDAR forest attribute estimations need to meet an acceptable level of accuracy or bias, as highlighted by previous studies (Jakubowski et al. 2013;Montagnoli et al. 2015).Some studies define acceptable accuracy as when the rRMSE or prediction error is less than a given value (da Silva et al. 2020;Silva et al. 2017;Lin et al. 2016).However, the present study found that the estimation accuracy varied with forest attributes and forest types (Table 3 and Fig. 2), making it difficult to establish a single criterion for acceptable accuracy across different forest attributes and forest types.Table 3 and Fig. 2 also demonstrated that the larger the sample size, the higher the estimation accuracy.As the number of sample plots in this study was large enough, we believed that the accuracy of the forest attribute estimations obtained by using all sample plots was the highest or close to the highest.Based on this consideration, we believed that it was a scientific and reasonable method to determine the minimum and maximum sample sizes based on the coefficient of variation of the rRMSE of the estimation model.Therefore, this study presented the minimum number of sample plots required for operationalizing ALS-based large-scale forest inventory in subtropical regions, and these results were expected to provide a reference for developing technical schemes.
Using a rule-based exhaustive approach, we developed multiplicative power models for estimating four forest attributes across four forest types based on 13 LiDARderived metrics.However, there are numerous other models for estimating forest attributes using LiDAR data, and additional modeling tests are necessary to understand the impact of sample size on estimation accuracy.Additionally, a stratified approach based on species and maturity class (e.g., young and mature forests) is an effective means of enhancing accuracy (Hauglin et al. 2021), and exploring the influence of sample size on estimation accuracy following such stratification would also be worthwhile.

Conclusions
In this study, we investigated the impact of the number of field plots on the accuracy of airborne LiDAR-based estimations of forest inventory attributes.Subsequently, we determined the minimum sample size required to estimate the four forest attributes across four forest types in a large subtropical region.As the sample size increased, the estimation accuracy improved for all forest attributes and forest types.We preliminary confirmed that this could attributed to a decrease in the standard deviation of the target attributes and LiDAR variables with increasing sample size.We observed that the sample size had a variable impact on the estimation accuracy for different forest attributes and forest types, and the minimum sample size varied significantly across forest attributes and forest types.We basically believed that heterogeneity in the forest structure and variability in the field-measured attributes were the main factors affecting the required minimum sample size.The higher the heterogeneity of the vertical forest structure (e.g., Chinese fir forests), the larger the minimum sample sizes required, and conversely, the lower the heterogeneity (e.g., broad-leaved forests), the smaller the minimum sample sizes required.The smaller the variation in field-measured attributes (e.g., DBH), the smaller the minimum sample sizes needed; and conversely, the greater the variation (e.g., VOL), the larger the minimum sample sizes needed.
The findings of this study will be useful in optimizing the technical schemes to improve the cost-effectiveness of operational airborne LiDAR-assisted forest resource inventory in large subtropical areas.However, the conclusions of this study still need to be validated by other predictive models, such as machine learning models.Additionally, the minimum sample size required needs to be further studied when using LiDAR data for assisted sampling.Similarly, if stratification-based estimations are applied using tree species and age classes, vertical structure, etc., additional research will be required to determine the minimum sample size.This project is a part of the project "the Fifth Forest Management Inventory of Guangxi Zhuang Autonomous Region (5 th FMI-GX, 2017-2020)", China.The airborne LiDAR data acquisition and prepossessing and field data measurement were founded by the Finance Department of the Guangxi Zhuang Autonomous Region.The authors would like to express their sincere gratitude to Chengling Yang and Yao Liang from the Guangxi Forest Inventory and Planning Institute (FIPI-GX) and 120 field crews who worked on the field measurements.The authors also thank Guangxi 3D Remote Sensing Engineering Technology Co., Ltd.; Feiyan Aero Remote Sensing Technology Co., Ltd.; Zhongke Remote Sensing Technology Group Co., Ltd.; and Guangzhou Jiantong Surveying and Mapping Geographic Information Technology Co., Ltd., which were responsible for the acquisition and preprocessing of the airborne LiDAR data utilized herein.We acknowledge the anonymous reviewers and the editor for their insightful suggestions.

Fig. 1
Fig. 1 Study area and distribution of the field plots.a Geographic location of the study area in China, b distribution of plot clusters in three regions in the study area, and c locations of field plots in a cluster displays the trends in the SDs of the measured VOL and the five LiDAR variables used for the VOL estimation in the Masson pine forest across the ten best and ten worst models and all 50 models.According to Figs.3 and 4, we can infer that the SDs of the target variables and LiDAR variables in the samples are critical factors in determining the model accuracy.If the SDs are small, the accuracy is high; otherwise, the accuracy is low.

Fig. 2
Fig. 2 The trends of the mean rRMSE and mean R 2 of the estimation models of four forest attributes with the increase of sample size: Chinese fir (a), Masson pine (b), eucalyptus (c), and broad-leaved (d) forests

Fig. 3
Fig. 3 Boxplots of rRMSE (a) and R 2 (b) of all 50 models and the ten best and ten worst models of the VOL estimation of the Masson pine forest showed decreasing trends of variations in rRMSE and R 2 with increasing sample size

Fig. 4
Fig. 4 Trends of the standard deviations (SDs) for the measured VOL and LiDAR-derived metrics used to estimate VOL of the Masson pine forest as the sample size increased in all 50 models and the ten best and ten worst models: measured VOL (a), Hmean (b), Hcv (c), CC (d), dp50 (e), and LADmean (f)

Fig.
Fig. The relative differences of the standard deviations (ΔSD, %) of 13 LiDAR-derived metrics between the different sample sizes and the total number of plots showed decreasing trends in all four forest types: Chinese fir (a), Masson pine (b), eucalyptus (c), and broad-leaved (d) forests

Fig. 7
Fig. 7 Logistic regression curves showing the relationships between the CV rRMSE of forest attributes estimation models and the sample sizes of four types of forest: Chinese fir (a), Masson pine (b), eucalyptus (c), and broad-leaved (d) forests

Table 3
Mean rRMSE and R 2 of the four attribute estimations for the four forest types on the validation dataset with a sample size of 30 and the total number of plots, respectively

Table 4
Variation in the mean (VR mean , %) and standard deviation (VR SD , %) of LiDAR-derived metrics among the different number of plots in four types of forests

Forest type Statistic hp95 Hmean Hstdev Hcv CC dp50 dp75 LADmean LADsdev LADcv VFPmean VFPsdev VFPcv
larger than those observed for the means.The variations of the SD (VR SD ) of most LiDAR metrics were less than 5.0%, except for the largest variation, which reached 16.72% found in the LADmean in Chinese fir forests.

Table 5
Minimum (Min)and maximum (Max) numbers of plots required for estimating four forest attributes across four forest types to achieve acceptable and stable accuracy, respectively