Considering all species in each dataset together, the clumping factor accounted for 70% and 76%, respectively, of the reduction of deviance in datasets 1 and 2. This indicated that dispersal limitation was the major determining factor of species distribution at the scale of the study presented here.
There were however differences between species and datasets in terms of the relative contribution of the factors tested. The lower contribution of the canopy disturbance factor in dataset 2 was probably a consequence of not having included any pioneer species in that dataset. Notably, a number of pioneer species in dataset 1 (Table 1) appear more constrained by the degree of canopy disturbance than dispersal limitation (Cecropia obtusa Trécul, Goupia glabra Aubl., Inga cayennensis Sagot ex Benth., Inga umbellifera (Vahl) Steud., Loreya mespiloides Miq.,...).
Consistent with previous findings (Condit et al. 2000), most species showed a significant degree of spatial correlation. OddClump which captures this spatial dependency appears to be consistent between the two datasets despite the fairly different spatial sampling pattern (Pearson’s coefficient correlation r = 0.81; P < 0.001; Bartlett Chi square test, n = 13). There was only one species for which spatial correlation was not picked-up in dataset 1 while it was in dataset 2 (Carapa procera DC.).
This large effect of spatial correlation may be related to the size of stems targeted in this study (and will also depend on the elementary quadrat size taken to be 5 × 5 m in the present study). The spatial correlation is expected to be stronger for saplings than for trees as population thinning will gradually weaken the pattern originating from individual trees seed dispersal kernel. Hence, the relative importance of spatial correlation and habitat specialisation is likely to differ between saplings and adults. It will also be affected by the overall degree of disturbance.
Introducing a clumping factor in the logistic model reduced the number of “significant” species-habitat association from 74% to 42% thereby strongly limiting the number of spurious associations. In one case of very restricted distribution (Duguetia yeshidan Sandwith, a treelet found in two plots only) introducing this clumping factor allowed to reveal its sensitivity to canopy openness (Table 1) which was otherwise missed out (data not shown).
Spatial aggregation of conspecific stems may occur at different spatial scales indeed and may directly reflect dispersal limitation such as clumping of saplings around a mother tree, or larger aggregates known to occur for many tropical species (Traissac 2003). Aggregation may however also be induced by latent environmental factors. Statistically significant clumping can also be a consequence of imprecise or ill-coded predictors of environmental factors included in the model. For instance, C. obtusa is considered to be an extremely efficient disperser: its seeds are dispersed by bats and may survive many years in the soil until a canopy gaps occurs and they will germinate. They are extremely abundant and widespread (de Foresta and Prévost 1986). We found that the distribution of C. obtusa was largely determined by canopy disturbance (Table 1) as expected but that the species still showed a significant degree of clumping in our model. Since the species tends to be clumped in canopy gaps this spatial correlation may be a consequence of unmapped natural gaps which were not included in the model. The presence of a group of stems not associated to a recorded disturbance zone will increase the deviance captured by the clumping factor. Mapping and coding of disturbance may also be prone to errors even in the logged-over areas. Consequently, spatial correlation might also have been inflated as a consequence of inaccurate canopy disturbance mapping.
The final rate of positive species × habitat association was 20% in dataset 1 and 67% in dataset 2. The latter was characterised by a higher sampling pressure and a stronger spatial coherence (four large square plots covering 25 ha in total included in a 40 ha compact area, Fig. 1 in the Electronic supplementary material in Molino and Sabatier 2001). The true percentage of species-habitat association in dataset 1 is most probably much larger than the percentage reported here.
For a given area sampled, the statistical power of the test, i.e. the probability of detecting an existing species × habitat association, will increase with species abundance. This is reflected by the highly significant negative rank correlation between species abundance and PEnv (Spearman’s rho = −0.401; P < 0.001) in dataset 1. It may be true that more common species are more frequently associated to particular habitat though there is no definite ecological argument why this should be the case. By including species with as few as 40 stems tallied in 5 ha in our analysis, we accepted a high risk of not rejecting H0 even though it was false (type II error).
Everything else being equal, the power of the statistical test will also increase with the area sampled. Higher sampling pressure also affected the higher discovery rate in dataset 2. Positive associations detected in dataset 1 were consistently detected in dataset 2 for the 14 species common to the two datasets (column PSoil and PDist in Tables 1 and 2). However, lower sampling intensity in dataset 1 did limit the power of the analysis and half of the associations detected in dataset 2 were missed in dataset 1 (Tables 1 and 2): only five of the ten species found to be preferentially associated to a particular habitat were also detected in dataset 1. For some species, only one significant environmental factor was detected in dataset 1 while both were found to be significant when using dataset 2 (e.g. Tachigali melinonii (Harms) Zarucchi & Herend.).
Finally, as we applied a multiple test correction to maintain a family wise type I error rate of 5%, we further increased the type II error and more drastically so for dataset 1 which had more species tested.
As mentioned above the accuracy of spatial predictors as well as their precision might have been limiting. The large range of dbh and the rather long time since logging might have blurred the floristic changes which have occurred following logging.
Nonetheless, despite the aforecited limitations, this study clearly indicates that ecological determinism applies to sapling distribution of a large proportion of species in the Paracou forest.
In addition to different optimum environmental settings, species clearly exhibited a variety of niche breadth as they were more or less strongly affected by the environmental factors examined here, some abundant species being apparently unaffected either by canopy openness or drainage regime.
Similar conclusions have been reached in a series of studies in the tropics. A study conducted in the Barro Colorado Island 50 ha plot (Harms et al. 2001) found that 64% of the 171 most abundant species showed significant association (either positive or negative) with a particular habitat. In the latter study, all stems above 1 cm dbh were censused. The rate of species-habitat association detected was remarkably high given that habitats were rather crudely defined as a combination of slope and elevation class (in addition to a small patch of secondary forest, stream-sides and a swamp which were treated as separate habitats). This high detection rate however does not include correction for multiple testing. No evidence was found that rarer and commoner species may vary in the degree to which they are associated with habitats.
John et al. (2007), using data collected in three neotropical forest plots in Colombia (La Planada, 25 ha), Ecuador (Yasuni, 25 ha), and again Panama (Barro Colorado Island, 50 ha) also showed that the spatial distributions of 29% to 40% of tree species were strongly associated with the distributions of soil nutrients at all three sites.
Webb and Peart (2000) sampled trees (in 28 plots totalling 4.5 ha and seedlings (<1 cm diameter) in 28 sub-plots totalling 0.1 ha across a study area of 150 ha of rain forest in Borneo. Combining light (assessed by hemispherical photographs) and physiographic habitat (defined as topographic position with associated moisture and soil regime), 20 out of the 45 abundant species were associated with at least one habitat factor as either adults or seedlings. They also report stronger physiographic habitat specificity at tree stage than at seedling stage and, less expectedly, a fairly low degree of consistency between adult and seedling—physiographic habitat associations.
The species-habitat associations detected in the present study (as in the studies mentioned above) may be biased if unmapped latent environmental factors significantly affect species distributions or as a consequence of errors in mapping of the spatial predictors. A first test of robustness of our study is provided by comparing the results of both datasets for the shared species (Figs. 3 and 4). A few discrepancies between the two datasets indicate that the precise strength of some associations may be poorly estimated in some cases. Notably the abundance threshold of eight stems per ha that was used with dataset 1 may have been too permissive. However the overall consistency between the two datasets is good in as much as one is interested in positive or negative associations with environmental factors and clumping intensity. A second test of robustness of the species-habitat associations was conducted by comparing the species ranking along ecological gradients arrived at in the present study with previously published independent studies.
To assess the validity of the plant-soil associations detected, we compared our results with the findings of Pélissier et al. (2002) at a nearby site (Piste de Saint Elie Research station). This study led to the ranking of 118 tree species site along a main gradient of tolerance to prolonged water saturation. The study conducted at the site used a detailed soil classification in which nine units were distinguished, including well-drained soils (deep vertical drainage unit—DVD). Although simpler and reflecting a shorter gradient (DVD are rare at Paracou and absent from our dataset), our 5-unit classification is directly superimposable to this 9-unit classification. For each of our datasets, and for the species they respectively share with Pélissier et al.’s study (2002), we compared the ordering along Pélissier et al.’s gradient of tolerance with that obtained with oddAlt values in our study. OddAlt values reflect species’ attractiveness (or repulsion) for the soils with the best drainage in Paracou (Alt). Among the 38 species in dataset 1 that appear in the Piste de St Elie dataset, ten are sensitive to soil type (PSoil < 0.05 and PTorus < 0.05). Their ranking according to oddAlt appeared significantly correlated to their ordering in Pélissier et al.’s analysis (Spearman’s rho = 0.71, P = 0.02).
As regards plant—canopy openness relationships, we compared our results to the list of pioneer and heliophilic tree species found in Paracou (Molino and Sabatier 2001). Among the 28 species of dataset 1 that appeared to be affected by the proximity to past logging disturbance (PDist < 0.05), only those (and all of them) that were positively linked (OddDist < 1) appeared in Molino and Sabatier’s list (2001). Meanwhile, in dataset 2, the only two species that appear very significantly positively affected by the proximity to past logging disturbance (PDist < 10−5, OddDist = 0.83 and 0.86) were also previously classified as pioneer or heliophilic (Molino and Sabatier 2001).
Overall our results regarding species sensitivity to canopy disturbance and drainage restriction are largely consistent with previously published observations. Minor discrepancies exist which can result from one or more of the following sources. Firstly, difference in strength of species-habitat association is likely to occur with change in development stage (Webb and Peart 2000). This may explain some of the observed differences in preferred drainage regime found with Pélissier et al.’s study (2002) which dealt exclusively with stem >10 dbh. Secondly, most other studies do not analyse jointly the two environmental gradients considered here, hence potentially leading to a biased estimate of the single gradient considered as those gradients are usually not independent. For instance poorly drained seasonally flooded bottomland have been shown to have a higher turnover rate (Madelaine et al. 2007) and a more open canopy (Vincent et al. 2010) than the surrounding forest. In our dataset, the reverse may be true since canopy disturbance was artificial (bottomlands with superficial hydromorphy may have been less accessible and hence have a lower level of canopy disturbance than more accessible parts). In any case taking into account both factors as we did should have improved the estimate of the effect of each factor. Thirdly, differences in the environmental gradient covered in the different studies may affect the results of the studies. Fourthly, as suggested earlier some species may be ill sorted due to low abundance. As a rule of thumb, the more abundant the species (Table 1) the more reliable the ecological profile may be.