WEIGHTED FUNCTIONS in the k-NN ESTIMATES of GROWING STOCK in HIGH FOREST in BOSNIA Ponderske funkcije u k-nn procjenama drvne zalihe u visokim šumama Bosne

Last decades permanent researches clarify possibilities for forest resource estimation based on terrestrial measurement and remote sensing. The most often the nonparametrical k-NN method is used integrating local estimates from terrestrial measurement and spectral Landsat data. In this paper the weighting functions of the kNN related to value differences and distances were examined in a case of high forest in site Konjuh in Bosnia. It is found that weighting Euclidean distance has not resulted with efficiency increase. Procentual RMSE's of growing stock showed higher values for weighted estimates on the pixel level. Classified volume estimates on aggregated level compared with volumes from intensive regular forest inventory achieved moderate level of agreement. The agreements between volume estimates are almost perfect regardless on weighting functions. Obtained results point out unweighted estimates as reported in several cases.


INTRODUCTION -Uvod
Multisource forest inventory on the global, regional, local and small scale level become the imperative in modern forest governance and management. Recent possibilities in forest resource assessment resulted with estimations and classifications based on all available information and data. Sampling methods, field measurements, remote sensing, geoinformation system have been examined permanently. Their possibilities are reflected in forest resource estimations and mapping on large scale (GALLAUN . Number of studies confirmed acceptable results based on non-parametric estimation method (k-NN) for forest inventory on large area (FRANCO-LOPEZ , MCROBERTS ET AL. 2002, HAPPANEN ET AL. 2004, NILSSON 1997, REESE ET AL., 2003. The k-NN estimation was introduced by KILKKI AND PAIVINEN (1987) and adapted for Finnish national forest inventory (MS-NFI) since 1990. The MS-NFI was assigned as image-aided multisource method. During last the inventories the method was enhanced improving k-NN estimation procedures aiming to deliver as large so small area estimates and maps GJERTSEN&ERIKSEN, 2004;KOUKAL, 2004;KANGAS&MALTAMO, 2006 Basic k-NN estimation is based on a heuristic model and the estimation is not design-based. The estimates were based on local estimations of forest attributes and corresponding pixel values in multidimensional space. Firstly the reference sample plot method was developed where the term "k" was one. In the application it was not possible to control weight of the individual plot (at the population level) in the estimation procedure (KANGAS&MALTAMO, 2006). Several papers reported about improvements of methodological approach related to weighting functions. Firstly, the k-NN estimates on the pixel level were based on unweighted and inverse distanceweighted equations. Then optimization method based on a genetic algorithm was used in order to find the variable weights. The method was assigned as ik-NN (improved k-NN method) (TOMPPO&HALME, 2004). The alternative to the k-NN methods have been developed and examined (Stümer& 2010). Recent advances and emerging issues in nearest neighbor's techniques are reviewed for four topic areas: (1) distance metrics, (2) optimization, (3) diagnostic tools, and (4) inference (MCROBERTS, 2011). Past studies in Bosnia resulted with acceptable k-NN estimates for growing stock for some planning unit levels (the high forest with natural regeneration, the narrow forest categories and some management classes) (ČABARAVDIĆ, 2007). Here were compared k-NN estimates based on weighted Euclidean, Mahalanobis and modified Mahalanbis distance (MD) in spectral space. Further research questions could be related to the growing stock estimation considering possibilities of applying different weighting functions on Euclidian distance numerically or by some more advanced procedures. The present paper deals with the application of weighting functions related to Azra Čabaravdić, Dieter R. Pelz, Gherardo Chirici, Christian Kutzer, Ernada Ćatić, Hamid Delić Euclidian distance and value difference in the non-parametrical k-NN method in a test forest inventory carried out in the Konjuh Kladanj area in B&H. The final aim is to set up an optimal k-NN model based on weighted Euclidian distance to spatialize the growing stock on the basis of Landsat 7 ETM+ imagery and to obtain if the application of weighting functions would increase estimation efficiency.

MATERIALS AND RESEARCH METHODS -Materijal i metod
The test site for this research is located in the nord-east Bosnia representing high forest with natural regeneration.
The total land area of Bosnia and Herzegovina (B&H) is approximately 51,500 square kilometers, of which about half is forested. Broadleaf trees dominate two-third of the forested landscape. Sixty percent is high forest, mainly mixed broadleaf-conifer stands or pure stands of oaks, beech, spruce, pines or fir. In the Federation of Bosnia and Herzegovina, forested area is 1,135,931 ha, with 56.3% of high forests with natural regeneration. Mixed beech, fir and spruce forests in central Bosnia are assigned as forests of the highest productivity and high structural and ecological stability. For the present research was chosen a test area defined as an administrative forest management unit named Forest Economic Management Unit "Konjuh" Kladanj with about 26,000 ha of forests. The high forest with natural regeneration dominates covering about 19,000 ha. For the purposes of the scientific research a pilot forest inventory was made in the test area. Ground sampling was configured on the basis of a single-stage systematic cluster sampling design. Based on UTM georeferenced reference system, a 2 kilometric sample grid was generated to locate clusters ( Figure 2).

Figure 2. Sampling plan Slika 2. Plan uzorka
Plots of 25 m in radius were aggregated into square-shaped clusters ( Figure  3). On the grid clusters made of 8-sample plots distributed in square were displaced at a distance of 400 m one from the others.
A first selection of forest clusters was carried out on the basis mine-free land. After this selection, 3 clusters were classified as non-accessible and excluded from further analysis.
A total number of 253 sample plots remained, 235 were selected for the experimental test because belonging to the high forest with natural regeneration. The field work for acquiring forest inventory data was conducted in the summer 2003. Registered information and data were used to describe sampling plots and to determine local estimates of the most important forest attributes (growing stock, number stem and growth) and their confidence intervals. Satellite data were from Landsat 7 ETM+ 187-029 scene acquired in the august 2000 with 7 multispectral bands of 30 m of geometric resolution. Here are used 1 st -5 th and 7 th spectral channels. In order to acquire information each of the 235 sampling units was inventoried in the field. Then spectral values of the corresponding pixels were extracted and recorded.

The k-NN method
The k-Nearest Neighbors (k-NN) classifier is one of the most widely known and used non-parametric classification procedures introducing in combined forest inventories by KATILA&TOMPPO (2001) and FRANCO-LOPEZ ET AL. (2001). Later the method is investigated intensively and adjusted in many studies confirming its applicability in estimation procedures related to compiled forest and spectral data. Azra Čabaravdić, Dieter R. Pelz, Gherardo Chirici, Christian Kutzer, Ernada Ćatić, Hamid Delić Detailed description of the method used in this paper is given in STÜMER (2004) and KUTZER (2008): "With the k-NN method are obtained forest attributes estimates calculating the average value of the k neighbour samples. The neighbours are weighted by the distance value, which describes the spectral similarity. Each pixel contains spectral information for each channel in the form of a digital value. The spectral difference between two pixels is defined with the use of a metrics. A common distance value is the Euclidean distance, d (i)p , which has to be calculated from the target pixel p, to every sample pixel i, for which a terrestrial observation is available. If x 1 and x 2 are the characteristic vectors of two pixels whose similarity have to be tested, the Euclidean distance d(x 1 , x 2 ) between them is where N is the number of the spectral components (e.g. used channels). A generalized Euclidean distance is described as the Minkowski-r-distance (BORTZ, 1993). By replacing the exponent 2 (respectively ½) with r (respectively 1/r), a generalization of the Euclidean distance equation (5.1) is the result: (BORTZ, 1993), of which the distance of two points results from the sum of the attribute differences.
By variation of the metric coefficient, attribute differences become weighted differently. With r = 1, all attribute differences are weighted equally, irrespectively of their term. With r = 2, bigger differences get a stronger loading compared to smaller differences and so on.
Also, the variability of the spectral information varies within the channels. To consider channels with a wide variability in the reflex reflection and to weight their influence on the differentiation of attribute classes of an attribute, the parameter a j for the weighting of the channels has been introduced (FRANCO-LOPEZ ET AL., 2001): If the parameter aj for j = 1,…N is chosen equal to 1, all channels have the same weight when calculating the distance. However, each channel can adequately be linked by a weighting a j .
The k = 1 to k = n nearest spectral neighbours, i.e. pixel with corresponding terrestrial observations are used for the following analyses.
Pixel which meet the stipulation in the spectral feature space, where d (k),p is the distance of the k-nearest neighbours and n is the number of available pixels with corresponding terrestrial data. All pixels with distances in the spectral feature space greater than d(k),p of the observed pixel p are ignored. With k = 1, only the pixel with the lowest spectral difference is been used for the further calculations. The higher the values for k that are used, the more pixels with corresponding terrestrial records affect the estimation of the target value, which contains no terrestrial records. The use of k samples means that the random scatter, caused by signal errors, may be narrowed.
The distance values do only represent the differences between the spectral information of two pixels. To integrate the attribute values from the terrestrial observations, which are allocated to the k-nearest pixels, for further calculations, they have to be weighted according to their spectral distance. Hence a weighting w (i),p is calculated for each extracted pixel: The more similar the spectral information is, the higher is the weighting and, therefore, the influence on the attribute value, which has to be calculated.
MALTAMO & KANGAS (1998) have modified (5.5) to determine the pixel weighting where k describes the number of nearest neighbours and t influences the weighting of the distance. The bigger t is chosen, the bigger is the weighting of a pixel with a narrow spectral distance. The sum of all weightings p ), t ( w′ is always 1. The attribute value of a pixel sought-after is calculated with help of the attribute values, derived from terrestrial sampling, brought into correlation with the corresponding weighted spectral data of the k-nearest pixels: where m (i),p are the terrestrially recorded values of i = 1,…k pixels, which are located nearest to pixel p in the spectral space. The process is repeated for every pixel and results in intensive computations, depending on the resolution of the sensor and the size of the inventory area (STÜMER, 2004).
By variation of the variables k, r, t, and a j , influence on the estimator may be exerted. Calculations with varying values for the parameters k, r, and t are made. For an easier comparability, the value 1 was chosen for the variables a 1 , a 2 , …, a j ." In this paper the optimal settings for the growing stock is examined and the overall accuracies of the kNN estimations were calculated.
Obtain results were evaluated on the pixel level calculating: Bias (B), Mean Square Error (MSE), Root Mean Square Error (RMSE) and procentual Root Mean Square Error (RMSE%). Bias (B) is differences between estimated and measured values: is an estimated value, μ μ the field measured value (local estimate).
The mean square error (MSE) is calculated based on variable variance and bias: The root mean square error (RMSE) is determined as square root of MSE: The relative RMSE was calculated by using: where μ is the mean of the variable estimates.
Relative RMSE% for characteristic configurations were compared and discussed.
Beside, FRANCO-LOPEZ ET AL. (2001) found as informative to build confusion matrices for continuous attributes classifying them in attribute classes and calculate Kappa indexes as measure of agreement. A value of 0 indicates perfect agreement while a value of 0 implies that agreement is by chance. Here are analyzed confusion matrices for the growing stock classified in classes of 100 m³ ha -1 and expressed Kappa values (CONGALTON, 1991). Obtained Kappa values are interpreted using following relations: < 0 -poor agreement, 0,0 to 0,20 -slight agreement, 0,20 to 0,40fair agreement, 0,40 to 0,60 -moderate agreement, 0,60 to 0,80 -substantial agreement and 0,80 to 1,00 -almost perfect agreement.

RESULTS AND DISCUSION -Rezultati i diskusija
The estimation efficency depends on the natural variability and appropriate statistical methods. In the k-NN methods family it is important to examine influence of Weighted functions in the k-nn estimates of growing stock in high forest in Bosnia model parameters (number of neighbors, weighting functions, other posibilities). In this paper is explored if weighted functions related to value differences and distances would affect estimation efficiency of total growing stock in high forest in Bosnia.
Firstly, values of the growing stock %RMSE depending of k nearest neigbours for different combination of value differences and distance weight are examined (Fig.  3). The typical behaviour of RMSEs is obtained resulting with sharp error decrease for the closer neighbors (k<3-5), moderate for k=5-7 and value stabilization after 7 neighbours for all combinations. The highest erros are obtained in cases for t=5 while the lowest values were obtained for r=1 and t=1.

. Zavisnost relativne greške (%) procjene od k, r i t
The mean estimates on the pixel level are overestimated for all combination k, r and t. Procentual RMSEs were in the interval of 54 to 57%. Errors were affected by t weights resulting with higher values of biases and RMSEs for higher distance weight (Table 1).  20 -11.90 192.60 192.96 56.22 The RMSE-s depending of value difference weights (r) (const. k=5, t=2) and distance weights (t) (const. k=5, r=2) are presented in Figure 4.a) and 4.b). It is visible that the minimal RMSEs are obtained for r=1 and t=1 respectively. The %RMSE depending of r started with minimal value, then showed sharp increase giving importance to closer value differences with further moderate error variation (Fig. 4a). The distance weight function for constant k and r resulted with minimal value of %RMSE for t equal 1 (Fig. 4b). Obtained results are consisted with Fig. 3 where the lowest %RMSE was achived for r=1 and t=1. FRANCO-LOPEZ  found that it was best to use no distance weights (equal weights). Consistent results related to distance weights are obtained for basal area (r=2) in STÜMER (2004). ÖZSAKABAŞI (2008) compared four weight functions for Euclidean distance (inverse, inverse square distance, stars and fraction) and unweighted estimates. The inverse distance based estimates resulted with the lowest error values. Findings related to t curve are published in MCROBERTS (2011) identifying the best combinations of feature variables. The autor confirmed that equal neighbor weighting produced predictions comparable to more complex schemes in estimation of mean forest stem volume per unit area for small areas using a combination of forest inventory observations and Landsat Thematic Mapper (TM) imagery.
Comparison of %RMSE for different k-NN configurations employing Euclidean and Mahalanobis distances is given in Figure 5. The obtained results show consistent %RMSE behavior with the lowest values for unweighted Euclidean distance while Mahalanobis distances resulted with higher %RMSEs. Further, for different configuration using weighting functions the estimates for high forest are generated and mapped. The growing stock estimates are classified in volume classes of 100 m³ ha -1 and determined Kappa value. Classified thematic map generated by different k-NN configurations using weighting functions and the map obtained in regular forest inventory in high forest on forest management enterprise level (FEI) were compared. Kappa values ranged from 0.53 to 0.56 with values about 0,53 for weighted Euclidian distances and 0,55 for Mahalanobis distance (Table 2). The aggrement between k-NN and FEI estimates could be assigned as moderate. It is interesting to notice that almost prefect aggrement appared between weighted Euclidian based estimates. Also estimates based on the Mahalanobis distance achived aggrements assigned as substantial to almost prefect compared with weighted Euclidian based estimates. Both were in moderate aggrement with FEI estimates as mentioned earlier (Table 3).

CONCLUSION -Zaključci
Here are analysed different weighting functions related to value differences and distances in k-NN estimates of the growing stock on pixel and aggregated level. Determined results pointed out that weighting functions affected estimates slightly as on the pixel level so on the high forest level. The obtained differences clarified effects of weighting functions and identified possible achievements in estimation efficiency. The best results on the pixel level are obtained for unweighted value difference and distance. The resultsa are consistent with those reported in FRANCO-LOPEZ  and MCROBERTS (2011). Some research identified better results using weighting function as ÖZSAKABAŞI (2008). It is known that the specific forest and ecological conditions influence results in particular cases. The agreement between classified estimates obtained using k-NN for different weighting function and estimates from regular forest inventory is assigen as moderate. The weighting funcions have not showed significant influence on the classified estimates agreement level. Further possibilities for improvement are reported in RATU&KANGAS (2012) where the general mean is adjusted with the realized values of neighboring observations (measured with Euclidian distance) and weighted by correlation of the errors (or variogram) as a function of distance.