Implementation Example of the Expert system for Decision Support on Android platform based on a specific Dataset

ABSTRACT


Introduction
The main task of the Covertype Data Set qualification problem is to predict forest cover type only with cartographic signs, without other data.Independent variables are derived from data originally obtained from the US Geological Survey -USGS and US Forest Service -USFS.This research includes four wild regions ie areas, located in the Roosevelt National Forest north of Colorado.Some basic information for these four regions are: 1. Rawah (region 1) 2. Neota (region 2), probably has the highest altitude, 3. Comanche Peak (region 3) have a lower altitude than region 2, 4. Cashe la Poudre (region 4) has the lowest altitude.
As for the types of trees in this area: in Neot, the most common spruce / firs (type 1), while in Rawah and Comanche Peak is the most abundant twisted pine (type 2) as the main species, then spruce / fir and aspen (type 5) .In Cache la Poudre there are red pine (type 3), Douglas fir (type 6), and poplar / willow (type 4).The areas of Rawah and Comanche Peak tend to be more typical when looking at data than Neota or Cache la Poudre, precisely because of the diversity of tree species and the range of predictable values of variables such as altitude.Cache la Poudre is more unique than others due to less altitude value and species diversity.
In addition to four regions, twelve cartographic measures (independent variables) and seven types of large forest surfaces (dependent variables) are included.This set of data has 581,012 instances of which: the first 11,340 records are taken for view the data from a subset, the next 3,780 records are taken for data verification in subset, and the last 565,892 records are taken for test the data in the subsets [1].Managers of national parks responsible for management of for strategy of eco-system require basic information, including a list of earth reforestation inventories to make it easier for the decision-making process.One way of obtaining this information is model prediction.
In the papers [2,3,4], two predictive models were examined: a model of a neural network and a traditional statistical model based on a discriminant analysis.The overall objectives of these studies are to develop these two predictive models [5], to compare and estimate their precision in the division of types of wood cover in unexplored (uninhabited) forests.Several sub-sets of these variables have been tested for determining of the best predictive model [6,7,8,9].For each subset of twelve cartographic variables, which were examined in studies, the relative classification indicates that the approach to the application of neural networks exceeds the traditional method of discriminatory analysis in predicting of the forest cover type.The final neural network model was more precise in the classification (70.58%) than the linear regression model for prediction (58.38%).In support of these results, there are thirty more networks with randomly selected initial results.The total mean value of the precision in split for the neural network model is 70.52%.Therefore, national park managers can use an alternative method in predicting of the forest cover type that is superior to the traditional method and adequate to support their decision-making process for the eco-system management strategy.
Chart 2. Redistribution after the application of unsupervised instances of the "Resample" filter, taking 10%

Learning outcomes and learned rules
As can be seen from the previous section of this paper, after filtering with the unsupervised instance Resample filter, the number of instances is reduced to 58101, which is 10% of the total number of samples in the entire dataset.With using of the PART method, with the default parameters to the preprocessed dataset, we get 1760 learned rules.The precision is checked by a 10-fold cross-validation and we get that 84.29% is correctly classified.However, the number of rules is very high in order to be manually translated into a knowledge base for the expert system, so we must try to "tree pruning" by changing the parameters in the PART method: • By increasing the parameter M (the minimum number of instances as a rule), we reduce the tree or the number of rules, since the data set is relatively large, we take M = 1000.• We will also reduce the value of C (Confidence Factor) whose reduction we achieve a greater "tree pruning", we take the value C = 0.15 (default is 0.25).• With these parameters we get 143 learned rules, and precision is 75.36%, which is again a great number for manual translation into the knowledge base.• By adjusting the parameters we will try to get a reasonable number of rules that we can manually translate into the knowledge base.• After several attempts for different values of M and C, we have come to an optimal solution where for the parameter values M = 400 and C = 0.15 we obtain a tree of 28 rules and a precision of 71.07%.

Obtained learning rules
Using the chosen method for learning of the production rules, PART from the WEKA data research system [10], is inductively learned set of the production rules.The accuracy and comprehensiveness of the learned knowledge is optimized with the available M and C parameters of the selected learning methods.

View the functioning of the Expert system on Android Platforms and Web environment
In creation a covtype.kbfile with a total of 28 rules, is used the e2gRuleWriter tool.The learned set of rules was built into the knowledge base of the expert system e2gDroid Expert System.The user interface of the system in Serbian was created using the Expertise2Go translate directive.Below are given the demonstration of performance testing of a small expert system on some of the selected examples.
The first case of testing on Android Platforms (Figure 1) and second case of testing in web -HTML environment (Figure 2).

Conclusion
An efficient way of creating a small Expert Decision Support System for the Android platform is shown without serious programming in the Java programming language.The knowledge base of the system, for given area of expertise was generated by inductive learning methods based on examples from the WEKA data research system, and the system was realized using the Expertise2Go and e2gDroid Lite Expert shell system for mobile devices.
Based on the given application area and a set of trained examples, specifically based on the Covertype DataSet qualification problem, was developed a support system for the decisions.

Figure 1 .Figure 2 .
Figure 1.parts of an application that has been customized for Android Platforms

Table 1 .
Basic characteristics of Covertype DataSet qualification problem

Table 4 .
PART decision list (only the first 17 rules)