Next: References Up: Controlling Asymmetric Errors in Previous: Experimental Evaluation

Discussion

Unbalanced datasets like the one described here are quite usual in real world applications. Especially in machine vision tasks often a filtering of unimportant information is wanted, whereas important information must not be discarded. This leads to asymmetric errors.

A common way to deal with such data is to balance the classes in the preprocessing stage of the data. Therefore, either the number of minority cases is increased or the number of majority cases is reduced. This makes the application of standard classifiers possible. We treated the data in both ways and applied several classifiers, i.e. decision trees, KNN minimal distance classifiers. The results are comparable to that of NEFCLASS. However, this offers only an indirect way to control how the given asymmetries are handled.

The best result we got from the original NEFCLASS learned on the balanced dataset was an average of detection and reduction rate (instead of and with the new version). The reduction is pretty good, whereas the detection rate is nearly useless. Obviously there is a tradeoff between these two rates. The problem is that the behavior of the classifier can only be influenced indirectly. The specification of misclassification costs is straightforward and gives the user more control over the learning result.

Another reason for the significantly better results of the modified NEFCLASS is certainly the new learning algorithm that allows much more stable fine-tuning from noisy data. In our domain the modifications made NEFCLASS a promising approach.

Aljoscha Klose
Mon Nov 29 17:03:10 MET 1999