Unbalanced datasets like the one described here are quite usual in real world applications. Especially in machine vision tasks often a filtering of unimportant information is wanted, whereas important information must not be discarded. This leads to asymmetric errors.
A common way to deal with such data is to balance the classes in the preprocessing stage of the data. Therefore, either the number of minority cases is increased or the number of majority cases is reduced. This makes the application of standard classifiers possible. We treated the data in both ways and applied several classifiers, i.e. decision trees, KNN minimal distance classifiers. The results are comparable to that of NEFCLASS. However, this offers only an indirect way to control how the given asymmetries are handled.
The best result we got from the original NEFCLASS learned on the
balanced dataset was an average of
detection and
reduction rate (instead of
and
with the new version).
The reduction is pretty good, whereas the detection rate is nearly
useless. Obviously there is a tradeoff between these two rates. The
problem is that the behavior of the classifier can only be influenced
indirectly. The specification of misclassification costs is
straightforward and gives the user more control over the learning
result.
Another reason for the significantly better results of the modified NEFCLASS is certainly the new learning algorithm that allows much more stable fine-tuning from noisy data. In our domain the modifications made NEFCLASS a promising approach.