The original back-propagation like algorithm processes the patterns one by one. This often leads to problems in datasets with high dimensionality and not well separated classes, as changes to the rule base derived from one pattern can affect the classification of the other patterns in an unpredictable manner. Therefore the learning algorithm has been changed to generation learning. This offers much more reliable information on which changes to the rule base are suited to lower the total error. Furthermore, the back-propagation algorithm was replaced by one that allows easier utilization of the cost matrix. We implemented a kind of evolutionary strategy, known as (1+1)-strategy. This algorithm is one of the earliest and probably one of the simplest evolutionary strategies known in literature, with a population of only one individual and exactly one offspring per generation. Actually, this means that random changes (mutation) are applied to the fuzzy sets and are tested on the learning data. The changes are established if the offspring's rule base performs better than its parent's, otherwise the unmodified rule base is kept (selection). As in the original backpropagation algorithm, constraints are imposed on the changes to keep the semantics of the fuzzy rules.
The confusion matrix can directly be used by the learning procedure if an appropriate error measure is specified. We implemented two different measures, which are analogous extensions of the misclassification rate and the error rate in the original NEFCLASS system. The former is calculated from the crisp classifications and reflects the borders between the fuzzy rules, whereas the latter uses the activations and reflects ambiguousness and position of the fuzzy sets in relation to the classes. The misclassification rate is extended to an estimation of classification costs by summing up the cost matrix elements given by the actual class of a pattern and the decision of NEFCLASS for this pattern:
The main aim of the learning phase is to minimize this error measure. This measure depends on the crisp classification and not directly on activations of the rules. If during learning the crisp classification does not change, e.g. if changes of the rule base are small or in sparsely covered regions, the error does not change and thus gives no feedback whether the change is desirable. To direct the learning in these situations the second measure is used, which directly uses the activations of the output layer for class c and prefers unambiguous classifications. The exact definition is
Intuitively the first measure adjusts the borders between classes, and the second measure fits (the centers of) the fuzzy sets to the data.
It might be worth mentioning, that the original NEFCLASS learning algorithm cannot guarantee to minimize either of the (unmodified) error measures. The heuristic learning rules try to decrease the error on a per pattern basis. However, in some cases this can even increase the total error. The new algorithm can also (like most nonlinear optimization algorithms) not guarantee to reach a global minimum, but it ensures that the total error does not increase and thus settles at least in a local minimum. The need for sophisticated search algorithms is not so apparent, as the 2nd learning phase of NEFCLASS does only fine-tune the fuzzy sets. This may explain why the relatively simple algorithm produces quite satisfactory results.