The original back-propagation like algorithm processes the patterns one
by one. This often leads to problems in datasets with high
dimensionality and not well separated classes, as changes to the rule
base derived from one pattern can affect the classification of the
other patterns in an unpredictable manner. Therefore the learning
algorithm has been changed to generation learning. This offers much
more reliable information on which changes to the rule base are suited
to lower the
total error. Furthermore, the
back-propagation algorithm was replaced by one that allows easier
utilization of the cost matrix. We implemented
a kind of evolutionary strategy, known as (1+1)-strategy. This algorithm
is one of the earliest and probably one of
the simplest
evolutionary strategies known in literature, with a population of only
one individual
and exactly one offspring per generation. Actually, this means that random
changes (*mutation*) are applied to the fuzzy sets and are tested on
the learning data. The changes are established if the offspring's rule
base performs better than its parent's, otherwise the unmodified rule base is kept
(*selection*).
As in the original backpropagation algorithm, constraints
are imposed on the changes to keep the semantics of the fuzzy rules.

The confusion matrix can directly be used by the learning procedure if an appropriate error measure is specified. We implemented two different measures, which are analogous extensions of the misclassification rate and the error rate in the original NEFCLASS system. The former is calculated from the crisp classifications and reflects the borders between the fuzzy rules, whereas the latter uses the activations and reflects ambiguousness and position of the fuzzy sets in relation to the classes. The misclassification rate is extended to an estimation of classification costs by summing up the cost matrix elements given by the actual class of a pattern and the decision of NEFCLASS for this pattern:

The main aim of the learning phase is to minimize this error
measure. This measure depends on the crisp classification and not
directly on activations of the rules. If during learning the
crisp classification does not change, e.g. if changes of the rule base
are small or in sparsely covered regions, the error does not change and
thus gives no feedback whether the change is desirable. To direct the
learning in these situations the second measure is used, which
directly uses the activations of the output layer for class *c*
and prefers unambiguous classifications. The exact definition is

Intuitively the first measure adjusts the borders between classes, and the second measure fits (the centers of) the fuzzy sets to the data.

It might be worth mentioning, that the original NEFCLASS learning algorithm cannot guarantee to minimize either of the (unmodified) error measures. The heuristic learning rules try to decrease the error on a per pattern basis. However, in some cases this can even increase the total error. The new algorithm can also (like most nonlinear optimization algorithms) not guarantee to reach a global minimum, but it ensures that the total error does not increase and thus settles at least in a local minimum. The need for sophisticated search algorithms is not so apparent, as the 2nd learning phase of NEFCLASS does only fine-tune the fuzzy sets. This may explain why the relatively simple algorithm produces quite satisfactory results.

Mon Nov 29 17:03:10 MET 1999