The initial learning phase usually creates more rules than necessary. A smaller rule base is easier to understand and often performs better on unseen data. To get more compact rule bases pruning techniques have been integrated into the NEFCLASS system.
A basic set of the most important pruning techniques from NEFCLASS have been modified to incorporate the misclassification cost matrix. These are input pruning, rule merging and rule evaluation, which are normally (eventually repeatedly) applied in that order. The rule merging on the logical level does not use the underlying data and thus no changes were necessary .
The input pruning tries to find a discriminant subset of the inputs. By deleting inputs several rules can be projected onto one and thus the rule base consists of a smaller total number of merged rules. The entailed loss of accuracy was originally estimated used a measure based on minimum description length . This has been replaced by an estimation of the misclassification costs using the given matrix. This is done by determining which cases of the data will be classified by a merged rule. The consequent of the rule is chosen to minimize the costs given the misclassification cost matrix. The increase of the sum of the costs over the data is used as a measure to decide which inputs may be deleted.
The rule evaluation is normally used as a final clean up. As fuzzy rules partially overlap, some rules may be superfluous and can be deleted from the rule base.
To find a minimal set of rules that covers all data points a performance measure is specified. According to this measure a fraction of the rules is chosen as the new rule base. The implemented performance measure determines for every rule the increase in costs that would result from deletion of that single rule. Therefore every case in the data is propagated through the network and the two highest rule activations are determined. As NEFCLASS uses min-max inference, only deletion of the rule with the highest activation changes the (crisp) classification result and thus the change in classification costs (if the second highest rule was used) is added to the rule's performance. The more a rule contributes to classification the higher the aggregated performance will be. Low performance can result from rare activations of rules that are too specific or from rules that lie between classes and thus should be removed.