Mining Fuzzy Rules from
Quantitative Data
Based on the AprioriTid
Algorithm
Tzung-Pei Hong1, Chan-Sheng Kuo2, Sheng-Chai Chi2 and Shyue-Liang Wang1
1Department of Information Management, I-Shou
University
2Graduate
School of Management Science, I-Shou University
Kaohsiung, 84008, Taiwan, R.O.C.
{tphong, m861008m, scchi,
slwang}@csa500.isu.edu.tw
ABSTRACT
Most
of conventional data mining algorithms identify the relation among transactions
with binary values. Transactions with quantitative values are, however,
commonly seen in real world applications. This paper thus attempts to propose a
new data-mining algorithm to enhance the capability of exploring interesting
knowledge from the transactions with quantitative values. The proposed
algorithm integrates the fuzzy set concepts and the AprioriTid mining algorithm
to find interesting fuzzy association rules from given transaction data. The
database needs to be scanned only in the first pass to calculate the support of
the items. Experiments on students' grades in I-Shou University are also made
to verify the performance of the proposed algorithm.
Keywords
Data mining, fuzzy set, association rule, quantitative
value
Recently, fuzzy
set theory is more and more frequently used in intelligent systems, because of
its simplicity and similarity to human reasoning [14]. Several fuzzy learning
algorithms for inducing rules from a set of given data were designed and caused
good effects on specific domains [5, 6, 8, 9, 12, 13, 17]. Some strategies
based on decision tree [7] were proposed in [15, 18, 19]. Wang et al. also
proposed a fuzzy version space learning strategy for managing vague information
[17].
This
paper integrates the fuzzy-set concepts and the AprioriTid mining algorithm to
find interesting itemsets and fuzzy association rules from transaction data
with quantitative values. A fuzzy data mining algorithm is proposed, which is
specially capable of transforming quantitative values in transactions into
linguistic terms, then filtering them, and finding association rules by modifying
the AprioriTid mining algorithm [4] .
In the past,
Agrawal and his co-workers proposed several mining algorithms based on the
concept of large itemsets to find association rules from transactions [1, 2, 3,
4]. They decomposed the mining process into two phases. In the first phase,
candidate itemsets are generated and counted by scanning the transactions. If
the number of an itemset appearing in the transactions is larger than a
pre-defined threshold value (called minimum support), the itemset is thought of
as a large itemset. Itemsets with only one item are first processed. The large
itemsets with one item are then combined to form candidate itemsets of two
items. This process is repeated until all large itemsets are found. In the
second phase, the desired association rules are induced from the large itemsets
found in the first phase. All the possible combination ways of association
rules for each large itemset are formed, and the ones with their calculated confidence
values larger than a predefined threshold (called minimum confidence) are
output as desired association rules.
In
addition to proposing methods for mining association rules from transactions of
binary values, Agrawal et al also proposed a method [16] to mine association
rules from those with quantitative and categorical attributes. Their proposed
method first determines the number of partitions for each quantitative
attribute, and then maps all possible values of each attribute into a set of
consecutive integers. It then finds the large itemsets whose support values are
greater than the user-specified minimum support. These large itemsets are then
processed to generate association rules, and the interesting rules are output
from the viewpoint of users.
In this section,
the fuzzy concepts are used in the AprioriTid data-mining algorithm to discover
useful association rules from quantitative values. The proposed fuzzy mining
algorithm first transforms each quantitative value into a fuzzy set with
linguistic terms using membership functions. The algorithm then calculates the
scalar cardinality of each linguistic term on all the transaction data using
temporary sets. The mining process based on fuzzy counts is then performed to
find fuzzy association rules. The detail of the proposed mining algorithm is
described as follows.
The Algorithm:
INPUT: A body of n transaction data, each with m attribute values, a set of membership
functions, a predefined minimum support value
, and a predefined confidence value
.
OUTPUT: A set of fuzzy associate rules.
STEP 1: For each
transaction data D(i), i=1 to n, and for each attribute Aj, j=1 to m, transfer the quantitative value
into a fuzzy
set
represented as
using the given
membership functions, where Rjk
is the k-th fuzzy region of attribute
Aj,
is
¡¦s fuzzy membership value in region Rjk, and l (=
) is the number of fuzzy regions for Aj.
STEP 2: Build a temporary
set
and store the
pairs (
) in the set
, ![]()
.
STEP 3: For each region Rjk kept in
, calculate its scalar cardinality on the transactions using
:
countjk
.
STEP 4: For each Rjk kept in
, j=1 to m,
, check whether its countjk
is larger than or equal to the predefined minimum support value
. If Rjk
satisfies the above condition, put it in the set of large 1-itemsets (L1). That is:
L1 =
.
STEP 5: Set r=1,
where r is used to represent the
number of items kept in the current large itemsets.
STEP 6: Generate the candidate set Cr+1 from Lr. Restated, the algorithm
joins Lr and Lr under the condition that
each Rjk, j=1 to m, 1
k![]()
, doesn¡¦t belong to the same attribute AJ and r-1 items in the two itemsets are the same and the other one is
different. It then keeps in Cr+1
the itemsets which have all their sub-itemsets of r items existing in Lr.
STEP 7: Build an empty temporary set
.
STEP 8: For each newly formed (r+1)-itemset s with items
in Cr+1, do the following
substeps:
(a) For each
transaction data D(i),
calculate its fuzzy value on s as
using
, where
is the fuzzy membership value of D(i) in region sj.
If the minimum operator is used for the intersection, then
.
(b) Store the pairs (s,
),
in
.
(c) Set counts =
using
.
(d) If counts is larger than or
equal to the predefined minimum support value
, put s in Lr+1.
STEP 9: IF Lr+1
is null, then do the next step; otherwise, set r=r+1 and repeat STEPs 6
to 9.
STEP 10: For each large q-itemset s with items
, q
2, construct the associate rules by the following substeps:
(a) Form each possible
association rule as follows:
, k=1 to q.
(b) Calculate the
confidence value of the above association rule as:
.
STEP 11: Output the rules
with their confidence values larger than or equal to the predefined confidence
threshold
.
After
STEP 11, the rules constructed are output and can act as the meta-knowledge for
the given transactions.
The students¡¦
score data in the Department of Information Management in I-Shou University,
Taiwan, were used to show the feasibility of the proposed mining algorithm.
Totally 260 transactions were included in the data set. Each transaction
consisted of the scores that a student had got. Execution of the mining algorithm
was implemented on a Pentium-PC.
Experiments were made to compare the accuracy of the
proposed fuzzy mining algorithm and the crisp-partition mining method in which
the possible values of each attribute were partitioned in a crisp fashion and a
traditional mining algorithm was used to mine association rules. The comparison
of accuracy for the minimum support value set at 40 is shown in Figure 1.

Figure 1:
The comparison of the accuracy of two mining algorithms.
From Figure 1, it
is easily seen that the accuracy of the proposed fuzzy mining algorithm was
higher than that of the crisp partition method for various minimum confidence
values.
In this paper, we have proposed a generalized data-mining algorithm to
process transaction data with quantitative values and discover interesting
patterns among them. Experimental results on the students¡¦ scores in the
Department of Information Management in I-Shou University, Taiwan, shows the
feasibility of the proposed mining algorithm.
Although the proposed method works well in data
mining for quantitative values, it is just a beginning. There is still much
work to be done in this field. Our method assumes that the membership functions
are known in advance. In [10, 11], some fuzzy learning methods were proposed to
automatically derive the membership functions. In the future, we will attempt
to dynamically adjust the membership functions in the proposed mining algorithm
to avoid the bottleneck of the acquisition of membership functions. We will
also attempt to design different data-mining models for different problem
domains.
[1] Agrawal, R. Imielinksi, T.
and Swami, A. Mining association rules between sets of items in large database.
The 1993 ACM SIGMOD Conference, Washington
DC, USA, (1993).
[2] Agrawal, R.
Imielinksi, T. and Swami, A. Database mining: a performance perspective. IEEE
Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, (1993), 914-925.
[3]
Agrawal, R. Srikant, R.
and Vu, Q. Mining association rules with item constraints. The Third
International Conference on Knowledge Discovery in Databases and Data Mining,
Newport Beach, California, (August 1997).
[4] Agrawal, R.
and Srikant, R. Fast algorithm for mining association rules. The International
Conference on Very Large Data Bases, (1994), 487-499.
[5] Blishun, A. F.
Fuzzy learning models in expert systems. Fuzzy Sets and Systems, Vol. 22, (1987), 57-70.
[6] Campos, L. M.
and Moral, S. Learning rules for a fuzzy inference model. Fuzzy Sets and
Systems, Vol. 59, (1993), 247-257.
[7] Clair, C. Liu, C. and
Pissinou, N. Attribute weighting: a method of applying domain knowledge in the
decision tree process. The Seventh International
Conference on Information and Knowledge Management, (1998), 259-266.
[8] Delgado, M.
and Gonzalez, A. An inductive learning procedure to identify fuzzy systems.
Fuzzy Sets and Systems, Vol. 55, (1993), 121-132.
[9] Gonzalez, A. A learning
methodology in uncertain and imprecise environments. International Journal of
Intelligent Systems, Vol. 10, (1995), 57-371.
[10]
Hong, T. P. and Chen, J. B. Finding relevant attributes and membership
functions. Fuzzy Sets and Systems,
Vol.103, No. 3, (1999), 389-404.
[11]
Hong, T. P. and Chen, J. B. Processing individual fuzzy attributes for fuzzy rule
induction. accepted and to appear in Fuzzy Sets and Systems.
[12]
Hong, T. P. and Lee, C. Y. Induction of fuzzy rules and membership functions
from training examples. Fuzzy Sets and Systems, Vol. 84, (1996), 33-47.
[13] Hong, T. P.
and Tseng, S. S. A generalized version space learning algorithm for noisy and
uncertain data. IEEE Transactions on Knowledge and Data Engineering, Vol. 9,
No. 2, (1997), 336-340.
[14] Kandel, A.
Fuzzy Expert Systems, CRC Press, Boca Raton, (1992), 8-19.
[15] Quinlan, J.
R. Decision tree as probabilistic classifier. The Fourth International Machine
Learning Workshop, Morgan Kaufmann, San Mateo, CA, (1987), 31-37.
[16] Srikant, R. and Agrawal,
R. Mining quantitative association rules in large relational tables. The 1996
ACM SIGMOD International Conference on Management of Data, Monreal, Canada,
(June 1996), 1-12.
[17] Wang, C. H. Hong, T. P.
and Tseng, S. S. Inductive learning from fuzzy examples. The fifth IEEE
International Conference on Fuzzy Systems, New Orleans, (1996), 13-18.
[18] Wang, C. H. Liu, J. F.
Hong, T. P. and Tseng, S. S. FILSMR:A fuzzy inductive learning strategy for
modular rules. The Sixth IEEE International Conference on Fuzzy Systems,
(1997), 1289-1294.
[19] Yuan, Y.
Shaw, M. J. Induction of fuzzy decision trees. Fuzzy Sets and Systems, 69, (1995), 125-139.