ABSTRACT

Most of conventional data mining algorithms identify the relation among transactions with binary values. Transactions with quantitative values are, however, commonly seen in real world applications. This paper thus attempts to propose a new data-mining algorithm to enhance the capability of exploring interesting knowledge from the transactions with quantitative values. The proposed algorithm integrates the fuzzy set concepts and the AprioriTid mining algorithm to find interesting fuzzy association rules from given transaction data. The database needs to be scanned only in the first pass to calculate the support of the items. Experiments on students' grades in I-Shou University are also made to verify the performance of the proposed algorithm.

Keywords

Data mining, fuzzy set, association rule, quantitative value

1. INTRODUCTION

Recently, fuzzy set theory is more and more frequently used in intelligent systems, because of its simplicity and similarity to human reasoning [14]. Several fuzzy learning algorithms for inducing rules from a set of given data were designed and caused good effects on specific domains [5, 6, 8, 9, 12, 13, 17]. Some strategies based on decision tree [7] were proposed in [15, 18, 19]. Wang et al. also proposed a fuzzy version space learning strategy for managing vague information [17].

This paper integrates the fuzzy-set concepts and the AprioriTid mining algorithm to find interesting itemsets and fuzzy association rules from transaction data with quantitative values. A fuzzy data mining algorithm is proposed, which is specially capable of transforming quantitative values in transactions into linguistic terms, then filtering them, and finding association rules by modifying the AprioriTid mining algorithm [4] .

2. REVIEW OF RELATED ALGORITHMS

In the past, Agrawal and his co-workers proposed several mining algorithms based on the concept of large itemsets to find association rules from transactions [1, 2, 3, 4]. They decomposed the mining process into two phases. In the first phase, candidate itemsets are generated and counted by scanning the transactions. If the number of an itemset appearing in the transactions is larger than a pre-defined threshold value (called minimum support), the itemset is thought of as a large itemset. Itemsets with only one item are first processed. The large itemsets with one item are then combined to form candidate itemsets of two items. This process is repeated until all large itemsets are found. In the second phase, the desired association rules are induced from the large itemsets found in the first phase. All the possible combination ways of association rules for each large itemset are formed, and the ones with their calculated confidence values larger than a predefined threshold (called minimum confidence) are output as desired association rules.

In addition to proposing methods for mining association rules from transactions of binary values, Agrawal et al also proposed a method [16] to mine association rules from those with quantitative and categorical attributes. Their proposed method first determines the number of partitions for each quantitative attribute, and then maps all possible values of each attribute into a set of consecutive integers. It then finds the large itemsets whose support values are greater than the user-specified minimum support. These large itemsets are then processed to generate association rules, and the interesting rules are output from the viewpoint of users.

3. FUZZY DATA-MINING ALGORITHM

In this section, the fuzzy concepts are used in the AprioriTid data-mining algorithm to discover useful association rules from quantitative values. The proposed fuzzy mining algorithm first transforms each quantitative value into a fuzzy set with linguistic terms using membership functions. The algorithm then calculates the scalar cardinality of each linguistic term on all the transaction data using temporary sets. The mining process based on fuzzy counts is then performed to find fuzzy association rules. The detail of the proposed mining algorithm is described as follows.

The Algorithm:

INPUT: A body of n transaction data, each with m attribute values, a set of membership functions, a predefined minimum support value, and a predefined confidence value .

OUTPUT: A set of fuzzy associate rules.

STEP 1: For each transaction data D⁽ⁱ⁾, i=1 to n, and for each attribute A_j, j=1 to m, transfer the quantitative value into a fuzzy set represented as using the given membership functions, where R_jk is the k-th fuzzy region of attribute A_j, is ’s fuzzy membership value in region R_jk, and l (=) is the number of fuzzy regions for A_j.

STEP 2: Build a temporary set and store the pairs () in the set , .

STEP 3: For each region R_jk kept in , calculate its scalar cardinality on the transactions using :

count_jk .

STEP 4: For each R_jk kept in , j=1 to m, , check whether its count_jk is larger than or equal to the predefined minimum support value. If R_jk satisfies the above condition, put it in the set of large 1-itemsets (L₁). That is:

L₁ = .

STEP 5: Set r=1, where r is used to represent the number of items kept in the current large itemsets.

STEP 6: Generate the candidate set C_r+1 from L_r. Restated, the algorithm joins L_r and L_r under the condition that each R_jk, j=1 to m, 1k, doesn’t belong to the same attribute A_J and r-1 items in the two itemsets are the same and the other one is different. It then keeps in C_r+1 the itemsets which have all their sub-itemsets of r items existing in L_r.

STEP 7: Build an empty temporary set .

STEP 8: For each newly formed (r+1)-itemset s with items in C_r+1, do the following substeps:

(a) For each transaction data D⁽ⁱ⁾, calculate its fuzzy value on s asusing, where is the fuzzy membership value of D⁽ⁱ⁾ in region s_j. If the minimum operator is used for the intersection, then.

(b) Store the pairs (s, ), in .

(d) If count_s is larger than or equal to the predefined minimum support value, put s in L_r+1.

STEP 9: IF L_r+1 is null, then do the next step; otherwise, set r=r+1 and repeat STEPs 6 to 9.

STEP 10: For each large q-itemset s with items, q2, construct the associate rules by the following substeps:

(a) Form each possible association rule as follows:

, k=1 to q.

(b) Calculate the confidence value of the above association rule as:

STEP 11: Output the rules with their confidence values larger than or equal to the predefined confidence threshold .

After STEP 11, the rules constructed are output and can act as the meta-knowledge for the given transactions.

4. EXPERIMENTAL RESULTS

The students’ score data in the Department of Information Management in I-Shou University, Taiwan, were used to show the feasibility of the proposed mining algorithm. Totally 260 transactions were included in the data set. Each transaction consisted of the scores that a student had got. Execution of the mining algorithm was implemented on a Pentium-PC.

Experiments were made to compare the accuracy of the proposed fuzzy mining algorithm and the crisp-partition mining method in which the possible values of each attribute were partitioned in a crisp fashion and a traditional mining algorithm was used to mine association rules. The comparison of accuracy for the minimum support value set at 40 is shown in Figure 1.

Figure 1: The comparison of the accuracy of two mining algorithms.

From Figure 1, it is easily seen that the accuracy of the proposed fuzzy mining algorithm was higher than that of the crisp partition method for various minimum confidence values.

5. CONCLUSION AND FUTURE WORK

In this paper, we have proposed a generalized data-mining algorithm to process transaction data with quantitative values and discover interesting patterns among them. Experimental results on the students’ scores in the Department of Information Management in I-Shou University, Taiwan, shows the feasibility of the proposed mining algorithm.

Although the proposed method works well in data mining for quantitative values, it is just a beginning. There is still much work to be done in this field. Our method assumes that the membership functions are known in advance. In [10, 11], some fuzzy learning methods were proposed to automatically derive the membership functions. In the future, we will attempt to dynamically adjust the membership functions in the proposed mining algorithm to avoid the bottleneck of the acquisition of membership functions. We will also attempt to design different data-mining models for different problem domains.

6. REFERENCES

[1] Agrawal, R. Imielinksi, T. and Swami, A. Mining association rules between sets of items in large database. The 1993 ACM SIGMOD Conference, Washington DC, USA, (1993).

[2] Agrawal, R. Imielinksi, T. and Swami, A. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, (1993), 914-925.

[3] Agrawal, R. Srikant, R. and Vu, Q. Mining association rules with item constraints. The Third International Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, (August 1997).

[4] Agrawal, R. and Srikant, R. Fast algorithm for mining association rules. The International Conference on Very Large Data Bases, (1994), 487-499.

[5] Blishun, A. F. Fuzzy learning models in expert systems. Fuzzy Sets and Systems, Vol. 22, (1987), 57-70.

[6] Campos, L. M. and Moral, S. Learning rules for a fuzzy inference model. Fuzzy Sets and Systems, Vol. 59, (1993), 247-257.

[7] Clair, C. Liu, C. and Pissinou, N. Attribute weighting: a method of applying domain knowledge in the decision tree process. The Seventh International Conference on Information and Knowledge Management, (1998), 259-266.

[8] Delgado, M. and Gonzalez, A. An inductive learning procedure to identify fuzzy systems. Fuzzy Sets and Systems, Vol. 55, (1993), 121-132.

[9] Gonzalez, A. A learning methodology in uncertain and imprecise environments. International Journal of Intelligent Systems, Vol. 10, (1995), 57-371.

[10] Hong, T. P. and Chen, J. B. Finding relevant attributes and membership functions. Fuzzy Sets and Systems, Vol.103, No. 3, (1999), 389-404.

[11] Hong, T. P. and Chen, J. B. Processing individual fuzzy attributes for fuzzy rule induction. accepted and to appear in Fuzzy Sets and Systems.

[12] Hong, T. P. and Lee, C. Y. Induction of fuzzy rules and membership functions from training examples. Fuzzy Sets and Systems, Vol. 84, (1996), 33-47.

[13] Hong, T. P. and Tseng, S. S. A generalized version space learning algorithm for noisy and uncertain data. IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, (1997), 336-340.

[14] Kandel, A. Fuzzy Expert Systems, CRC Press, Boca Raton, (1992), 8-19.

[15] Quinlan, J. R. Decision tree as probabilistic classifier. The Fourth International Machine Learning Workshop, Morgan Kaufmann, San Mateo, CA, (1987), 31-37.

[16] Srikant, R. and Agrawal, R. Mining quantitative association rules in large relational tables. The 1996 ACM SIGMOD International Conference on Management of Data, Monreal, Canada, (June 1996), 1-12.

[17] Wang, C. H. Hong, T. P. and Tseng, S. S. Inductive learning from fuzzy examples. The fifth IEEE International Conference on Fuzzy Systems, New Orleans, (1996), 13-18.

[18] Wang, C. H. Liu, J. F. Hong, T. P. and Tseng, S. S. FILSMR:A fuzzy inductive learning strategy for modular rules. The Sixth IEEE International Conference on Fuzzy Systems, (1997), 1289-1294.