Mining Functional Dependencies from Fuzzy Relational Databases ShyueLiang Wang, JennShing Tsai and TzungPei Hong Department of Information Management, IShou University Kaohsiung, 84008, Taiwan, ROC +88676563711 {slwang, m873202m, tphong}@csa500.isu.edu.tw ¡@ 
ABSTRACT Keywords ¡@ 
1. INTRODUCTION Fuzzy relational databases to accommodate imprecise, uncertain or incomplete information have been studied extensively in recent years [1,3,5,12,18,19,21]. To introduce fuzziness into the relational data model, two major approaches have been proposed: model through similarity/proximity relations [5,12,18] and model through possibility distribution [1,3,12,19,21]. A number of related issues such as functional dependencies, security, implementation considerations and others have also been investigated. Functional dependency is a major issue in the design of relational databases. For a number of reasons, (mainly the risk for inconsistency when updates are performed), redundancy is not convenient in a database and dependencies have emerged since they are capable of capturing some semantics of the data strongly connected with the occurrence of redundancy in a databases. Informally, a FD X Y is a property valid on a relation, stating that tuples with the same value on a set X of attributes have the same value on a set Y of attributes. In fact, it is possible to reduce (and sometimes to remove) this form of redundancy from a database if its schema is chosen (or designed) appropriately. Many extensions of FD on fuzzy databases have been proposed: the fuzzy functional dependency framework of Raju and Majumdar [16], the probabilistic dependency framework of Haux and Eckert [9], and cluster dependency framework of Saharia and Barron [17], etc. In [4], Bosc etc. discuss problems with previous extensions of FD's in the presence of imprecise data. A semantic based approach is then proposed, where no threshold appears and a usual FD is extended in the sense that the functional feature is preserved both at the level of representations and values. The discoveries of functional dependencies from relations have received considerable interests recently. Several methods of mining FDs on crisp relational data model have been presented by Huhtala [10], Mannila [11], Missaoui & Godin [13], Savnik & Flach [14], Schlimmer [15], Bell & Blockhausen [2], etc. However, approaches on the discovery of FD's on fuzzy relational databases have not been discussed. In this work, we propose a mining technique based on levelwise search for discovering FDs defined by [4] on fuzzy relational databases. The rest of our paper is organized as follows. In section 2, we review several extensions of FD on fuzzy relational databases. In section 3, we introduce an approach of calculating the dependency between attributes in a possibilitybased data model. In section 4, we present a mining algorithm for discovering all possible nontrivial and minimal FD's in a possibilitybased database. Finally, a brief conclusion is given in section 5. ¡@ 
2. EXTENSIONS OF FD ON FUZZY
DATABASES Integrity constraints on data in database management systems reflect properties of real world. There are various types of integrity constraints among which a particular type came out along with relational data model, namely functional dependency. The interest for FD is motivated by the fact that redundancy can be captured by FDs. In fact, it is possible to reduce (or remove) the redundancies from a database if its schema is chosen properly. In extending relational databases with fuzzy set theory, many extensions of FD have been proposed: the fuzzy functional dependency framework of Raju and Majumdar [16], the probabilistic dependency framework of Haux and Eckert [9], and cluster dependency framework of Saharia and Barron [17], etc. However, it should be noticed that: (1) the term fuzzy functional dependency (FFD) is unclear since it applies to very different contexts and the term FFD is not really informative, (2) the semantics and objectives of the proposed definitions remain somewhat unclear, especially with respect to the concern of redundancy. In the following, we review two families of approaches that attribute values are imperfectly known (possibilitybased) and examine the possible meaning of the notion of FD. The classification is based on the nature of comparison between two illknown values. In one approach the comparison is made in terms of representations, i.e., the result is a degree to which the two underlying fuzzy sets are equal. For example, in [4], the definition of a socalled FFD X~>Y is defined as : where denotes RescherGaines implication (a b = 1 if , 0 otherwise) and µ_{EQ}(a, b) = min_{u} 1  . The interpretation of this FFD is: "the more resemblant the Xrepresentations, the more resemblant the Yrepresentations should be". However, this type of dependency remains essentially syntactic. In particular, the FFD may contain tuples whose Xrepresentations share some more or less possible values while they do not share a single value in the Yrepresentations. For example, if t_{1}.X={1/French, 0.8/English}, t_{2}.X={1/Chinese, 0.6/English}, t_{1}.Y=100 and t_{2}.Y=200, we have µ_{EQ}(t_{1}.X, t_{2}.X)= 0 = µ_{EQ}(t_{1}.Y, t_{2}.Y), and then the implication is satisfied whereas tuples may have the same Xvalue and totally different values on Y. In the other approach the comparison is made in terms of values, i.e., its result is degree of possibility of the equality between two illknown values. For example, in [6], the definition of a FFD is defined as : where denotes Godel implication (a b = 1 if , b otherwise) and µ_{EQ}(a, b) = Sup_{u} min(). The interpretation of this FFD is: "when two tuples have the same value (or representation) on X, they should have the same value (or representation) on Y due to the if part of the definition". In addition, the decomposition theorem [7] is valid. However, the critical threshold value is undecided and remains arbitrary to the database design. To overcome the problems above, Bosc etc [4] proposed an FD where no threshold appears and a usual FD is extended in that the functional feature is preserved at both the levels of representations and values. The FD is defined as : where denotes RescherGaines implication (a b = 1 if , 0 otherwise) and µ_{EQ}(a, b) = Sup_{u} min(). The interpretation of such an FD is: "tuples which have the same representation on X should have the same representation on Y and tuples are at least as possibly equal on Y than they are possibly equal on X". The decomposition theorem is also valid on this FD. In the following sections, we will present an approach for mining this type of FD on possibilitybased fuzzy relational databases. ¡@ 
3. CALCULATION OF DEPENDENCIES In this section, we will describe a method for the determination of dependency based on equation (1) between a pair of attributes XA_{R}, where X R and A_{R}R for a given relational schema R. For cases that A_{R}R, they can be reduced to current situation. Notice that the domains of attributes in R contain possibilitybased values. Given a relation r on a relational schema , where , are fuzzy sets, a functional dependency holds over r can be defined as: (1) where denotes RescherGaines implication (a b = 1 if , 0 otherwise) and µ_{EQ}(a, b) = Sup_{u} min(). For a
possibilitybased fuzzy relation as Table 1, it can be checked that the FD Size
Value
holds according to formula above. Table 1 A Possibilitybased Fuzzy Relation The determination of the functional dependency can be calculated as follows:
In the following, we will use Table 1 for illustration. In step I, we calculate for attributes X = Size
and A_{R} = Value and the results are shown as follows. Since for every i, j, the else part of the dependency in (1) is true. In step III, since , we have to check whether t_{1}.Size is equal to t_{2}.Size and t_{1}.Value is equal to t_{2}.Value. As t_{1}.Size t_{2}.Size, the condition part of (1) is not satisfied. Similarly, since, we check equality between t_{3}.Size and t_{4}.Size as well as t_{3}.Value and t_{4}.Value. As this t_{3}.Size = t_{4}.Size and t_{3}.Value = t_{4}.Value, the If part is satisfied. Therefore, the FD Size Value holds. Consequently, the fuzzy relation in Table 1 can be decomposed into two relations AS(A#, Size), SV(Size, Value), and some redundancy has been removed. For cases that X2, i.e., X contains more than one attribute, we let and check in step II. ¡@ 
4. MINING ALGORITHM In this section, we describe a levelwise algorithm for searching all possible minimal nontrivial functional dependencies in a given relation r defined on relational schema R. A functional dependency X A_{R} is minimal (in R) if A_{R} is not functional dependent on any proper subset of X, i.e., if Y A_{R} does not hold in r for any Y X. The dependency X A_{R} is trivial if A_{R}X. Our task is to find the following: given a relation r, find all minimal nontrivial functional dependencies that hold in r. Since for a given pair of attributes X and A_{R}, the computation of the FD X A_{R} is given in section 3, we will emphasize on finding the possible left hand side and corresponding right hand side candidates in this section. The collection of all possible lefthand sides of dependencies is the collection of all attribute sets. They constitute a set containment lattice such as in Figure 2 [10,11]. The levelwise algorithm starts the search from the singleton sets, and work its way through the lattice level by level until the minimal dependencies that hold are found. The level is the collection of attribute sets of size in a set containment lattice. For example, L_{0} = {Ø }, and L_{1} = {{A}, {B}, {C}, {D}} in Figure. 2. They contain all the possible left hand side candidates of a potential FD. During this levelwise search, false dependencies are eliminated as early as possible with some pruning techniques.
The possible righthand sides consist of single attribute that can be obtained with a single breadthfirst or levelwise pass through the lattice. For example, scanning the edges between nodes {A} and {{AB}, {AC}, {AD}}, we can obtain A's right hand side candidates as B, C, and D, respectively. In addition, there is a onetoone correspondence between the edges of the lattice and the nontrivial dependencies: the edge between sets X and X {A_{R}} is viewed as a nontrivial dependency X A_{R}. For example, the edge between nodes {A} and {AB} denotes the possible dependency of A B and the edge between nodes {AB} and {ABC} denotes the possible dependency of AB C, etc. To find all valid minimal nontrivial dependencies, we search the set containment lattice in a levelwise manner. The validity test is to determine that an FD exists for a given pair of attributes. This can be done by the method proposed in section 3. To test minimality of X\{A_{R}} A_{R}, we need to know whether Y\{A_{R}} A_{R}, holds for some proper subset Y of X. The information is stored in the set C^{+}(X), the righthand side candidates of X [10], where C^{+}(X) = { A R  for all B X, X \ {A,B} B does not hold}. The levelwise search starts with L_{1} = {{A}  A R}, and compute L_{2} from L_{1}, and L_{3} from L_{2}, and so on. According to these, we obtain the following algorithm where the pruning and next level generation procedures can be found in Huhtala [10]. Algorithm: levelwise search of functional dependencies.
The efficiency of the levelwise algorithm is based on reducing the computation of each level by using results from previous levels. We do not need to compute the more or less equality from scratch for every set of attributes. In fact, the more or less equality of any level in the lattice can be obtained from level L_{1}. In addition, if we find that X A holds, then Y A is not minimal for any superset Y of X. Thus we can delete all supersets of X from consideration. ¡@ 
5. CONCLUSION We have presented an approach for finding all possible minimal nontrivial functional dependencies from possibilitybased fuzzy relational databases. Many extensions of functional dependency on fuzzy databases have been studied. An extension that considers both representations and semantics of functional dependency defined by Bosc et al is adopted here. However only conceptual viewpoints are given without algorithms of discovering the FD's from the database in prior research. A levelwise searching algorithm based on Mannila et al is proposed here as an effective technique of discovering all possible FD's. As the dependencies discovered are both representational and semantic, database schema design and query optimization can be directly benefited. Comparisons with other mining techniques of FD's and applications to other types of fuzzy databases should be carefully looked into. ¡@ 
6. REFERENCES
