Copyright ACM, 2000

Mining Functional Dependencies from Fuzzy Relational Databases

Shyue-Liang Wang, Jenn-Shing Tsai and Tzung-Pei Hong

Department of Information Management, I-Shou University

Kaohsiung, 84008, Taiwan, ROC

+886-7-6563711

{slwang, m873202m, tphong}@csa500.isu.edu.tw

@

ABSTRACT
In this work, we present a method for mining functional dependencies from possibility-based fuzzy relational databases.  In addition to similarity-based fuzzy data model, possibility-based fuzzy data models have been proposed to represent imprecise, uncertain, and incomplete information.  Research on generalizing the notion of functional dependencies (FD) into that of fuzzy FD's on fuzzy databases has been undertaken in recent years.  However, their emphases are on the conceptual viewpoints and no algorithms are given.  A level-wise mining technique is adopted here for the search of all possible nontrivial minimal functional dependencies defined by Bosc [4].  As the dependencies discovered are both representational and semantic, database schema design and query optimization can be directly benefited.

Keywords
Data mining, functional dependency, fuzzy database

@

1. INTRODUCTION
Fuzzy relational databases to accommodate imprecise, uncertain or incomplete information have been studied extensively in recent years [1,3,5,12,18,19,21].   To introduce fuzziness into the relational data model, two major approaches have been proposed: model through similarity/proximity relations [5,12,18] and model through possibility distribution [1,3,12,19,21].  A number of related issues such as functional dependencies, security, implementation considerations and others have also been investigated.

Functional dependency is a major issue in the design of relational databasesFor a number of reasons, (mainly the risk for inconsistency when updates are performed), redundancy is not convenient in a database and dependencies have emerged since they are capable of capturing some semantics of the data strongly connected with the occurrence of redundancy in a databases.   Informally, a FD X Y is a property valid on a relation, stating that tuples with the same value on a set X of attributes have the same value on a set Y of attributes.  In fact, it is possible to reduce (and sometimes to remove) this form of redundancy from a database if its schema is chosen (or designed) appropriately.

Many extensions of FD on fuzzy databases have been proposed: the fuzzy functional dependency framework of Raju and Majumdar [16], the probabilistic dependency framework of Haux and Eckert [9], and cluster dependency framework of Saharia and Barron [17], etc.  In [4], Bosc etc. discuss problems with previous extensions of FD's in the presence of imprecise data. A semantic based approach is then proposed, where no threshold appears and a usual FD is extended in the sense that the functional feature is preserved both at the level of representations and values.

The discoveries of functional dependencies from relations have received considerable interests recently. Several methods of mining FDs on crisp relational data model have been presented by Huhtala [10], Mannila [11], Missaoui & Godin [13], Savnik & Flach [14], Schlimmer [15], Bell & Blockhausen [2], etc. However, approaches on the discovery of FD's on fuzzy relational databases have not been discussed.  In this work, we propose a mining technique based on level-wise search for discovering FDs defined by [4] on fuzzy relational databases.

The rest of our paper is organized as follows.  In section 2, we review several extensions of FD on fuzzy relational databases.  In section 3, we introduce an approach of calculating the dependency between attributes in a possibility-based data model.  In section 4, we present a mining algorithm for discovering all possible non-trivial and minimal FD's in a possibility-based database.   Finally, a brief conclusion is given in section 5.

@

2. EXTENSIONS OF FD ON FUZZY DATABASES
Integrity constraints on data in database management systems reflect properties of real world.  There are various types of integrity constraints among which a particular type came out along with relational data model, namely functional dependency.  The interest for FD is motivated by the fact that redundancy can be captured by FDs.  In fact, it is possible to reduce (or remove) the redundancies from a database if its schema is chosen properly.

In extending relational databases with fuzzy set theory, many extensions of FD have been proposed: the fuzzy functional dependency framework of Raju and Majumdar [16], the probabilistic dependency framework of Haux and Eckert [9], and cluster dependency framework of Saharia and Barron [17], etc. However, it should be noticed that: (1) the term fuzzy functional dependency (FFD) is unclear since it applies to very different contexts and the term FFD is not really informative, (2) the semantics and objectives of the proposed definitions remain somewhat unclear, especially with respect to the concern of redundancy.

In the following, we review two families of approaches that attribute values are imperfectly known (possibility-based) and examine the possible meaning of the notion of FD.  The classification is based on the nature of comparison between two ill-known values.  In one approach the comparison is made in terms of representations, i.e., the result is a degree to which the two underlying fuzzy sets are equal.  For example, in [4], the definition of a so-called FFD X~>Y is defined as :

where denotes Rescher-Gaines implication (a b = 1 if , 0 otherwise) and µEQ(a, b) = minu 1 - ||.

The interpretation of this FFD is: "the more resemblant the X-representations, the more resemblant the Y-representations should be".  However, this type of dependency remains essentially syntactic.  In particular, the FFD may contain tuples whose X-representations share some more or less possible values while they do not share a single value in the Y-representations.  For example, if t1.X={1/French, 0.8/English}, t2.X={1/Chinese, 0.6/English}, t1.Y=100 and t2.Y=200, we have µEQ(t1.X, t2.X)= 0 = µEQ(t1.Y, t2.Y), and then the implication is satisfied whereas tuples may have the same X-value and totally different values on Y.

In the other approach the comparison is made in terms of values, i.e., its result is degree of possibility of the equality between two ill-known values.  For example, in [6], the definition of a FFD is defined as :

where denotes Godel implication (a b = 1 if , b otherwise) and µEQ(a, b) = Supu min().

The interpretation of this FFD is: "when two tuples have the same value (or representation) on X, they should have the same value (or representation) on Y due to the if part of the definition".   In addition, the decomposition theorem [7] is valid.  However, the critical threshold value is undecided and remains arbitrary to the database design.

To overcome the problems above, Bosc etc [4] proposed an FD where no threshold appears and a usual FD is extended in that the functional feature is preserved at both the levels of representations and values. The FD is defined as :

where denotes Rescher-Gaines implication (a b = 1 if , 0 otherwise) and µEQ(a, b) = Supu min().

The interpretation of such an FD is: "tuples which have the same representation on X should have the same representation on Y and tuples are at least as possibly equal on Y than they are possibly equal on X".  The decomposition theorem is also valid on this FD.

In the following sections, we will present an approach for mining this type of FD on possibility-based fuzzy relational databases.

@

3. CALCULATION OF DEPENDENCIES
In this section, we will describe a method for the determination of dependency based on equation (1) between a pair of attributes XAR, where X R and ARR for a given relational schema R.  For cases that ARR, they can be reduced to current situation.   Notice that the domains of attributes in R contain possibility-based values.

Given a relation r on a relational schema , where , are fuzzy sets, a functional dependency holds over r can be defined as:

(1)

where denotes Rescher-Gaines implication (a b = 1 if , 0 otherwise) and µEQ(a, b) = Supu min().

For a possibility-based fuzzy relation as Table 1, it can be checked that the FD Size Value holds according to formula above.

          Table 1 A Possibility-based Fuzzy Relation

The determination of the functional dependency can be calculated as follows:

  1. For every attribute Ai in X and AR, calculate between every two tuples i and j in r,
  2. Check for every i, j in r. If true, then the else part of the dependency in (1) is    satisfied,
  3. If step (2) is true and = 1 =, then check .   If true, then the functional dependency in (1) holds.

In the following, we will use Table 1 for illustration.

In step I, we calculate for attributes X = Size and AR = Value and the results are shown as follows.


In step II, we check by taking the difference between the two attributes as follows.

wpeD.jpg (4133 bytes)

Since for every i, j, the else part of the dependency in (1) is true.

In step III, since , we have to check whether t1.Size is equal to t2.Size and t1.Value is equal to t2.Value. As t1.Size t2.Size, the condition part of (1) is not satisfied. Similarly, since, we check equality between t3.Size and t4.Size as well as t3.Value and t4.Value. As this t3.Size = t4.Size and t3.Value = t4.Value, the If part is satisfied. Therefore, the FD Size Value holds.

Consequently, the fuzzy relation in Table 1 can be decomposed into two relations AS(A#, Size), SV(Size, Value), and some redundancy has been removed.

wpeE.jpg (11324 bytes)

For cases that |X|2, i.e., X contains more than one attribute, we let and check in step II.

@

4. MINING ALGORITHM
In this section, we describe a level-wise algorithm for searching all possible minimal non-trivial functional dependencies in a given relation r defined on relational schema R.

A functional dependency X AR is minimal (in R) if AR is not functional dependent on any proper subset of X, i.e., if Y AR does not hold in r for any Y X.  The dependency X AR is trivial if ARX.  Our task is to find the following: given a relation r, find all minimal non-trivial functional dependencies that hold in r.   Since for a given pair of attributes X and AR, the computation of the FD X AR is given in section 3, we will emphasize on finding the possible left hand side and corresponding right hand side candidates in this section.

The collection of all possible left-hand sides of dependencies is the collection of all attribute sets.   They constitute a set containment lattice such as in Figure 2 [10,11].  The level-wise algorithm starts the search from the singleton sets, and work its way through the lattice level by level until the minimal dependencies that hold are found.  The level is the collection of attribute sets of size in a set containment lattice. For example, L0 = {Ø }, and L1 = {{A}, {B}, {C}, {D}} in Figure. 2.  They contain all the possible left hand side candidates of a potential FD. During this level-wise search, false dependencies are eliminated as early as possible with some pruning techniques.

index8.jpg (8658 bytes)

Fig. 2 The set containment lattice for {A,B,C,D} representing the search space of all possible left-hand sides

The possible right-hand sides consist of single attribute that can be obtained with a single breadth-first or level-wise pass through the lattice.  For example, scanning the edges between nodes {A} and {{AB}, {AC}, {AD}}, we can obtain A's right hand side candidates as B, C, and D, respectively.  In addition, there is a one-to-one correspondence between the edges of the lattice and the non-trivial dependencies: the edge between sets X and X {AR} is viewed as a non-trivial dependency X AR.  For example, the edge between nodes {A} and {AB} denotes the possible dependency of A B and the edge between nodes {AB} and {ABC} denotes the possible dependency of AB C, etc.

To find all valid minimal non-trivial dependencies, we search the set containment lattice in a level-wise manner.   The validity test is to determine that an FD exists for a given pair of attributes.   This can be done by the method proposed in section 3.  To test minimality of X\{AR} AR, we need to know whether Y\{AR} AR, holds for some proper subset Y of X.  The information is stored in the set C+(X), the right-hand side candidates of X [10], where C+(X) = { A R | for all B X, X \ {A,B} B does not hold}.  The level-wise search starts with L1 = {{A} | A R}, and compute L2 from L1, and L3 from L2, and so on. According to these, we obtain the following algorithm where the pruning and next level generation procedures can be found in Huhtala [10].

Algorithm: level-wise search of functional dependencies.

  1. L0 :={Ø }
  2. C+(Ø ) := r
  3. L1 :={{A} | A R}
  4. := 1
  5. while Ø
  6. for very X, COMPUTE-RHS-CANDIDATES C+(X)
  7. PRUNE( )
  8. COMPUTE-DEPENDENCIES( )
  9. L+1 := GENERATE-NEXT-LEVEL( )
  10. := + 1

The efficiency of the level-wise algorithm is based on reducing the computation of each level by using results from previous levels.  We do not need to compute the more or less equality from scratch for every set of attributes.  In fact, the more or less equality of any level in the lattice can be obtained from level L1.   In addition, if we find that X A holds, then Y A is not minimal for any superset Y of X. Thus we can delete all supersets of X from consideration.

@

5. CONCLUSION
We have presented an approach for finding all possible minimal nontrivial functional dependencies from possibility-based fuzzy relational databases.   Many extensions of functional dependency on fuzzy databases have been studied. An extension that considers both representations and semantics of functional dependency defined by Bosc et al is adopted here.  However only conceptual viewpoints are given without algorithms of discovering the FD's from the database in prior research.  A level-wise searching algorithm based on Mannila et al is proposed here as an effective technique of discovering all possible FD's.  As the dependencies discovered are both representational and semantic, database schema design and query optimization can be directly benefited.  Comparisons with other mining techniques of FD's and applications to other types of fuzzy databases should be carefully looked into.

@

6. REFERENCES
  1. M. Anvari, and G.F. Rose, "Fuzzy relational databases", in Bezdek, Ed., Analysis of Fuzzy Information, Vol. II (CRC Press, Boca Raton, FL, 1987).
  2. S. Bell and P. Brockhausen, "Discovery of data dependencies in relational databases", Tech. Rep. LS-8, Report-14, University of Dortmund, Apr. 1995.
  3. P. Bosc, and O. Pivert, "Fuzzy querying in conventional databases", In Fuzzy Logic for the Management of Uncertainty, Zadeh, L. and Kacprzyk, J. Eds, John Wiley, New York, 1992, 645-671.
  4. P. Bosc, L. Lietard and O. Pivert, "Functional Dependencies Revisited Under Graduality and Imprecision", NAFIPS, 1997, 57-62.
  5. B.P. Buckles and F.E. Petry, "A fuzzy representation of data for relational databases", Fuzzy Sets and Systems, 7, 1982, 213-226.
  6. G.Q. Chen, "Fuzzy functional dependencies and a series of design issues of fuzzy relational databases", in Fuzziness in Database Management Systems, P. Bosc, J. Kacprzyk, eds, Physica Verlag, 166-185, 1995.
  7. C.J. Date, An Introduction to Database Systems, sixth edition, Addison-Wesley, 1995.
  8. W.J. Frawley, G. Piatesky-Shapiro, and C.J. Matheus, "Knowledge Discovery in Databases: An Overview", Knowledge Discovery in Databases, G. Piateskey-Shapiro and W.J. Frawley, eds, pp 1-27, AAAI/MIT Press, 1991.
  9. R. Haux and U. Eckert, "Nondeterministic dependencies in relations: an extension of the concept of functional dependency", Information Systems, 10 (2), 1985, 139-148.
  10. Y. Huhtala, J Karkkainen, P. Porkka, and H. Toivonen, "Efficient discovery of functional and approximate dependencies using partitions", Proceedings of IEEE International Conference on Data Engineering, 1998, 392-410.
  11. H. Mannila and H. Toivonen, "Levelwise search and borders of theories in knowledge discovery", Data Mining and Knowledge Discovery, 1 (3), 1997, 241-258.
  12. J.M. Medina, M.A. Vila, J.C. Cubero, and O. Pons, "Towards the implementation of a generalized fuzzy relational database model", Fuzzy Sets and Systems, 75, 1995, 273-289.
  13. R. Missaoui, R. Godin, "Search for concepts and dependencies in databases", in Rough Sets, Fuzzy Sets and Knowledge Discovery, W.P. Ziarko, editor, Springer-Verlag, 1994.
  14. I. Savnik, P. Flach, "Bottom-up induction of functional dependencies from relations", In Knowledge Discovery in Databases Workshop, G. Piatetsky-Shapiro, editor, paper from the 1993 AAAI Workshop, 174-185.
  15. J.C. Schlimmer, "Using learned dependencies to automatically construct sufficient and sensible editing views", In Knowledge Discovery in Databases Workshop, G. Piatetsky-Shapiro, editor, paper from the 1993 AAAI Workshop, 186-196.
  16. K.V.S.V.N. Raju and A.K. Majumdar, "Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems", ACM Transactions on Database Systems, 13 (2), 1988, 129-166.
  17. A.N. Saharia, T.M. Barron, "Approximate dependencies in database systems", Decision Support Systems, 13, 1995, 335-347.
  18. S. Shenoi, and A. Melton, "Proximity relations in the fuzzy relational database model", Fuzzy Sets and System, 31, 1989, 285-296.
  19. M. Umano, Freedom-O: A fuzzy database system, in Gupta-Sanchez, Ed., Fuzzy Information and Decision Processes (North-Holland, Amsterdam, 1982).
  20. L.A. Zadeh, "Similarity relations and fuzzy orderings", Info. Sci., vol 3, no. 1, Mar. 1971, 177-200.
  21. M. Zemankova-Leech and A. Kandel, Fuzzy Relational Databases - A Key to Expert Systems, Verlag TUV Rheinland, Cologne, 1985.

Copyright 2000 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or fee.
SAC 2000 March 19-21 Como, Italy
(c) 2000 ACM 1-58113-239-5/00/003>...>$5.00