Static Detection of Sources of Dynamic Anomalies in a Network of Referential Integrity Restrictions

The data manipulations considered are insertions, deletions or updates of one or several tuples; a manipulation involves only one kind of operation, it refers to an unique relation and entails a set of tuples that do not change during the execution of the manipulation. The specified constraints will be verified after each single tuple operation. The manipulation will be successful if all of its single tuple operations are carried out, otherwise it fails (is revoked). The above two examples present more than just one anomalous behavior. In the following sections, simplified subschemes of them are used to illustrate different problems.

3.1. Problems in Insert Operations

The effect of insertions is examined in this section.

Example 1-1: Consider the rir's and the database state depicted in Figure 1, excluding in this case I₁andI₂. The operation I inserts the tuple (5 4 4 d x) in the relation r₁. I may cause the insertion (via I₅ and I₆) of the tuple (d) in the relation r₃, or it may block the insert operation (via I₃and I₄). The result of I depends on the order in which the rir’s that involve R₁ are enforced: (i) if I₅is first considered, the tuple (d x) is inserted in r₄, this triggers the enforcing of I₆, which provokes the insertion of the tuple (d) in r₃, then I₃will cause the insertion of the tuple (4 d) in r₄;or (ii) if I₃ is taken into account in the first place, the tuple (4 d) is inserted in r₂, and this triggers the enforcing of I₄ which blocks the insert operation since the tuple (d) does not exist in the relation r₃.

Example 2-1: Consider the example in Figure 2 and the corresponding state of the database r = {r₁,r₂}. If a₁: r and a₂: c and the operation I inserts the tuple (5 6 6 v) in the relation r₁, it may trigger the insertion (via I₂) of the tuple (6 l) in r₂, or the insert operation may be blocked (via I₁). The result of I depends on the order in which the rir’s that involve a R₁are applied: (i) if I₂is enforced in first place then the tuple (5 6 6 v) is inserted in r₁ and the tuple (6 l) in r₂; or (ii) if I₁is first taken into account, I is blocked since there is not a value 6 for PERSON-ID in R₂. If a₁: n and a₂: c and it is required that PROD# and DIR# foreign keys associated with R₁ can not hold null values, the set null modality for the insert operation becomes restricted, showing a behavior similar to the previous.

3.2. Problems in Delete Operations

A deletion D may trigger the typical actions defined by the strategy, triggering other operations or, by the contrary, blocking the manipulation. The outcome of D, may be unpredictable when the enforcement of the rir's promoted by D implies the trigger of updates or deletions of tuples, that in turn can blockade D if another path of G is first considered.

Example 1-2: Consider the Example 1, excluding in this case I₁andI₂. Suppose that D involves the tuple (a) of relation r₃. The tuple (4 a) of r₂ can block D via I₄, while D can trigger the deletion of that tuple via I₆, I₅, and I₁. The result of D depends on the order of the enforcement of the rir’s that involve R₃: (i) if I₆is verified in the first time, then (a x) is deleted from r₄, then I₅ is enforced, triggering the deletion of tuples (1 4 x a) and (4 l x a) of r₁; finally I₁ promotes the suppression of tuple (4 a) of r₂; on the other side (ii) if I₃is first enforced, D is blocked by tuple (4 a) of r₂.

Example 2-2: Consider Example 2; with b₁: r, b₂: c; and r={r₁’, r₂}. If D involves the tuple (2 yy) of r₂, the tuple (3 2 2 s) of r₁’ may block D via I₁, or may trigger the deletion of this tuple via I₂. The outcome of the operation depends on the order of rir’s involving R₂ enforcement: (i) If I₂is taken into account in first place (3 2 2 s) and (2 1 2 q) are deleted from r₁’; or (ii) if I₁is first considered, D is blocked by the tuple (3 2 2 s) of r₁’.

The undesirable effects illustrated above increase when there is overlapping of the foreign key attributes [Clair98].

3.3. Problems in Updates

The study of all possible updates may be divided in three cases: i) left update: the foreign key FK_i of R_i is updated, and FK_iÇK_i=Æ; ii) right update: the primary key K_j of R_j is updated, and FK_jÇK_j=Æ; iii) both sides update: the primary key K_j of R_j is updated, and FK_jÇK_j¹Æ.

3.3.1. Right Updates

Only the update of values belonging to primary keys in the relations R_j will be taken into account. Let Ru be the update of one or more tuples of one relation. Ru may promote actions like those seen for deletions: i) the update of tuples referencing tuples involved in Ru via rir’s with Cascades modality; or ii) the update of foreign key values in tuples referencing tuples involved in Ru via rir’s with Nullified modality. On the other hand, if one tuple t references one tuple involved in Ru via one rir with a Restricted modality, t blocks the execution of Ru.

Example 1-3: Consider the Example 1, excluding I₁ and I₂. Suppose that Ru changes the tuple (a) by (d) in r₃. The result of this update depends on the order in which the rir’s involving R₃ are enforced: (i) if I₆ is enforced in first place, the tuple (a x) in r₄is modified to (d x), this leads to the enforcement of I₅, which results in a failed attempt to modify the value of the attribute PROJ# in (1 4 4 a x) and (4 l l a x) of r₁ to (d), where the failure is due to the conflict of the new value by I₃; (ii) if I₄ is enforced in first place, the tuple (4 a) in r₂is modified to (4 d), leading to the enforcement of I₃ which in turns assigns null values to the attributes DIR# and PROJ# in the tuple (1 4 4 a x) of r₁; then as a result of enforcing I₆, (a x) in r₄ is modified to (d x), and enforcing I₅ would provoke to modify the value of the attribute PROJ# in (4 - - a x) of the relation r₁ to (d), and (1 4 - - x) in r₁ does not hold any reference to r₄.

3.3.2. Left Updates

Only updates of values belonging to foreign keys will be taken into account. Let Lu be the update of one or more tuples of Ri. Lu may promote actions such as: i) the insertion of the tuples referenced by the tuples involved in Lu via rir’s with Cascades modality; or ii) the update of foreign key values in tuples involved in Lu referencing non existing tuples in a relation linked via one rir with Nullified modality. On the opposite if one tuple t references a non existing tuple in a relation referenced via one rir with a Restricted modality, t blocks the execution of Lu. Problematic cases in left updates are the same that those for insertions.

3.3.3. Both-Side Updates

Only the update of values belonging to both, primary keys and foreign keys in a relation belonging to two rir’s as the right side and the left side respectively, will be taken into account. The problematic cases for both-side update operations are the same that those detected for left and right updates.

4. STATIC DETECTION OF ANOMALIES

In section 3, different anomalies in the manipulation of data, has been examined.

In this section the mechanisms needed to detect the potential presence of anomalies will be detailed. This will be done adhering to the [Markowitz94] and [Casanova89] approaches. Safeness conditions must ensure:

1 - Data manipulations must produce only one result regardless the order in which the rir’s are enforced and the order in which the tuples are accessed.

2 - A data manipulation, must map a consistent database state r to another consistent database state r’, this is in concordance with the immediate mode verification used in this article.

The present study is driven by the immediate mode in what refers to integrity verification. On the other hand, deferred mode is an advantageous and even a mandatory well-known strategy for integrity maintenance. However, the full understanding of the immediate mode is required for the analysis of deferred mode that is under development in this project.

The relations in whom the anomalies may appear during the execution of operations over the database can be determined through safeness conditions. To accomplish that, the following sets of relations will be defined: C(R_i), R(R_i), N(R_i), CDir(R_i), RDir(R_i), and NDir(R_i). These sets are formed by elements (R_j, FK) including R_i, where R_j Î R, and FK is one foreign key associated with R_j.

4.1. Insert Operations

The sets needed to detect sources of inconsistencies are:

· CDir(R_i) contains elements (R_j,FK), where R_j is a relation connected in G to R_i by one edge corresponding to one rir with Cascades modality for insertions.

· C(R_i) contains elements (R_j,FK), where one relation R_t of C(R_i) or CDir(R_i) is connected in G to R_j by an edge corresponding to a rir with a Cascades modality for insertions, of the form R_t[FK]<<R_j[K_j]:(c,b,m_i,m_d), where: FKÍK_t, and K_tis primary key of R_t;

· NDir(R_i) contains elements (R_j,FK), where R_j is one relation connected in G to R_i by one edge corresponding to a rir with Nullified modality for insertions and for every XÍY, there does not exist any nna R_i:X¹l.

· RDir(R_i) contains elements of the form (R_j,FK), where R_j is one relation connected in G to R_i by one arc corresponding to one of the followings: (1) one rir R_i[FK]<<R_j[K_j]:(r,b,m_i,m_d); (2) one rir R_i[FK]<<R_j[K_j] :(n,b,m_i,m_d), where there are at least one XÍFK, with one nna of the form R_i:X¹l.

· R(R_i) contains elements of the form (R_j,FK), where one relation R_t of C(R_i) or CDir(R_i) is connected in G to R_j by one edge corresponding to: (1) one rir R_t[FK] <<R_j[K_j]:(r,b,m_i,m_d) and FKÍK_t, where K_tis the primary key of R_t; (2) one rir R_t[FK]<<R_j[K_j]:(n,b,m_i,m_d), where FKÍK_t, where K_tis the primary key of R_t and there exists XÍFK, with at least one nna R_t:X¹l.

Note that in delete or right update operations, the relation that enforces the rir by Nullified does not suffer any modification in its attributes since it propagates the operation's effect to the left relation. By the contrary, insertions or left updates by Nullified, sets null their own involved attributes, voiding any reference to another relation. This is why the definition of the sets RDir, CDir and NDir is needed. It may be said that the relational scheme R is safe in insert operations iff for every relation R_i of R:

I1) there is not any relation R_j of R belonging to C(R_i) or CDir(R_i) and R(R_i) or RDir(R_i) at the same time;

I2) there is not any pair of elements (R_j,Y) and (R_k,Y’) belonging to CDir(R_i) and NDir(R_i) respectively, where: (i) R_j=R_k or (ii) Y and Y’ overlaps;

I3) there is not any relation R_j of R belonging to C(R_i) and NDir(R_i) at the same time;

I4) there is not exist any pair of elements (R_j,Y) and (R_k,Y’) belonging to NDir(R_i) respectively, where Y and Y’ strictly overlaps;

I5) there is not any pair of elements (R_j,Y) and (R_k,Y’) belonging to RDir(R_i) and NDir(R_i) respectively, and Y and Y’ overlaps.

Example: The relational scheme of the Example 1-1 of section 3.1 does not satisfy I1, since the relation R₃ is involved in the sets R(R₁) and C(R₁), then R₁ is a possible source of unpredictable results during inserts.

Safe conditions may be used to place the affected relations during insert operations: i) if I1 is not satisfied, the affected relation is R_j which is involved in C(R_i) or CDir(R_i)) and R(R_i) or RDir(R_i) at the same time; ii) if I3 is not satisfied, the affected relation is R_j which is involved in C(R_i) and NDir(R_i) at the same time; iii) if I2, I4 or I5 are not satisfied, the place where different results may appear is R_i.

Proposition: For every relation R_i of R, for every database state r associated with R, and for every insertion I involving one or more tuples of the relation r_i of r associated with the relation R_i, I maps r into an unique state of the database iff R satisfies the previous conditions. In [Rivero99] the proof of that proposition is depicted.

4.2. Delete Operations

For delete operations, sets that help to detect conflictive nodes are:

· C(R_i) contains the element (R_i, -), and elements (R_j, FK), where R_j is a relation connected to R_i in G for an oriented path formed by edges corresponding to rir’s with Cascades mode for deletions, such that the first edge is labeled FK:(a,c,m_i,m_d).

· N(R_i) contains elements (R_j, FK), where R_j is a relation connected to R_m of C(R_i) in G by an edge corresponding to a rir R_j[FK]<<R_m[K_m]:(a, n, m_i, m_d), and for each X Í FK, X is allowed to have null values.

· R(R_i) contains elements (R_j, FK), where R_j is a relation connected to R_m of C(R_i) in G by an edge corresponding to: (1) a rir with Restricted mode and labeled with the foreign key FK; (2) a rir R_j[FK]<<R_m[K_m]:(a, n, m_i, m_d), such that there exists XÍFK, and may not take null values.

By involving those operations that may be rejected because they try to nullify attributes associated to a nna to the set R(R_i), the sets and defined by Markowitz (1994), are no longer required [Rivero98].

The relational schema R is sure iff for each R_i in R:

D1) there is not any relation R_j of R involved in both C(R_i) and R(R_i); this condition avoids the deletion of tuples that block the operation when another path is followed;

D2) there is not any pair of elements (R_j, Y) and (R_j, Y’) involved in C(R_i) and N(R_i) respectively, such that Y and Y’ overlap. It avoids the updating or deletion according to the path of rir’s that is enforced in the first place;

D3) there is not any pair of elements (R_j, Y) and (R_j, Y’) belonging to R(R_i) and N(R_i) respectively, such that Y and Y’ overlap. This condition prevents that the tuples affected by a delete operation were modified or stay unaltered blocking the operation, according to the order in which the restrictions are verified;

D4) there is not any pair of elements (R_j, Y) and (R_j, Y’) belonging to the set N(R_i), such that Y and Y’ strictly overlap. In such a way different results when updates with Nullifies strategy are performed, are avoided.

Other combinations of sets either are symmetric to those previously exposed or do not produce anomalies. The relational schema of Example 1 is unpredictable since R₂ is involved in R(R_i) and C(R_i). R₃ is a possible source of unpredictable results when a delete operation is performed since it not satisfies D1. The relational schema of Example 2 does not satisfies D1 because R₁ is involved in both R(R₂) and C(R₂); in such a way, R₂ is a source of potential anomalies.

Safety conditions permit to establish that the source of anomalies is the relation R_i and the places where anomalies occur is: the R_j that is involved in C(R_i) and R(R_i), when D1 is not satisfied; the R_j that is involved in C(R_i) and N(R_i) with the elements (R_j,Y) and (R_j,Y’) respectively in such a way that Y and Y’ overlap, when D2 is not satisfied; the R_j that is involved in R(R_i) and N(R_i) with the elements (R_j,Y) and (R_j,Y’) respectively in such a way that Y and Y’ overlap, if D3 is not satisfied; the R_j that is involved in N(R_i) with the elements (R_j,Y) and (R_j,Y’) respectively in such a way that Y and Y’ strictly overlap if D4 is not satisfied. In Markowitz (1994) the proof of necessity and sufficiency of that conditions, is sketched.

4.3. Update Operations

In the same way as previous operations, sets of relations are built in order to find sources of potential anomalies.

4.3.1. Right Updates

In this case the following sets must be built:

· C(R_i) has the element (R_i, -), and elements (R_j, FK), such that there exists an element (R_k, S_k) belonging to C(R_i), where R_j is a relation of R linked to R_k in G by an edge corresponding to a rir R_j[FK]<<R_k[K_k]:(a,b,m_i,c) with a Cascades option for updates, and: (a) S_k= Æ or (b) S_kÇ K_k¹ Æ.

· N(R_i) contains elements (R_j,FK), such that there exists an element (R_k,S_k) belonging to C(R_i), where R_j is linked to R_k in G by an edge corresponding to a rir R_j[FK]<<R_k[K_k]:(a,b,m_i,n) such that for each XÍY, there does not exist any rnn R_j:X¹l and: (a) S_k= Æ or (b) S_kÇ K_k¹ Æ.

· R(R_i) is formed by elements (R_j, FK), such that there exists an element (R_k,S_k) belonging to C(R_i), where R_j is a R's relation connected to R_k in G by an edge corresponding to: (1) a rir R_j[FK] << R_k[K_k]: (a, b,m_i, b) such that: (a) S_k= Æ or (b) S_kÇ K_k¹ Æ; (2) a rir R_j[FK] << R_k[K_k]: (a,b,m_i,n) such that there exists X Í FK, associated to a rnn R_j:X¹l and: (a) S_k= Æ; or (b) S_kÇ K_k¹ Æ.

It may be stated that a relational schema R is safe under right updates iff for each relation R_i of R:

U_r1) there is not any relation R_j of R, such that there exists a pair of elements (R_j,Y) and (R_j,Y’) belonging to C(R_i) and R(R_i) respectively and Y and Y' overlap;

U_r2) there is not any relation R_j of R, such that there exists a pair of elements (R_j,Y) and (R_j,Y’) belonging to C(R_i) and N(R_i) respectively and Y overlaps Y’;

U_r3) there is not any relation R_j of R, such that there exists a pair of elements (R_j,Y) and (R_j,Y’) belonging to R(R_i) and N(R_i) respectively and Y overlaps Y’;

U_r4) there is not any relation R_j of R, such that there exists a pair of elements (R_j,Y) and (R_j,Y’) belonging to the set N(R_i), and Y and Y’ strictly overlap.

Using those four conditions the sources of anomalies when updates to the right side relation are performed, may be determined. For all cases, the source is the relation R_i. If U_r1, is not satisfied the irregularity will occur in a R_j contained in both C(R_i) and R(R_i) with the elements (R_j,Y) and (R_j,Y’) respectively, where Y and Y’ overlap. Analogously, when U_r2 is not satisfied the anomalies will be placed in a R_j involved in both C(R_i) and N(R_i) with the elements (R_j,Y) and (R_j,Y’) respectively, where Y overlaps Y’. The same situation occurs when U_r3 is not attained. If U_r4, is violated the place of anomalies will be R_j contained in N(R_i) with the elements (R_j,Y) and (R_j,Y’) respectively where Y and Y’ strictly overlap.

4.3.2. Left Updates

Insecure cases for referential integrity when left updates are performed are the same as those studied for insertions, if their modalities agree. The composition of the set changes because the first edge must be considered with the update option but the following ones must be seen as the ones corresponding to insertions. Safety conditions are the same.

4.3.3. Both-Side Updates

In this case, problematic cases are the same as those studied under left and right updates, then their analysis may be summarized according to what was indicated for those operations.

5. Generation OF RULES

For each one of the relations that appeared as potential sources of anomalies during the static analysis of the schema, rules were built. They will permit to determine if anomalies are present in a given database state. Such rules are expressed as serial combinations using the logical operations not, and and the composition operator o. A serial combination from a specific relation, is an expression representing a directed path in G, and links nodes with an incidence degree (delete and right updates) or divergence degree (insertion and left updates) equal to 1.

The partial divergence (incidence) degree (pdd and pid respectively) of a vertex is defined as the number of significant edges that leave (reach) it. A significant edge is defined according to the operation: i) for deletions and left updates, every edge is a significant one; ii) for insertions a significant edge is one representing a rir R_i[W]<<R_j[K_j]:(a, b,m_i, m_d), with WÍK_i; iii) for right updates a significant edge is one representing a rir R_i[W]<<R_j[K_j]:(a,b,m_i, m_d), with WÇK_i ¹Æ.

In the restriction graph, seven types of nodes will be distinguished according to their partial incidence or divergence degrees: 1) source node (pid=0); 2) sink node (pdd=0); 3) unifier node (pid³2 and pdd=1); branch node (pdd³2 and pid=1); passing node (pid=1 and pdd=1); multiple node (pid³2 and pdd³2); isolated node (pid=0 and pdd=0).

5.1. Insertion Rules

In order to build the rules corresponding to potentially anomalous insert operations, the following serial combinations must be considered:

· C⁺(I_j) º representing a G's directed path from a non-sink node (R_i ) to a non-source node (R_jn), where the edges of the path represent the following rir’s: R_i[FK_i]<<R_j1[K_j1]: (c,b,m_i,m_d); R_j1[FK₁]<<R_j2[K_j2]:(c,b,m_i,m_d); ...; R_jn‑1[FK_n‑1]<< R_jn[K_jn]:(c,b,m_i,m_d); with FK₁ÍK_j1; FK₂ Í K_j2;.._;FK_n-1Í K_jn-1 and the first edge in the path is I_j.

· N (I_j) º representing a directed path in G, composed by an only edge, that corresponds to the rir I_j:R_i[FK] <<R_m[K_m]:(n,b,m_i,m_d), such that for each XÍFK, X is allowed to be null.

· R (I_j) º representing a directed path in G, composed by a unique edge that corresponds to: (1) the rir I_j:R_i[FK] <<R_m[K_m]:(r,b,m_i,m_d); or (2) the rir I_j:R_i[FK]<<R_m[K_m]:(n,b, m_i,m_d) such that XÍFK exists and X is restricted by a rnn.

· C_r(I_j) º representing a directed path in G leaving from a non-sink node (R_i) and reaching a non-source node (R_j), where the edges correspond to the following rir’s: (1) R_i[FK_i]<<R_j1[K_j1]:(c,b,m_i,m_d);R_j1[FK1]<<R_j2[K_j2]:(c,b,m_i,m_d);....;R_jn‑1[FK_n‑1]<<R_jn[K_jn]:(c,b,m_i,m_d);R_jn[FK_n]<<R_j[K_jj]:(r,b,m_i,m_d); with FK₁ÍK_j1; FK₂ Í K_j2;...;FK_n Í K_jn and the first edge

corresponds to I_j or (2) R_i[FK_i]<<R_j1[K_j1]:(c,b,m_i,m_d); R_j1[FK1]<<R_j2[K_j2]:(c,b,m_i,m_d);...; R_jn-1[FK_n-1]<<R_jn[K_jn] :(c,b,m_i,m_d); R_jn[FK_n]<<R_j[K_j]:(n,b,m_i,m_d); with FK₁ÍK_j1; FK₂ Í K_j2;...;FK_n Í K_jn, where the first edge is associated to I_j; X Í FK_n exists and X is not allowed to have null values.

Algorithm: For an insertion operation over a table that is a source of anomalies, a set of trees will be built in order to support the rules generation, applying the following algorithm:

1. Set the source of anomalies table as the root of the tree. Set its straight descendents in the graph as their children in the tree (the number of children of the root node will be equal to the partial divergence degree of the node in the graph).

2. In each one of the branches of the tree (each of the internal nodes have only one child), combine serially all sequences of nodes with a partial divergence degree equal to 1, until a node with an ancestor reaching it with an option not equal to Cascades or a node with a partial divergence degree not equal to 1 is reached.

For each tree, a rule is built. Each one of the branches of the trees will be a serial combination since each internal node has a unique child. They will be assembled by means of the and operator. If the branch represents a serial combination C_r o R it will be preceded by the not operator. If a branch ends in a node that is the root of another tree, the serial combination of that branch (C⁺) is composed (o) with the rule corresponding to that relation. Each one of the paths is considered in such a way that the treatment of the same anomaly twice or more times is avoided. If a relation has no associated rule, it is because it never produces anomalies when insert operations are performed in it.

5.2. Delete Rules

The analysis of the different paths in the restriction graph is analogous to that exposed for insertions, but in this case the graph is scanned in the reverse direction. Besides, the algorithm is quite similar to the one already depicted in the previous section.

Example: The rule for the source of anomalies R₃,according to the graph of Figure 1, is Rule R₃: C⁺(I₆) and (not R(I₄)). Figure 3 shows their construction. For the evaluation of a rule the knowledge of the database state, is essential. The mechanisms that perform the evaluation should be refined in order to obtain an acceptable level of efficiency; on the contrary the proposed strategy will not be applicable.

Jorge H. Doorn

Daniel Loureiro

Keywords: integrity restrictions, referential integrity, database updates, anomalous updates.

Problem = False

Copyright 2000 ACM