Static Detection of Sources of Dynamic Anomalies in a Network of
Referential Integrity Restrictions
|
Laura C.
Rivero ISISTAN
- UNCentro LINTI - U.N.La Plata B.A.
Argentina Phone: +54 2293 440363 lrivero@exa.unicen.edu.ar |
Jorge H. Doorn
ISISTAN -
UNCentro, B.A.
Argentina Phone: +54 2293 440363 jdoorn@exa.unicen.edu.ar |
Daniel Loureiro
UNCentro, B.A. Argentina Phone: +54 2293 440363 |
ABSTRACT
Under
certain circumstances, basic operations over tables in a relational database,
where integrity restrictions such as referential and null restrictions have
been specified, may produce unpredictable results, not detectable by means of a
static analysis of the schema. When the design includes redundancies or when
the set of restrictions is contradictory it is easy to detect and prevent
future errors, but there are situations that require a dynamic analysis. In
this paper, the properties of networks of referencial integrity restrictions
that contain irregularities are analyzed, and the anomalies that may appear
when data actualization in such environment is done are studied in order to
define criteria and develop an algorithm to generate rules for proper handling
of inconsistencies.
1. INTRODUCTION
Semantic data control ensures the maintenance of database consistency,
by rejecting update transactions that lead to inconsistent states or by
activating specific actions on the database state to compensate the effect of
the previous transaction. Unpredictable results may be obtained in a relational
particular database state, when basic operations are applied. This
inconvenience is produced by redundancies in the schema design, by
contradictory referential integrity restrictions or simply because the designer
established complex restrictions having tuple dependent semantic. From a
procedural point of view, it means that the result may be unpredictable because
it is affected by the order in which individual restrictions are applied or by
the order in which the constraints are enforced.
Those problems are well-known and there are research
reports in relation with some aspects of them (see e.g. [Markowitz90],
[Markowitz91], [Casanova88], [Rivero96]) but the formalization of conditions
and the implementation of mechanisms for the automatic prevention of those
anomalies is a recent investigation field. In [Markowitz94] the uncertainties
produced by specific data manipulation are described, especially in delete
operations. In [Casanova89] special cases of updates propagation are described.
This research paper may be divided in three well-defined parts. The
first is developed in sections 3 and 4 and is devoted to refine Markowitz study
extending it to all operations. The second is dedicated to the specification of
algorithms able to detect potential sources of anomalies in a static way (Section
5) [Rivero98]. The last part (Section 6.) presents an algorithm that
automatically generates rules that allows the integrity verification.
2. RELATIONAL CONCEPTS
A relational schema is R º<R, D>, with R={R1, R2, ..., Rm} and D={FD, ID, NR} set of relations, FD and ID set of functional dependencies (fd) and inclusion
dependencies (id) respectively and NR
set of null restrictions (null constraint and null not allowed). A database
state for R is denoted by r ={r1, r2,
..., rm}; sch(Ri) represents
the set of attributes of Ri, Ki stands for a candidate key over Ri; and FK represents a foreign
key for Ri. A database state r associated with R is
consistent if it satisfies all restrictions in D. Two attribute sets overlap iff
they share attributes (YÇZ¹Æ), and strictly overlap iff they overlap but they are
not equal ((YÇZ¹Æ) Ù (Y¹Z)).
A functional dependency (fd) over a set of attributes U is
one expression of the form X®Y, where X,YÍU. If RieR, fd’s over Ri
will be indicated by Ri:X ® Y.
A null constraint (nna) may be expressed by Ri:Li ¹l. It is satisfied by Ri iff
for every tuple t of Ri the subtuple t.Li has only not null values.
There is at least one nna in Ri, that is Ri:
Ki ¹l.
One inclusion dependency (id) in R is an expression Ri[X]<<Rj[Z]:(a,b,mi,md), where Ri
and Rj are relation names (possibly the same); X,ZÍsch(Rj) are compatible attributes; a, b, mi and md are the strategies to perform inserts, deletes and updates over the
left and right side respectively, and the strategies may be c (Cascades), r (Restricted) or n (Nullifies).
All combinations are studied in this article, however inserts and updates over
the left side are generally done using a y mi specified with modality ‘r’. If Z is the
primary key of the relation Rj, then it is a key-based-id and X
constitutes a foreign key for Ri; this sort of id’s are named
referencial integrity restrictions (rir’s).
The referential integrity directed graph G=(V,H), associated with R. may be defined with V=R and
H = {(Ri,Rj,L:(a,b,mi,md)) /Ri[L]<<Rj[Kj]:[a,b,mi,md] ÎID}. H
is composed by elements (Ri,Rj,
L: (a, b, mi, md)), where the edge goes from Ri
to Rj
and L:(a,b,mi, md) is the label of the edge.
3. CONFLICTIVE MANIPULATIONS
In order to characterize the problem that may arise when some updates
are performed over data constrained by rir's the examples in Figures 1 and
2 will be used.
Example
1:
|
·
Relations (keys are underlined) (R1) Employee ( NBR-E,
NBR-S, NBR-M, NBR-D, NBR-P ) (R2)
Manager ( NBR-M, NBR-P ) (R3)
Project ( NBR-P) (R4)
Department ( NBR-D, NBR-P ) ·
Null restrictions (N1)
Employee: NBR-E ¹ l (N2)
Manager: (NBR-M, NBR-P) ¹ l
|
(N3)
Project: NBR-P ¹ l (N4)
Department: (NBR-D, NBR-P) ¹ l ·
Referential Integrity Restrictions (I1)
Manager[NBR-M]<<Employee[NBR-E]:(c,c,c,c) (I2)
Employee[NBR-S]<<Employee[NBR-E]:(n,r,n,r) (I3) Employee[NBR-D,
NBR-P]<<Manager[NBR-M, NBR-P]:(c,c,c,n) (I4)
Manager[NBR-P]<<Project[NBR-P]:(r,r,c,c) (I5)
Employee[NBR-P, NBR-D]<<Department[NBR-P, NBR-D]:(c,c,c,c) (I6) Department[NBR-P]<<Project[NBR-P]:(c,c,r,c) |
Figure
1: Restriction Graph, State and Restrictions for Example 1 (adapted from
[Markowitz94])
Example 2:
|
·
Relations (keys are underlined) (R1)
Films ( FILM#, PROD#, DIR#, Subject ) (R2)
Staff ( PERS#, NAME ) ·
Nulls not allowed Restrictions (N1)
Films: FILM# ¹ l |
(N2) Staff: PERS# ¹ l ·
Referential Integrity Restrictions (I1)
Films[PROD#]<<Staff[PERS#]:(a1,b1,mi1,md1) (I2)
Films[DIR#]<<Staff[PERS#]:(a2,b2,mi2,md2) |

Figure 2: Restrictions Graph, State and Restrictions for Example 2.
The data manipulations
considered are insertions, deletions or updates of one or several tuples; a manipulation
involves only one kind of operation, it refers to an unique relation and
entails a set of tuples that do not change during the execution of the
manipulation. The specified constraints will be verified after each single
tuple operation. The manipulation will be successful if all of its single tuple
operations are carried out, otherwise it fails (is revoked). The above two
examples present more than just one anomalous behavior. In the following
sections, simplified subschemes of them are used to illustrate different
problems.
3.1. Problems in Insert
Operations
The effect of insertions is examined in this section.
Example 1-1: Consider the rir's and the database state
depicted in Figure 1, excluding in this case I1 and I2.
The operation I inserts the tuple (5 4 4 d x) in the relation r1. I may cause the insertion (via I5 and
I6) of the tuple (d) in
the relation r3, or it may block the insert
operation (via I3 and I4). The result of I depends
on the order in which the rir’s that involve R1 are
enforced: (i) if I5 is first considered, the tuple (d x) is inserted in r4, this triggers the enforcing of I6, which provokes the
insertion of the tuple (d) in r3,
then I3 will cause the insertion of the tuple (4 d) in r4; or (ii) if I3 is taken
into account in the first place, the tuple (4 d) is inserted in r2, and this triggers the enforcing of
I4 which blocks the insert operation since the tuple (d) does not exist in the relation r3.
Example 2-1: Consider the example in Figure 2 and the corresponding state of the
database r = {r1,r2}. If a1: r and a2: c and the operation I inserts the tuple (5 6 6 v) in the relation r1, it may
trigger the insertion (via I2) of the tuple (6 l) in r2, or the insert operation may
be blocked (via I1). The result of I depends on the order in which the rir’s that involve a R1
are applied: (i) if I2 is enforced in first place then the
tuple (5 6 6 v) is inserted in r1
and the tuple (6 l) in r2; or (ii) if I1 is first taken into
account, I is blocked since there is
not a value 6 for PERSON-ID in R2. If a1: n
and a2: c and it is required that PROD# and DIR#
foreign keys associated with R1 can not hold null values, the set null
modality for the insert operation becomes restricted, showing a behavior
similar to the previous.
3.2. Problems in Delete
Operations
A deletion D may trigger the
typical actions defined by the strategy, triggering other operations or, by the
contrary, blocking the manipulation. The outcome of D, may be unpredictable when the enforcement of the rir's
promoted by D implies the trigger of
updates or deletions of tuples, that in turn can blockade D if another path of G
is first considered.
Example 1-2: Consider the Example 1,
excluding in this case I1 and I2. Suppose that
D
involves the tuple (a) of
relation r3. The tuple (4 a)
of r2 can block D via I4,
while D can trigger the deletion of
that tuple via I6, I5, and I1. The result of D depends on the order of the
enforcement of the rir’s that involve R3: (i) if I6 is
verified in the first time, then (a x)
is deleted from r4, then I5 is enforced, triggering the
deletion of tuples (1 4 x a) and (4 l x a) of r1; finally I1
promotes the suppression of tuple (4 a)
of r2; on the other side (ii) if I3 is first enforced, D is blocked by tuple (4 a) of r2.
Example 2-2: Consider Example 2; with b1: r, b2: c; and r={r1’, r2}.
If D involves the tuple (2 yy) of r2,
the tuple (3 2 2 s) of r1’ may block D via I1, or may trigger the deletion of this tuple via
I2. The outcome of the operation depends on the order of rir’s
involving R2 enforcement:
(i) If I2 is taken into account in first place (3 2 2 s) and (2 1 2
q) are deleted from r1’; or (ii) if I1 is first
considered, D is blocked by the tuple
(3 2 2 s) of r1’.
The
undesirable effects illustrated above increase when there is overlapping of the
foreign key attributes [Clair98].
3.3. Problems in Updates
The study of all possible updates may be divided in three cases: i) left update: the foreign key FKi
of Ri is updated, and FKiÇKi=Æ; ii) right update: the
primary key Kj of Rj is updated, and FKjÇKj=Æ; iii) both sides update: the primary key Kj of Rj
is updated, and FKjÇKj¹Æ.
3.3.1. Right Updates
Only the update of values belonging to primary keys in the relations Rj
will be taken into account. Let Ru be
the update of one or more tuples of one relation. Ru may promote actions like those seen for deletions: i) the update
of tuples referencing tuples involved in Ru
via rir’s
with Cascades modality; or ii) the
update of foreign key values in tuples referencing tuples involved in Ru via rir’s with Nullified modality. On the other hand,
if one tuple t references one tuple involved in Ru via one rir with a Restricted modality, t blocks the execution of Ru.
Example 1-3: Consider the Example 1, excluding I1 and I2.
Suppose that Ru changes the tuple (a) by (d) in r3. The result of this update depends on the order
in which the rir’s involving R3 are enforced: (i) if I6 is enforced in first place,
the tuple (a x) in r4 is
modified to (d x), this leads to the
enforcement of I5, which results in a failed attempt to modify the
value of the attribute PROJ# in (1 4 4 a
x) and (4 l l a x) of r1 to (d), where
the failure is due to the conflict of the new value by I3; (ii) if I4
is enforced in first place, the tuple (4
a) in r2 is modified to (4
d), leading to the enforcement of I3 which in turns assigns null
values to the attributes DIR# and PROJ# in the tuple (1 4 4 a x) of r1; then as a result
of enforcing I6, (a x) in r4 is modified to (d x), and enforcing I5 would
provoke to modify the value of the attribute PROJ# in (4 - - a x) of the relation r1 to (d), and (1 4 - - x) in r1 does not hold any reference to r4.
3.3.2. Left Updates
Only updates of values belonging to foreign keys will be taken into
account. Let Lu be the update of one
or more tuples of Ri. Lu may promote
actions such as: i) the insertion of the tuples referenced by the tuples
involved in Lu via rir’s
with Cascades modality; or ii) the
update of foreign key values in tuples involved in Lu referencing non existing tuples in a relation linked via one rir
with Nullified modality. On the
opposite if one tuple t references a non existing tuple in a relation
referenced via one rir with a Restricted
modality, t blocks the execution of Lu.
Problematic cases in left updates are the same that those for insertions.
3.3.3. Both-Side Updates
Only the update of values belonging to both, primary keys and foreign
keys in a relation belonging to two rir’s as the right side and the left
side respectively, will be taken into account. The problematic cases for
both-side update operations are the same that those detected for left and right
updates.
4. STATIC DETECTION OF
ANOMALIES
In section 3, different anomalies in the
manipulation of data, has been examined.
In
this section the mechanisms needed to detect the potential presence of
anomalies will be detailed. This will be done adhering to the [Markowitz94] and
[Casanova89] approaches. Safeness conditions must ensure:
1 - Data manipulations must produce only one result regardless the order
in which the rir’s are enforced and the order in which the tuples are
accessed.
2 - A data manipulation, must map a consistent database state r to another consistent database state r’, this is in concordance with the immediate mode verification used in
this article.
The present study is driven by the immediate mode in what refers to
integrity verification. On the other hand, deferred mode is an advantageous and
even a mandatory well-known strategy for integrity maintenance. However, the
full understanding of the immediate mode is required for the analysis of deferred
mode that is under development in this project.
The relations in whom the anomalies may appear during the execution of
operations over the database can be determined through safeness conditions. To
accomplish that, the following sets of relations will be defined: C(Ri),
R(Ri), N(Ri), CDir(Ri), RDir(Ri),
and NDir(Ri). These sets are formed by elements (Rj, FK) including Ri,
where Rj Î R, and FK is one foreign key associated with Rj.
4.1. Insert Operations
The sets needed to detect sources of inconsistencies are:
·
CDir(Ri) contains elements (Rj,FK),
where Rj is a relation connected in G to Ri by one edge corresponding to one rir
with Cascades modality for
insertions.
·
C(Ri) contains elements (Rj,FK), where one relation
Rt of C(Ri) or CDir(Ri) is connected in G to Rj by an edge
corresponding to a rir with a Cascades modality
for insertions, of the form Rt[FK]<<Rj[Kj]:(c,b,mi,md), where: FKÍKt,
and Kt is primary key of Rt;
·
NDir(Ri) contains elements (Rj,FK),
where Rj is one relation connected in G to Ri by one edge corresponding to a rir
with Nullified modality for
insertions and for every XÍY, there does not exist any nna
Ri:X¹l.
·
RDir(Ri) contains elements of the form (Rj,FK),
where Rj is one relation connected in G to Ri by one arc corresponding to one of the
followings: (1) one rir Ri[FK]<<Rj[Kj]:(r,b,mi,md); (2) one rir Ri[FK]<<Rj[Kj]
:(n,b,mi,md), where there are at least one XÍFK,
with one nna of the form Ri:X¹l.
·
R(Ri) contains elements of the form (Rj,FK), where one relation
Rt of C(Ri) or CDir(Ri) is connected in
G to Rj by one edge corresponding to: (1) one rir Rt[FK]
<<Rj[Kj]:(r,b,mi,md) and FKÍKt, where Kt is the
primary key of Rt; (2) one rir Rt[FK]<<Rj[Kj]:(n,b,mi,md), where FKÍKt,
where Kt is the primary key of Rt and there exists XÍFK, with at least one nna Rt:X¹l.
Note that in delete or right update operations, the relation that
enforces the rir by Nullified does
not suffer any modification in its attributes since it propagates the operation's
effect to the left relation. By the contrary, insertions or left updates by Nullified, sets null their own involved
attributes, voiding any reference to another relation. This is why the
definition of the sets RDir, CDir and NDir
is needed. It may be said that the relational scheme R is safe in insert operations
iff for every relation Ri
of R:
I1) there is not any relation Rj of R belonging to C(Ri) or CDir(Ri)
and R(Ri)
or RDir(Ri)
at the same time;
I2) there is not any pair of elements (Rj,Y) and (Rk,Y’)
belonging to CDir(Ri) and NDir(Ri) respectively,
where: (i) Rj=Rk or (ii) Y and Y’ overlaps;
I3) there is not any relation Rj
of R belonging to C(Ri)
and NDir(Ri) at the same time;
I4) there is not exist any pair of
elements (Rj,Y) and (Rk,Y’) belonging to NDir(Ri)
respectively, where Y and Y’ strictly overlaps;
I5) there is not any pair of elements (Rj,Y) and (Rk,Y’)
belonging to RDir(Ri) and NDir(Ri) respectively,
and Y and Y’ overlaps.
Example: The relational scheme of the Example 1-1 of section 3.1 does not
satisfy I1, since the relation R3 is involved in the sets R(R1)
and C(R1),
then R1 is a possible
source of unpredictable results during inserts.
Safe
conditions may be used to place the affected relations during insert operations:
i) if I1 is not satisfied, the
affected relation is Rj which is involved in C(Ri) or CDir(Ri))
and R(Ri)
or RDir(Ri)
at the same time; ii) if I3 is not
satisfied, the affected relation is Rj which is involved in C(Ri)
and NDir(Ri)
at the same time; iii) if I2, I4 or I5 are not satisfied, the place where different results may appear
is Ri.
Proposition: For every relation Ri
of R, for every database state r associated with R, and for every
insertion I involving one or more
tuples of the relation ri of r associated with the relation Ri, I maps r into an unique state of the
database iff R satisfies the previous
conditions. In [Rivero99] the proof of that proposition is depicted.
4.2. Delete Operations
For delete operations, sets that help to detect conflictive nodes are:
·
C(Ri) contains the element (Ri, -), and elements (Rj, FK), where Rj
is a relation connected to Ri
in G for an oriented path
formed by edges corresponding to rir’s with Cascades mode for deletions, such that the first edge is labeled
FK:(a,c,mi,md).
·
N(Ri) contains elements (Rj,
FK), where Rj is a relation connected to Rm of C(Ri)
in G by an edge corresponding to a rir
Rj[FK]<<Rm[Km]:(a, n, mi, md), and for each X Í FK, X is allowed to have null
values.
·
R(Ri) contains elements (Rj,
FK), where Rj is a relation connected to Rm of C(Ri)
in G by an edge
corresponding to: (1) a rir with Restricted mode and labeled with the foreign key FK; (2) a rir
Rj[FK]<<Rm[Km]:(a, n, mi, md), such that there exists XÍFK, and may not take null values.
By involving those operations that may be rejected because they try to
nullify attributes associated to a nna to the set R(Ri), the
sets and
defined by Markowitz (1994), are no longer required [Rivero98].
The relational schema R is
sure iff for each Ri
in R:
D1) there is not any relation Rj
of R involved in both C(Ri)
and R(Ri);
this condition avoids the deletion of tuples that block the operation when
another path is followed;
D2) there is not any pair of elements (Rj, Y) and (Rj,
Y’) involved in C(Ri) and N(Ri) respectively, such
that Y and Y’ overlap. It avoids the
updating or deletion according to the path of rir’s that is enforced in
the first place;
D3) there is not any pair of elements (Rj, Y) and (Rj,
Y’) belonging to R(Ri) and N(Ri) respectively, such
that Y and Y’ overlap. This condition prevents that the tuples affected by a
delete operation were modified or stay unaltered blocking the operation,
according to the order in which the restrictions are verified;
D4) there is not any pair of elements (Rj, Y) and (Rj,
Y’) belonging to the set N(Ri), such that Y and Y’
strictly overlap. In such a way different results when updates with Nullifies strategy are performed, are
avoided.
Other combinations of sets either are symmetric to those previously
exposed or do not produce anomalies. The relational schema of Example 1 is
unpredictable since R2 is
involved in R(Ri) and C(Ri). R3 is a possible source of
unpredictable results when a delete operation is performed since it not
satisfies D1. The relational schema
of Example 2 does not satisfies D1 because
R1 is involved in both R(R2)
and C(R2); in such a way, R2 is a source of potential anomalies.
Safety conditions permit to establish that the source of anomalies is
the relation Ri and the places where anomalies occur is: the Rj
that is involved in C(Ri) and R(Ri), when D1 is not satisfied; the Rj
that is involved in C(Ri) and N(Ri) with the elements
(Rj,Y) and (Rj,Y’) respectively in such a way that Y and
Y’ overlap, when D2 is not
satisfied; the Rj that is involved in R(Ri) and N(Ri)
with the elements (Rj,Y) and (Rj,Y’) respectively in such
a way that Y and Y’ overlap, if D3 is
not satisfied; the Rj that is involved in N(Ri) with the
elements (Rj,Y) and (Rj,Y’) respectively in such a way
that Y and Y’ strictly overlap if D4
is not satisfied. In Markowitz (1994) the proof of necessity and sufficiency of
that conditions, is sketched.
4.3. Update
Operations
In the same way as previous operations, sets of
relations are built in order to find sources of potential anomalies.
4.3.1. Right Updates
In this case the following sets must
be built:
·
C(Ri) has the element (Ri,
-), and elements (Rj, FK),
such that there exists an element (Rk, Sk) belonging to C(Ri),
where Rj is a relation of R
linked to Rk in G by an
edge corresponding to a rir Rj[FK]<<Rk[Kk]:(a,b,mi,c) with a Cascades
option for updates, and: (a) Sk = Æ or
(b) Sk Ç Kk ¹ Æ.
·
N(Ri) contains elements (Rj,FK), such that there
exists an element (Rk,Sk) belonging to C(Ri),
where Rj is linked to Rk in G by an edge corresponding to a rir Rj[FK]<<Rk[Kk]:(a,b,mi,n) such that for each XÍY, there does not exist any rnn Rj:X¹l and: (a) Sk = Æ or (b) Sk Ç Kk ¹ Æ.
·
R(Ri) is formed by elements (Rj, FK), such that there
exists an element (Rk,Sk) belonging to C(Ri),
where Rj is a R's
relation connected to Rk in G
by an edge corresponding to: (1) a rir Rj[FK] << Rk[Kk]:
(a, b,mi, b) such that: (a) Sk = Æ or
(b) Sk Ç Kk ¹ Æ; (2) a rir Rj[FK] << Rk[Kk]: (a,b,mi,n) such that there exists X Í FK, associated to a rnn Rj:X¹l and: (a) Sk = Æ; or (b) Sk Ç Kk ¹ Æ.
It may be stated that a relational schema R is safe under right updates iff for each relation Ri
of R:
Ur1) there is not any relation Rj of R, such that there exists a pair of elements (Rj,Y) and
(Rj,Y’) belonging to C(Ri) and R(Ri)
respectively and Y and Y' overlap;
Ur2) there is not any relation Rj of R, such that there exists a pair of elements (Rj,Y) and
(Rj,Y’) belonging to C(Ri) and N(Ri)
respectively and Y overlaps Y’;
Ur3) there is not any relation Rj of R, such that there exists a pair of elements (Rj,Y) and
(Rj,Y’) belonging to R(Ri) and N(Ri)
respectively and Y overlaps Y’;
Ur4) there is not any relation Rj of R, such that there exists a pair of elements (Rj,Y) and
(Rj,Y’) belonging to the set N(Ri), and Y and Y’
strictly overlap.
Using those four conditions the sources of anomalies when updates to the
right side relation are performed, may be determined. For all cases, the source
is the relation Ri. If Ur1,
is not satisfied the irregularity will occur in a Rj contained in
both C(Ri)
and R(Ri)
with the elements (Rj,Y) and (Rj,Y’) respectively, where
Y and Y’ overlap. Analogously, when Ur2
is not satisfied the anomalies will be placed in a Rj involved
in both C(Ri) and N(Ri) with the elements
(Rj,Y) and (Rj,Y’) respectively, where Y overlaps Y’. The
same situation occurs when Ur3
is not attained. If Ur4,
is violated the place of anomalies will be Rj contained in N(Ri)
with the elements (Rj,Y) and (Rj,Y’) respectively where Y
and Y’ strictly overlap.
4.3.2. Left Updates
Insecure cases for referential
integrity when left updates are performed are the same as those studied for
insertions, if their modalities agree. The composition of the set changes
because the first edge must be considered with the update option but the
following ones must be seen as the ones corresponding to insertions. Safety
conditions are the same.
4.3.3. Both-Side Updates
In this case, problematic cases are the same as those
studied under left and right updates, then their analysis may be summarized
according to what was indicated for those operations.
5. Generation OF RULES
For each one of the relations that appeared as
potential sources of anomalies during the static analysis of the schema, rules
were built. They will permit to determine if anomalies are present in a given
database state. Such rules are expressed as serial combinations using the
logical operations not, and and the composition operator o. A serial combination from a specific
relation, is an expression representing a directed path in G, and links nodes with an incidence degree (delete and right
updates) or divergence degree (insertion and left updates) equal to 1.
The partial divergence (incidence) degree (pdd and pid
respectively) of a vertex is defined as the number of significant edges that
leave (reach) it. A significant edge is
defined according to the operation: i)
for deletions and left updates, every edge is a significant one; ii) for
insertions a significant edge is one representing a rir Ri[W]<<Rj[Kj]:(a, b,mi, md), with WÍKi; iii) for right updates a
significant edge is one representing a rir Ri[W]<<Rj[Kj]:(a,b,mi, md), with WÇKi ¹Æ.
In the restriction graph, seven types of nodes will be
distinguished according to their partial incidence or divergence degrees: 1) source node (pid=0); 2) sink node (pdd=0); 3) unifier node (pid³2 and pdd=1); branch node (pdd³2 and pid=1); passing node (pid=1 and pdd=1); multiple node (pid³2 and pdd³2); isolated
node (pid=0 and pdd=0).
5.1. Insertion Rules
In order to build the rules corresponding to potentially anomalous insert
operations, the following serial combinations must be considered:
·
C+ (Ij) º representing a G's directed path from a non-sink node
(Ri ) to a non-source node (Rjn), where the edges of the
path represent the following rir’s: Ri[FKi]<<Rj1[Kj1]:
(c,b,mi,md); Rj1[FK1]<<Rj2[Kj2]:(c,b,mi,md); ...; Rjn‑1[FKn‑1]<<
Rjn[Kjn]:(c,b,mi,md); with FK1ÍKj1; FK2 Í Kj2; ..;FKn-1 Í Kjn-1 and the first edge in the path is Ij.
·
N (Ij) º representing a directed path in G, composed by an only edge, that
corresponds to the rir Ij:Ri[FK] <<Rm[Km]:(n,b,mi,md), such that for each XÍFK, X is allowed to be null.
·
R (Ij) º representing a directed path in G, composed by a unique edge that
corresponds to: (1) the rir Ij:Ri[FK] <<Rm[Km]:(r,b,mi,md); or (2) the rir Ij:Ri[FK]<<Rm[Km]:(n,b, mi,md) such that XÍFK exists and X is restricted by a rnn.
·
Cr (Ij) º representing a directed
path in G leaving from a non-sink
node (Ri) and reaching a non-source node (Rj), where the
edges correspond to the following rir’s: (1) Ri[FKi]<<Rj1[Kj1]:(c,b,mi,md);Rj1[FK1]<<Rj2[Kj2]:(c,b,mi,md);....;Rjn‑1[FKn‑1]<<Rjn[Kjn]:(c,b,mi,md);Rjn[FKn]<<Rj[Kjj]:(r,b,mi,md); with FK1ÍKj1; FK2 Í Kj2;...;
FKn Í Kjn and the first edge
corresponds to Ij or (2) Ri[FKi]<<Rj1[Kj1]:(c,b,mi,md); Rj1[FK1]<<Rj2[Kj2]:(c,b,mi,md);...; Rjn-1[FKn-1]<<Rjn[Kjn]
:(c,b,mi,md); Rjn[FKn]<<Rj[Kj]:(n,b,mi,md); with FK1ÍKj1; FK2 Í Kj2;...;
FKn Í Kjn, where the first edge is associated to Ij; X Í FKn exists and X is not allowed to
have null values.
Algorithm: For an insertion operation
over a table that is a source of anomalies, a set of trees will be built in
order to support the rules generation, applying the following algorithm:
1.
Set the source of anomalies table
as the root of the tree. Set its straight descendents in the graph as
their children in the tree (the number
of children of the root node will be equal to the partial divergence degree of
the node in the graph).
2.
In each one of the branches of the tree (each of the internal nodes have
only one child), combine serially all sequences of nodes with a partial
divergence degree equal to 1, until a node with an ancestor reaching it with an
option not equal to Cascades or a
node with a partial divergence degree not equal to 1 is reached.
For each
tree, a rule is built. Each one of the branches of the trees will be a serial
combination since each internal node has a unique child. They will be assembled
by means of the and operator. If the
branch represents a serial combination Cr o R it will be preceded by the not
operator. If a branch ends in a node that is the root of another tree, the
serial combination of that branch (C+) is composed (o) with the rule corresponding to that
relation. Each one of the paths is considered in such a way that the treatment
of the same anomaly twice or more times is avoided. If a relation has no
associated rule, it is because it never produces anomalies when insert
operations are performed in it.
5.2. Delete Rules
The analysis of the different paths
in the restriction graph is analogous to that exposed for insertions, but in
this case the graph is scanned in the reverse direction. Besides, the algorithm
is quite similar to the one already depicted in the previous section.
Example: The rule for the source of anomalies
R3, according to the graph of Figure 1, is Rule R3:
C+(I6) and (not R(I4)). Figure 3 shows
their construction. For the evaluation of a rule the knowledge of the database
state, is essential. The mechanisms that perform the evaluation should be
refined in order to obtain an acceptable level of efficiency; on the contrary
the proposed strategy will not be applicable.

Figure 3: Rule for R3:
C+(I6) and (not R(I4))
5.3. Update Rules
Regarding
update operations, different situations may occur: i) in case of right updates,
the rules and the algorithm are similar to the one for deletions, taking into
account the overlapping of the attributes; ii) in case of left updates, the
algorithm is quite similar to the one for insertions, but in the first step,
the update strategy must be considered. From that point, this operation becomes
an insertion. For that reason, if in the generated tree there is a leaf reached
by the root, with a left update strategy 'Cascades',
then the serial combination of that branch (C+) will be composed (o) with the insertion rule of the
relation represented by that node (Clair, 1998); iii) in case of both sides
updates, rules and algorithms for one-side updates may be combined: Rule Ri: Ruleleft (Ri) and Ruleright (Ri)
6. GENERAL
PROCEDURE
In order to analyze the schema, the algorithm of Figure 4 must be
followed:
|
(Variables G:Graph; Vi: Vertex; Problem:
Boolean; N, C, R, NDir, Cdir, RDir: ListSure; Rule: String ) Init_Graph(G) Call Built_Graph(G) Foreach Vertex (Vi) in G Do Call FillSetDelete(G, Vi, N, C, R) Problem = False
Call AnalyzeSetDelete(N, C, R, Problem) If Problem Then
Call BuildRuleDelete(G, Vi, Rule) Call InsertRule (Vi, “d”, Rule) End If Call FillSetUpdateRight(G, Vi, N, C, R) Problem = False Call AnalyzeSetUpdateRight(N, C, R, Problem) If Problem Then
Call BuildRuleUpdateRight(G, Vi, Rule) Call InsertRule(Vi, “ur”, Rule) End If |
Call FillSetInsert(G, Vi, NDir, CDir, RDir, C, R) Problem = False Call
AnalyzeSetInsert(NDir, CDir, RDir, C, R, Problem) If Problem Then
Call BuildRuleInsert(G, Vi, Rule) Call InsertRule(Vi, “i”, Rule) End If Call FillSetUpdateLeft(G, Vi, NDir, CDir, RDir, C, R) Problem = False Call AnalyzeSetUpdateLeft (NDir, CDir, RDir, C, R, Problem) If Problem Then
Call BuildRuleUpdateLeft(G, Vi, Rule) Call InsertRule(Vi, “ul”, Rule) End If End Foreach End Main |
Figure 4: General Procedure
7. CONCLUSIONS
AND FUTURE WORK.
The
effects of basic operations over relations in a conceptual schema with
referential integrity constraints and null restrictions were studied. A minor
simplification of the static analysis of deletions, developed by Markowitz
(1994), was made. The analysis was extended to all update operations, in
despite of the fact that insertions and updates over the left-hand side
relation are generally performed with a Restricted
modality.
In
order to determine if a conceptual schema is sure with respect to all basic
operations, algorithms related with each one of them are presented, extending
by this way current research. A software tool was designed and implemented
(DepuSem), for the analysis of the rir and nna’s graph.
Important
points of symmetry have been detected: i) problematic nodes for insertions and left updates are
the same when the options are the same (obviously, the generated rules will be
the same); ii) problematic nodes related to right updates are also problematic
with respect to delete operations whenever their options are the same; iii) on
the contrary, unsure nodes for delete operations are not inevitably unsure with
respect to right updates, even if they have the same options. This is true
because the propagation of right update operations requires the overlapping of
primary and foreign keys.
The design of strategies
for the efficient evaluation of the rules must be faced since the proposed
monitoring process may become inapplicable if it slows down when the number of
tables and relationships increases.
8. REFERENCES
[1] Casanova, M. et al.. Optimization of Relational schemes containing inclusion dependencies. Proceedings of 15 VLDB Conference.
Amsterdam, pp.315-325. (1989)
[2]
Casanova, M.; Tucherman, L.: Enforcing
Inclusion Dependencies and Referential Integrity”. Proceedings 14th. VLDB Conference. L.A. Calif. (1988).
[3] Clair, D.; Loureiro, D. y
Vizcaino, M., Depurador Semántico de una Red de Restricciones. Graduate Thesis. Facultad Ciencias
Exactas. Universidad Nacional del Centro. BA.
Argentina. (1998)
[4]
Markowitz, V., Safe Referential
Integrity and Null Constraint Structures in Relational Databases. Personal
communication. (1994)
[5] Markowitz, V.: Referential Integrity Revisited: An Object-Oriented
Perspective. Proccedings of the 16th VLDB
Conference. Brisbane. Australia. (1990)
[6] Markowitz, V.: Safe Referential Integrity Structures in Relational
Databases. Proceedings 17th. VLDB
Conference. Barcelona, España (1991).
[7] Rivero, L, Doorn, J and
Loureiro, D., Semantic Integrity Constraint Network Validation. Research report RR 1999-001 ISISTAN. Facultad Ciencias Exactas.
Universidad Nacional del Centro. BA. Argentina. (1999)
[8] Rivero, L. and Doorn, J., Semantic Integrity in a Relational Database:
Properties and Rules. In Proc. XI
International Conference of Systems Engineering, ICSE-96, Las Vegas,
170-175 (1996)
[9]
Rivero,
L. et al., Un Analizador de una Red de Restricciones Semánticas. In Actas de
Intercon'98. IEEE Perú. Tacna. Perú. (1998).