Static Detection of Sources of Dynamic Anomalies in a Network of
Referential Integrity Restrictions
Laura C.
Rivero ISISTAN
- UNCentro LINTI - U.N.La Plata B.A.
Argentina Phone: +54 2293 440363 lrivero@exa.unicen.edu.ar |
Jorge H. Doorn
ISISTAN -
UNCentro, B.A.
Argentina Phone: +54 2293 440363 jdoorn@exa.unicen.edu.ar |
Daniel Loureiro
UNCentro, B.A. Argentina Phone: +54 2293 440363 |
ABSTRACT
Under
certain circumstances, basic operations over tables in a relational database,
where integrity restrictions such as referential and null restrictions have
been specified, may produce unpredictable results, not detectable by means of a
static analysis of the schema. When the design includes redundancies or when
the set of restrictions is contradictory it is easy to detect and prevent
future errors, but there are situations that require a dynamic analysis. In
this paper, the properties of networks of referencial integrity restrictions
that contain irregularities are analyzed, and the anomalies that may appear
when data actualization in such environment is done are studied in order to
define criteria and develop an algorithm to generate rules for proper handling
of inconsistencies.
1. INTRODUCTION
Semantic data control ensures the maintenance of database consistency,
by rejecting update transactions that lead to inconsistent states or by
activating specific actions on the database state to compensate the effect of
the previous transaction. Unpredictable results may be obtained in a relational
particular database state, when basic operations are applied. This
inconvenience is produced by redundancies in the schema design, by
contradictory referential integrity restrictions or simply because the designer
established complex restrictions having tuple dependent semantic. From a
procedural point of view, it means that the result may be unpredictable because
it is affected by the order in which individual restrictions are applied or by
the order in which the constraints are enforced.
Those problems are well-known and there are research
reports in relation with some aspects of them (see e.g. [Markowitz90],
[Markowitz91], [Casanova88], [Rivero96]) but the formalization of conditions
and the implementation of mechanisms for the automatic prevention of those
anomalies is a recent investigation field. In [Markowitz94] the uncertainties
produced by specific data manipulation are described, especially in delete
operations. In [Casanova89] special cases of updates propagation are described.
This research paper may be divided in three well-defined parts. The
first is developed in sections 3 and 4 and is devoted to refine Markowitz study
extending it to all operations. The second is dedicated to the specification of
algorithms able to detect potential sources of anomalies in a static way (Section
5) [Rivero98]. The last part (Section 6.) presents an algorithm that
automatically generates rules that allows the integrity verification.
2. RELATIONAL CONCEPTS
A relational schema is R º<R, D>, with R={R_{1}, R_{2}, ..., R_{m}} and D={FD, ID, NR} set of relations, FD and ID set of functional dependencies (fd) and inclusion
dependencies (id) respectively and NR
set of null restrictions (null constraint and null not allowed). A database
state for R is denoted by r ={r_{1}, r_{2},
..., r_{m}}; sch(R_{i}) represents
the set of attributes of R_{i}, K_{i} stands for a candidate key over Ri; and FK represents a foreign
key for R_{i}. A database state r associated with R is
consistent if it satisfies all restrictions in D. Two attribute sets overlap iff
they share attributes (YÇZ¹Æ), and strictly overlap iff they overlap but they are
not equal ((YÇZ¹Æ) Ù (Y¹Z)).
A functional dependency (fd) over a set of attributes U is
one expression of the form X®Y, where X,YÍU. If R_{i}eR, fd’s over R_{i}
will be indicated by R_{i}:X ® Y.
A null constraint (nna) may be expressed by R_{i}:L_{i }¹l. It is satisfied by R_{i} iff
for every tuple t of R_{i} the subtuple t.Li has only not null values.
There is at least one nna in R_{i}, that is R_{i}:
K_{i} ¹l.
One inclusion dependency (id) in R is an expression R_{i}[X]<<R_{j}[Z]:(a,b,m_{i},m_{d}), where R_{i}
and R_{j} are relation names (possibly the same); X,ZÍsch(R_{j}) are compatible attributes; a, b, m_{i} and m_{d} are the strategies to perform inserts, deletes and updates over the
left and right side respectively, and the strategies may be c (Cascades), r (Restricted) or n (Nullifies).
All combinations are studied in this article, however inserts and updates over
the left side are generally done using a y m_{i }specified with modality ‘r’. If Z is the
primary key of the relation R_{j}, then it is a key-based-id and X
constitutes a foreign key for R_{i}; this sort of id’s are named
referencial integrity restrictions (rir’s).
The referential integrity directed graph G=(V,H), associated with R. may be defined with V=R and
H = {(R_{i},R_{j},L:(a,b,m_{i},m_{d})) /R_{i}[L]<<R_{j}[K_{j}]:[a,b,m_{i},m_{d}] ÎID}. H
is composed by elements (R_{i},R_{j},
L: (a, b, m_{i}, m_{d})), where the edge goes from R_{i}
to R_{j}
and L:(a,b,m_{i}, m_{d}) is the label of the edge.
3. CONFLICTIVE MANIPULATIONS
In order to characterize the problem that may arise when some updates
are performed over data constrained by rir's the examples in Figures 1 and
2 will be used.
Example
1:
·
Relations (keys are underlined) (R_{1}) Employee ( NBR-E,
NBR-S, NBR-M, NBR-D, NBR-P ) (R_{2})
Manager ( NBR-M, NBR-P ) (R_{3})
Project ( NBR-P) (R_{4})
Department ( NBR-D, NBR-P ) ·
Null restrictions (N_{1})
Employee: NBR-E ¹ l (N_{2})
Manager: (NBR-M, NBR-P) ¹ l |
(N_{3})
Project: NBR-P ¹ l (N_{4})
Department: (NBR-D, NBR-P) ¹ l ·
Referential Integrity Restrictions (I_{1})
Manager[NBR-M]<<Employee[NBR-E]:(c,c,c,c) (I_{2})
Employee[NBR-S]<<Employee[NBR-E]:(n,r,n,r) (I_{3}) Employee[NBR-D,
NBR-P]<<Manager[NBR-M, NBR-P]:(c,c,c,n) (I_{4})
Manager[NBR-P]<<Project[NBR-P]:(r,r,c,c) (I_{5})
Employee[NBR-P, NBR-D]<<Department[NBR-P, NBR-D]:(c,c,c,c) (I_{6}) Department[NBR-P]<<Project[NBR-P]:(c,c,r,c) |
Figure
1: Restriction Graph, State and Restrictions for Example 1 (adapted from
[Markowitz94])
Example 2:
·
Relations (keys are underlined) (R_{1})
Films ( FILM#, PROD#, DIR#, Subject ) (R_{2})
Staff ( PERS#, NAME ) ·
Nulls not allowed Restrictions (N_{1})
Films: FILM# ¹ l |
(N_{2}) Staff: PERS# ¹ l ·
Referential Integrity Restrictions (I_{1})
Films[PROD#]<<Staff[PERS#]:(a_{1},b_{1},m_{i}_{1},m_{d}_{1}) (I_{2})
Films[DIR#]<<Staff[PERS#]:(a_{2},b_{2},m_{i}_{2},m_{d}_{2}) |
Figure 2: Restrictions Graph, State and Restrictions for Example 2.
The data manipulations
considered are insertions, deletions or updates of one or several tuples; a manipulation
involves only one kind of operation, it refers to an unique relation and
entails a set of tuples that do not change during the execution of the
manipulation. The specified constraints will be verified after each single
tuple operation. The manipulation will be successful if all of its single tuple
operations are carried out, otherwise it fails (is revoked). The above two
examples present more than just one anomalous behavior. In the following
sections, simplified subschemes of them are used to illustrate different
problems.
3.1. Problems in Insert
Operations
The effect of insertions is examined in this section.
Example 1-1: Consider the rir's and the database state
depicted in Figure 1, excluding in this case I_{1 }and_{ }I_{2}.
The operation I inserts the tuple (5 4 4 d x) in the relation r_{1}. I may cause the insertion (via I_{5} and
I_{6}) of the tuple (d) in
the relation r_{3}, or it may block the insert
operation (via I_{3 }and I_{4}). The result of I depends
on the order in which the rir’s that involve R_{1} are
enforced: (i) if I_{5 }is first considered, the tuple (d x) is inserted in r_{4}, this triggers the enforcing of I_{6}, which provokes the
insertion of the tuple (d) in r_{3},
then I_{3 }will cause the insertion of the tuple (4 d) in r_{4};_{ }or (ii) if I_{3} is taken
into account in the first place, the tuple (4 d) is inserted in r_{2}, and this triggers the enforcing of
I_{4} which blocks the insert operation since the tuple (d) does not exist in the relation r_{3}.
Example 2-1: Consider the example in Figure 2 and the corresponding state of the
database r = {r_{1},r_{2}}. If a_{1}: r and a_{2}: c and the operation I inserts the tuple (5 6 6 v) in the relation r_{1}, it may
trigger the insertion (via I_{2}) of the tuple (6 l) in r_{2}, or the insert operation may
be blocked (via I_{1}). The result of I depends on the order in which the rir’s that involve a R_{1
}are applied: (i) if I_{2 }is enforced in first place then the
tuple (5 6 6 v) is inserted in r_{1}
and the tuple (6 l) in r_{2}; or (ii) if I_{1 }is first taken into
account, I is blocked since there is
not a value 6 for PERSON-ID in R_{2}. If a_{1}: n
and a_{2}: c and it is required that PROD# and DIR#
foreign keys associated with R_{1} can not hold null values, the set null
modality for the insert operation becomes restricted, showing a behavior
similar to the previous.
3.2. Problems in Delete
Operations
A deletion D may trigger the
typical actions defined by the strategy, triggering other operations or, by the
contrary, blocking the manipulation. The outcome of D, may be unpredictable when the enforcement of the rir's
promoted by D implies the trigger of
updates or deletions of tuples, that in turn can blockade D if another path of G
is first considered.
Example 1-2: Consider the Example 1,
excluding in this case I_{1 }and_{ }I_{2}. Suppose that
D
involves the tuple (a) of
relation r_{3}. The tuple (4 a)
of r_{2} can block D via I_{4},
while D can trigger the deletion of
that tuple via I_{6}, I_{5}, and I_{1}. The result of D depends on the order of the
enforcement of the rir’s that involve R_{3}: (i) if I_{6 }is
verified in the first time, then (a x)
is deleted from r_{4}, then I_{5} is enforced, triggering the
deletion of tuples (1 4 x a) and (4 l x a) of r_{1}; finally I_{1}
promotes the suppression of tuple (4 a)
of r_{2}; on the other side (ii) if I_{3 }is first enforced, D is blocked by tuple (4 a) of r_{2}.
Example 2-2: Consider Example 2; with b_{1}: r, b_{2}: c; and r={r_{1}’, r_{2}}.
If D involves the tuple (2 yy) of r_{2},
the tuple (3 2 2 s) of r_{1}’ may block D via I_{1}, or may trigger the deletion of this tuple via
I_{2}. The outcome of the operation depends on the order of rir’s
involving R_{2} enforcement:
(i) If I_{2 }is taken into account in first place (3 2 2 s) and (2 1 2
q) are deleted from r_{1}’; or (ii) if I_{1 }is first
considered, D is blocked by the tuple
(3 2 2 s) of r_{1}’.
The
undesirable effects illustrated above increase when there is overlapping of the
foreign key attributes [Clair98].
3.3. Problems in Updates
The study of all possible updates may be divided in three cases: i) left update: the foreign key FK_{i}
of R_{i} is updated, and FK_{i}ÇK_{i}=Æ; ii) right update: the
primary key K_{j} of R_{j} is updated, and FK_{j}ÇK_{j}=Æ; iii) both sides update: the primary key K_{j} of R_{j}
is updated, and FK_{j}ÇK_{j}¹Æ.
3.3.1. Right Updates
Only the update of values belonging to primary keys in the relations R_{j}
will be taken into account. Let Ru be
the update of one or more tuples of one relation. Ru may promote actions like those seen for deletions: i) the update
of tuples referencing tuples involved in Ru
via rir’s
with Cascades modality; or ii) the
update of foreign key values in tuples referencing tuples involved in Ru via rir’s with Nullified modality. On the other hand,
if one tuple t references one tuple involved in Ru via one rir with a Restricted modality, t blocks the execution of Ru.
Example 1-3: Consider the Example 1, excluding I_{1} and I_{2}.
Suppose that Ru changes the tuple (a) by (d) in r_{3}. The result of this update depends on the order
in which the rir’s involving R_{3} are enforced: (i) if I_{6} is enforced in first place,
the tuple (a x) in r_{4 }is
modified to (d x), this leads to the
enforcement of I_{5}, which results in a failed attempt to modify the
value of the attribute PROJ# in (1 4 4 a
x) and (4 l l a x) of r_{1} to (d), where
the failure is due to the conflict of the new value by I_{3}; (ii) if I_{4}
is enforced in first place, the tuple (4
a) in r_{2 }is modified to (4
d), leading to the enforcement of I_{3} which in turns assigns null
values to the attributes DIR# and PROJ# in the tuple (1 4 4 a x) of r_{1}; then as a result
of enforcing I_{6}, (a x) in r_{4} is modified to (d x), and enforcing I_{5} would
provoke to modify the value of the attribute PROJ# in (4 - - a x) of the relation r_{1} to (d), and (1 4 - - x) in r_{1} does not hold any reference to r_{4}.
3.3.2. Left Updates
Only updates of values belonging to foreign keys will be taken into
account. Let Lu be the update of one
or more tuples of Ri. Lu may promote
actions such as: i) the insertion of the tuples referenced by the tuples
involved in Lu via rir’s
with Cascades modality; or ii) the
update of foreign key values in tuples involved in Lu referencing non existing tuples in a relation linked via one rir
with Nullified modality. On the
opposite if one tuple t references a non existing tuple in a relation
referenced via one rir with a Restricted
modality, t blocks the execution of Lu.
Problematic cases in left updates are the same that those for insertions.
3.3.3. Both-Side Updates
Only the update of values belonging to both, primary keys and foreign
keys in a relation belonging to two rir’s as the right side and the left
side respectively, will be taken into account. The problematic cases for
both-side update operations are the same that those detected for left and right
updates.
4. STATIC DETECTION OF
ANOMALIES
In section 3, different anomalies in the
manipulation of data, has been examined.
In
this section the mechanisms needed to detect the potential presence of
anomalies will be detailed. This will be done adhering to the [Markowitz94] and
[Casanova89] approaches. Safeness conditions must ensure:
1 - Data manipulations must produce only one result regardless the order
in which the rir’s are enforced and the order in which the tuples are
accessed.
2 - A data manipulation, must map a consistent database state r to another consistent database state r’, this is in concordance with the immediate mode verification used in
this article.
The present study is driven by the immediate mode in what refers to
integrity verification. On the other hand, deferred mode is an advantageous and
even a mandatory well-known strategy for integrity maintenance. However, the
full understanding of the immediate mode is required for the analysis of deferred
mode that is under development in this project.
The relations in whom the anomalies may appear during the execution of
operations over the database can be determined through safeness conditions. To
accomplish that, the following sets of relations will be defined: C(R_{i}),
R(R_{i}), N(R_{i}), CDir(R_{i}), RDir(R_{i}),
and NDir(R_{i}). These sets are formed by elements (R_{j}, FK) including R_{i},
where R_{j} Î R, and FK is one foreign key associated with R_{j}.
4.1. Insert Operations
The sets needed to detect sources of inconsistencies are:
·
CDir(R_{i}) contains elements (R_{j},FK),
where R_{j} is a relation connected in G to R_{i} by one edge corresponding to one rir
with Cascades modality for
insertions.
·
C(R_{i}) contains elements (R_{j},FK), where one relation
R_{t} of C(R_{i}) or CDir(R_{i}) is connected in G to R_{j} by an edge
corresponding to a rir with a Cascades modality
for insertions, of the form R_{t}[FK]<<R_{j}[K_{j}]:(c,b,m_{i},m_{d}), where: FKÍK_{t},
and K_{t }is primary key of R_{t};
·
NDir(R_{i}) contains elements (R_{j},FK),
where R_{j} is one relation connected in G to R_{i} by one edge corresponding to a rir
with Nullified modality for
insertions and for every XÍY, there does not exist any nna
R_{i}:X¹l.
·
RDir(R_{i}) contains elements of the form (R_{j},FK),
where R_{j} is one relation connected in G to R_{i} by one arc corresponding to one of the
followings: (1) one rir R_{i}[FK]<<R_{j}[K_{j}]:(r,b,m_{i},m_{d}); (2) one rir R_{i}[FK]<<R_{j}[K_{j}]
:(n,b,m_{i},m_{d}), where there are at least one XÍFK,
with one nna of the form R_{i}:X¹l.
·
R(R_{i}) contains elements of the form (R_{j},FK), where one relation
R_{t} of C(R_{i}) or CDir(R_{i}) is connected in
G to R_{j} by one edge corresponding to: (1) one rir R_{t}[FK]
<<R_{j}[K_{j}]:(r,b,m_{i},m_{d}) and FKÍK_{t}, where K_{t }is the
primary key of R_{t}; (2) one rir R_{t}[FK]<<R_{j}[K_{j}]:(n,b,m_{i},m_{d}), where FKÍK_{t},
where K_{t }is the primary key of R_{t} and there exists XÍFK, with at least one nna R_{t}:X¹l.
Note that in delete or right update operations, the relation that
enforces the rir by Nullified does
not suffer any modification in its attributes since it propagates the operation's
effect to the left relation. By the contrary, insertions or left updates by Nullified, sets null their own involved
attributes, voiding any reference to another relation. This is why the
definition of the sets RDir, CDir and NDir
is needed. It may be said that the relational scheme R is safe in insert operations
iff for every relation R_{i}
of R:
I1) there is not any relation R_{j} of R belonging to C(R_{i}) or CDir(R_{i})
and R(R_{i})
or RDir(R_{i})
at the same time;
I2) there is not any pair of elements (R_{j},Y) and (R_{k},Y’)
belonging to CDir(R_{i}) and NDir(R_{i}) respectively,
where: (i) R_{j}=R_{k} or (ii) Y and Y’ overlaps;
I3) there is not any relation R_{j}
of R belonging to C(R_{i})
and NDir(R_{i}) at the same time;
I4) there is not exist any pair of
elements (R_{j},Y) and (R_{k},Y’) belonging to NDir(R_{i})
respectively, where Y and Y’ strictly overlaps;
I5) there is not any pair of elements (R_{j},Y) and (R_{k},Y’)
belonging to RDir(R_{i}) and NDir(R_{i}) respectively,
and Y and Y’ overlaps.
Example: The relational scheme of the Example 1-1 of section 3.1 does not
satisfy I1, since the relation R_{3} is involved in the sets R(R_{1})
and C(R_{1}),
then R_{1} is a possible
source of unpredictable results during inserts.
Safe
conditions may be used to place the affected relations during insert operations:
i) if I1 is not satisfied, the
affected relation is R_{j} which is involved in C(R_{i}) or CDir(R_{i}))
and R(R_{i})
or RDir(R_{i})
at the same time; ii) if I3 is not
satisfied, the affected relation is R_{j} which is involved in C(R_{i})
and NDir(R_{i})
at the same time; iii) if I2, I4 or I5 are not satisfied, the place where different results may appear
is R_{i}.
Proposition: For every relation R_{i}
of R, for every database state r associated with R, and for every
insertion I involving one or more
tuples of the relation r_{i} of r associated with the relation R_{i}, I maps r into an unique state of the
database iff R satisfies the previous
conditions. In [Rivero99] the proof of that proposition is depicted.
4.2. Delete Operations
For delete operations, sets that help to detect conflictive nodes are:
·
C(R_{i}) contains the element (R_{i}, -), and elements (R_{j}, FK), where R_{j}
is a relation connected to R_{i}
in G for an oriented path
formed by edges corresponding to rir’s with Cascades mode for deletions, such that the first edge is labeled
FK:(a,c,m_{i},m_{d}).
·
N(R_{i}) contains elements (R_{j},
FK), where R_{j} is a relation connected to R_{m} of C(R_{i})
in G by an edge corresponding to a rir
R_{j}[FK]<<R_{m}[K_{m}]:(a, n, m_{i}, m_{d}), and for each X Í FK, X is allowed to have null
values.
·
R(R_{i}) contains elements (R_{j},
FK), where R_{j} is a relation connected to R_{m} of C(R_{i})
in G by an edge
corresponding to: (1) a rir with Restricted mode and labeled with the foreign key FK; (2) a rir
R_{j}[FK]<<R_{m}[K_{m}]:(a, n, m_{i}, m_{d}), such that there exists XÍFK, and may not take null values.
By involving those operations that may be rejected because they try to
nullify attributes associated to a nna to the set R(R_{i}), the
sets _{}and _{}defined by Markowitz (1994), are no longer required [Rivero98].
The relational schema R is
sure iff for each R_{i}
in R:
D1) there is not any relation R_{j}
of R involved in both C(R_{i})
and R(R_{i});
this condition avoids the deletion of tuples that block the operation when
another path is followed;
D2) there is not any pair of elements (R_{j}, Y) and (R_{j},
Y’) involved in C(R_{i}) and N(R_{i}) respectively, such
that Y and Y’ overlap. It avoids the
updating or deletion according to the path of rir’s that is enforced in
the first place;
D3) there is not any pair of elements (R_{j}, Y) and (R_{j},
Y’) belonging to R(R_{i}) and N(R_{i}) respectively, such
that Y and Y’ overlap. This condition prevents that the tuples affected by a
delete operation were modified or stay unaltered blocking the operation,
according to the order in which the restrictions are verified;
D4) there is not any pair of elements (R_{j}, Y) and (R_{j},
Y’) belonging to the set N(R_{i}), such that Y and Y’
strictly overlap. In such a way different results when updates with Nullifies strategy are performed, are
avoided.
Other combinations of sets either are symmetric to those previously
exposed or do not produce anomalies. The relational schema of Example 1 is
unpredictable since R_{2} is
involved in R(R_{i}) and C(R_{i}). R_{3} is a possible source of
unpredictable results when a delete operation is performed since it not
satisfies D1. The relational schema
of Example 2 does not satisfies D1 because
R_{1} is involved in both R(R_{2})
and C(R_{2}); in such a way, R_{2} is a source of potential anomalies.
Safety conditions permit to establish that the source of anomalies is
the relation R_{i} and the places where anomalies occur is: the R_{j}
that is involved in C(R_{i}) and R(R_{i}), when D1 is not satisfied; the R_{j}
that is involved in C(R_{i}) and N(R_{i}) with the elements
(R_{j},Y) and (R_{j},Y’) respectively in such a way that Y and
Y’ overlap, when D2 is not
satisfied; the R_{j} that is involved in R(R_{i}) and N(R_{i})
with the elements (R_{j},Y) and (R_{j},Y’) respectively in such
a way that Y and Y’ overlap, if D3 is
not satisfied; the R_{j} that is involved in N(R_{i}) with the
elements (R_{j},Y) and (R_{j},Y’) respectively in such a way
that Y and Y’ strictly overlap if D4
is not satisfied. In Markowitz (1994) the proof of necessity and sufficiency of
that conditions, is sketched.
4.3. Update
Operations
In the same way as previous operations, sets of
relations are built in order to find sources of potential anomalies.
4.3.1. Right Updates
In this case the following sets must
be built:
·
C(R_{i}) has the element (R_{i},
-), and elements (R_{j}, FK),
such that there exists an element (R_{k}, S_{k}) belonging to C(R_{i}),
where R_{j} is a relation of R
linked to R_{k} in G by an
edge corresponding to a rir R_{j}[FK]<<R_{k}[K_{k}]:(a,b,m_{i},c) with a Cascades
option for updates, and: (a) S_{k }= Æ or
(b) S_{k }Ç K_{k }¹ Æ.
·
N(R_{i}) contains elements (R_{j},FK), such that there
exists an element (R_{k},S_{k}) belonging to C(R_{i}),
where R_{j} is linked to R_{k} in G by an edge corresponding to a rir R_{j}[FK]<<R_{k}[K_{k}]:(a,b,m_{i},n) such that for each XÍY, there does not exist any rnn R_{j}:X¹l and: (a) S_{k }= Æ or (b) S_{k }Ç K_{k }¹ Æ.
·
R(R_{i}) is formed by elements (R_{j}, FK), such that there
exists an element (R_{k},S_{k}) belonging to C(R_{i}),
where R_{j} is a R's
relation connected to R_{k} in G
by an edge corresponding to: (1) a rir R_{j}[FK] << R_{k}[K_{k}]:
(a, b,m_{i}, b) such that: (a) S_{k }= Æ or
(b) S_{k }Ç K_{k }¹ Æ; (2) a rir R_{j}[FK] << R_{k}[K_{k}]: (a,b,m_{i},n) such that there exists X Í FK, associated to a rnn R_{j}:X¹l and: (a) S_{k }= Æ; or (b) S_{k }Ç K_{k }¹ Æ.
It may be stated that a relational schema R is safe under right updates iff for each relation R_{i}
of R:
U_{r}1) there is not any relation R_{j} of R, such that there exists a pair of elements (R_{j},Y) and
(R_{j},Y’) belonging to C(R_{i}) and R(R_{i})
respectively and Y and Y' overlap;
U_{r}2) there is not any relation R_{j} of R, such that there exists a pair of elements (R_{j},Y) and
(R_{j},Y’) belonging to C(R_{i}) and N(R_{i})
respectively and Y overlaps Y’;
U_{r}3) there is not any relation R_{j} of R, such that there exists a pair of elements (R_{j},Y) and
(R_{j},Y’) belonging to R(R_{i}) and N(R_{i})
respectively and Y overlaps Y’;
U_{r}4) there is not any relation R_{j} of R, such that there exists a pair of elements (R_{j},Y) and
(R_{j},Y’) belonging to the set N(R_{i}), and Y and Y’
strictly overlap.
Using those four conditions the sources of anomalies when updates to the
right side relation are performed, may be determined. For all cases, the source
is the relation R_{i}. If U_{r}1,
is not satisfied the irregularity will occur in a R_{j} contained in
both C(R_{i})
and R(R_{i})
with the elements (R_{j},Y) and (R_{j},Y’) respectively, where
Y and Y’ overlap. Analogously, when U_{r}2
is not satisfied the anomalies will be placed in a R_{j} involved
in both C(R_{i}) and N(R_{i}) with the elements
(R_{j},Y) and (R_{j},Y’) respectively, where Y overlaps Y’. The
same situation occurs when U_{r}3
is not attained. If U_{r}4,
is violated the place of anomalies will be R_{j} contained in N(R_{i})
with the elements (R_{j},Y) and (R_{j},Y’) respectively where Y
and Y’ strictly overlap.
4.3.2. Left Updates
Insecure cases for referential
integrity when left updates are performed are the same as those studied for
insertions, if their modalities agree. The composition of the set changes
because the first edge must be considered with the update option but the
following ones must be seen as the ones corresponding to insertions. Safety
conditions are the same.
4.3.3. Both-Side Updates
In this case, problematic cases are the same as those
studied under left and right updates, then their analysis may be summarized
according to what was indicated for those operations.
5. Generation OF RULES
For each one of the relations that appeared as
potential sources of anomalies during the static analysis of the schema, rules
were built. They will permit to determine if anomalies are present in a given
database state. Such rules are expressed as serial combinations using the
logical operations not, and and the composition operator o. A serial combination from a specific
relation, is an expression representing a directed path in G, and links nodes with an incidence degree (delete and right
updates) or divergence degree (insertion and left updates) equal to 1.
The partial divergence (incidence) degree (pdd and pid
respectively) of a vertex is defined as the number of significant edges that
leave (reach) it. A significant edge is
defined according to the operation: i)
for deletions and left updates, every edge is a significant one; ii) for
insertions a significant edge is one representing a rir R_{i}[W]<<R_{j}[K_{j}]:(a, b,m_{i}, m_{d}), with WÍK_{i}; iii) for right updates a
significant edge is one representing a rir R_{i}[W]<<R_{j}[K_{j}]:(a,b,m_{i}, m_{d}), with WÇK_{i} ¹Æ.
In the restriction graph, seven types of nodes will be
distinguished according to their partial incidence or divergence degrees: 1) source node (pid=0); 2) sink node (pdd=0); 3) unifier node (pid³2 and pdd=1); branch node (pdd³2 and pid=1); passing node (pid=1 and pdd=1); multiple node (pid³2 and pdd³2); isolated
node (pid=0 and pdd=0).
5.1. Insertion Rules
In order to build the rules corresponding to potentially anomalous insert
operations, the following serial combinations must be considered:
·
C^{+ }(I_{j}) º representing a G's directed path from a non-sink node
(R_{i} ) to a non-source node (R_{jn}), where the edges of the
path represent the following rir’s: R_{i}[FK_{i}]<<R_{j1}[K_{j1}]:
(c,b,m_{i},m_{d}); R_{j1}[FK_{1}]<<R_{j2}[K_{j2}]:(c,b,m_{i},m_{d}); ...; R_{jn‑1}[FK_{n‑1}]<<
R_{jn}[K_{jn}]:(c,b,m_{i},m_{d}); with FK_{1}ÍK_{j1;} FK_{2} Í K_{j2; }.._{;}FK_{n-1 }Í K_{jn-1} and the first edge in the path is I_{j}.
·
N (I_{j}) º representing a directed path in G, composed by an only edge, that
corresponds to the rir I_{j}:R_{i}[FK] <<R_{m}[K_{m}]:(n,b,m_{i},m_{d}), such that for each XÍFK, X is allowed to be null.
·
R (I_{j}) º representing a directed path in G, composed by a unique edge that
corresponds to: (1) the rir I_{j}:R_{i}[FK] <<R_{m}[K_{m}]:(r,b,m_{i},m_{d}); or (2) the rir I_{j}:R_{i}[FK]<<R_{m}[K_{m}]:(n,b, m_{i},m_{d}) such that XÍFK exists and X is restricted by a rnn.
·
C_{r }(I_{j}) º representing a directed
path in G leaving from a non-sink
node (R_{i}) and reaching a non-source node (R_{j}), where the
edges correspond to the following rir’s: (1) R_{i}[FK_{i}]<<R_{j1}[K_{j1}]:(c,b,m_{i},m_{d});R_{j1}[FK1]<<R_{j2}[K_{j2}]:(c,b,m_{i},m_{d});....;R_{jn‑1}[FK_{n‑1}]<<R_{jn}[K_{jn}]:(c,b,m_{i},m_{d});R_{jn}[FK_{n}]<<R_{j}[K_{jj}]:(r,b,m_{i},m_{d}); with FK_{1}ÍK_{j1;} FK_{2} Í K_{j2;...;
}FK_{n} Í K_{jn} and the first edge
corresponds to I_{j} or (2) R_{i}[FK_{i}]<<R_{j1}[K_{j1}]:(c,b,m_{i},m_{d}); R_{j1}[FK1]<<R_{j2}[K_{j2}]:(c,b,m_{i},m_{d});...; R_{jn-1}[FK_{n-1}]<<R_{jn}[K_{jn}]
:(c,b,m_{i},m_{d}); R_{jn}[FK_{n}]<<R_{j}[K_{j}]:(n,b,m_{i},m_{d}); with FK_{1}ÍK_{j1}; FK_{2} Í K_{j2};...;_{
}FK_{n} Í K_{jn}, where the first edge is associated to I_{j;} X Í FK_{n} exists and X is not allowed to
have null values.
Algorithm: For an insertion operation
over a table that is a source of anomalies, a set of trees will be built in
order to support the rules generation, applying the following algorithm:
1.
Set the source of anomalies table
as the root of the tree. Set its straight descendents in the graph as
their children in the tree (the number
of children of the root node will be equal to the partial divergence degree of
the node in the graph).
2.
In each one of the branches of the tree (each of the internal nodes have
only one child), combine serially all sequences of nodes with a partial
divergence degree equal to 1, until a node with an ancestor reaching it with an
option not equal to Cascades or a
node with a partial divergence degree not equal to 1 is reached.
For each
tree, a rule is built. Each one of the branches of the trees will be a serial
combination since each internal node has a unique child. They will be assembled
by means of the and operator. If the
branch represents a serial combination C_{r} o R it will be preceded by the not
operator. If a branch ends in a node that is the root of another tree, the
serial combination of that branch (C^{+}) is composed (o) with the rule corresponding to that
relation. Each one of the paths is considered in such a way that the treatment
of the same anomaly twice or more times is avoided. If a relation has no
associated rule, it is because it never produces anomalies when insert
operations are performed in it.
5.2. Delete Rules
The analysis of the different paths
in the restriction graph is analogous to that exposed for insertions, but in
this case the graph is scanned in the reverse direction. Besides, the algorithm
is quite similar to the one already depicted in the previous section.
Example: The rule for the source of anomalies
R_{3},_{ }according to the graph of Figure 1, is Rule R_{3}:
C^{+}(I_{6}) and (not R(I_{4})). Figure 3 shows
their construction. For the evaluation of a rule the knowledge of the database
state, is essential. The mechanisms that perform the evaluation should be
refined in order to obtain an acceptable level of efficiency; on the contrary
the proposed strategy will not be applicable.
Figure 3: Rule for R_{3}:
C^{+}(I_{6}) and (not R(I_{4}))
5.3. Update Rules
Regarding
update operations, different situations may occur: i) in case of right updates,
the rules and the algorithm are similar to the one for deletions, taking into
account the overlapping of the attributes; ii) in case of left updates, the
algorithm is quite similar to the one for insertions, but in the first step,
the update strategy must be considered. From that point, this operation becomes
an insertion. For that reason, if in the generated tree there is a leaf reached
by the root, with a left update strategy 'Cascades',
then the serial combination of that branch (C^{+}) will be composed (o) with the insertion rule of the
relation represented by that node (Clair, 1998); iii) in case of both sides
updates, rules and algorithms for one-side updates may be combined: Rule R_{i}: Rule_{left} (R_{i}) and Rule_{right} (R_{i})
6. GENERAL
PROCEDURE
In order to analyze the schema, the algorithm of Figure 4 must be
followed:
(Variables G:Graph; Vi: Vertex; Problem:
Boolean; N, C, R, NDir, Cdir, RDir: ListSure; Rule: String ) Init_Graph(G) Call Built_Graph(G) Foreach Vertex (Vi) in G Do Call FillSetDelete(G, Vi, N, C, R) Problem = False
Call AnalyzeSetDelete(N, C, R, Problem) If Problem Then
Call BuildRuleDelete(G, Vi, Rule) Call InsertRule (Vi, “d”, Rule) End If Call FillSetUpdateRight(G, Vi, N, C, R) Problem = False Call AnalyzeSetUpdateRight(N, C, R, Problem) If Problem Then
Call BuildRuleUpdateRight(G, Vi, Rule) Call InsertRule(Vi, “ur”, Rule) End If |
Call FillSetInsert(G, Vi, NDir, CDir, RDir, C, R) Problem = False Call
AnalyzeSetInsert(NDir, CDir, RDir, C, R, Problem) If Problem Then
Call BuildRuleInsert(G, Vi, Rule) Call InsertRule(Vi, “i”, Rule) End If Call FillSetUpdateLeft(G, Vi, NDir, CDir, RDir, C, R) Problem = False Call AnalyzeSetUpdateLeft (NDir, CDir, RDir, C, R, Problem) If Problem Then
Call BuildRuleUpdateLeft(G, Vi, Rule) Call InsertRule(Vi, “ul”, Rule) End If End Foreach End Main |
Figure 4: General Procedure
7. CONCLUSIONS
AND FUTURE WORK.
The
effects of basic operations over relations in a conceptual schema with
referential integrity constraints and null restrictions were studied. A minor
simplification of the static analysis of deletions, developed by Markowitz
(1994), was made. The analysis was extended to all update operations, in
despite of the fact that insertions and updates over the left-hand side
relation are generally performed with a Restricted
modality.
In
order to determine if a conceptual schema is sure with respect to all basic
operations, algorithms related with each one of them are presented, extending
by this way current research. A software tool was designed and implemented
(DepuSem), for the analysis of the rir and nna’s graph.
Important
points of symmetry have been detected: i) problematic nodes for insertions and left updates are
the same when the options are the same (obviously, the generated rules will be
the same); ii) problematic nodes related to right updates are also problematic
with respect to delete operations whenever their options are the same; iii) on
the contrary, unsure nodes for delete operations are not inevitably unsure with
respect to right updates, even if they have the same options. This is true
because the propagation of right update operations requires the overlapping of
primary and foreign keys.
The design of strategies
for the efficient evaluation of the rules must be faced since the proposed
monitoring process may become inapplicable if it slows down when the number of
tables and relationships increases.
8. REFERENCES
[1] Casanova, M. et al.. Optimization of Relational schemes containing inclusion dependencies. Proceedings of 15 VLDB Conference.
Amsterdam, pp.315-325. (1989)
[2]
Casanova, M.; Tucherman, L.: Enforcing
Inclusion Dependencies and Referential Integrity”. Proceedings 14th. VLDB Conference. L.A. Calif. (1988).
[3] Clair, D.; Loureiro, D. y
Vizcaino, M., Depurador Semántico de una Red de Restricciones. Graduate Thesis. Facultad Ciencias
Exactas. Universidad Nacional del Centro. BA.
Argentina. (1998)
[4]
Markowitz, V., Safe Referential
Integrity and Null Constraint Structures in Relational Databases. Personal
communication. (1994)
[5] Markowitz, V.: Referential Integrity Revisited: An Object-Oriented
Perspective. Proccedings of the 16th VLDB
Conference. Brisbane. Australia. (1990)
[6] Markowitz, V.: Safe Referential Integrity Structures in Relational
Databases. Proceedings 17th. VLDB
Conference. Barcelona, España (1991).
[7] Rivero, L, Doorn, J and
Loureiro, D., Semantic Integrity Constraint Network Validation. Research report RR 1999-001 ISISTAN. Facultad Ciencias Exactas.
Universidad Nacional del Centro. BA. Argentina. (1999)
[8] Rivero, L. and Doorn, J., Semantic Integrity in a Relational Database:
Properties and Rules. In Proc. XI
International Conference of Systems Engineering, ICSE-96, Las Vegas,
170-175 (1996)
[9]
Rivero,
L. et al., Un Analizador de una Red de Restricciones Semánticas. In Actas de
Intercon'98. IEEE Perú. Tacna. Perú. (1998).