Copyright ACM, 2000

The GOLD Definition Language (GDL): An Object Oriented Formal Specification Language For Multidimensional Databases

 

 

Juan Trujillo, Manuel Palomar, Jaime Gómez

Research Group of Logic Programming and Information Systems

Dept. of Languages and Information Systems. University of Alicante

Apto. Correos 99. E-03071. 03690, Alicante. Spain

e-mail: {jtrujillo, mpalomar, jgomez}@dlsi.ua.es

ABSTRACT

The GOLD Definition Language (GDL) is an Object Oriented (OO) formal specification language for the modeling of multidimensional databases. The OO multidimensional data model called GOLD is based on the OO paradigm, which allows us to consider key issues in multidimensional modeling that are hardly considered by other multidimensional models such as derived measures, derived dimension attributes, the additivity of measures and multiple classification hierarchies. In this paper, we define the GDL for the GOLD model and demonstrate that its power of expression enables us to consider all the peculiarities of multidimensional modeling.

Keywords

Data warehouses, multidimensional databases, OLAP, conceptual modeling, Object-Orientation

1. INTRODUCTION

Multidimensional Databases (MDB) in conjunction with data warehouses and OLAP analysis, are becoming more and more important. OLTP systems impose different requirements than OLAP systems, and different data models and implementation methods for each type of system are therefore required. The Entity-Relationship (E/R) model is commonly used to represent an OLTP application at the conceptual level. This model, however, is not capable of sufficiently representing MD data [16]. For this reason, a popular alternative conceptual model used to conceptualize the data in a DW is the multidimensional view of data (cube or hypercube) [17].

Nevertheless, there is not a formal multidimensional data model that is commonly accepted for OLAP applications. Many multidimensional data models have been proposed recently such as [1], [2], [4], [9], [11], [14] and [15] (a comprehensive review of these can be found in [3]). Most of them, however, focus on either an OLAP query language such as [2], [11] and [15] or on a presumption about a subsequent implementation, such as [11] or [15]. Furthermore, they are all lacking in certain key issues in multidimensional modeling, such as derived measures, derived dimension attributes and the additivity of measures.

The GOLD model ([5], [7]), however, has demonstrated that the application of the OO paradigm allows us to consider key issues in multidimensional modeling that are hardly considered by other models, such as derived measures, derived dimension attributes, the additivity of measures and multiple classification hierarchies.

In this paper, we propose an OO definition language (GDL) for the GOLD model. This language is an extension of the OASIS formal specification language (an OO formal specification language developed by the Research Group of Logic Programming and Software Engineering in the Technical University of Valencia ([13], [12]) that allows us to specify the peculiarities associated with MDB.

This paper is organized as follows. In Section 2 we introduce the classical multidimensional model by means of an example that will be dominant throughout the paper. Section 3 summaries the GOLD model and provides its main definitions. Section 4 defines the GOLD Definition Language (GDL) and demonstrate its power of expression by applying it to the example in Section 2. Finally, Section 5 draws some conclusions and sketches further research issues.

2. THE CLASSICAL MULTIDIMEN-SIONAL MODEL THROUGH AN EXAMPLE

In this model, a fact is an item of interest for an enterprise, and is described through a set of attributes called measures or fact attributes, which are contained in cells or points in the data cube. Moreover, this set of measures are based on a set of dimensions that are the granularity adopted for representing facts, (i.e., the context in which facts are to be analyzed). Thus, dimensions are also characterized by attributes that are called dimension attributes. Another relevant feature of the model is the classification hierarchy defined on attributes along dimensions, which determines how fact instances may be aggregated and selected significantly for the decision making process. Furthermore, multiple classification hierarchies and alternative path hierarchies are also relevant in MD modeling. A final relevant feature is the concept of additivity on measures along dimensions. A measure is additive along a dimension if the sum operator can be used to aggregate attribute values along all hierarchies defined in that dimension.

Let us consider from now on a very simple example in which the fact is the sale of products in a great store-chain, and measures are the number of clients, product_price, quantity_sold and total_price (total_price is a derived measure obtained by the fomula total_price= product_price* quantity_sold). Furthermore, the dimensions considered to analyze data are product, store and date. Let us add that the measure number of clients (estimated by counting the number of purchase-receipts for a given product) is not additive along the dimension product as this could lead to semantic errors

On the other hand, and according to S. Chaudhuri and U. Dayal (1997) in [17], the set of operations generally applied to the MD view of data are as follows: roll-up (increasing the level of aggregation) and drill-down (decreasing the level of aggregation) along one or more dimension hierarchies, slice-dice (selection and projection) and pivoting (re-orienting the MD view of data).

3. THE GOLD MODEL

In this section we summarize the basic concepts of the GOLD model ([5], [6], [7] and [8]) to help the reader understand the concepts and definitions handled in the following section. In the GOLD model, fact classes (FC) are specified as composite classes in an aggregation relation where dimension classes (DC) are the components. Cube classes (CC), to which OLAP operations are applied to allow us to accomplish a subsequent data analysis, are then defined from these DC and FC, based on user requirements.

Unlike other models presented so far, the GOLD model considers derived measures and derived dimension attributes by defining complex predicates using both arithmetic operations and relational grouping functions. With respect to the additivity of fact attributes (only considered in [10]) the GOLD model provides Aggregation Patterns (AP) to represented it. Thus, fact attributes can be additive if the SUM operator can be applied along all dimensions, semi-additive if it is not additive along one or more dimensions, and non-additive if it is additive along no dimension.

Finally, relationships between dimension attributes are considered by a directed acyclic and weakly connected graph in which each edge represents a -to-one relationship between attributes. In Figure 1 it can be observed the directed acyclic graphs of the dimensions defined in the example of section 2. We can therefore distinguish between Attribute Roll-up Relation Paths (ARRP), and Attribute Classification Paths (ACP), depending on whether there is a classification hierarchy defined on attributes along the path or not.

For another thing, users can query the database basic schema by cube classes. These cube classes will encapsulate not only data but also operations allowed on objects in the given class. Thus, OLAP operations (roll-up, drill-down, slice-dice and pivoting; and the extended combine and divide [8]) are considered as public methods of cube classes.

 

Figure 1. Directed acyclic graphs of dimensions

 

4. THE GOLD DEFINITION LANGUAGE (GDL)

In the GDL, the specification of a class is a description of the structure and behavior of the objects belonging to that class. Each specification of a class begins with the word class and consists of a number of sections or paragraphs, according to the formal specification of the GOLD model. The GDL is based on the OASIS ([13], [12]) formal specification language, an OO formal specification language for open information systems developed by the Research Group of Logic Programming and Software Engineering in the Technical University of Valencia. Based on the OASIS class template, we define new sections to represent the GOLD model, and therefore, to enable us to represent all the peculiarities of multidimensional modeling in a natural way. In this sense, GOLD class templates may be seen as multidimensional extensions to OASIS class templates, and therefore, GDL may also be considered as an extension of OASIS. Therefore, before the GDL introduction, we will briefly introduce the OASIS formal specification language.

4.1. Introduction To The OASIS Formal Specification Language

An OASIS object can be viewed as a cell with a state and a set of services. The set of services is the object's interface among objects, which allows other objects to access the state. Object evolution is characterized in terms of changes of states. Events that represent atomic changes of state can be grouped into transactions. In this sense, an object can be defined as an observable object in terms of its changes of state. For this purpose, a class definition is enriched with the specification of the process attached to the class in terms of events and transactions.

On the other hand, objects are grouped into classes, which represent a collection of objects sharing the same template. The template must allow for the declaration of an identification mechanism, the signature of the class including attributes and methods, and finally, a set of formulae of different kinds to cover the rest of class properties, such as dynamic constraints, triggers and preconditions. The sections that will be used in the GDL are as follows:

Finally, OASIS deals with complexity by introducing aggregation and inheritance operators. A more detailed description of the OASIS formal language can be found in [13], and the complete description of its semantics and formal foundations, in [12].

4.2. Representing The GOLD Model In The GOLD Definition Language (GDL)

Having presented the foundations of the OASIS model as the starting point of the GDL, in Table 1 we map the structures and concepts defined in the GOLD model into the sections of the GDL class templates.

As it can be observed in Table 1 and in Figure 2, we define a single class template for both dimension and fact classes. This allows us to consider all structures defined in the GOLD model although there are sections that are used in just one type of class (Table 1).

 

The GOLD model

The GOLD definition Language (GDL)

Dimension class

Class template

Key Attribute

Identification

Dimension attributes

Constant_attributes

Derived dimension attributes

Derived_Attributes

Derivation predicates

Derivation

Directed acyclic graphs (ag)

 

ARRP (Roll-up paths)

Attribute_Roll-up_Relation_Paths

ACP

Attribute_Classification_Paths

Events

Events

Fact class

Complex class as an aggregation of n Classes

Key Attribute

Identification

Fact attributes

Constant_attributes

Derived Fact attributes

Derived_Attributes

Derivation predicates

Derivation

Aggregation Patterns

Static constraints

Events

Events

Table 1. Representing the GOLD model in the GOLD Definition Language (GDL)

 

A special mention is needed for the following three sections in the GDL class template, i.e. Attribute_Roll-up_Relation_paths (ARRP), Attribute_Classification_ Paths (ACP) and static constrains. The Path sections are used to define the two different kinds of paths within the graph, i.e. ARRP and ACP paths (see section3). These two sections are considered as process algebra describing the operation that might be applied from one attribute to another to accomplish a subsequent data analysis.

 

Class class name

Identification

Identification correspondencies;

Constant_attributes

Attribute name: attribute type;

Derived_attributes

Attribute name: attribute type

Attribute_Roll-up_Relation_paths

Attribute name = A list of [Roll-up | Drill-down] attribute name

Attribute_Classification_Paths

Attribute name = A list of [Combine | Divide] attribute name private_events declaration of variables

event name (list of variables) new;

event name (list of variables) delete;

constraints declaration of variables

static

[constant_attribute | derived_attribute] [å | f | c ] a list of class_names | ‘ALL’

derivation declaration of variables

derivation formula

end class

Figure 2. The class template for the GOLD Definition Language (GDL)

 

Like other authors, such as R. Kimball in [16], we believe that both rolling-up and drilling-down are also possible even though there is not an explicit classification hierarchy defined on attributes. For this reason, we define in [8] two more OLAP operations, combine and divide, to increase and decrease the level of aggregation, respectively, when there is not any classification hierarchy defined on attributes.

Therefore, we define four reserved words to represent the relationships between dimension attributes. Thus, Roll-up and drill-down are used along ARRP’s paths, whereas combine and divide along ACP’s. It should be taken into account that drill-down and divide are necessary to express that leaf nodes can only decrease the level of aggregation.

A last consideration must be made with regard to the Aggregation Pattern (AP) structure defined in the GOLD model. Aggregation Patterns can be seen as static constraints on fact attributes, in the sense that they describe which fact attributes can be aggregated along which dimensions. We have therefore redefined the former section of static constraints of OASIS to allow us to express these AP.

4.2. Dimension Classes

To validate the power of expression provided by the GDL, the application of the previous GDL class template (Figure 2) to the product dimension class (example in section 2) can be seen in Figure 3. It can be observed that the two sections based on the process algebra are powerful enough to represent the graph that defines relationships between dimension attributes. Furthermore, complex classification hierarchies such as the alternative path hierarchy in the product dimension are also represented, as we can aggregate data from the type attribute to either the marketing group one or the family one by means of the application of the Roll-up operation (see section Attribute_Roll-up_Relation_Paths in Figure 3). Moreover, our ARRP and ACP enable us to consider multiple classification hierarchies, such as the one in the store dimension (see example in Section 2).

As the aim of this paper is to demonstrate the power of expression of the GDL, and for the sake of simplicity, we will merely show how the GDL deals with the multiple classification hierarchy in the store dimension in Figure 4, in which it can be observed that data can be aggregated to the community attribute, by means of the Roll-up operation, either through the sale area attribute or through the path City, Province and Community.

Class product

Identification

By_cod_product : (KA)

Constant_attributes

KA, Colour, Group, Department, Manager, Mark_group, Type, Supplier, Family, Brand: string;

Weight: float; Quantity: nat;

Derived_attributes

transport_cost : float;

Attribute_Roll-up_Relation_paths

KA= Roll-up brand | Roll-up type

Brand = Drill-down KA

type= Roll-up Mark_group | Roll-up family | Drill-down KA

Mark_group = Drill-down type

family= Roll-up group | Drill-down type

group = Roll-up department | Drill-down family

department = Drill-down group

Attribute_Classification_Paths

KA= Combine colour | Combine weight | Combine quantity | Combine supplier | Combine trans_cost

colour= Divide KA

weight= Divide KA

quantity = Divide KA

supplier = Divide KA

trans_cost= Divide KA

Department = Combine manager

Manager = Divide department

private_events

new_ product new; del_product delete;

derivation

transport_cost =quntity * weight

end_class

Figure 3. The class template for product dimension class

 

Class store

.....more sections.....

Attribute_Roll-up_Relation_paths

KA= Roll-up city | Roll-up sale area

sale area = Drill-down KA | roll-up community

city = Roll-up province | Drill-down KA

province = Roll-up community | Drill-down city

community = Drill-down province | Drill-down sale area

Attribute_Classification_Paths

KA= Combine phone | Combine address

phone= Divide KA

address= Divide KA

.....more sections.....

end_class

Figure 4. The class template for store dimension class

 

4.3. Fact Classes

In MD modeling, and therefore in the GOLD model, a fact class (FC) denotes a many-to-many relationship between n dimension classes (DC). The GOLD model, therefore, defines a FC as a composite class in an aggregation relation in which n DC are the components. The concept of aggregation of classes is taken into account by all OO formal specification languages, by providing a specific operator to construct complex classes from basic classes as an aggregation relation. In an aggregation relation, a complex class (composite class) is built from classes called component classes. Different OO formal specification languages, however, consider aggregation in different ways. We use the operator provided by OASIS for aggregation and provide some constraints to allow us to represent the FC of the GOLD model in the GDL.

In transactional systems, and therefore in OO databases, there are several ways to construct a complex class from other classes as an aggregation relation. In MD modeling, however, fact classes have a single way of being constructed. Because of this, we mention only the different concepts and reserved words of the syntax of the aggregation operator in OASIS that are specifically relevant to multidimensional modeling, and therefore, to represent FC in the GDL.

Therefore, the syntax of the operator that the GDL provides for constructing fact classes is as follows:

COMPLEX CLASS class_name AGGREGATION OF {class_name ([RELATIONAL] [ESTATIC] [NODISJOINT] [FLEXIBLE] [UNIVALUED] [NOT NULL]) }

 

COMPLEX CLASS sales_products AGGREGATION OF product, store, time

Identification

By_cod_sales (KA)

Constant_attributes

KA : string; product_price, qty_sold, n_of_clients: nat;

Derived_attributes

total_price: nat;

private_events

new_ product new, del_product delete;

constraints static

product_price å ALL, qty_sold å ALL, n_of_clients c product, n_of_clients å {time, store}, total_price å ALL

derivation

total_price= (sales_product.product_price * sales_product.qty_sold)

end_class

Figure 5. The class template for sales_product fact class

 

In Figure 5 the application of the GDL class template for the construction of the sales_product fact class can be seen. It is noteworthy that some sections used for dimension classes are now not employed and the section static constraints is only used in the definition of complex classes. Furthermore, the complex class construction operator syntax will take the above-mentioned reserved words by default.

5. CONCLUSIONS

It has been demonstrated that the GOLD model considers all key issues in MD modeling, some of which are scarcely considered by other models, such as derived measures, derived dimension attributes, the additivity of measures and multiple classification hierarchies.

In this paper, we have presented the GOLD Definition Language (GDL) as an extension of the OASIS formal specification language to represent all concepts of the GOLD model. Based on the OASIS class template, we define new sections that enable us to model MD databases. We have demonstrated that the expressiveness of the GDL enables us to consider all the peculiarities of MD conceptual modeling. The advantage of both the GOLD model and the GDL is the improvement of the expressiveness of other proposals presented up to now, since our OO approach takes advantage of existing formal specification languages.

Further studies will concentrate on providing a graphical notation for conceptual modeling as well as the automatic generation of GDL templates from this graphical notation.

6. REFERENCES

 

[1]

A. Datta and H. Thomas, "A conceptual Model and an algebra for On-Line Analytical Processing in Data Warehouse", Workshop on Information Technologies and Systems, Atlanta, 1997.

[2]

C. Li and X. Wang, "A Data Model for Supporting On-Line Analytical Processing". Proc. in Intl. Conf. on Information and Knowledge Management, (CIKM’96), Rockville (Maryland) USA Nov. 1996, pp. 81-88.

[3]

C. Sapia, M. Blaschka, G. Höfling, and B. Dinter, "An Overview of Multidimensional Data Models for OLAP". Technical Report, http//www.forwiss.tu-muenchen.de/~system42/.

[4]

J. Gray, A. Bosworth, A. Layman and H. Pirahesh, "Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub Totals". Data Mining and Knowledge Discovery Journal, vol. 1, no 1, 1997.

[5]

J. Trujillo and M. Palomar, "An Object Oriented Approach to Multidimensional Database Conceptual Modeling (OOMD)". In Proc. of the ACM 1st International Workshop on Data warehousing and OLAP (DOLAP’98), Washington D.C., The USA, Nov. 1998, pp. 16-21.

[6]

J. Trujillo and M. Palomar, "An Object Oriented Approach to Multidimensional Databases & OLAP operations". Journal of Computer Science and Information Management, 1999. (To appear).

[7]

J. Trujillo, "The GOLD model: An Object Oriented multidimensional data model for multidimensional databases". 9th ECOOP Workshop for PhD Students in Object Oriented Systems (PhDOOS '99), Lisbon, Portugal, June, 1999.

[8]

J. Trujillo, M. Palomar and J. Gomez "Detecting patterns and OLAP operations in the GOLD model". In Proc. of the ACM 2st International Workshop on Data warehousing and OLAP (DOLAP’99), Kansas city, Missouri, USA, Nov.

[9]

L. Cabibbo and R. Torlone, "A Logical Approach to Multidimensional Databases". Lecture Notes in Computer Science, number 1377, in proc. of the 6th Int. Conf. On Extending Database Technology, (EDBT’98), Valencia, Spain, March, pp. 183-197.

[10]

M. Golfarelli and S. Rizzi, "A methodological Framework for Data Warehouse Design". In Proc. Of the ACM 1st International Workshop on Data warehousing and OLAP (DOLAP’98), Washington D.C., The USA, Nov. 1998, pp. 3-9.

[11]

M. Gyssens and L. Lakshmanan "A Foundation for Multi-Dimensional Databases". Proc. in the 33rd Intl. Conf. On Very Large Database Conference (VLDB’97), Athens, Greece, August, pp.106-115.

[12]

O. Pastor and I. Ramos. "OASIS 2.1.1: A Class-Definition Language to Model Information Systems Using and Object-Oriented Approach". Servicio de Publicaciones. Universidad Politécnica de Valencia, 3rd edition, 1995.

[13]

O. Pastor, F. Hayes, and S. Bear. OASIS: AN Object Oriented Specification Language". In P. Loucopoulos, editor, Advanced Information Systems _Engineering , volume 593 of Lecture Notes in Computer Science, pages 348-363. Springer-Verlag, 1992.

[14]

P. Vassiliadis. "Modeling Multidimensional Databases, cube and cube operations". In proc. of 10th Intl. Conf. on Stadistical and Scientific Databases (SSDBM’98), Capri, 1998.

[15]

R. Agrawal, A. Gupta and S. Sarawagi, "Modeling Multidimensional Databases". Proc. 13th Intl. Conf. On Data Engineering, (ICDE’97), Birmingham, U.K., April 1997, pp. 232-243.

[16]

R. Kimball, "The data warehousing toolkit". John Wiley, 1996.

[17]

S. Chaudhuri, and U. Dayal, "An Overview of Data Warehousing and OLAP technology". ACM Sigmod Record, vol. 26, no 1, March 1997.

[18]

W. Lehner "Modelling Large Scale OLAP Scenarios". Lecture Notes in Computer Science, number 1377 in proc. of the 6th Int. Conf. On Extending Database Technology, (EDBT’98), Valencia, Spain. March 1998, pp. 153-167

 

Copyright 2000 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or fee.
SAC 2000 March 19-21 Como, Italy
(c) 2000 ACM 1-58113-239-5/00/003>...>$5.00