INTRODUCTION

Fault tolerance is becoming increasingly important for distributed systems as they are more widely used [5]. Several techniques have been proposed to provide fault tolerance in distributed systems, two of the best known are transactions and group communication. Transactions [9] have been proposed to guarantee data consistency in the presence of failures and concurrent accesses; they provide atomicity and order requirements for a set of operations. On the other hand, group communication [1] or multicast has been proposed as a building block for reliable distributed systems, providing atomicity and ordering guarantees for a single operation (message). A transaction is a sequence of operations that fulfills the following properties, known as ACID properties: In the group communication model, a server consists of a set of processes called group. Clients multicast requests to server groups. The multicast can provide different levels of consistency. Reliable multicast guarantees that a request is received either by all the correct members of the group or by none of them. FIFO multicast ensures that messages sent from a sender to a receiver are delivered in the same order that they were sent. Totally ordered multicast guarantees that all the messages multicast to the same group are delivered in the same order in all the members of the group. High availability is another desirable property of fault-tolerant systems and it is achieved by means of replication. Primary-backup and active (also known as synchronous) replication are two approaches for replication. In the former approach a replica is distinguished as primary and it processes all the requests; that replica checkpoints its state from time to time to the backup replicas. If the primary fails a backup replica can take over. The latter approach is more uniform in the sense that all the replicas execute the same set of invocations. That is, no replica is distinguished and no action must be taken in case a replica fails as all the replicas process all the requests and have the same state. Active replication is more appropriate in applications which require uninterrupted services, with minimum overhead during failures since there is no election of a primary or protocol to ensure that backups are consistent. The correctness criterion for replication in transactional environments is called 1-copy serializability [4]. This correctness criterion consists in that the replicated server must behave as there were no replication at all from an external point of view. The main difficulty when active replication is used is how to keep the state of all the replicas synchronized. Totally ordered reliable multicast can be used to guarantee the consistency of a set of replicas and at the same time to reduce the probability of deadlocks [3]. If all the replicas receive and process the same set of requests in the same order, that is, replicas behave as state machines [18] (they are deterministic), the state will be the same in all the replicas. This holds if the servers are not multithreaded. However, single-threaded replicas cannot be always used, for instance, in a transactional setting is not admissible to process transactions sequentially. In this paper we propose a new approach to provide multithreaded (and thus concurrent) replicated servers based on active replication in a transactional framework using group communication. In the proposed approach replicas are multithreaded and their determinism is achieved by means of a deterministic scheduling of the requests. A replicated server is implemented as a replicated group of processes. Clients multicast requests to servers using a totally ordered reliable multicast. This kind of multicast ensures the proper synchronization of external events in all the replicas. This approach allows interleaving the execution of concurrent transactions within a server without losing replica determinism. This scheduling algorithm has been used in TransLib [10], an adaptable object-oriented library to program distributed transactional systems. TransLib can be used as an stand-alone transactional library, but it is also used as run-time support for Transactional Drago [16] a transactional language that extends Ada 95 [2] with group communication and transactions. The paper is structured as follows, next section introduces client/server interaction mechanism used in our approach, multithreaded rendezvous, that it is necessary to understand the context of the scheduling algorithm. Section 3 describes the scheduler structures and the scheduling algorithm. Then we make a comparison with related work in section 4 and finally we present our conclusions.


1999-11-17