SAC 2000: Electronic Proceedings

Parallel and Distributed Computing Track

Track chairs:Pradip Srimani, Colorado State University
Mahmoud Omari, Mu'tah University
The following papers submitted to the Parallel and Distributed Computing Track in SAC 2000 are available on-line:
Application-level Virtual Memory Management in Real-Time Multiprocessor Systems
 
Felicia IonescuUniversity "Politehnica" of Bucharest
Abstract

The paper is dealing with the problem of virtual memory management in multiprocessor systems for real-time visualization of large virtual scenes. A virtual scene (or synthetic environment) is an integrated set of data elements that describe a defined geographical region. Large virtual scenes, including extended geographical regions, can be represented as a number of virtual segments, which usually, require a memory space much greater than physical memory of the image generator. The virtual memory scheme implemented by the operating system based on page swapping on page fault interrupt is not acceptable in real-time image generation and have to be replaced with an application-level memory management scheme. The physical memory is dynamically allocated to different segments of the virtual scene, depending on the relative position of the segment towards the observer position. We do not expect a memory miss interrupt in order to swap virtual segments, and we must prevent a memory miss situation by repeatedly execution of a mapping function, which allows that a virtual segment is present in physical memory when it is needed. The representation and visualization of large synthetic environments was studied and experimented on a Silicon Graphics multiprocessor, Onyx, with four MIPS R10000 processors and Infinite Reality graphic accelerator, under IRIX 6.4 operating system.

Full paper
Effects of Communication Characteristics on Task Mapping Quality on a 2-D Mesh with Wormhole Routing
 
Soo-Young Lee Auburn University
Abstract

Given that computational load is well balanced, task mapping is mainly concerned with reducing communication overhead. How much communication time can be reduced by optimizing allocation of tasks on a multiprocessor system would be dependent on several factors. In this study, dependency of improvement by task mapping on communication pattern is investigated on a mesh with wormhole routing. Communication pattern may be described by a set of parameters such as the total number of messages, the number of sources, the number of destinations, and distribution of message size. Through extensive simulation, it has been shown that the communication pattern has a significant effect on reduction in communication overhead that can be achieved by task mapping. Effects of individual communication parameters have been analyzed.

Full paper
Task Scheduling using a Block Dependency DAG for Block-Oriented Sparse Cholesky Factorization
 
Heejo LeePOSTECH
Jong KimPOSTECH
Sung Je HongPOSTECH
Sunggu LeePOSTECH
Abstract

Block-oriented sparse Cholesky factorization decomposes a sparse matrix into rectangular sub-blocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the reduction of communication volumes so that it performs more efficiently on a distributed-memory multiprocessor system than the customary column-oriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that shows the execution behavior of block sparse Cholesky factorization in a distributed-memory system. Since the characteristics of tasks for the block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consists of two stages: early-start clustering and affined cluster mapping. The early-start clustering stage is used to cluster tasks with preserving the earliest start time of a task without limiting parallelism. After task clustering, the affined cluster mapping stage allocates clusters to processors considering both communication cost and load balance. Experimental results on the Fujitsu parallel system show that the proposed task scheduling approach outperforms other processor mapping methods.

Full paper
Concurrent Transactional Replicated Servers
 
Ricardo Jimenez-PerisUniversidad Politecnica de Madrid
Marta Patiņo-MartinezUniversidad Politecnica de Madrid
Sergio ArevaloUniversidad Rey Juan Carlos
Abstract

Transactions have been proposed to provide data consistency in the presence of failures and concurrent accesses. Replication of servers increases the availability of data and service. When active (or synchronous) replication is used all the requests to a replicated server are processed by all the replicas of the server. Replicas execute independently, for this reason maintaining the consistency of the replicas state is a hard problem. The replicas must exhibit a deterministic behavior to keep the consistency, which it is usually achieved with a sequential single-threaded code (that is, by means of non-concurrent replicas). However, single-threaded code is not always feasible, in particular, servers processing transactions sequentially are not acceptable in a transactional setting. In this paper a new approach is presented to allow concurrent active replicated servers in a transactional distributed framework. In our approach replicas of a server belong to a group and clients communicate with the replicas with a totally ordered and atomic multicast. Determinism of multithreaded replicated servers is achieved through a deterministic scheduling.

Full paper
A Lightweight Causal Logging Scheme for Recoverable Distributed Shared Memory
 
Taesoon ParkSejong University
Heon Y. YeomSeoul National University
Abstract

This paper presents a new causal logging scheme for lazy release consistent distributed shared memory systems. For the efficient implementation of causal logging, data structures and operations supported by the lazy release consistency memory model are utilized. Also, unlike the previous scheme which logs the vector clock for each synchronization operation, the proposed scheme adds the minimum information to recreate the corresponding vector clock, into the existing write notice structures. As a result, the additional information carried in each message becomes less than two integers for each synchronization interval, and hence, fault-tolerance can be achieved with very little overhead. Moreover, the size of the additional information is independent of the number of processes in the system, which means that the new scheme can be very effective for the large size systems. To evaluate the performance of the proposed scheme, the logging protocols have been implemented on top of the CVM(Coherent Virtual Machine), and the experimental results show that the proposed scheme achieves 35% - 66% of reduction in the log size.

Full paper
Scatter and Gather Operations on an Asynchronous Communication Model
 
Susumu ShibusawaIbaraki University
Hiroyuki MakinoIbaraki University
Shigeki NimiyaIbaraki University
Jun-ichi HattaIbaraki University
Abstract

Collective communication such as scatter, gather, broadcast, and total exchange plays an important role on the performance of parallel and distributed processing. So far, several parallel collective communication algorithms have been studied in synchronous communication models, but it is necessary to design and analyze parallel algorithms using a more realistic communication model. This paper introduces an asynchronous communication model between processors, and evaluates the execution time of operations which include scatter, gather and some computations for data initially located at one processor. The results show that the execution time in sending data in a single direction is less than that of dual directions for asynchronous communication. This paper also introduces the logarithmic correction of the bisection communication tree, and evaluates the communication time of several instances.

Full paper
Page generated: 17:59 14 Feb 2000 by R.Inder@ed.ac.uk