|Application-level Virtual Memory Management in Real-Time Multiprocessor Systems|
|Felicia Ionescu||University "Politehnica" of Bucharest
The paper is dealing with the problem of virtual memory management in multiprocessor systems for real-time visualization of large virtual scenes. A virtual scene (or synthetic environment) is an integrated set of data elements that describe a defined geographical region. Large virtual scenes, including extended geographical regions, can be represented as a number of virtual segments, which usually, require a memory space much greater than physical memory of the image generator. The virtual memory scheme implemented by the operating system based on page swapping on page fault interrupt is not acceptable in real-time image generation and have to be replaced with an application-level memory management scheme. The physical memory is dynamically allocated to different segments of the virtual scene, depending on the relative position of the segment towards the observer position. We do not expect a memory miss interrupt in order to swap virtual segments, and we must prevent a memory miss situation by repeatedly execution of a mapping function, which allows that a virtual segment is present in physical memory when it is needed. The representation and visualization of large synthetic environments was studied and experimented on a Silicon Graphics multiprocessor, Onyx, with four MIPS R10000 processors and Infinite Reality graphic accelerator, under IRIX 6.4 operating system.
|Effects of Communication Characteristics on Task Mapping Quality on a 2-D Mesh with Wormhole Routing|
|Soo-Young Lee ||Auburn University
Given that computational load is well balanced, task mapping is mainly concerned with reducing communication overhead. How much communication time can be reduced by optimizing allocation of tasks on a multiprocessor system would
be dependent on several factors. In this study, dependency of improvement by task mapping on communication pattern is investigated on a mesh with wormhole routing. Communication pattern may be described by a set of parameters
such as the total number of messages, the number of sources, the number of destinations, and distribution
of message size. Through extensive simulation, it has been shown that the communication pattern has a significant effect on reduction in communication overhead that can be achieved by task mapping. Effects of individual communication parameters have been analyzed.
|Task Scheduling using a Block Dependency DAG for Block-Oriented Sparse Cholesky Factorization|
|Sung Je Hong||POSTECH
Block-oriented sparse Cholesky factorization decomposes
a sparse matrix into rectangular sub-blocks;
each block can then be handled as a computational unit in order to
increase data reuse in a hierarchical memory system.
Also, the factorization method increases the degree of concurrency
with the reduction of communication volumes so that
it performs more efficiently on a distributed-memory multiprocessor system
than the customary column-oriented factorization method.
But until now, mapping of blocks to processors has been designed
for load balance with restricted communication patterns.
In this paper, we represent tasks using a block dependency DAG
that shows the execution behavior of
block sparse Cholesky factorization in a distributed-memory system.
Since the characteristics of tasks for the block Cholesky factorization
are different from those of the conventional parallel task model,
we propose a new task scheduling algorithm using
a block dependency DAG.
The proposed algorithm consists of two stages:
early-start clustering and affined cluster mapping.
The early-start clustering stage is used to cluster tasks
with preserving the earliest start time of a task
without limiting parallelism. After task clustering,
the affined cluster mapping stage allocates
clusters to processors considering both communication cost
and load balance.
Experimental results on the Fujitsu parallel system
show that the proposed task scheduling approach
outperforms other processor mapping methods.
|Concurrent Transactional Replicated Servers|
|Ricardo Jimenez-Peris||Universidad Politecnica de Madrid
|Marta Patiņo-Martinez||Universidad Politecnica de Madrid
|Sergio Arevalo||Universidad Rey Juan Carlos
Transactions have been proposed to provide data consistency in the presence of failures and concurrent accesses.
Replication of servers increases the availability of data and service.
When active (or synchronous) replication is used all the requests to a replicated server are processed by all the replicas of the server.
Replicas execute independently, for this reason maintaining the consistency of the replicas state is a hard problem.
The replicas must exhibit a deterministic behavior to keep the consistency, which it is usually achieved with a sequential single-threaded code (that is, by means of non-concurrent replicas).
However, single-threaded code is not always feasible, in particular, servers processing transactions sequentially are not acceptable in a transactional setting.
In this paper a new approach is presented to allow concurrent active replicated servers in a transactional distributed framework.
In our approach replicas of a server belong to a group and clients communicate with the replicas with a totally ordered and atomic multicast.
Determinism of multithreaded replicated servers is achieved through a deterministic scheduling.
|A Lightweight Causal Logging Scheme for Recoverable Distributed Shared Memory|
|Taesoon Park||Sejong University
|Heon Y. Yeom||Seoul National University
This paper presents a new causal logging scheme for lazy release consistent
distributed shared memory systems. For the efficient implementation of causal
logging, data structures and operations supported by the lazy release
consistency memory model are utilized. Also, unlike the previous scheme which
logs the vector clock for each synchronization operation, the proposed scheme
adds the minimum information to recreate the corresponding vector clock, into
the existing write notice structures. As a result, the additional information carried
in each message becomes less than two integers for each synchronization
interval, and hence, fault-tolerance can be achieved with very little overhead.
Moreover, the size of the additional information is independent of the number of
processes in the system, which means that the new scheme can be very effective
for the large size systems. To evaluate the performance of the proposed scheme,
the logging protocols have been implemented on top of the CVM(Coherent Virtual
Machine), and the experimental results show that the proposed scheme achieves
35% - 66% of reduction in the log size.
|Scatter and Gather Operations on an Asynchronous Communication Model|
|Susumu Shibusawa||Ibaraki University
|Hiroyuki Makino||Ibaraki University
|Shigeki Nimiya||Ibaraki University
|Jun-ichi Hatta||Ibaraki University
Collective communication such as scatter, gather, broadcast, and total exchange plays an important role on the performance of parallel and distributed processing. So far, several parallel collective communication algorithms have been studied in synchronous communication models, but it is necessary to design and analyze parallel algorithms using a more realistic communication model. This paper introduces an asynchronous communication model between processors, and evaluates the execution time of operations which include scatter, gather and some computations for data initially located at one processor. The results show that the execution time in sending data in a single direction is less than that of dual directions for asynchronous communication. This paper also introduces the logarithmic correction of the bisection communication tree, and evaluates the communication time of several instances.