PDF
A memory-layout oriented run-time technique for locality optimization
Y. Yan, X. Zhang and Z. Zhang
Proceedings of 1998 International Conference on Parallel Processing,
(ICPP'98), August 1998, pp. 189-196.
Abstract
--------
Exploiting locality at run-time is a complementary approach
to a compiler approach for those applications with dynamic
memory access patterns. This paper proposes a memory-layout
oriented approach to exploit cache locality for parallel
loops at run-time on Symmetric Multi-Processor (SMP) systems.
Guided by application dependent hints and the targeted cache
architecture, it reorganizes and partitions a parallel loop
through shrinking and partitioning the memory access space of
the loop at run-time. In the generated task partitions, the
data sharing among partitions is minimized and data reuse in
a partition is maximized. The execution of tasks in partitions
is scheduled in an adaptive and locality-preserved way to
achieve balanced execution, for minimizing the execution time
of applications by trading off load balance and locality.
Based on simulation and measurement, we show our run-time approach
can achieve comparable performance with the compiler optimizations
for two applications, whose load balance and cache locality can be
well optimized by the tiling and other program transformations.
However, our experimental results also show that our approach is able
to significantly improve the memory performance for the applications
with dynamic memory access patterns. This type of programs are usually
hard to be optimized by compilers.