Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems
Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang and P. Sadayappan
In Proceedings of the 14th International Symposium on High-Performance Computer Architecture, Salt Lake City, UT February 16-20, 2008 PDF
Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all ex- isting studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation inac- curacy. To address these issues, we have taken an efficient software approach to supporting both static and dynamic cache partitioning in OS through memory address map- ping. We have comprehensively evaluated several represen- tative cache partitioning schemes with different optimiza- tion objectives, including performance, fairness, and qual- ity of service (QoS). Our software approach makes it possi- ble to run the SPEC CPU2006 benchmark suite to comple- tion. Besides confirming important conclusions from previ- ous work, we are able to gain several insights from whole- program executions, which are infeasible from simulation. For example, giving up some cache space in one program to help another one may improve the performance of both programs for certain workloads due to reduced contention for memory bandwidth. Our evaluation of previously pro- posed fairness metrics is also significantly different from a simulation-based study.
The contributions of this study are threefold. (1) To the best of our knowledge, this is a highly comprehen- sive execution- and measurement-based study on multicore cache partitioning. This paper not only confirms important conclusions from simulation-based studies, but also pro- vides new insights into dynamic behaviors and interaction effects. (2) Our approach provides a unique and efficient option for evaluating multicore cache partitioning. The im- plemented software layer can be used as a tool in multi- core performance evaluation and hardware design. (3) The proposed schemes can be further refined for OS kernels to improve performance.