Sword: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs

IPDPS 2018 screenshot

Abstract

The detection and elimination of data races in large-scale OpenMP programs is of critical importance. Unfortunately, today's state-of-the-art OpenMP race checkers suffer from high memory overheads and/or miss races. In this paper, we present Sword, a data race detector that significantly improves upon these limitations. Sword limits the application slowdown and memory usage by utilizing only a bounded, user-adjustable memory buffer to collect targeted memory accesses. When the buffer fills up, the accesses are compressed and flushed to a file system for later offline analysis. Sword builds on an operational semantics that formally captures the notion of concurrent accesses within OpenMP regions. An offline race checker that is driven by these semantic rules allows Sword to improve upon happens-before techniques that are known to mask races. To make its offline analysis highly efficient and scalable, Sword employs effective self-balancing interval-tree-based algorithms. Our experimental results demonstrate that Sword is capable of detecting races even within programs that use over 90% of the memory on each compute node. Further, our evaluation shows that it matches or exceeds the best available dynamic OpenMP race checker in detection capability while remaining efficient in execution time.

Citation

Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Ignacio Laguna, Greg L. Lee, Dong H. Ahn
Sword: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs
Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), 845--854, doi:10.1109/IPDPS.2018.00094, 2018.

BibTeX

@inproceedings{2018_ipdps_agrlla,
  title = {Sword: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs},
  author = {Simone Atzeni and Ganesh Gopalakrishnan and Zvonimir Rakamaric and Ignacio Laguna and Greg L. Lee and Dong H. Ahn},
  booktitle = {Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
  publisher = {IEEE Computer Society},
  pages = {845--854},
  doi = {10.1109/IPDPS.2018.00094},
  year = {2018}
}

Acknowledgements

This work was performed under the auspices of the U.S. Department of Energy by LLNL under Contract DE-AC52-07NA27344 (LLNL-CONF740324), NSF OAC 1535032, and NSF CCF 1704715.