PRUNERS: Providing Reproducibility for Uncovering Non-Deterministic Errors in Runs on Supercomputers

IJHPCA 2019 screenshot

Abstract

Large scientific simulations must be able to achieve the full-system potential of supercomputers. When they tap into high-performance features, however, a phenomenon known as non-determinism may be introduced in their program execution, which significantly hampers application development. PRUNERS is a new toolset to detect and remedy non-deterministic bugs and errors in large parallel applications. To show the capabilities of PRUNERS for large application development, we also demonstrate their early usage on real-world production applications.

Citation

Kento Sato, Ignacio Laguna, Gregory L. Lee, Martin Schulz, Christopher M. Chambreau, Simone Atzeni, Michael Bentley, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Geof Sawaya, Joachim Protze, Dong H. Ahn
PRUNERS: Providing Reproducibility for Uncovering Non-Deterministic Errors in Runs on Supercomputers
International Journal of High Performance Computing Applications (IJHPCA), 33(5): 777--783, doi:10.1177/1094342019834621, 2019.

BibTeX

@article{2019_ijhpca_sllscabgrspa,
  title = {PRUNERS: Providing Reproducibility for Uncovering Non-Deterministic Errors in Runs on Supercomputers},
  author = {Kento Sato and Ignacio Laguna and Gregory L. Lee and Martin Schulz and Christopher M. Chambreau and Simone Atzeni and Michael Bentley and Ganesh Gopalakrishnan and Zvonimir Rakamaric and Geof Sawaya and Joachim Protze and Dong H. Ahn},
  journal = {International Journal of High Performance Computing Applications (IJHPCA)},
  volume = {33},
  publisher = {SAGE},
  pages = {777--783},
  doi = {10.1177/1094342019834621},
  number = {5},
  month = {sep},
  year = {2019}
}

Acknowledgements

This work was performed under the auspices of the US Department of Energy by LLNL under contract DE-AC52-07NA27344 (LLNL-JRNL-747183).