PRUNERS: Providing Reproducibility for Uncovering Non-Deterministic Errors in Runs on Supercomputers

CRE 2017 screenshot

Abstract

Large scientific simulations must be able to achieve the full-system potential of supercomputers. When they tap into high-performance features, however, a phenomenon known as non-determinism may be introduced in their program execution, which significantly hampers application development. PRUNERS is a new toolset to detect and remedy non-deterministic bugs and errors in large parallel applications. To show the capabilities of PRUNERS for large application development, we also demonstrate their early usage on real-world, production applications.

Citation

Kento Sato, Ignacio Laguna, Gregory L. Lee, Martin Schulz, Christopher M. Chambreau, Dong H. Ahn, Simone Atzeni, Michael Bentley, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Geof Sawaya, Joachim Protze
PRUNERS: Providing Reproducibility for Uncovering Non-Deterministic Errors in Runs on Supercomputers
Computational Reproducibility at Exascale Workshop (CRE), 2017.

BibTeX

@inproceedings{2017_cre_sllscaabgrsp,
  title = {PRUNERS: Providing Reproducibility for Uncovering Non-Deterministic Errors in Runs on Supercomputers},
  author = {Kento Sato and Ignacio Laguna and Gregory L. Lee and Martin Schulz and Christopher M. Chambreau and Dong H. Ahn and Simone Atzeni and Michael Bentley and Ganesh Gopalakrishnan and Zvonimir Rakamaric and Geof Sawaya and Joachim Protze},
  booktitle = {Computational Reproducibility at Exascale Workshop (CRE)},
  note = {Extended abstract},
  year = {2017}
}

Acknowledgements

This work was performed under the auspices of the U.S. Department of Energy by LLNL under contract DE-AC52-07NA27344 (LLNL-CONF-737603).