Towards Formal Approaches to System Resilience

PRDC 2013 screenshot

Abstract

Technology scaling and techniques such as dynamic voltage/frequency scaling are predicted to increase the number of transient faults in future processors. Error detectors implemented in hardware are often energy inefficient, as they are always on. While software-level error detection can augment hardware-level detectors, creating detectors in software that are highly effective remains a challenge. In this paper, we first present a new LLVM-level fault injector called KULFI that helps simulate faults occurring within CPU state elements in a versatile manner. Second, using KULFI, we study the behavior of a family of well-known and simple algorithms under error injection. (We choose a family of sorting algorithms for this study.) We then propose a promising way to interpret our empirical results using a formal model that builds on the idea of predicate state transition diagrams. After introducing the basic abstraction underlying our predicate transition diagrams, we draw connections to the level of resilience empirically observed during fault injection studies. Building on the observed connections, we develop a simple, and yet effective, predicate-abstraction-based fault detector. While in its initial stages, ours is believed to be the first study that offers a formal way to interpret and compare fault injection results obtained from algorithms from within one family. Given the absolutely unpredictable nature of what a fault can do to a computation in general, our approach may help designers choose amongst a class of algorithms one that behaves most resilient of all.

Citation

Vishal Chandra Sharma, Arvind Haran, Zvonimir Rakamaric, Ganesh Gopalakrishnan
Towards Formal Approaches to System Resilience
Proceedings of the 19th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC), 41--50, doi:10.1109/PRDC.2013.14, 2013.

BibTeX

@inproceedings{2013_prdc_shrg,
  title = {Towards Formal Approaches to System Resilience},
  author = {Vishal Chandra Sharma and Arvind Haran and Zvonimir Rakamaric and Ganesh Gopalakrishnan},
  booktitle = {Proceedings of the 19th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC)},
  publisher = {IEEE Computer Society},
  pages = {41--50},
  doi = {10.1109/PRDC.2013.14},
  year = {2013}
}

Acknowledgements

Supported in part by NSF Award CCF 1255776, SRC Contract 2013- TJ-2426, and Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (and Basic Energy Sciences/Biological and Environmental Research/High Energy Physics/Fusion Energy Sciences/Nuclear Physics). We would like to thank Pedro Diniz, Prabhakar Kudva, Shuvendu Lahiri, and Karthik Pattabiraman for their insights and feedback on the early drafts of this paper. We are also grateful to Sui Chen for trying out early versions of KULFI and providing us with detailed feedback that helped us to improve it.