FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation

TACO 2019 screenshot

Abstract

We present FailAmp, a novel LLVM program transformation algorithm that makes programs employing structured index calculations more robust against soft-errors. Without FailAmp, an offset error can go undetected; with FailAmp, all subsequent offsets are relativized, building on the faulty one. FailAmp can exploit ISAs such as ARM to further reduce overheads. We verify correctness properties of FailAMP using an SMT solver, and present a thorough evaluation using many HPC benchmarks under a fault injection campaign. FailAmp provides full soft-error detection for address calculation while incurring an average overhead of around 5%.

Citation

Ian Briggs, Arnab Das, Marek Baranowski, Vishal Sharma, Sriram Krishnamoorthy, Zvonimir Rakamaric, Ganesh Gopalakrishnan
FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation
ACM Transactions on Architecture and Code Optimimization (TACO), 16(4): doi:10.1145/3369381, 2019.

BibTeX

@article{2019_taco_bdbskrg,
  title = {FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation},
  author = {Ian Briggs and Arnab Das and Marek Baranowski and Vishal Sharma and Sriram Krishnamoorthy and Zvonimir Rakamaric and Ganesh Gopalakrishnan},
  journal = {ACM Transactions on Architecture and Code Optimimization (TACO)},
  volume = {16},
  publisher = {ACM},
  doi = {10.1145/3369381},
  number = {4},
  month = {dec},
  issue_date = {January 2020},
  year = {2019}
}

Acknowledgements

This research was supported in part by NSF Awards 1817073 and 1704715, and DOE Contract DE-SC0014096. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number 66905. Pacific Northwest National Laboratory is operated by Battelle for DOE under Contract DE-AC05-76RL01830.