26-28 Jun 2019 Bordeaux (France)
Fault-Tolerance Strategies for HPC Platforms
Thomas Herault  1@  
1 : University of Tennessee

The NSF SMURFS project explores the impact of faults and failures, fault mitigation strategies and emerging technologies by providing new analytical and component models for predicting fault-tolerant application behavior at scale. In this talk, I will present the recent results coming from case studies developed in the context of SMURFS, focusing on resource sharing, node provisionning, and resilience strategies.


