Adaptive Communication-Induced Checkpointing Protocols with Domino-Effect Freedom
Jichiang Tsai, Chi-Yi Lin* and Sy-Yen Kuo* Department of Electrical Engineering National Chung Hsing University Taichung, 402 Taiwan *Department of Electrical Engineering National Taiwan University Taipei, 106 Taiwan
The domino effect is an important problem for the checkpointing and rollback recovery in distributed systems. Communication-induced checkpointing is one way of preventing domino effect. Most existing such protocols focus on guaranteeing that every checkpoint is part of a consistent global checkpoint. This may induce high run-time overhead due to the possibly excessive number of extra forced checkpoints. In this paper, we propose several adaptive communication-induced checkpointing protocols with domino-effect freedom. These protocols allow a flexible tradeoff between the cost of checkpoint coordination and the rollback distance. Only a specific set of checkpoints needs to be part of a consistent global checkpoint. The overhead analysis shows that our generalization can significantly reduce the number of extra forced checkpoints.