JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]


Journal of Information Science and Engineering, Vol. 20 No. 5, pp. 885-901


Adaptive Communication-Induced Checkpointing Protocols with Domino-Effect Freedom


Jichiang Tsai, Chi-Yi Lin* and Sy-Yen Kuo*
Department of Electrical Engineering 
National Chung Hsing University 
Taichung, 402 Taiwan 
*Department of Electrical Engineering 
National Taiwan University 
Taipei, 106 Taiwan


    The domino effect is an important problem for the checkpointing and rollback recovery in distributed systems. Communication-induced checkpointing is one way of preventing domino effect. Most existing such protocols focus on guaranteeing that every checkpoint is part of a consistent global checkpoint. This may induce high run-time overhead due to the possibly excessive number of extra forced checkpoints. In this paper, we propose several adaptive communication-induced checkpointing protocols with domino-effect freedom. These protocols allow a flexible tradeoff between the cost of checkpoint coordination and the rollback distance. Only a specific set of checkpoints needs to be part of a consistent global checkpoint. The overhead analysis shows that our generalization can significantly reduce the number of extra forced checkpoints.


Keywords: distributed systems, domino effect, communication-induced checkpointing, fault tolerance, rollback recovery

  Retrieve PDF document (JISE_200405_05.pdf)