JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]


Journal of Information Science and Engineering, Vol. 20 No. 2, pp. 379-390


Management of Fault Tolerance Information for Coordinated Checkpointing Protocol without Sympathetic Rollbacks


Kwang Sik Chung, YoungJun Lee*, HeoChang Yu** and WonGyu Lee**
Department of Computer Science 
University College London 
Gower Street, WC1E 6BT, London 
*Department of Computer Education 
Korea National University of Education 
Chungbuk, 363791 Korea 
**Department of Computer Science Education 
Korea University 
Seoul, 136-701 Korea


    This paper presents the condition for an extended global recovery line for coordinated checkpointing protocol and a new garbage collection protocol on checkpoints and message logs in order to avoid the sympathetic rollback caused by lost messages. Since previous works assumed the communication channel does not lose the in-transit messages, those works on garbage collection in coordinated checkpointing protocols delete all the checkpoints except for the last checkpoints on each process. But coordinated checkpointing protocol based on the communication protocol with reliability (TCP) causes in-transit messages to be lost when a failure occurs, and lost messages lead to sympathetic rollbacks of faulty processes or related processes. Thus there is a need for management methods of fault tolerance information that can store and delete the coordinated checkpoint and light message log to avoid sympathetic rollback. In this paper, we define the extended global recovery line conditions for garbage collection of checkpoints and message logs for lost messages, and present the new garbage collection algorithm within the extended global recovery line. The proposed algorithm uses piggybacked process information on each message so that the additional messages for garbage collection and extended global recovery line are not needed. Since it relies on the piggybacked checkpoint information in communication message, the proposed garbage collection algorithm is called 'the lazy garbage collection algorithm'.


Keywords: coordinated checkpointing protocol, message log, garbage collection, sympathetic rollback, garbage collection

  Retrieve PDF document (JISE_200402_09.pdf)