JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]


Journal of Information Science and Engineering, Vol. 30 No. 4, pp. 1167-1186


Fault-Tolerance Implementation in Typical Distributed Stream Processing Systems


WUHONG CHEN AND JICHIANG TSAI
Department of Electrical Engineering
National Chung Hsing University
Taichung, 402 Taiwan

 


    Typical training simulation systems adopt distributed network architecture designs composed of personal computers because of cost, extensibility, and maintenance considerations. In this design, the functions of the entire system are easily affected by failures or errors from any computer during operation. Thus, adopting appropriate fault-tolerance processing mechanisms to ensure that the normal operation and functions of the entire system can be maintained when irregularities occur in a subsystem computer is an important consideration for typical training simulation system design. Since firearms training simulation system operations involve the transmission and processing of substantial amounts of streaming data, these can be considered typical distributed stream processing systems. In this paper, we examined typical distributed stream processing fault-tolerance mechanism designs and technique. We applied this technique to a typical firearms training simulation system to increase the operation reliability and availability. We used the transparent checkpoint method to implement the fault-tolerance mechanism processing program. The results of single-machine fault-tolerance mechanism tests and multi-machine synchronized fault-tolerance mechanism tests indicate that the performance of the checkpoint establishment and rollback recovery time can satisfy the system operation requirements.


Keywords: distributed stream processing, fault-tolerance, checkpoint, rollback recovery, high availability

  Retrieve PDF document (JISE_201404_14.pdf)