JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9]


Journal of Information Science and Engineering, Vol. 19 No. 3, pp. 503-516


Robust TCP Connections for Fault Tolerant Computing


Richard Ekwall, Peter Urban and Andre Schiper
Ecole Polytechnique Federale de Lausanne (EPFL) 
School of Computer and Communication Sciences 
Distributed Systems Laboratory 
CH-1015 Lausanne, Switzerland 
E-mail: {nilsrichard.ekwall, peter.urban, andre.schiper}@epfl.ch


    When processes on two different machines communicate, they most often do so using the TCP protocol. While TCP is appropriate for a wide range of applications, it has shortcomings in other application areas. One of these areas is fault tolerant distributed computing. For some of those applications, TCP does not address link failures adequately: TCP breaks the connection if connectivity is lost for some duration (typically minutes). This is sometimes undesirable. The paper proposes robust TCP connections, a solution to the problem of broken TCP connections. The paper presents a session layer protocol on top of TCP that ensures reconnection, and provides exactly-once delivery for all transmitted data. A prototype has been implemented as a Java library. The prototype has less than 10% overhead on TCP sockets with respect to the most important performance figures.


Keywords: session layer protocol, TCP, performance, fault-tolerant distributed computing, quasi-reliable channels, java

  Retrieve PDF document (JISE_200303_07.pdf)