Technical Report Number
Replicating data objects has been suggested as a means of increasing the performance of a distributed database system in a network subject to link and site failures. Since a network may partition as a consequence of such failures, a data object may become unavailable from a given site for some period of time. In this paper we study duration failure, which we define as the length of time, once the object becomes unavailable from a particular site, that the object remains unavailable. We show that, for networks composed of highly-reliable components, replication does not substantially reduce the duration of failure. We model a network as a collection of sites and links, each failing and recovering independently according to a Poisson process. Using this model, we demonstrate via simulation that the duration of failure incurred using a non-replicated data object is nearly as short as that incurred using a replicated object and a replication control protocol, including an unrealizable protocol which is optimal with respect to availability. We then examine analytically a simplified system in which the sites but not the links are subject to failure. We prove that if each site operates with probability p, then the optimal replication protocol, Available Copies [5,26], reduces the duration of failure by at most a factor of 1-p/1+p. Lastly, we present bounds for general systems, those in which both the sites and the communications between the sites may fail. We prove, for example, that if sites are 95% reliable and a communications failure is sufficiently short (either infallible or satisfying a function specified in the paper) then replication can improve the duration of failure by at most 2.7% of that experienced using a single copy. These results show that replication has only a small effect of the duration of failure in present-day partitionable networks comprised of realistically reliable components.
Dartmouth Digital Commons Citation
Johnson, Donald B. and Raab, Larry, "Effects of Replication on the Duration of Failure in Distributed Databases" (1991). Computer Science Technical Report PCS-TR91-169. https://digitalcommons.dartmouth.edu/cs_tr/65