Document Type

Technical Report

Publication Date


Technical Report Number



Over the past decade, a pair of instructions called load-linked (LL) and store-conditional (SC) have emerged as the most suitable synchronization instructions for the design of lock-free algorithms. However, current architectures do not support these instructions; instead, they support either CAS (e.g., UltraSPARC, Itanium) or restricted versions of LL/SC (e.g., POWER4, MIPS, Alpha). Thus, there is a gap between what algorithm designers want (namely, LL/SC) and what multiprocessors actually support (namely, CAS or RLL/RSC). To bridge this gap, a flurry of algorithms that implement LL/SC from CAS have appeared in the literature. The two most recent algorithms are due to Doherty, Herlihy, Luchangco, and Moir (2004) and Michael (2004). To implement M LL/SC objects shared by N processes, Doherty et al.'s algorithm uses only O(N + M) space, but is only non-blocking and not wait-free. Michael's algorithm, on the other hand, is wait-free, but uses O(N^2 + M) space. The main drawback of his algorithm is the time complexity of the SC operation: although the expected amortized running time of SC is only O(1), the worst-case running time of SC is O(N^2). The algorithm in this paper overcomes this drawback. Specifically, we design a wait-free algorithm that achieves a space complexity of O(N^2 + M), while still maintaining the O(1) worst-case running time for LL and SC operations.