Date of Award
3-12-2004
Document Type
Thesis (Ph.D.)
Department or Program
Department of Computer Science
First Advisor
Thomas H. Cormen
Abstract
Sorting very large datasets is a key subroutine in almost any application that is built on top of a large database. Two ways to sort out-of-core data dominate the literature: merging-based algorithms and partitioning-based algorithms. Within these two paradigms, all the programs that sort out-of-core data on a cluster rely on assumptions about the input distribution. We propose a third way of out-of-core sorting: oblivious algorithms. In all, we have developed six programs that sort out-of-core data on a cluster. The first three programs, based completely on Leighton's columnsort algorithm, have a restriction on the maximum problem size that they can sort. The other three programs relax this restriction; two are based on our original algorithmic extensions to columnsort. We present experimental results to show that our algorithms perform well. To the best of our knowledge, the programs presented in this thesis are the first to sort out-of-core data on a cluster without making any simplifying assumptions about the distribution of the data to be sorted.
Recommended Citation
Chaudhry, Geeta, "Parallel Out-of-Core Sorting: The Third Way" (2004). Dartmouth College Ph.D Dissertations. 7.
https://digitalcommons.dartmouth.edu/dissertations/7
Comments
Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2004-517.