Date of Award


Document Type

Thesis (Undergraduate)


Department of Computer Science

First Advisor

Thomas Cormen


Increasingly, modern computing problems, including many scientific and business applications, require huge amounts of data to be examined, modified, and stored. Parallel computers can be used to decrease the time needed to operate on such large data sets, by allowing computations to be performed on many pieces of data at once. For example, on the DECmpp machine used in our research, there are 2048 processors in the parallel processor array. The DECmpp can read data into each of these processors, perform a computation in parallel on all of it, and write the data out again, theoretically decreasing the execution time by a factor of 2048 over the time required by one of its processors. Often, the computations that occur after the data is in the processors involve rearranging, or permuting, the data within the array of parallel processors. Information moves between processors by means of a network connecting them. Communication through the network can be very expensive, especially if there are many collisions--simultaneous contentions for the same network resource--between items of data moving from one processor to another. When a program performs hundreds or even thousands of these permutations during its execution, a bottleneck can occur, impeding the overall performance of the program. Effective algorithms that decrease the time required to permute the data within a parallel computer can yield a significant speed increase in running programs with large data sets. Cormen has designed algorithms to improve performance when the data movement is defined by certain classes of permutations. This thesis will examine the performance of one of these classes, the bit-matrix-multiply/complement (BMMC) permutation, when implemented on the DECmpp. Although Cormen's algorithm was designed for parallel disk systems, this thesis adapts it to permutations of data residing in the memory of the parallel processors. The DECmpp network follows the model of an Extended Delta Network (EDN). One characteristic of an EDN is that it has a set of input and output ports to the network, each of which can carry only one item of data at a time. If more than one item needs to travel over a given port, a collision occurs. The data must access the port serially, which slows down the entire operation. Cormen's algorithm reduces these collisions by computing a schedule for sending the data over the network. For small data sets, it is not worthwhile to perform the extra operations to generate such a schedule, because the overhead associated with computing the schedule outweighs the time gained by preventing collisions at the network ports. As the size of the data set increases, eliminating collisions becomes more and more valuable. On the DECmpp, when the data permutation involves more than 128 elements per processor, our algorithm beats the more naive and obvious method for permuting in the parallel processor array.


Originally posted in the Dartmouth College Computer Science Technical Report Series, number PCS-TR94-224.