Date of Award
Department of Computer Science
In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.
First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus we resolve the space complexity of the simplex counting problem by providing an algorithm that matches the lower bound.
Second, we examine the triangle counting question –a hypergraph where k = 2. We develop and analyze an almost optimal O (n+m 3/2 / T) triangle-counting algorithm based on ideas introduced in [KMPT12]. The proposed algorithm is subsequently used to establish a method for uniformly sampling triangles in a graph stream using O(m 3/2 / T) bits of space, which beats the state-of-the-art O(mn / T) algorithm given by [PTTW13]
Haris, Themistoklis, "Counting and Sampling Small Structures in Graph and Hypergraph Data Streams" (2021). Dartmouth College Undergraduate Theses. 230.