#### Date of Award

Summer 6-6-2021

#### Document Type

Thesis (Undergraduate)

#### Department

Department of Computer Science

#### First Advisor

Amit Chakrabarti

#### Abstract

In this thesis, we explore the problem of approximating the number of elementary substructures called **simplices** in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.

First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m^{1+1/k} / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m ^{1+1/k} / T ) bits of space. Thus we resolve the space complexity of the simplex counting problem by providing an algorithm that matches the lower bound.

Second, we examine the triangle counting question –a hypergraph where k = 2. We develop and analyze an almost optimal O (n+m ^{3/2} / T) triangle-counting algorithm based on ideas introduced in [KMPT12]. The proposed algorithm is subsequently used to establish a method for uniformly sampling triangles in a graph stream using O(m ^{3/2 }/ T) bits of space, which beats the state-of-the-art O(mn / T) algorithm given by [PTTW13]

#### Recommended Citation

Haris, Themistoklis, "Counting and Sampling Small Structures in Graph and Hypergraph Data Streams" (2021). *Dartmouth College Undergraduate Theses*. 230.

https://digitalcommons.dartmouth.edu/senior_theses/230

#### Included in

Databases and Information Systems Commons, Data Science Commons, OS and Networks Commons, Theory and Algorithms Commons