Date of Award

6-1-2018

Document Type

Thesis (Undergraduate)

Department or Program

Department of Computer Science

First Advisor

Charles Palmer

Abstract

Scientific data sets have grown rapidly in recent years, outpacing the growth in memory and network bandwidths. This I/O bottleneck has made it increasingly difficult for scientists to read and search outputted datasets in an attempt to find features of interest. In this paper, we will present the next generation of EMPRESS, a scalable metadata management service that offers the following solution: users can "tag" features of interest and search these tags without having to read in the associated datasets. EMPRESS provides, in essence, a digital scientific notebook where scientists can write down observations and highlight interesting results, and an efficient way to search these annotations. EMPRESS also provides storage-system independent physical metadata, providing a portable way for users to read both metadata and the associated data. EMPRESS offers scalability through two different deployment modes: "local", which runs on the compute nodes and "dedicated," which uses a set of dedicated, shared-nothing servers. EMPRESS also provides robust fault tolerance and transaction management, which is crucial to supporting workflows.

Comments

Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2018-846.

COinS