Date of Award
6-1-2020
Document Type
Thesis (Undergraduate)
Department or Program
Department of Computer Science
First Advisor
Sean W. Smith
Abstract
Invalid input often leads to unexpected behavior in a program and is behind a plethora of known and unknown vulnerabilities. To prevent improper input from being processed, the input needs to be validated before the rest of the program executes. Formal language theory facilitates the definition and recognition of proper inputs. We focus on the problem of defining valid input after the program has already been written. We construct a parser that infers the structure of inputs which avoid vulnerabilities while existing work focuses on inferring the structure of input the program anticipates. We present a tool that constructs an input language, given the program as input, using symbolic execution on symbolic arguments. This differs from existing work which tracks the execution of concrete inputs to infer a grammar. We test our tool on programs with known vulnerabilities, including programs in the GNU Coreutils library, and we demonstrate how the parser catches known invalid inputs. We conclude that the synthesis of the complete parser cannot be entirely automated due to limitations of symbolic execution tools and issues of computability. A more comprehensive parser must additionally be informed by examples and counterexamples of the input language.
Recommended Citation
Xiao, Linda, "Automatic Generation of Input Grammars Using Symbolic Execution" (2020). Dartmouth College Undergraduate Theses. 163.
https://digitalcommons.dartmouth.edu/senior_theses/163
Comments
Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2020-898.