"Automatic Generation of Input Grammars Using Symbolic Execution" by Linda Xiao

Dartmouth College Undergraduate Theses

Title

Automatic Generation of Input Grammars Using Symbolic Execution

Author

Linda Xiao, Dartmouth College

Date of Award

6-1-2020

Document Type

Thesis (Undergraduate)

Department or Program

Department of Computer Science

First Advisor

Sean W. Smith

Abstract

Invalid input often leads to unexpected behavior in a program and is behind a plethora of known and unknown vulnerabilities. To prevent improper input from being processed, the input needs to be validated before the rest of the program executes. Formal language theory facilitates the definition and recognition of proper inputs. We focus on the problem of defining valid input after the program has already been written. We construct a parser that infers the structure of inputs which avoid vulnerabilities while existing work focuses on inferring the structure of input the program anticipates. We present a tool that constructs an input language, given the program as input, using symbolic execution on symbolic arguments. This differs from existing work which tracks the execution of concrete inputs to infer a grammar. We test our tool on programs with known vulnerabilities, including programs in the GNU Coreutils library, and we demonstrate how the parser catches known invalid inputs. We conclude that the synthesis of the complete parser cannot be entirely automated due to limitations of symbolic execution tools and issues of computability. A more comprehensive parser must additionally be informed by examples and counterexamples of the input language.

Comments

Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2020-898.

Recommended Citation

Xiao, Linda, "Automatic Generation of Input Grammars Using Symbolic Execution" (2020). Dartmouth College Undergraduate Theses. 163.
https://digitalcommons.dartmouth.edu/senior_theses/163

Download

Included in

Computer Sciences Commons

COinS