Date of Award


Document Type

Thesis (Ph.D.)


Department of Computer Science

First Advisor

Sean W. Smith

Second Advisor

Sergey Bratus


Any useful computer system performs communication and any communication must be parsed before it is computed upon. Given their importance, one might expect parsers to receive a significant share of attention from the security community. This is, however, not the case: bugs in parsers continue to account for a surprising portion of reported and exploited vulnerabilities. In this thesis, I propose a methodology for supporting the development of software that depends on parsers---such as anything connected to the Internet---to safely support any reasonably designed protocol: data structures to describe protocol messages; validation routines that check that data received from the wire conforms to the rules of the protocol; systems that allow a defender to inject arbitrary, crafted input so as to explore the effectiveness of the parser; and systems that allow for the observation of the parser code while it is being explored. Then, I describe principled method of producing parsers that automatically generates the myriad parser-related software from a description of the protocol. This has many significant benefits: it makes implementing parsers simpler, easier, and faster; it reduces the trusted computing base to the description of the protocol and the program that compiles the description to runnable code; and it allows for easier formal verification of the generated code. I demonstrate the merits of the proposed methodology by creating a description of the USB protocol using a domain-specific language (DSL) embedded in Haskell and integrating it with the FreeBSD operating system. Using the industry-standard umap test-suite, I measure the performance and efficacy of the generated parser. I show that it is stable, that it is effective at protecting a system from both accidentally and maliciously malformed input, and that it does not incur unreasonable overhead.


Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2016-803.