Date of Award

Spring 5-15-2024

Document Type

Thesis (Master's)

Department or Program

Computer Science

First Advisor

Soroush Vosoughi

Second Advisor

Peter Chin

Third Advisor

SouYoung Jin


Human natural language communication frequently relies on extra-linguistic information to fill in gaps in the linguistic signal left by semantic underspecification, or the omission of details that can be inferred from prior knowledge or other modalities. Underspecification is particularly common in conversations between acquaintances, since these interlocutors share context. Underspecification is a key and beneficial feature of natural language that improves efficiency, although it can cause communication to fail if it is not resolved correctly. For language models to communicate effectively and in a human-like fashion, they must learn how to recognize and utilize underspecified language. This thesis argues that proficiency with underspecification and ambiguity are critical components of a pragmatic approach to machine learning systems. It then explores recent work with the goal of clarifying the current state of research into the impact of these phenomena in machine learning. Finally, it suggests potential strategies to make progress on quantifying and enhancing machine comprehension of underspecification, offering directions for future research to improve the communicative fluency of models.

Available for download on Thursday, May 15, 2025