Date of Award

Spring 5-7-2021

Document Type

Thesis (Ph.D.)

Department

Department of Computer Science

First Advisor

Daniel Rockmore

Abstract

Written text is one of the major ways that humans communicate their thoughts. A single thought can be expressed through many different combinations of words, and the writer must choose which they will use. We call the idea which is communicated the content of the message, and the particular words chosen to express the content, the style. The same content expressed in a different style may tell something useful about the author of the text (e.g., the author's identity), may be easier to understand for different audiences, or may evoke different emotions in the reader.

In this work we explore ways that the style of writing can be used to make inferences about the author and demonstrate applications where these techniques uncover interesting results. We supplement the analytic approach with a synthetic approach and consider the problem of generating text which matches the style of a target author. To this end we find and curate suitable parallel datasets of the same content written in different styles. These are -- to the extent possible -- made publicly available. Next, we demonstrate the performance of machine translation systems on this data. Finally, we show settings in which modifications to existing machine translation architectures can improve results and even perform style transfer in an unsupervised setting.

Share

COinS