Date of Award


Document Type

Thesis (Undergraduate)

Department or Program

Department of Computer Science

First Advisor

Soroush Vosoughi


My research focuses on predicting a cartoon caption's wittiness using multi-modal deep learning models. Nowadays, deep learning is commonly used in image captioning tasks, during which the machine has to understand both natural languages and visual pictures. However, instead of aiming to describe a real-world scene accurately, my research seeks to train computers to learn humor inside both natural languages and visual images. Cartoons are the artistic medium that supposes to deliver visual humor, and their captions are also supposed to be interesting to add to the fun. Thus, I decided to use research on cartoons' captions to see if deep learning models can, in some ways, learn human humor. I ended up using New Yorker's Cartoon Captioning Contests as the dataset to train a multi-modal model that can predict a cartoon's funniness. The model didn't beat the benchmark in terms of accuracy of the classification task, but it eliminated some unsuccessful attempts and set us up for the future study on this topic.


Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2020-893.