Date of Award
Department of Computer Science
Large pre-trained language models (PLMs) such as BERT and XLNet have revolutionized the field of natural language processing (NLP). The interesting thing is that they are pre- trained through unsupervised tasks, so there is a natural curiosity as to what linguistic knowledge these models have learned from only unlabeled data. Fortunately, these models’ architectures are based on self-attention mechanisms, which are naturally interpretable. As such, there is a growing body of work that uses attention to gain insight as to what linguistic knowledge is possessed by these models. Most attention-focused studies use BERT as their subject, and consequently the field is sometimes referred to as BERTology. However, despite surpassing BERT in a large number of NLP tasks, XLNet has yet to receive the same level of attention (pun intended). Additionally, there is an interest in their field in how these pre-trained models change when fine-tuned for supervised tasks. This paper details many different attention-based interpretability analyses and performs each on BERT, XLNet, and a version of XLNet fine-tuned for a Twitter hate-speech-spreader detection task. The purpose of doing so is 1. to be a comprehensive summary of the current state of BERTology 2. to be the first to do many of these in-depth analyse on XLNet and 3. to study how PLMs’ attention patterns change over fine-tuning. I find that most identified linguistic phenomenon present in the attention patterns of BERT are also present in those of XLNet to similar extents. Further, it is shown that much about the internal organization and function of PLMs, and how they change over fine-tuning, can be understood through attention.
Signorelli, Steven J. Jr, "Interpreting Attention-Based Models for Natural Language Processing" (2021). Dartmouth College Undergraduate Theses. 223.