Date of Award

Spring 5-15-2022

Document Type

Thesis (Master's)

Department or Program

Computer Science

First Advisor

Soroush Vosoughi

Second Advisor

Bo Zhu

Third Advisor

Hsien-Chih Chang

Abstract

We are living in the age of information, where it has become increasingly easy to share ideas, news, and content which are seen by an increasingly large number of people. This increasing scope of the increasing amount of data that is being shared lends itself to the question: how can we determine whether what we are reading promotes a stereotype? Previous work has applied transformer based models in this domain yielding impressive performance, but few studies exist interpreting the nature of attention heads in this task. Our work explores the feature encoding and extraction behaviors of attention heads in transformer based language models in stereotype detection tasks. We focus our investigation on three stereotype datasets, CrowS-Pairs, StereoSet, and BUG, and employ two probing mechanisms, gradient probing and leave-one-out probing, to investigate the importance of different attention heads on different subsets of each dataset. Our findings can be leveraged in downstream applications to determine how to boost performance on these tasks as well as generate further understanding of the roles of different attention heads.

Share

COinS