Date of Award
Spring 5-15-2022
Document Type
Thesis (Master's)
Department or Program
Computer Science
First Advisor
Soroush Vosoughi
Second Advisor
Bo Zhu
Third Advisor
Hsien-Chih Chang
Abstract
We are living in the age of information, where it has become increasingly easy to share ideas, news, and content which are seen by an increasingly large number of people. This increasing scope of the increasing amount of data that is being shared lends itself to the question: how can we determine whether what we are reading promotes a stereotype? Previous work has applied transformer based models in this domain yielding impressive performance, but few studies exist interpreting the nature of attention heads in this task. Our work explores the feature encoding and extraction behaviors of attention heads in transformer based language models in stereotype detection tasks. We focus our investigation on three stereotype datasets, CrowS-Pairs, StereoSet, and BUG, and employ two probing mechanisms, gradient probing and leave-one-out probing, to investigate the importance of different attention heads on different subsets of each dataset. Our findings can be leveraged in downstream applications to determine how to boost performance on these tasks as well as generate further understanding of the roles of different attention heads.
Recommended Citation
Hajjar, Joseph H., "The Behaviors of BERT Attention Heads in Stereotype Detection" (2022). Dartmouth College Master’s Theses. 171.
https://digitalcommons.dartmouth.edu/masters_theses/171