Author ORCID Identifier
Date of Award
Spring 6-4-2023
Document Type
Thesis (Undergraduate)
Department
Computer Science
First Advisor
Soroush Vosoughi
Abstract
This thesis describes a variety of approaches in examining how language models encode stereotypes (understanding stereotypes from a model point-of-view), debiasing language models, and using language models to understand how stereotypes affect conversations (understanding stereotypes from a conversational point-of-view). We present a novel approach for textual clues analysis that makes language models more interpretable, combining the understanding of what stereotypes the internal structures of language models have encoded during their initial training (via attention-based analysis) and understanding what textual clues are most relevant to identifying stereotypes for models trained to detect stereotypes (via SHAP-based analysis). We find that different pre-trained language models may encode different stereotypes for a certain minority group and that different attention heads within the same pre-trained model may also encode different stereotypes. We propose debiasing techniques to make the outputs of generative language models less stereotypical, with promising results shown (p-value=0.008). We then look into the nuanced effects of stereotypes on conversations between speakers of different dialects (Standard American English and African American Vernacular English), finding significant differences in the content when speakers of different dialects interact versus when speakers of the same dialect interact. Further study on the robustness of these techniques, as well as using new techniques, could allow for better understanding of how language models encode stereotypes and how stereotypes affect conversations, allowing us to progress towards less biased language models and bridge gaps in critical information between different groups of people.
Recommended Citation
Wang, Brian C., "Stereotypes and Language Models: Understanding how language models encode stereotypes, debiasing language models, and examining how stereotypes affect conversations" (2023). Computer Science Senior Theses. 8.
https://digitalcommons.dartmouth.edu/cs_senior_theses/8