Author ORCID Identifier
Date of Award
Spring 5-15-2025
Document Type
Thesis (Ph.D.)
Department or Program
Computer Science
First Advisor
Soroush Vosoughi
Second Advisor
Saeed Hassanpour
Third Advisor
Adam Breuer
Abstract
This thesis investigates critical aspects of responsible artificial intelligence (AI) — specifically model interpretability, bias detection and mitigation, and moral alignment in large language models (LLMs) — due to their pivotal role in the deployment of transparent, fair, and ethical AI systems. By addressing these dimensions of responsible AI, we hope to foster the increased trust and understanding necessary for wider AI adoption.
We begin by surveying the existing landscape of interpretability metrics and critically assess the effectiveness of interpretability methods designed to generate reliable explanations. Building upon this evaluation, we introduce novel model architectures and frameworks explicitly developed to enhance the interpretability of LLMs. On the front of bias mitigation, we adapt established bias benchmarks to focus specifically on racial and LGBTQ+ biases within healthcare contexts. Our evaluations demonstrate substantial biases embedded in multiple LLM architectures and highlight the nuanced effects of debiasing strategies, showing minimal performance trade-offs for general NLP tasks but notable impacts in specialized biomedical applications. In exploring the moral dimension of AI, we conduct extensive experiments assessing how LLMs align with established normative ethical frameworks. These investigations reveal systematic patterns in moral reasoning and significant inconsistencies influenced by scenario framing.
Collectively, our contributions offer a cohesive approach to responsible AI, effectively integrating interpretability, bias reduction, and moral alignment strategies. The insights and practical tools provided by this thesis contribute meaningfully to the development of AI systems that are transparent, equitable, and ethically consistent, establishing a foundation for responsible AI deployment.
Recommended Citation
Xie, Sean, "EXPLORING THE FACETS OF RESPONSIBLE AI: INTERPRETABILITY, BIASES, AND MORALITY OF LARGE LANGUAGE MODELS" (2025). Dartmouth College Ph.D Dissertations. 344.
https://digitalcommons.dartmouth.edu/dissertations/344
