Author ORCID Identifier

https://orcid.org/0009-0000-3338-1418

Date of Award

Spring 5-15-2025

Document Type

Thesis (Ph.D.)

Department or Program

Computer Science

First Advisor

Soroush Vosoughi

Second Advisor

Saeed Hassanpour

Third Advisor

Adam Breuer

Abstract

This thesis investigates critical aspects of responsible artificial intelligence (AI) — specifically model interpretability, bias detection and mitigation, and moral alignment in large language models (LLMs) — due to their pivotal role in the deployment of transparent, fair, and ethical AI systems. By addressing these dimensions of responsible AI, we hope to foster the increased trust and understanding necessary for wider AI adoption.

We begin by surveying the existing landscape of interpretability metrics and critically assess the effectiveness of interpretability methods designed to generate reliable explanations. Building upon this evaluation, we introduce novel model architectures and frameworks explicitly developed to enhance the interpretability of LLMs. On the front of bias mitigation, we adapt established bias benchmarks to focus specifically on racial and LGBTQ+ biases within healthcare contexts. Our evaluations demonstrate substantial biases embedded in multiple LLM architectures and highlight the nuanced effects of debiasing strategies, showing minimal performance trade-offs for general NLP tasks but notable impacts in specialized biomedical applications. In exploring the moral dimension of AI, we conduct extensive experiments assessing how LLMs align with established normative ethical frameworks. These investigations reveal systematic patterns in moral reasoning and significant inconsistencies influenced by scenario framing.

Collectively, our contributions offer a cohesive approach to responsible AI, effectively integrating interpretability, bias reduction, and moral alignment strategies. The insights and practical tools provided by this thesis contribute meaningfully to the development of AI systems that are transparent, equitable, and ethically consistent, establishing a foundation for responsible AI deployment.

Available for download on Friday, May 15, 2026

Share

COinS