Date of Award

Winter 3-17-2026

Document Type

Thesis (Master's)

Department or Program

Computer Science

First Advisor

Sarah Preum

Abstract

Large language models (LLMs) are increasingly used as sources of health advice, raising urgent concerns about reliability, safety, and personalization. While prior work has studied health advice conflicts, these efforts typically ignore patient context and treat contradictions as universal. In this paper, we propose the task of contextualized divergence detection, which frames divergence not only as strict contradictions but also as context-dependent mismatches. To support this task, we introduce a novel dataset of 10,545 samples that combines user profiles, consumer medication guideline documents, and advice statements, with divergences synthetically generated across four categories: direct, conditional, temporal, and sub-typical. We benchmark a set of LLM-based inference approaches, including prompt-based inference, retrieval-augmented generation (RAG), supervised finetuning, and agent-based reasoning.

We observe that relevance-filtered RAG achieves the highest accuracy and robustness for large models, while agent-based reasoning improves interpretability at the cost of accuracy, and supervised fine-tuning is essential for smaller models. A qualitative error analysis highlights recurring challenges such as hedging language, temporal mismatches, and profile-advisory inconsistencies. This work provides the first dataset and systematic evaluation of contextualized divergence detection in health advice, paving the way toward safer, more personalized interactions with LLMs.

Share

COinS