Date of Award

Spring 6-9-2026

Document Type

Thesis (Undergraduate)

Department

Cognitive Science

First Advisor

Steven Frankland

Second Advisor

Taylor Webb

Abstract

Large language models successfully solve classic theory-of-mind tasks adapted from cognitive science, but the mechanisms underlying this ability remain unclear. Do they deploy circuitry specialized for reasoning about other minds, memorize common false- belief vignette structures and outputs, or rely on more domain-general computations? We investigate this question in Qwen2.5-14B-Instruct using causal mediation and representational similarity analyses across matched prompt sets that systematically vary an agent’s beliefs, the state of the physical world, and the correct answer. We identify a population of mid-layer attention heads that tracks divergence between an initial representation and the current state of the world. These heads are causally relevant regardless of whether the initial representation is an agent’s belief or a photograph, despite the photograph condition involving no agent or perspective- taking. A distinct population of later-layer heads retrieves the answer token. The two populations are functionally dissociable yet combine compositionally. Together, these findings argue against both mentalizing-specific and memorization accounts, suggesting instead that LLMs might solve classic false-belief problems by reusing a domain-general mechanism for detecting divergence between representations and reality.

Available for download on Thursday, June 08, 2028

Share

COinS