Date of Award
Spring 6-9-2026
Document Type
Thesis (Undergraduate)
Department
Cognitive Science
First Advisor
Steven Frankland
Second Advisor
Taylor Webb
Abstract
Large language models successfully solve classic theory-of-mind tasks adapted from cognitive science, but the mechanisms underlying this ability remain unclear. Do they deploy circuitry specialized for reasoning about other minds, memorize common false- belief vignette structures and outputs, or rely on more domain-general computations? We investigate this question in Qwen2.5-14B-Instruct using causal mediation and representational similarity analyses across matched prompt sets that systematically vary an agent’s beliefs, the state of the physical world, and the correct answer. We identify a population of mid-layer attention heads that tracks divergence between an initial representation and the current state of the world. These heads are causally relevant regardless of whether the initial representation is an agent’s belief or a photograph, despite the photograph condition involving no agent or perspective- taking. A distinct population of later-layer heads retrieves the answer token. The two populations are functionally dissociable yet combine compositionally. Together, these findings argue against both mentalizing-specific and memorization accounts, suggesting instead that LLMs might solve classic false-belief problems by reusing a domain-general mechanism for detecting divergence between representations and reality.
Recommended Citation
Sahin, Idil K., "A Mechanistic Investigation of Theory of Mind in Large Language Models" (2026). Cognitive Science Senior Theses. 11.
https://digitalcommons.dartmouth.edu/cognitive-science_senior_theses/11
