Date of Award

Spring 6-9-2026

Document Type

Thesis (Undergraduate)

Department

Cognitive Science

First Advisor

Steven Frankland

Second Advisor

Taylor Webb

Abstract

Large language models successfully solve classic theory-of-mind tasks adapted from cognitive science, but the mechanisms underlying this ability remain unclear. Do they deploy circuitry specialized for reasoning about other minds, memorize common false- belief vignette structures and outputs, or rely on more domain-general computations? We investigate this question in Qwen2.5-14B-Instruct using causal mediation and representational similarity analyses across matched prompt sets that systematically vary an agent’s beliefs, the state of the physical world, and the correct answer. We identify a population of mid-layer attention heads that tracks divergence between an initial representation and the current state of the world. These heads are causally relevant regardless of whether the initial representation is an agent’s belief or a photograph, despite the photograph condition involving no agent or perspective- taking. A distinct population of later-layer heads retrieves the answer token. The two populations are functionally dissociable yet combine compositionally. Together, these findings argue against both mentalizing-specific and memorization accounts, suggesting instead that LLMs might solve classic false-belief problems by reusing a domain-general mechanism for detecting divergence between representations and reality.

Recommended Citation

Sahin, Idil K., "A Mechanistic Investigation of Theory of Mind in Large Language Models" (2026). Cognitive Science Senior Theses. 11.
https://digitalcommons.dartmouth.edu/cognitive-science_senior_theses/11

Cognitive Science Senior Theses

A Mechanistic Investigation of Theory of Mind in Large Language Models

Date of Award

Document Type

Department

First Advisor

Second Advisor

Abstract

Recommended Citation

Included in

Browse

Search

Contribute

Links

Questions?

Cognitive Science Senior Theses

A Mechanistic Investigation of Theory of Mind in Large Language Models

Author

Date of Award

Document Type

Department

First Advisor

Second Advisor

Abstract

Recommended Citation

Included in

Share

Browse

Search

Contribute

Links

Questions?