Date of Award

Spring 6-6-2026

Document Type

Thesis (Undergraduate)

Department

Cognitive Science

First Advisor

Jonathan S. Phillips

Abstract

While Multimodal Large Language Models (MLLMs) increasingly mediate visual culture, far less is known about how MLLMs themselves evaluate aesthetic value and how those evaluations shift under different beliefs about authorship. This thesis investigates how three commercial MLLMs (ChatGPT, Claude, and Gemini) appraise visual artworks across six dimensions: liking, beauty, profundity, worth, narrativity, and intentionality. Using a balanced dataset of 100 artworks, half human-generated and half AI-generated, with representational and abstract works equally represented, each model evaluated every image under four provenance conditions: unspecified origin, inferred origin, assumed AI origin, and assumed human origin. Results show that representationalism in visual content was the strongest and most consistent predictor of high aesthetic ratings. Works framed or inferred as human-generated received higher ratings, particularly on higher-order dimensions tied to agency, meaning, and cultural value, such as worth, intentionality, and profundity. In the origin-inference condition, models classified provenance above chance but showed asymmetric errors, misidentifying AI-generated works as human more often than human-generated works as AI. Model-specific profiles further revealed distinct evaluative tendencies: ChatGPT was the most stimulus-driven, Claude showed the strongest human-default classification bias, and Gemini was most sensitive to authorship framing. Together, these findings suggest that MLLM aesthetic appraisal is neither purely visual nor neutral, but reflects a hybrid evaluative regime shaped by perceptual structure, provenance cues, and culturally learned assumptions about human creativity.

Share

COinS