Author ORCID Identifier
https://orcid.org/0009-0008-5167-9048
Date of Award
Spring 5-20-2026
Document Type
Thesis (Master's)
Department or Program
Computer Science
First Advisor
Soroush Vosoughi
Abstract
Vision Language Models (VLMs) have become a popular tool for everyday use in many domains. However, these models have showcased many flaws, including poor visual acuity, geometric reasoning, and internalized biases. In this paper, we will focus on improving model performance on the visual acuity and geometric reasoning aspects. We work on tasks where the VLM is asked to check if two circles are touching, count the number of shapes in an image, and so on. We contribute a schema-based framework for improving model performance on these tasks. This framework comprises three key components: (1) a proof-of-concept pipeline adaptable for inference, (2) a Reinforcement Learning framework designed to optimize the schemas, and (3) a model-based verification and refinement pipeline that iteratively improves schema quality. We evaluate and test this pipeline across closed and open-source models, showing significant performance boosts as well as indications of the concept’s efficacy. We also make our code available at https://github.com/carlosguealv/vlms_symb_project.
Recommended Citation
Guerrero Alvarez, Carlos, "MAKING VLMS LESS BLIND: A NEURO-SYMBOLIC APPROACH" (2026). Dartmouth College Master’s Theses. 311.
https://digitalcommons.dartmouth.edu/masters_theses/311
