Date of Award

Spring 5-18-2026

Document Type

Thesis (Ph.D.)

Department or Program

Mathematics

First Advisor

Daniel Rockmore

Abstract

This thesis develops a geometric perspective on high-dimensional representations, motivated by applications to language. Rather than treating representations solely as inputs to predictive models, we view them as structured objects whose geometry encodes meaningful information. In particular, we argue that such representations exhibit organization at multiple scales: at a global level, metric and clustering structure capture relationships such as genre, authorship, and discourse; at a local level, geometric quantities such as intrinsic dimension and curvature describe how these relationships vary across the space.

To study these phenomena, we combine empirical analysis with theoretical development. On the empirical side, we examine representations derived from literary corpora and legal texts, showing that they form coherent geometric landscapes and support tasks such as clustering, comparison, and authorship attribution. These studies illustrate how similarity-based structure can be used not only to describe collections of texts, but also to infer relationships between them.

On the theoretical side, we develop a probabilistic framework for estimating local geometric structure in noisy high-dimensional settings. By modeling tangent space estimates as random variables and analyzing the induced distributions of curvature-related observables, we recast curvature estimation as a problem of statistical inference. This approach explains the bias in naive estimators and provides a principled method for recovering local geometric information.

Together, these results suggest a unified view in which representation spaces encode rich geometric structure. This perspective provides a framework for interpreting high-dimensional data and suggests new directions for studying language and other complex systems through geometry.

Share

COinS