Date of Award

Fall 9-29-2021

Document Type

Thesis (Ph.D.)

Department or Program

Biochemistry and Cell Biology

First Advisor

Gevorg Grigoryan


Proteins play crucial roles in a variety of biological processes. While we know that their amino acid sequence determines their structure, which in turn determines their function, we do not know why particular sequences fold into particular structures. My work focuses on discretized geometric descriptions of protein structure—conceptualizing native structure space as composed of mostly discrete, geometrically defined fragments—to better understand the patterns underlying why particular sequence elements correspond to particular structure elements. This discretized geometric approach is applied to multiple levels of protein structure, from conceptualizing contacts between residues as interactions between discrete structural elements to treating protein structures as an assembly of discrete fragments. My earlier work focused on better understanding inter-residue contacts and estimating their energies statistically. By scoring structures with energies derived from a stricter notion of contact, I show that native protein structures can be identified out of a set of decoy structures more often than when using energies derived from traditional definitions of contact and how this has implications for the evaluation of predictions that rely on structurally defined contacts for validation. Demonstrating how useful simple geometric descriptors of structure can be, I then show that these energies identify native structures on par with well-validated, detailed, atomistic energy functions. Moving to a higher level of structure, in my later work I demonstrate that discretized, geometrically defined structural fragments make good objects for the interactive assembly of protein backbones and present a software application which lets users do so. Finally, I use these fragments to generate structure-conditioned statistical energies, generalizing the classic idea of contact energies by incorporating specific structural context, enabling these energies to reflect the interaction geometries they come from. These structure-conditioned energies contain more information about native sequence preferences, correlate more highly with experimentally determined energies, and show that pairwise sequence preferences are tightly coupled to their structural context. Considered jointly, these projects highlight the degree to which protein structures and the interactions they comprise can be understood as geometric elements coming together in finely tuned ways.