Date of Award

Spring 3-2-2023

Document Type

Thesis (Ph.D.)

Department or Program

Computer Science

First Advisor

Chris Bailey-Kellogg


Protein interactions drive diverse processes essential to living organisms, and thus numerous biomedical applications center on understanding, predicting, and designing how proteins recognize their partners. While unfortunately the number of interactions of interest still vastly exceeds the capabilities of experimental determination methods, computational methods promise to fill the gap. My thesis pursues the development and application of computational methods for several protein interaction prediction and design tasks. First, to improve protein-glycan interaction specificity prediction, I developed GlyBERT, which learns biologically relevant glycan representations encapsulating the components most important for glycan recognition within their structures. GlyBERT encodes glycans with a branched biochemical language and employs an attention-based deep language model to embed the correlation between local and global structural contexts. This approach enables the development of predictive models from limited data, supporting applications such as lectin binding prediction. Second, to improve protein-protein interaction prediction, I developed a unified geometric deep neural network, ‘PInet’ (Protein Interface Network), which leverages the best properties of both data- and physics-driven methods, learning and utilizing models capturing both geometrical and physicochemical molecular surface complementarity. In addition to obtaining state-of-the-art performance in predicting protein-protein interactions, PInet can serve as the backbone for other protein-protein interaction modeling tasks such as binding affinity prediction. Finally, I turned from ii prediction to design, addressing two important tasks in the context of antibodyantigen recognition. The first problem is to redesign a given antigen to evade antibody recognition, e.g., to help biotherapeutics avoid pre-existing immunity or to focus vaccine responses on key portions of an antigen. The second problem is to design a panel of variants of a given antigen to use as “bait” in experimental identification of antibodies that recognize different parts of the antigen, e.g., to support classification of immune responses or to help select among different antibody candidates. I developed a geometry-based algorithm to generate variants to address these design problems, seeking to maximize utility subject to experimental constraints. During the design process, the algorithm accounts for and balances the effects of candidate mutations on antibody recognition and on antigen stability. In retrospective case studies, the algorithm demonstrated promising precision, recall, and robustness of finding good designs. This work represents the first algorithm to systematically design antigen variants for characterization and evasion of polyclonal antibody responses.