Author ORCID Identifier

https://orcid.org/0000-0002-2546-6347

Date of Award

Summer 7-26-2024

Document Type

Thesis (Ph.D.)

Department or Program

Biological Sciences

First Advisor

Gevorg Grigoryan

Second Advisor

Margaret E. Ackerman

Abstract

The complexes that antibodies make with their binding partners, or antigens, are especially valuable to be able to predict and modify due to their unique role in the immune system. Yet, they present significant challenges to computational methods of both modelling and design. Antibodies are unlike most other proteins in that they are composed of a scaffold region, which is a highly conserved structure that is largely the same between antibodies of the same class, and Complementary Determining Regions (CDRs), which are comprised of hypervariable loops that largely determine their binding motifs. Additionally, antibodies and their antigens do not co-evolve to bind each other. Thus the challenges antibody-antigen complexes present are threefold: modern machine-learning methods for modelling protein complexes rely on both co- evolution signals from Multiple Sequence Alignments (MSAs) and overall-homologous structural matches to predict general protein binding motifs, neither of which are very useful for predicting antibody-antigen interactions, while the binding motifs are determined solely by the highly unstructured CDR loops, which have unique sequence preferences.

In this thesis, the modelling and redesign of antibody-antigen complexes are ap- proached with particular sensitivity to these issues. A novel score is developed to quantify the significance of antibody-antigen models, which considers the restricted space of reasonable binding motifs involving CDR-antigen contacts. Then, this met- ric—along with DOCKQ score—is used to benchmark antibody-antigen models produced by six diverse methods. It reveals that methods relying on MSAs and ho- mologous matches, like AlphaFold-Multimer and RoseTTAFold, perform far worse in comparison to their accuracy at predicting general protein complexes. Moreover, when their interaction motifs are analyzed, it is shown that high quality AlphaFold- Multimer models have much more common interaction motifs than low-quality mod- els. This suggests that limited interfacial geometry data in AlphaFold-Multimer’s training set is limiting its performance, but also allows for better discrimination of low and high quality AlphaFold-multimer models via a novel confidence score based on commonness of the interaction motifs. Finally, statistical analysis of the interface be- tween an antibody and Venezuelan Equine Encephalitis Virus is used to redesign that antibody, relying solely on sequence preferences of similarly structured loop-heavy in- teraction motifs from general proteins. The suggested mutations were combined with other mutations in a collaborator’s directed library, which generated a variant that bound 60 times more strongly than the original.

Share

COinS