Lisa OhFollow

Date of Award


Document Type

Thesis (Master's)

Department or Program

Department of Computer Science

First Advisor

Chris Bailey-Kellogg

Second Advisor

Gevorg Grigoryan

Third Advisor

Saeed Hassanpour


Computational methods for predicting binding interfaces between antigens and antibodies (epitopes and paratopes) are faster and cheaper than traditional experimental structure determination methods. A sufficiently reliable computational predictor that could scale to large sets of available antibody sequence data could thus inform and expedite many biomedical pursuits, such as better understanding immune responses to vaccination and natural infection and developing better drugs and vaccines. However, current state-of-the-art predictors produce discontiguous predictions, e.g., predicting the epitope in many different spots on an antigen, even though in reality they typically comprise a single localized region. We seek to produce contiguous predicted epitopes, accounting for long-range spatial relationships between residues. We therefore build a novel Graph Convolution Network (GCN) that performs graph convolutions at multiple resolutions so as to represent and constrain long-range spatial dependencies. In evaluation on a standard epitope prediction benchmark, we see a significant boost with the multi-resolution approach compared to a previous state-of-the-art GCN predictor, with half of the test cases increasing in AUC-PR by an average of 0.15 and the other half decreasing by only 0.05. We further introduce a clustering algorithm that takes advantage of the contiguity yielded by our model, grouping the raw predictions into a small set of discrete potential epitopes. We show that within the top 3 clusters, 73% of test cases contain a cluster covering most of the actual epitope, demonstrating the utility of contiguous predictions for guiding experimental methods by yielding a small set of reasonable hypotheses for further investigation.