Date of Award


Document Type

Thesis (Ph.D.)

Department or Program

Department of Computer Science

First Advisor

Chris Bailey-Kellogg


The study of three-dimensional protein structures produces insights into protein function at the molecular level. Graphs provide a natural representation of protein structures and associated experimental data, and enable the development of graph algorithms to analyze the structures and data. This thesis develops such graph representations and algorithms for two novel applications: structure-based NMR resonance assignment and disulfide cross-link experiment planning for protein fold determination. The first application seeks to identify correspondences between spectral peaks in NMR data and backbone atoms in a structure (from x-ray crystallography or homology modeling), by computing correspondences between a contact graph representing the structure and an analogous but very noisy and ambiguous graph representing the data. The assignment then supports further NMR studies of protein dynamics and protein-ligand interactions. A hierarchical grow-and-match algorithm was developed for smaller assignment problems, ensuring completeness of assignment, while a random graph approach was developed for larger problems, provably determining unique matches in polynomial time with high probability. Test results show that our algorithms are robust to typical levels of structural variation, noise, and missings, and achieve very good overall assignment accuracy. The second application aims to rapidly determine the overall organization of secondary structure elements of a target protein by probing it with a set of planned disulfide cross-links. A set of informative pairs of secondary structure elements is selected from graphs representing topologies of predicted structure models. For each pair in this ``fingerprint'', a set of informative disulfide probes is selected from graphs representing residue proximity in the models. Information-theoretic planning algorithms were developed to maximize information gain while minimizing experimental complexity, and Bayes error plan assessment frameworks were developed to characterize the probability of making correct decisions given experimental data. Evaluation of the approach on a number of structure prediction case studies shows that the optimized plans have low risk of error while testing only a very small portion of the quadratic number of possible cross-link candidates.


Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2010-675.