Date of Award


Document Type

Thesis (Ph.D.)


Department of Computer Science

First Advisor

Chris Bailey-Kellogg


In order to most effectively investigate protein structure and improve protein function, it is necessary to carefully plan appropriate experiments. The combinatorial number of possible experiment plans demands effective criteria and efficient algorithms to choose the one that is in some sense optimal. This thesis addresses experiment planning challenges in two significant applications. The first part of this thesis develops an integrated computational-experimental approach for rapid discrimination of predicted protein structure models by quantifying their consistency with relatively cheap and easy experiments (cross-linking and site-directed mutagenesis followed by stability measurement). In order to obtain the most information from noisy and sparse experimental data, rigorous Bayesian frameworks have been developed to analyze the information content. Efficient algorithms have been developed to choose the most informative, least expensive, and most robust experiments. The effectiveness of this approach has been demonstrated using existing experimental data as well as simulations, and it has been applied to discriminate predicted structure models of the pTfa chaperone protein from bacteriophage lambda. The second part of this thesis seeks to choose optimal breakpoint locations for protein engineering by site-directed recombination. In order to increase the possibility of obtaining folded and functional hybrids in protein recombination, it is necessary to retain the evolutionary relationships among amino acids that determine protein stability and functionality. A probabilistic hypergraph model has been developed to model these relationships, with edge weights representing their statistical significance derived from database and a protein family. The effectiveness of this model has been validated by showing its ability to distinguish functional hybrids from non-functional ones in existing experimental data. It has been proved to be NP-hard in general to choose the optimal breakpoint locations for recombination that minimize the total perturbation to these relationships, but exact and approximate algorithms have been developed for a number of important cases.


Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2008-614.