Date of Award


Document Type

Thesis (Ph.D.)


Department of Computer Science

First Advisor

Chris Bailey-Kellogg


Site-directed protein recombination produces improved and novel protein variants by recombining sequence fragments from parent proteins. The resulting hybrids accumulate multiple mutations that have been evolutionarily accepted together. Subsequent screening or selection identifies hybrids with desirable characteristics. In order to increase the "hit rate" of good variants, this thesis develops experiment planning algorithms to optimize protein recombination experiments. First, to improve the frequency of generating novel hybrids, a metric is developed to assess the diversity among hybrids and parent proteins. Dynamic programming algorithms are then created to optimize the selection of breakpoint locations according to this metric. Second, the trade-off between diversity and stability in recombination experiment planning is studied, recognizing that diversity requires changes from parent proteins, which may also disrupt important residue interactions necessary for protein stability. Accordingly, methods based on dynamic programming are developed to provide combined optimization of diversity and stability, finding optimal breakpoints such that no other experiment plan has better performance in both aspects simultaneously. Third, in order to support protein recombination with heterogeneous structures and focus on functionally important regions, a general framework for protein fragment swapping is developed. Differentiating source and target parents, and swappable regions within them, fragment swapping enables asymmetric, selective site-directed recombination. Two applications of protein fragment swapping are studied. In order to generate hybrids inheriting functionalities from both source and target proteins by fragment swapping, a method based on integer programming selects optimal swapping fragments to maximize the predicted stability and activity of hybrids in the resulting library. In another application, human source protein fragments are swapped into therapeutic exogenous target protein to minimize the occurrence of peptides that trigger immune response. A dynamic programming method is developed to optimize fragment selection for both humanity and functionality, resulting in therapeutically active variants with decreased immunogenicity.


Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2010-672.