Document Type

Technical Report

Publication Date


Technical Report Number



Identifying residue coupling relationships within a protein family can provide important insights into intrinsic molecular processes, and has significant applications in modeling structure and dynamics, understanding function, and designing new or modified proteins. We present the first algorithm to infer an undirected graphical model representing residue coupling in protein families. Such a model serves as a compact description of the joint amino acid distribution, and can be used for predictive (will this newly designed protein be folded and functional?), diagnostic (why is this protein not stable or functional?), and abductive reasoning (what if I attempt to graft features of one protein family onto another?). Unlike current correlated mutation algorithms that are focused on assessing dependence, which can conflate direct and indirect relationships, our algorithm focuses on assessing independence, which modularizes variation and thus enables efficient reasoning of the types described above. Further, our algorithm can readily incorporate, as priors, hypotheses regarding possible underlying mechanistic/energetic explanations for coupling. The resulting approach constitutes a powerful and discriminatory mechanism to identify residue coupling from protein sequences and structures. Analysis results on the G-protein coupled receptor (GPCR) and PDZ domain families demonstrate the ability of our approach to effectively uncover and exploit models of residue coupling.