Document Type

Technical Report

Publication Date


Technical Report Number



Recognition of a protein's fold provides valuable information about its function. While many sequence-based homology prediction methods exist, an important challenge remains: two highly dissimilar sequences can have similar folds --- how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure for detecting 3D-structural homologies from sparse, unassigned protein NMR data. Our method identifies the 3D-structural models in a protein structural database whose geometries best fit the unassigned experimental NMR data. It does not use sequence information and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or sequence homology. The algorithm runs in O(pnk3) time, where p is the number of proteins in the database, n is the number of residues in the target protein, and k is the resolution of a rotation search. The method requires only uniform 15N-labelling of the protein and processes unassigned 1H-15N residual dipolar couplings, which can be acquired in a couple of hours. Our experiments on NMR data from 5 different proteins demonstrate that the method identifies closely related protein folds, despite low-sequence homology between the target protein and the computed model.


A revised and expanded version of this TR will appear as a refereed paper at the IEEE Computer Society Bioinformatics Conference , Stanford, California (August, 2003).