Dartmouth Scholarship

Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum

Ryan H. Lilien, Dartmouth College
Hany Farid, Dartmouth CollegeFollow
Bruce R. Donald, Dartmouth CollegeFollow

Document Type

Article

Publication Date

7-5-2004

Publication Title

Journal of Computational Biology

Department

Department of Computer Science

Additional Department

Geisel School of Medicine

Abstract

We have developed an algorithm called Q5 for probabilistic classification of healthy vs. disease whole serum samples using mass spectrometry. The algorithm employs Principal Components Analysis (PCA) followed by Linear Discriminant Analysis (LDA) on whole spectrum Surface-Enhanced Laser Desorption/Ionization Time of Flight (SELDI-TOF) Mass Spectrometry (MS) data, and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum.

Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a novel probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is non-iterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques, and can provide clues as to the molecular identities of differentially-expressed proteins and peptides.

DOI

10.1089/106652703322756159

Comments

Listed in the Dartmouth College Computer Science Technical Report Series as TR2002-434.

Original Citation

Ryan H. Lilien, Hany Farid, and Bruce R. Donald. Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. Journal of Computational Biology. Dec 2003. 925-946. http://doi.org/10.1089/10665270332275615

Dartmouth Digital Commons Citation

Lilien, Ryan H.; Farid, Hany; and Donald, Bruce R., "Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum" (2004). Dartmouth Scholarship. 4043.
https://digitalcommons.dartmouth.edu/facoa/4043

Download

Included in

Chemistry Commons, Computer Sciences Commons, Medicine and Health Sciences Commons

COinS

Dartmouth Scholarship

Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum

Document Type

Publication Date

Publication Title

Department

Additional Department

Abstract

DOI

Comments

Original Citation

Dartmouth Digital Commons Citation

Included in

Browse

Search

Contribute

Links

Questions?

Dartmouth Scholarship

Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum

Authors

Document Type

Publication Date

Publication Title

Department

Additional Department

Abstract

DOI

Comments

Original Citation

Dartmouth Digital Commons Citation

Included in

Share

Browse

Search

Contribute

Links

Questions?