Date of Award
6-1-1999
Document Type
Thesis (Undergraduate)
Department or Program
Department of Computer Science
First Advisor
Javed Aslam
Abstract
The need for a more effective similarity measure is growing as a result of the astonishing amount of information being placed online. Most existing similarity measures are defined by empirically derived formulas and cannot easily be extended to new applications. We present a pairwise document similarity measure based on Information Theory, and present corpus dependent and independent applications of this measure. When ranked with existing similarity measures over TREC FBIS data, our corpus dependent information theoretic similarity measure ranked first.
Recommended Citation
Isaacs, Jeffrey D., "Investigating Measures for Pairwise Document Similarity" (1999). Dartmouth College Undergraduate Theses. 201.
https://digitalcommons.dartmouth.edu/senior_theses/201
Comments
Originally posted in the Dartmouth College Computer Science Technical Report Series, number PCS-TR99-357.