Date of Award
5-1-2003
Document Type
Thesis (Undergraduate)
Department or Program
Department of Computer Science
First Advisor
Jay Aslam
Abstract
Evaluating retrieval systems, such as those submitted to the annual TREC competition, usually requires a large number of documents to be read and judged for relevance to query topics. Test collections are far too big to be exhaustively judged, so only a subset of documents is selected to form the judgment ``pool.'' The selection method that TREC uses produces pools that are still quite large. Research has indicated that it is possible to rank the retrieval systems correctly using substantially smaller pools. This paper introduces an active learning algorithm whose goal is to reach the correct rankings using the smallest possible number of relevance judgments. It adds one document to the pool at a time, always trying to select the document with the highest information gain. Several variants of this algorithm are described, each with improvements on the one before. Results from experiments are included for comparison with the traditional TREC pooling method. The best version of the algorithm reliably outperforms the traditional method, although its degree of improvement varies.
Recommended Citation
Torrey, Lisa A., "An Active Learning Approach to Efficiently Ranking Retrieval Engines" (2003). Dartmouth College Undergraduate Theses. 28.
https://digitalcommons.dartmouth.edu/senior_theses/28
Comments
Originally posted in the Dartmouth College Computer Science Technical Report Series, number TR2003-449.