Date of Award
Department of Computer Science
The metasearch problem is to optimally merge the ranked lists output by an arbitrary number of search systems into one ranked list. In this work: (1) We show that metasearch improves upon not just the raw performance of the input search engines, but also upon the consistency of the input search engines from query to query. (2) We experimentally prove that simply weighting input systems by their average performance can dramatically improve fusion results. (3) We show that score normalization is an important component of a metasearch engine, and that dependence upon statistical outliers appears to be the problem with the standard technique. (4) We propose a Bayesian model for metasearch that outperforms the best input system on average and has performance competetive with standard techniques. (5) We introduce the use of Social Choice Theory to the metasearch problem, modeling metasearch as a democratic election. We adapt a positional voting algorithm, the Borda Count, to create a metasearch algorithm, acheiving reasonable performance. (6) We propose a metasearch model adapted from a majoritarian voting procedure, the Condorcet algorithm. The resulting algorithm is the best performing algorithm in a number of situations. (7) We propose three upper bounds for the problem, each bounding a different class of algorithms. We present experimental results for each algorithm using two types of experiments on each of four data sets.
Montague, Mark H., "Metasearch: Data Fusion for Document Retrieval" (2002). Dartmouth College Ph.D Dissertations. 3.