Date of Award

6-8-1999

Document Type

Thesis (Undergraduate)

Department or Program

Department of Computer Science

First Advisor

Javed A. Aslam

Abstract

The problems of word sense disambiguation and document indexing for information retrieval have been extensively studied. It has been observed that indexing using disambiguated meanings, rather than word stems, should improve information retrieval results. We present a new corpus-based algorithm for performing word sense disambiguation. The algorithm does not need to train on many senses of each word; it uses instead the probability that certain concepts will occur together. That algorithm is then used to index several corpa of documents. Our indexing algorithm does not generally outperform the traditional stem-based tf.idf model.

Comments

Originally posted in the Dartmouth College Computer Science Technical Report Series, number PCS-TR99-352.

COinS