Document Type

Technical Report

Publication Date

10-1-1997

Technical Report Number

PCS-TR97-324

Abstract

We present and analyze the off-line star algorithm for clustering static information systems and the on-line star algorithm for clustering dynamic information systems. These algorithms partition a document collection into a number of clusters that is naturally induced by the collection. We show a lower bound on the accuracy of the clusters produced by these algorithms. We use the random graph model to show that both star algorithms produce correct clusters in time Theta(V + E). Finally, we provide data from extensive experiments.

Comments

Submitted to the 1998 SIGIR Conference.

Share

COinS