Document Type

Technical Report

Publication Date

6-7-1993

Technical Report Number

PCS-TR93-192

Abstract

We present a system for recognizing off-line cursive English text, guided in part by global characteristics of the handwriting. A new method for finding the letter boundaries, based on minimizing a heuristic cost function, is introduced. The function is evaluated at each point along the baseline of the word to find the best possible segmentation points. The algorithm tries to find all the actual letter boundaries and as few additional ones as possible. After size and slant normalizations, the segments are classified by a one hidden layer feedforward neural network. The word recognition algorithm finds the segmentation points that are likely to be extraneous and generates all possible final segmentations of the word, by either keeping or removing them. Interpreting the output of the neural network as posterior probabilities of letters, it then finds the word that maximizes the probability of having produced the image, over a set of 30,000 words and over all the possible final segmentations. We compared two hypotheses for finding the likelihood of words that are in the lexicon and found that using a Hidden Markov Model of English is significantly less successful than assuming independence among the letters of a word. In our initial test with multiple writers, 61% of the words were recognized correctly.

Share

COinS