AI Lunch, July 21st, 1995

Deterministic Part-Of-Speech Tagging with Finite State Transducers

Yves Schabes

Mitsubishi Electric Research Laboratories

Boston, MA, USA

Joint work with Emmanuel Roche (MERL).

A method for inferring a finite-state transducer is presented and applied to the problem of part-of-speech disambiguation.

Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state of the art part-of-speech tagging can be achieved with a rule-based tagger by inferring rules from a training corpus. However current implementations of the rule-based tagger run more slowly than previous approaches. We present a finite-state tagger inspired by the rule-based tagger which operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to deterministically follow a single path in a deterministic finite state machine. This result is achieved by encoding the application of the inferred rules found in the tagger as a non-deterministic finite state transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a part-of-speech tagger whose speed is dominated by the access time of mass storage devices.