AI Lunch, July 21st, 1995
Deterministic Part-Of-Speech Tagging with Finite State Transducers
Yves Schabes
Mitsubishi Electric Research Laboratories
Boston, MA, USA
Joint work with Emmanuel Roche (MERL).
A method for inferring a finite-state transducer is presented and applied to
the problem of part-of-speech disambiguation.
Stochastic approaches to natural language processing have often been
preferred to rule-based approaches because of their robustness and their
automatic training capabilities. This was the case for part-of-speech tagging
until Brill showed how state of the art part-of-speech tagging can be achieved
with a rule-based tagger by inferring rules from a training corpus. However
current implementations of the rule-based tagger run more slowly than previous
approaches. We present a finite-state tagger inspired by the rule-based
tagger which operates in optimal time in the sense that the time to assign
tags to a sentence corresponds to the time required to deterministically
follow a single path in a deterministic finite state machine. This result is
achieved by encoding the application of the inferred rules found in the tagger
as a non-deterministic finite state transducer and then turning it into a
deterministic transducer. The resulting deterministic transducer yields a
part-of-speech tagger whose speed is dominated by the access time of mass
storage devices.