Tech Report CS-99-12

A Maximum-Entropy-Inspired Parser

Eugene Charniak

August 1999


We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established ``standard'' sections of the Wall Street Journal tree-bank. This represents a 15% decrease in error rate over the best single-parser results on this corpus. The major technical innovation in this parser is the use of a ``maximum-entropy-inspired'' model for conditioning and smoothing that allowed us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's pre-terminal before guessing the lexical head.

(complete text in pdf or gzipped postscript)