Tech Report CS-95-29

A Statistical Syntactic Disambiguation Program and What it Learns

Murat Ersan and Eugene Charniak

October 1995

Abstract:

We describe a program that uses statistical information on word-usage to perform syntactic disambiguation, and show that the use of this information significantly improves performance. The bulk of the paper, however, attempts to answer the question: what did the program learn that would account for this improvement? We show that the program has learned many linguistically recognized forms of lexical information, particularly verb case frames and prepositional preferences for nouns and adjectives. We also show that viewed simply as a learner of lexical information the program is also a success, performing slightly better than hand-crafted learning programs for the same tasks.

(complete text in pdf or gzipped postscript)