Tech Report CS-07-04

A Generative Discourse-New Model for Text Coherence

Micha Elsner and Eugene Charniak

May 2007


Recent models of document coherence have focused on the referents of noun phrases, ignoring their syntax. However, syntax depends on discourse function; NPs which introduce new entities are often more complex. We develop a generative model for NP syntax which describes this difference. It can be used to model discourse coherence in the Wall Street Journal; combining it with the local coherence model of Elsner ('07) yields substantial improvements. Our model is competitive with previous systems on the discourse-new detection task; its performance is comparable to Uryupina ('03).

(complete text in pdf)