Thesis Proposal


"Cross-Document Coreference Resolution for Entities and Events"

Christopher Tanner

Wednesday, May 16, 2018 at 4:00 P.M.

Room 368 (CIT 3rd Floor)

Coreference Resolution is a fundamental natural language processing (NLP) problem, as it attempts to resolve which underlying discourse objects refer to one another. Further, it serves as an essential component of many other core NLP tasks, including information extraction, question-answering, document summarization, etc. However, decades of research have primarily focused on resolving entities (people, locations, organizations), with significantly less attention given to events -- the actions performed. Also, systems almost always use third-party software to first determine which exact pieces of text (i.e., "mentions") to resolve, and these two lines of research have remained disjoint.

This proposal outlines research which aims to improve coreference resolution by taking a comprehensive approach. First, we develop a novel state-of-the-art event-based system which (1) uses significantly fewer lexical features than most existing systems; and (2) overcomes pitfalls from the commonly-used clustering approaches. Next, we plan to jointly model both entities and events, while uniquely leveraging each model to benefit the other. Last, we propose merging the mention detection and coreference lines of research, with the idea that accurately performing coreference on a given set of candidate mentions could improve mention detection, and vice versa.

Host: Professor Eugene Charniak