Unsupervised learning of name structure from coreference data

Eugene Charniak

We present two methods for learning the structure of person names from unlabeled data. The first simply uses a few implicit constraints governing this structure to gain a toe-hold on the problem --- e.g., descriptors come before first names, which come before middle names, etc. The second model also uses possible coreference information. We found that coreference constraints on names improves the performance of the models from 92.6% to 97.0%. We are interested in this problem in it's own right, but also as a possible way to improve named entity recognition (by recognizing the structure of different kinds of names) and as a way to improve noun-phrase coreference determination.