skip navigation

This page looks better in modern browsers. Please upgrade.

Brown Home Brown Home Brown Home Brown CS

Thesis Defense

 

"Unsupervised Bayesian Lexicalized Dependency Grammar Induction"

Will Headden

Tuesday, November 24, 2009 at 9:00 A.M.

Lubrano Conference Room (CIT 4th Floor)

This dissertation investigates learning dependency grammars for statistical natural language parsing from corpora without parse tree annotations. Most successful work in unsupervised dependency grammar induction has assumed that the input consists of sequences of parts-of-speech, ignoring words and using extremely simple probabilistic models. However, supervised parsing has long shown the value of more sophisticated models which use lexical features. These more sophisticated models however require probability distributions with complex conditioning information, which must be smoothed to avoid sparsity issues.

In this work we explore several dependency grammars that use smoothing, and lexical features. We explore a variety of different smoothing regimens, and find that smoothing is helpful for even unlexicalized models such as the Dependency Model with Valence.

Furthermore, adding lexical features yields the highest accuracy dependency induction on the Penn Treebank WJS10 corpus to date.

Host: Mark Johnson


Page Owner: Webmaster Last Modified: Mon Oct 26 13:32:32 2009