Self-training

Download self-training data.

The DATA/ directory is an alternate data directory, trained from WSJ and NANC data using self-training. WSJ is given a relative weight of 5 and approximately 1,750k sentences from NANC (1,765,736 sentences total). On section 23 of the Penn Treebank, it achieves an f-score of 92.1% with the reranking parser. For more details, please see:

Make sure you have a new enough release of the BLLIP reranking parser from here or it will not be able to handle the larger vocabulary.

< Self-training for biomedical parsing
< Back to BLLIP
< Back to David McClosky's homepage