The DATA/ directory is an alternate data directory, trained from WSJ and 266,664 randomly collected biomedical abstracts from PubMed. Using the standard WSJ-trained reranker (included with the BLLIP reranking parser), this model achieves an f-score of 84.3% on the GENIA treebank beta 2 test set. A paper describing this dataset is under preparation.
Make sure you have a new enough release of the BLLIP reranking parser from here or it will not be able to handle the larger vocabulary.
< Self-training for non-biomedical parsing
< Back to BLLIP
< Back to David McClosky's homepage