Edit Detection and Parsing for Transcribed Speech
Eugene Charniak and Mark Johnson
We present a simple architecture for parsing transcribed speech in
which an edited-word detector first removes such words from the
sentence string, and then a standard statistical parser trained on
transcribed speech parses the remaining words. The edit detector
achieves a misclassification rate on edited words of 2.2%. (The
NULL-model (which marks everything as not edited) has an error rate of
5.9%.) To evaluate our parsing results we introduce a new evaluation
metric, the purpose of which is to make evaluation of a parse tree
relatively indifferent to the exact tree position of EDITED nodes. By
this metric the parser achieves 85.3% precision and 86.5% recall.