Eugene Charniak is interested in programming computers to understand language so that they will be able to perform such tasks as answering questions and holding a conversation. This is far beyond our current capabilities, so research proceeds by dividing the problem up into manageable subparts. Prof. Charniak's research is called "statistical language learning." He and his students write programs that collect statistical information about language from large amounts of text, then apply the statistics to new examples. For example, much of his recent research has been on statistical models of syntactic parsing, grammatically identifying parts of speech and learning the rules for sentence formation, an exercise akin to the sentence diagramming that most of us did in school. Most researchers believe it is a small but important step toward true language understanding.
Prof. Charniak and his students have also been looking at statistics-based programs for determining the referents of pronouns. For example, in the sentence, "After Helen cleaned the piano she played some Brahms," the program would be trained to figure out that "she" refers to "Helen" (and not "the piano"). He uses statistics that relate the probability of a particular referent based upon factors like how far back it is in the text, the typical gender of the antecedent phrase ("Helen" vs. "the piano"), etc. His motivation is primarily theoretical, an effort to learn how language understanding is possible. However, there are also many applications for this research, including automatic language translation, computer telephone operators, and web search engines that answer questions.