"Language Translation as Codebreaking"
Kevin Knight, USC/Information Sciences Institute & Department of Computer Science
Thursday, December 6, 2001 at 4:00 P.M.
CIT, Room 165
It is possible to view natural language translation (e.g., Japanese to English) as a decoding problem. When we look at a Japanese sentence, we imagine it was originally written in English plaintext, but subsequently that plaintext was encoded into Japanese. The exact encoding procedure is still a mystery to science, but we can begin to approximate it by collecting statistics over large documents that have been translated by hand. For some language pairs, these documents now reach into the tens of millions of words.
This talk will start with the (simpler) problem of translating names and loanwords, which occur frequently in newspaper text. For example, if we see "anjiranaito" in Japanese, we must guess that the correct translation into English is "Angela Knight." (In such cases, the Japanese string really is an encoding of some original English string!) It is possible to get good empirical results on this task, and the computations are tractable.
I will also discuss sentence translation algorithms. Here the computational problems are much more severe, and decoders are only able to examine a fraction of the search space of possible translations. I will present new decoding algorithms and compare their behaviors in terms of time and output quality.
Kevin Knight is a Project Leader at the USC Information Sciences Institute (ISI) and a Research Assistant Professor in the Computer Science Department at USC. He received a Ph.D. in compare science from Carnegie Mellon University in 1991, and before that his bachelor's degree at Harvard University. Dr. Knight's research interests include natural language analysis and generation, machine translation, statistical modeling, and large-scale applications of artificial intelligence. He serves on the editorial boards of Computational Linguistics and the Journal of Artificial Intelligence Research. In 2000 he received a AAAI Outstanding Paper Award, and in 2001 he was recipient of an ACL Best Paper Award. Most recently, he was Program Chair for the North American ACL conference held in Pittsburgh in June, 2001, and he is now area coordinator for machine translation research in the translingual TIDES program of the Defense Advanced Research Projects Agency (DARPA).
Host: Eugene Charniak
To watch the video for this lecture, please click here.