The 33rd IPP Symposium

Translingual Information Processing

Salim Roukos, Manager, Natural Language Technologies Department

IBM T.J. Watson Research Center

Searching unstructured information in the form of (largely) text with increasing image, audio and video content is fast becoming a daily activity for many people. Increasingly, the content is becoming multilingual (e.g. one such trend is that non-english speakers became the majority of online users in the summer of 2001 and continue to increase their share). To help assist users with accessing answers to their information needs regardless of the original language of the relevant content, we at IBM Research have a number of projects to handle multilingual content ranging from machine translation, information extraction, to topic detection and tracking. In this talk, we will present an overview of our work on statistical machine translation and information extraction. We will also demonstrate a cross-lingual search engine to search foreign language content using English queries.