The Institute of Computational Linguistics investigates the integration of cross-sentence features in Statistical Machine Translation (SMT) systems to improve the quality of their translations. This research is motivated by the state-of-the-art Machine Translation (MT) systems which operate at sentence-level and do not propagate information through the sentences. A sequence of sentences in a document shows connectedness and this unawareness of discourse leads to incorrect translations.
Focus of the research project
- Discourse entitities (NP's, pronouns) and connectives.
- Automatically building domain-specific semantic networks (i.e., ontologies) to improve lexical cohesion in MT.
- Building and annotating corpora for discourse-aware MT.
- Integrating cross-sentence features in SMT.
- Improving lexical choice and consistency across the document in SMT.
- Performing experiments with static and dynamic semantic models to inform lexical choice for expressions referring to discourse entities.
The project is funded as a Sinergia project by the Swiss National Science Foundation and started in August 2013.