MODERN: Modeling Discourse Entities and Relations for Coherent Machine Translation

The Institute of Computational Linguistics investigates the integration of cross-sentence features in Statistical Machine Translation (SMT) systems to improve the quality of their translations. This research is motivated by the state-of-the-art Machine Translation (MT) systems which operate at sentence-level and do not propagate information through the sentences. A sequence of sentences in a document shows connectedness and this unawareness of discourse leads to incorrect translations.

The MODERN project is led by Idiap Research Institute and it also counts with the collaboration of theUtrecht Institute of Linguistics OTS and the University of Geneva, tackling different discourse-related problems.

Focus of the research project

  • Discourse entitities (NP's, pronouns) and connectives.
  • Automatically building domain-specific semantic networks (i.e., ontologies) to improve lexical cohesion in MT.
  • Building and annotating corpora for discourse-aware MT.
  • Integrating cross-sentence features in SMT.
  • Improving lexical choice and consistency across the document in SMT.
  • Performing experiments with static and dynamic semantic models to inform lexical choice for expressions referring to discourse entities.

Project head:


The project is funded as a Sinergia project by the Swiss National Science Foundation and started in August 2013.