MODERN: Modeling Discourse Entities and Relations for Coherent Machine Translation

The Institute of Computational Linguistics investigates the integration of cross-sentence features in Statistical Machine Translation (SMT) systems to improve the quality of their translations. This research is motivated by the state-of-the-art Machine Translation (MT) systems which operate at sentence-level and do not propagate information through the sentences. A sequence of sentences in a document shows connectedness and this unawareness of discourse leads to incorrect translations.

The MODERN project is led by Idiap Research Institute and it also counts with the collaboration of theUtrecht Institute of Linguistics OTS and the University of Geneva, tackling different discourse-related problems.

Focus of the research project

  • Discourse entitities (NP's, pronouns) and connectives.
  • Automatically building domain-specific semantic networks (i.e., ontologies) to improve lexical cohesion in MT.
  • Building and annotating corpora for discourse-aware MT.
  • Integrating cross-sentence features in SMT.
  • Improving lexical choice and consistency across the document in SMT.
  • Performing experiments with static and dynamic semantic models to inform lexical choice for expressions referring to discourse entities.

Publications

  1. Mascarell, Laura (2017). Crossing Sentence Boundaries in Machine Translation (PDF, 1567 KB) University of Zurich, Faculty of Arts.
  2. Rios, Annette; Mascarell, Laura; Sennrich, Rico (2017). Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings. In: Second Conference on Machine Translation, Copenhagen, Denmark, 7 September 2017 - 8 September 2017.
  3. Mascarell, Laura (2017). Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation. In: Discourse in Machine Translation (DiscoMT), Copenhagen, Denmark, 8 September 2017.
  4. Pu, Xiao; Mascarell, Laura; Popescu-Belis, Andrei (2017). Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues. In: European Chapter of the Association for Computational Linguistics (EACL), Valencia, Spain, 3 April 2017 - 7 April 2017.
  5. Mascarell, Laura; Fishel, Mark; Korchagina, Natalia; Volk, Martin (2014). Enforcing Consistent Translation of German Compound Coreferences. In: Konvens, Hildesheim, Germany, 8 October 2014 - 10 October 2014.

Project head:

Researchers:

The project was funded as a Sinergia project by the Swiss National Science Foundation and was running from August 2013 to September 2017.