The Institute of Computational Linguistics investigates the integration of cross-sentence features in Statistical Machine Translation (SMT) systems to improve the quality of their translations. This research is motivated by the state-of-the-art Machine Translation (MT) systems which operate at sentence-level and do not propagate information through the sentences. A sequence of sentences in a document shows connectedness and this unawareness of discourse leads to incorrect translations.
Focus of the research project
- Discourse entitities (NP's, pronouns) and connectives.
- Automatically building domain-specific semantic networks (i.e., ontologies) to improve lexical cohesion in MT.
- Building and annotating corpora for discourse-aware MT.
- Integrating cross-sentence features in SMT.
- Improving lexical choice and consistency across the document in SMT.
- Performing experiments with static and dynamic semantic models to inform lexical choice for expressions referring to discourse entities.
- Mascarell, Laura (2017). Crossing Sentence Boundaries in Machine Translation (PDF, 1567 KB) University of Zurich, Faculty of Arts.
- Rios, Annette; Mascarell, Laura; Sennrich, Rico (2017). Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings. In: Second Conference on Machine Translation, Copenhagen, Denmark, 7 September 2017 - 8 September 2017.
- Mascarell, Laura (2017). Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation. In: Discourse in Machine Translation (DiscoMT), Copenhagen, Denmark, 8 September 2017.
- Pu, Xiao; Mascarell, Laura; Popescu-Belis, Andrei (2017). Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues. In: European Chapter of the Association for Computational Linguistics (EACL), Valencia, Spain, 3 April 2017 - 7 April 2017.
- Mascarell, Laura; Fishel, Mark; Korchagina, Natalia; Volk, Martin (2014). Enforcing Consistent Translation of German Compound Coreferences. In: Konvens, Hildesheim, Germany, 8 October 2014 - 10 October 2014.
The project was funded as a Sinergia project by the Swiss National Science Foundation and was running from August 2013 to September 2017.