Software and data


Nematus - an attention-based encoder-decoder model for neural machine translation

subword-nmt scripts for subword segmentation that we use for neural machine translation, including byte-pair encoding (BPE).

ParZu - The Zurich Dependency Parser for Germanonline demo

Zmorge - Zurich Morphological Lexicon for German

Bleualign - an MT-based sentence alignment tool

clevertagger - morphologically informed POS-tagging

wmt2014-scripts - scripts and configuration files to (partially) reproduce systems submitted to WMT2014/5 shared translation task for English-German.


Human parity evaluation Human judgements collected for evaluating whether NMT has achieved human parity in document-level evaluation.

ContraPro, a large-scale test set for the evaluation of context-aware pronoun translation in neural machine translation.

WMT 2017 systems Pre-trained neural models and training scripts for WMT 2017 shared translation task.

ContraWSD, a test set for NMT evaluation of word sense disambiguation.

code docstring corpus, a parallel corpus of Python functions and documentation strings.

LingEval97, a test set of contrastive translation pairs for NMT evaluation.

WMT 2016 systems Pre-trained neural models for WMT 2016 shared translation task.

WMT 2016 backtranslations Synthetic parallel data (back-translated monolingual data), used at WMT 2016.

WMT 2016 factors Linguistically annotated data sets (for factored neural MT ).

WMT 2015 German treebank Dependency parses (with ParZu) of WMT 2015 training data.