Software and data
- CV •
- Publications •
- Software and data •
- Students •
- Teaching
Software
Nematus - an attention-based encoder-decoder model for neural machine translation
subword-nmt subword segmentation scripts for neural machine translation, including byte-pair encoding (BPE).
Zmorge - Zurich Morphological Lexicon for German
clevertagger - morphologically informed POS-tagging
Bleualign - an MT-based sentence alignment tool
ParZu - The Zurich Dependency Parser for German online demo
Data
x-stance, a multilingual multi-target dataset for stance detection
ContraPro, a large-scale test set for the evaluation of context-aware pronoun translation in neural machine translation.
WMT 2017 systems Pre-trained neural models and training scripts for WMT 2017 shared translation task.
ContraWSD, a test set for NMT evaluation of word sense disambiguation.
code docstring corpus, a parallel corpus of Python functions and documentation strings.
LingEval97, a test set of contrastive translation pairs for NMT evaluation.
WMT 2016 systems Pre-trained neural models for WMT 2016 shared translation task.
WMT 2016 backtranslations Synthetic parallel data (back-translated monolingual data), used at WMT 2016.
WMT 2016 factors Linguistically annotated data sets (for factored neural MT).
WMT 2015 German treebank Dependency parses (with ParZu) of WMT 2015 training data.