Header

Search

Machine Translation for Romansh Idioms

Machine translation systems for Romansh ↔ German currently focus on the standardized written variety Rumantsch Grischun. In this project, we expand the range of available systems to include the five idioms of Romansh (Sursilvan, Sutsilvan, Surmiran, Puter, Vallader).

Suitable data is particularly essential for enabling this technology. Thus, an important element of the project involves collecting parallel training data, as well as creating reference translations and ratings of translation quality for all five idioms.

To achieve these goals, we are collaborating closely with Lia Rumantscha and RTR, and are working together with institutions such as PHGR to make idiom-specific text data available to the research community. 

Group leader

Rico Sennrich

Project manager

Jannis Vamvas

Research Assistants

Angela Heldstab, Dominic Fischer, Zdeněk Šnajdr

 

Earlier team members

Sina Ahmadi, Zachary Hopton, Anna Rutkiewicz

 

Publications

Jannis Vamvas, Ignacio Pérez Prat, Not Battesta Soliva, and 14 others. 2025. Expanding the WMT24++ Benchmark with Rumantsch Grischun, Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader. In Proceedings of the Tenth Conference on Machine Translation (WMT 2025), pages 1028–1047, Suzhou, China. Association for Computational Linguistics. [cite] [data] [code]

Zachary Hopton, Jannis Vamvas, Andrin Büchler, Anna Rutkiewicz, Rico Cathomas, and Rico Sennrich. 2025. The Mediomatix Corpus: Parallel Data for Romansh Idioms via Comparable Schoolbooks. Pre-print. [cite] [data] [code]

Apertus Team. 2025. Apertus: Democratizing Open and Compliant LLMs for Global Language Environments. Technical Report. [cite] [model]

The Project in the News

The New Yorker: “Valley of Babel,” by Simon Akam. December 8, 2025 issue. https://www.newyorker.com/magazine/2025/12/08/a-very-big-fight-over-a-very-small-language

Blick Online: “Schweizer KI Apertus im Test: Wer ist Bundesrat Vinterti Monic?”, by Thomas Benkö and Tobias Bolzern. September 2, 2025. https://www.blick.ch/digital/schweizer-ki-apertus-im-test-wer-ist-bundesrat-vinterti-monic-id21193504.html

  • «Noch detaillierter wurde die Sprachkompetenz [von Apertus] von der Uni Zürich geprüft: Dort übersetzten Forscherinnen und Forscher mit Apertus über 1000 Texte in Rumantsch Grischun und die fünf Idiome Sursilvan, Sutsilvan, Surmiran, Puter und Vallader. Dabei war alles von Zeitungsartikeln über Science-Fiction bis zu YouTube-Tutorials. Ergebnis: Apertus schlägt sich besser als andere offene Modelle wie Llama oder GPT-OSS, erreicht aber längst nicht das Niveau menschlicher Übersetzer.»