Jannis Vamvas

Jannis Vamvas, Dr.

Senior Academic Associate
Text Technology

Raumbezeichnung: AND 2.42

E-Mail

My research focuses on deep learning for natural language processing (NLP). I am interested in systems that use data in multiple languages and in how their quality can be evaluated.

Highlights from my research:

PhD thesis on Model-Based Evaluation of Multilinguality: Summary, Published thesis
SwissBERT, the multilingual language model for Switzerland: Explainer, Model weights

Questions that intrigue me:

How can maximum-quality text be generated with language models?
How can large language models be adapted to local data?
Since language models estimate word probabilities – what are creative ways of using those probabilities? Ideas we developed previously include: Contrastive Conditioning, Translation Cross-Likelihood, Omission Error Detection and Translation Direction Detection

Thesis Supervision

The next semester on which I can supervise new Bachelor's or Master's theses is Fall 2026. Please reach out early via email, and please provide a list of 2–3 topic ideas. This will help me understand your research interests and will help me give you optimal advice.

Short CV

Since January 2024: Academic associate at the Department of Computational Linguistics
- Collaborator InvestigaDiff project
- Project manager Machine Translation for Romansh Idioms
April 2023 − December 2023: Postdoctoral researcher, MUTAMUR project
2019 − March 2023: PhD student at the Department of Computational Linguistics, supervised by Rico Sennrich, Lena A. Jäger and Martin Volk.
Summer 2022: Applied Science Internship with Amazon AI Translate, Berlin
2018−2019: Research internship at Munich Re (NLP for Reinsurance Development)
2018−2019: Graduate teaching assistant for Prof. Dr. Hinrich Schütze, CIS Munich
2017−2019: M.Sc. in Computational Linguistics (major) and Computer Science (minor) at LMU Munich
2015−2017: Full-Stack Web Developer at Arteria GmbH, Basel
2011−2015: B.A. in Computer Science and Philosophy from the University of Basel

Publications

Michelle Wastl, Jannis Vamvas and Rico Sennrich. 2026. SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics, San Diego, California, United States. Association for Computational Linguistics. [cite] [data] [code]

Apertus Team. 2026. Apertus: Democratizing Open and Compliant LLMs for Global Language Environments. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics, San Diego, California, United States. Association for Computational Linguistics. [cite] [model]

Charlotte Model, Sina Ahmadi and Jannis Vamvas. 2026. Robust Language Identification for Romansh Varieties. In Proceedings of the 11th edition of the Swiss Text Analytics Conference, Zurich, Switzerland. Association for Computational Linguistics. [cite] [code]

Dominic P. Fischer, Zachary Hopton and Jannis Vamvas. 2026. RUMLEM: A Dictionary-Based Lemmatizer for Romansh. In Proceedings of the 11th edition of the Swiss Text Analytics Conference, Zurich, Switzerland. Association for Computational Linguistics. [cite] [code] [demo]

Jannis Vamvas, Ignacio Pérez Prat, Angela Heldstab, Dominic P. Fischer, Sina Ahmadi and Rico Sennrich. 2026. Translation Asymmetry in LLMs as a Data Augmentation Factor: A Case Study for 6 Romansh Language Varieties. Pre-print. [cite] [data] [model] [code]

Jannis Vamvas, Ignacio Pérez Prat, Not Battesta Soliva, and 14 others. 2025. Expanding the WMT24++ Benchmark with Rumantsch Grischun, Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader. In Proceedings of the Tenth Conference on Machine Translation (WMT 2025), pages 1028–1047, Suzhou, China. Association for Computational Linguistics. [cite] [data] [code]

Hanxu Hu, Jannis Vamvas and Rico Sennrich. 2025. Source-primed Multi-turn Conversation Helps Large Language Models Translate Documents. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 23702–23712, Suzhou, China. Association for Computational Linguistics. [cite] [code]

Zachary Hopton, Jannis Vamvas, Andrin Büchler, Anna Rutkiewicz, Rico Cathomas, and Rico Sennrich. 2026. The Mediomatix Corpus: Parallel Data for Romansh Language Varieties via Comparable Schoolbooks. In Findings of the Association for Computational Linguistics: EACL 2026, pages 290–306, Rabat, Morocco. Association for Computational Linguistics. [cite] [data] [code]

Patrick Haller, Jannis Vamvas, Rico Sennrich, and Lena Ann Jäger. 2025. Leveraging In-Context Learning for Political Bias Testing of LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24718–24738, Vienna, Austria. Association for Computational Linguistics. [cite] [code]

Michelle Wastl, Jannis Vamvas, and Rico Sennrich. 2025. Machine Translation Models are Zero-Shot Detectors of Translation Direction. In Findings of the Association for Computational Linguistics: ACL 2025, pages 1054–1074, Vienna, Austria. Association for Computational Linguistics. [cite] [code] [demo]

Michelle Wastl, Jannis Vamvas, Selena Calleri and Rico Sennrich. 2025. 20min-XD: A Comparable Corpus of Swiss News Articles. In Proceedings of the 10th edition of the Swiss Text Analytics Conference, pages 1–10, Winterthur, Switzerland. Association for Computational Linguistics. [cite] [code] [data] ★ best paper award

Patrick Haller, Jannis Vamvas and Lena A. Jäger. 2024. Yes, no, maybe? Revisiting language models' response stability under paraphrasing for the assessment of political leaning. In First Conference on Language Modeling, Philadelphia. [cite] [code]

Jannis Vamvas and Rico Sennrich. 2024. Linear-time Minimum Bayes Risk Decoding with Reference Aggregation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 790–801, Bangkok, Thailand. Association for Computational Linguistics. [cite] [code]

Juri Grosjean and Jannis Vamvas. 2024. Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents. In Proceedings of the 9th edition of the Swiss Text Analytics Conference, pages 41–49, Chur, Switzerland. Association for Computational Linguistics. [cite] [code] [model] ★ best scientific paper award

Anastassia Shaitarova, Nikolaj Bauer, Jannis Vamvas, and Martin Volk. 2024. Tracing Linguistic Footprints of ChatGPT Across Tasks, Domains and Personas in English and German. In Proceedings of the 9th edition of the Swiss Text Analytics Conference, pages 102–112, Chur, Switzerland. Association for Computational Linguistics. [cite] [code]

Jannis Vamvas, Noëmi Aepli and Rico Sennrich. 2024. Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect. In Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024), pages 16–23, St Julians, Malta. Association for Computational Linguistics. [cite] [code] [model] [blog]

Rico Sennrich, Jannis Vamvas and Alireza Mohammadshahi. 2024. Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 21–33, St. Julian’s, Malta. Association for Computational Linguistics. [cite] [code]

Alireza Mohammadshahi, Jannis Vamvas and Rico Sennrich. 2024. Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models. In Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 169–180, Mexico City, Mexico. Association for Computational Linguistics. [cite] [code]

Jannis Vamvas and Rico Sennrich. 2023. Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13543–13552, Singapore. Association for Computational Linguistics. [cite] [code] [model] [data] [demo]

Jannis Vamvas, Tobias Domhan, Sony Trenous, Rico Sennrich and Eva Hasler. 2023. Trained MT Metrics Learn to Cope with Machine-translated References. In Proceedings of the Eighth Conference on Machine Translation, pages 983–995, Singapore. Association for Computational Linguistics. [cite] [code]

Jannis Vamvas. 2023. Model-based Evaluation of Multilinguality. Ph.D. thesis, University of Zurich. [cite] [blog] ★ EAMT 2024 highly commended thesis

Jannis Vamvas, Johannes Graën and Rico Sennrich. 2023. SwissBERT: The Multilingual Language Model for Switzerland. In Proceedings of the 8th edition of the Swiss Text Analytics Conference, pages 54–69, Neuchatel, Switzerland. Association for Computational Linguistics. [cite] [code] [model] [data] [blog]

Jannis Vamvas and Rico Sennrich. 2022. NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 198–213, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. [cite] [code] [blog]

Jannis Vamvas and Rico Sennrich. 2022. As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 490–500, Dublin, Ireland. Association for Computational Linguistics. [cite] [code] [blog]

Renate Hauser, Jannis Vamvas, Sarah Ebling and Martin Volk. 2022. A Multilingual Simplified Language News Corpus. In Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference, pages 25–30, Marseille, France. European Language Resources Association. [cite] [data]

Jannis Vamvas and Rico Sennrich. 2021. On the Limits of Minimal Pairs in Contrastive Evaluation. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 58–68, Punta Cana, Dominican Republic. Association for Computational Linguistics. [cite] [code] [blog] ★ best paper award

Jannis Vamvas and Rico Sennrich. 2021. Contrastive Conditioning for Assessing Disambiguation in MT: A Case Study of Distilled Bias. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10246–10265, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. [cite] [code] [blog, blog]

Jannis Vamvas and Rico Sennrich. 2020. X-Stance: A Multilingual Multi-Target Dataset for Stance Detection. In Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), Zurich, Switzerland. [cite] [code] [data] [talk] [blog] ★ best video award

Teaching

Spring 2026	Lecturer CAS Generative AI
Spring 2026	Lecturer Text Generation with Language Models
Spring 2026	Lecturer Mathematical Foundations of Computational Linguistics
Fall 2025	Lecturer CAS Generative AI
Fall 2025	Lecturer Large Language Models
Fall 2025	Lecturer Language Technology and Web Applications
Spring 2025	Lecturer Text Generation with Language Models
Spring 2025	Lecturer Mathematical Foundations of Computational Linguistics
Fall 2024	Lecturer CAS Generative AI
Fall 2024	Lecturer Large Language Models
Fall 2024	Lecturer Language Technology and Web Applications
Spring 2024	Lecturer Text Generation with Language Models
Spring 2024	Co-lecturer Mathematical Foundations of Computational Linguistics
Fall 2023	Organizer Colloquium Computational Linguistics
Fall 2022	Lecturer Language Technology and Web Applications
Fall 2021	Co-instructor Ethical Aspects of NLP
Fall 2021	Lecturer Language Technology and Web Applications
Spring 2021	Co-lecturer Programming Techniques in Computational Linguistics 2
Fall 2020	Lecturer Language Technology and Web Applications
Spring 2020	Co-lecturer Programming Techniques in Computational Linguistics 2

Quicklinks

Hauptnavigation

Jannis Vamvas, Dr.

Thesis Supervision

Short CV

Publications

Teaching