Resources
Datasets and resources created by members of the group.
SignCLIP: Connecting Text and Sign Language by Contrastive Learning
We present SignCLIP, which re-purposes CLIP (Contrastive Language-Image Pretraining) to project spoken language text and sign language videos, two classes of natural languages of distinct modalities, into the same space. SignCLIP is an efficient method of learning useful visual representations for sign language processing from large-scale, multilingual video-text pairs.
Links:
SwissADT – An Audio Description Translation System for Swiss Languages
We present SwissADT, the first audio description translation system implemented for three main Swiss languages and English. By collecting well-crafted AD data augmented with video clips in German, French, Italian, and English, and leveraging the power of Large Language Models (LLMs), we aim to enhance information accessibility for diverse language populations in Switzerland by automatically translating AD scripts to the desired Swiss language.
We believe that combining human expertise with the generation power of LLMs can further enhance the performance of ADT systems, ultimately benefiting a larger multilingual target population.
Links
Sign Language Processing Demo on Google Colab
This Colab notebook is initially an exercise for the course Artificial Intelligence for Language Accessibility (521j538a), which demonstrates some of our work:
https://colab.research.google.com/drive/1VaUIdrLRWiaNGb_4z8kSl6B1ZH4HdNic#scrollTo=yw7P4wTBSA6g
SwissSLi: the Multi-parallel Sign Language Corpus for Switzerland
Zifan Jiang, Anne Göhring, Amit Moryossef, Rico Sennrich, Sarah Ebling
First sign language corpus that contains parallel data of all three Swiss sign languages, namely Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), and Italian Sign Language of Switzerland (LIS-CH).
The data underlying this corpus originates from television programs in three spoken languages: German, French, and Italian.
The programs have for the most part been translated into sign language by deaf translators, resulting in a unique, up to six-way multi-parallel dataset between spoken and sign languages.
Paper
SwissSLi: The Multi-parallel Sign Language Corpus for Switzerland (2024)
Source code
Available on SWISSUbase.
Workshop on Machine Translation Shared Task on Sign Language Translation (WMT-SLT '23) Data
The second edition of the WMT shared task on sign language translation.
Novel task requiring the processing of visual information (video frames, human pose estimation) beyond the standard paradigm of text-to-text machine translation with training data for Swiss German Sign Language (DSGS) / German (DE), French Sign Language (LSF) / French, and Italian Sign Language (LIS) / Italian.
Source code
Usage examples, documentation, and code available on WMT-SLT.
Data
Training data
Zifan Jiang, Mathias Müller, Sarah Ebling, Amit Moryossef, Robin Ribback (2023)
Available upon request on SwissUbase.
Further information on https://www.wmt-slt.com/data.
Workshop on Machine Translation Shared Task on Sign Language Translation (WMT-SLT '22) Data
Novel task requiring the processing of visual information (video frames, human pose estimation) beyond the standard paradigm of text-to-text machine translation with training data for Swiss German Sign Language (DSGS) and German (DE).
Source code
Usage examples, documentation, and code available on WMT-SLT and GitHub.
Data
Training data
Mathias Müller, Sarah Ebling, Necati Cihan Camgöz, Zifan Jiang, Alessia Battisti, Amit Moryossef, Annette Rios, Richard Bowden, Ryan Wong
Available upon request on Zenodo.
Videos and subtitles
Mathias Müller, Sarah Ebling, Necati Cihan Camgöz, Zifan Jiang, Alessia Battisti, Katja Tissi, Sandra Sidler-Miserez, Regula Perrollaz, Michèle Berger, Sabine Reinhard, Amit Moryossef, Annette Rios, Richard Bowden, Ryan Wong, Robin Ribback, Severine Schori
Available upon request on Zenodo.
Poses
Mathias Müller, Sarah Ebling, Necati Cihan Camgöz, Zifan Jiang, Alessia Battisti, Katja Tissi, Sandra Sidler-Miserez, Regula Perrollaz, Michèle Berger, Sabine Reinhard, Amit Moryossef, Annette Rios, Richard Bowden, Ryan Wong
Available upon request on Zenodo.
Okra: Mobile App for Conducting Reading Comprehension Experiments
Andreas Säuberli
Mobile (Android/iOS) app enabling participation in cloze tests, lexical decision tasks, multiple-choice reading comprehension, n-back working memory tasks, picture naming, and reaction time tests in English, German, French, and Italian.
Source code
Available on GitHub.
Paper
Sentence Alignments from the Austria Press Agency Corpus
Nicolas Spring, Annette Rios, Sarah Ebling
Alignments extracted with LHA (Nikolov and Hahnloser, 2019) for CEFR levels A2 and B1 to the original standard German text from Austria Press Agency (Austria Presse Agentur, APA) news items between August 2018 and April 2021.
Data
Available upon request on Zenodo.
Paper
Exploring German Multi-Level Text Simplification (2021)
Source code
Available on GitHub.
Corpus for Automatic Readability Assessment and Text Simplification of German
Alessia Battisti, Dominik Pfütze, Andreas Säuberli, Marek Kostrzewa, Sarah Ebling
Corpus of parallel and monolingual-only (simplified German) data compiled from web sources containing additional information on text structure, typography, and images.
Paper
A Corpus for Automatic Readability Assessment and Text Simplification of German (2020)
Source code
Available upon request on Zenodo.
SATEF: Sentence Alignment Tools Evaluation Framework
SMILE Swiss German Sign Language Dataset
Sarah Ebling, Necati Cihan Camgöz, Penny Boyes Braem, Katja Tissi, Sandra Sidler-Miserez, Stephanie Stoll, Simon Hadfield, Tobias Haug, Richard Bowden, Sandrine Tornay, Marzieh Razavi, Mathew Magimai Doss
Large-scale dataset containing videotaped repeated productions of 100 items of a vocabulary test with associated transcriptions and annotations, consisting of data from 11 adult L1 signers and 19 adult L2 learners of DSGS.
Paper
SMILE Swiss German Sign Language Dataset (2018)
Source code
Available upon request on Zenodo.
“20 Minuten” Data for Document-level Text Simplification
Annette Rios, Nicolas Spring, Tannon Kew, Marek Kostrzewa, Andreas Säuberli, Mathias Müller, Sarah Ebling
Dataset of full articles (in German) from the Swiss news magazine '20 Minuten' paired with simplified summaries.
Data
Scripts and instructions for downloading the data available on GitHub.
Paper
20 Minuten: A Multi-task News Summarisation Dataset for German (2023)
A New Dataset and Efficient Baselines for Document-level Text Simplification in German (2021)