Datasets and Resources

Datasets and resources created by members of the group.

Sign Language Processing Demo on Google Colab

This Colab notebook is initially an exercise for the course Artificial Intelligence for Language Accessibility (521j538a), which demonstrates some of our work:

https://colab.research.google.com/drive/1VaUIdrLRWiaNGb_4z8kSl6B1ZH4HdNic#scrollTo=yw7P4wTBSA6g

Workshop on Machine Translation Shared Task on Sign Language Translation (WMT-SLT '23) Data

The second edition of the WMT shared task on sign language translation.

https://www.wmt-slt.com

Novel task requiring the processing of visual information (video frames, human pose estimation) beyond the standard paradigm of text-to-text machine translation with training data for Swiss German Sign Language (DSGS) / German (DE), French Sign Language (LSF) / French, and Italian Sign Language (LIS) / Italian.

Source code

Usage examples, documentation, and code available on WMT-SLT.

Data

Training data

Zifan Jiang, Mathias Müller, Sarah Ebling, Amit Moryossef, Robin Ribback (2023)

Available upon request on SwissUbase.

Further information on https://www.wmt-slt.com/data.

Workshop on Machine Translation Shared Task on Sign Language Translation (WMT-SLT '22) Data

https://www.wmt-slt.com/

Novel task requiring the processing of visual information (video frames, human pose estimation) beyond the standard paradigm of text-to-text machine translation with training data for Swiss German Sign Language (DSGS) and German (DE).

Source code

Usage examples, documentation, and code available on WMT-SLT and GitHub.

Data

Training data

Mathias Müller, Sarah Ebling, Necati Cihan Camgöz, Zifan Jiang, Alessia Battisti, Amit Moryossef, Annette Rios, Richard Bowden, Ryan Wong

Available upon request on Zenodo.

Videos and subtitles

Mathias Müller, Sarah Ebling, Necati Cihan Camgöz, Zifan Jiang, Alessia Battisti, Katja Tissi, Sandra Sidler-Miserez, Regula Perrollaz, Michèle Berger, Sabine Reinhard, Amit Moryossef, Annette Rios, Richard Bowden, Ryan Wong, Robin Ribback, Severine Schori

Available upon request on Zenodo.

Poses

Mathias Müller, Sarah Ebling, Necati Cihan Camgöz, Zifan Jiang, Alessia Battisti, Katja Tissi, Sandra Sidler-Miserez, Regula Perrollaz, Michèle Berger, Sabine Reinhard, Amit Moryossef, Annette Rios, Richard Bowden, Ryan Wong

Available upon request on Zenodo.

Okra: Mobile App for Conducting Reading Comprehension Experiments

Andreas Säuberli

Mobile (Android/iOS) app enabling participation in cloze tests, lexical decision tasks, multiple-choice reading comprehension, n-back working memory tasks, picture naming, and reaction time tests in English, German, French, and Italian.

Source code

Available on GitHub.

Paper

Enabling text comprehensibility assessment for people with intellectual disabilities using a mobile application (2023)

Sentence Alignments from the Austria Press Agency Corpus

Nicolas Spring, Annette Rios, Sarah Ebling

Alignments extracted with LHA (Nikolov and Hahnloser, 2019) for CEFR levels A2 and B1 to the original standard German text from Austria Press Agency (Austria Presse Agentur, APA) news items between August 2018 and April 2021.

Data

Available upon request on Zenodo.

Paper

Exploring German Multi-Level Text Simplification (2021)

Source code

Available on GitHub.

Corpus for Automatic Readability Assessment and Text Simplification of German

Alessia Battisti, Dominik Pfütze, Andreas Säuberli, Marek Kostrzewa, Sarah Ebling

Corpus of parallel and monolingual-only (simplified German) data compiled from web sources containing additional information on text structure, typography, and images.

Paper

A Corpus for Automatic Readability Assessment and Text Simplification of German (2020)

Source code

Available upon request on Zenodo.

SATEF: Sentence Alignment Tools Evaluation Framework

Marek Kostrzewa

Source code

Available on GitHub.

SMILE Swiss German Sign Language Dataset

Sarah Ebling, Necati Cihan Camgöz, Penny Boyes Braem, Katja Tissi, Sandra Sidler-Miserez, Stephanie Stoll, Simon Hadfield, Tobias Haug, Richard Bowden, Sandrine Tornay, Marzieh Razavi, Mathew Magimai Doss

Large-scale dataset containing videotaped repeated productions of 100 items of a vocabulary test with associated transcriptions and annotations, consisting of data from 11 adult L1 signers and 19 adult L2 learners of DSGS.

Paper

SMILE Swiss German Sign Language Dataset (2018)

Source code

Available upon request on Zenodo.

“20 Minuten” Data for Document-level Text Simplification

Annette Rios, Nicolas Spring, Tannon Kew, Marek Kostrzewa, Andreas Säuberli, Mathias Müller, Sarah Ebling

Dataset of full articles (in German) from the Swiss news magazine '20 Minuten' paired with simplified summaries.

Data

Scripts and instructions for downloading the data available on GitHub.

Paper

20 Minuten: A Multi-task News Summarisation Dataset for German (2023)

A New Dataset and Efficient Baselines for Document-level Text Simplification in German (2021)

Department of Computational Linguistics Language, Technology and Accessibility

Quicklinks und Sprachwechsel

Main navigation

Datasets and Resources

Sign Language Processing Demo on Google Colab

Workshop on Machine Translation Shared Task on Sign Language Translation (WMT-SLT '23) Data

Source code

Data

Training data

Workshop on Machine Translation Shared Task on Sign Language Translation (WMT-SLT '22) Data

Source code

Data

Training data

Videos and subtitles

Poses

Okra: Mobile App for Conducting Reading Comprehension Experiments

Source code

Paper

Sentence Alignments from the Austria Press Agency Corpus

Data

Paper

Source code

Corpus for Automatic Readability Assessment and Text Simplification of German

Paper

Source code

SATEF: Sentence Alignment Tools Evaluation Framework

Source code

SMILE Swiss German Sign Language Dataset

Paper

Source code

“20 Minuten” Data for Document-level Text Simplification

Data

Paper