Navigation auf uzh.ch
Colloquium FS 2023: Reports from current research at the institute, bachelor and master theses, programming projects, guest lectures
Time & Location: every 2-3 weeks on Tuesdays from 10:15 am to 12:00 pm, BIN-2.A.10 (Karte)
Online participation via the MS Teams Team CL Colloquium is also possible.
|Tuesday, 07.03.2023||Jannis Vamvas||Challenges in Language-adaptive Pre-training|
Real-time Multilingual Sign Language Processing
Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting
Building ML in an Open and Collaborative Way with the Hugging Face ecosystem
"Lesen im Blick": Developing an Eyetracking-Based Screening for Dyslexia: Current stage of the project and changes in experiment design
|Chantal Amrhein||Gender-fair Rewriting|
|Natural Language Inference for Hate Speech Detection|
|Alessia Battisti||Analyzing L1 and L2 Person Identification from Pose Estimates in Swiss German Sign Language|
Synthesizing scanpaths on texts
|Noëmi Aepli||Evaluation for dialect generation|
Jannis Vamvas: Challenges in Language-adaptive Pre-training.
The multilingual language situation of Switzerland calls for multilingual NLP tools. We present an ongoing project that involves training a masked language model in all the national languages of Switzerland. A main challenge of the project is learning multilingual representations with a highly imbalanced training corpus, with many more texts in German or French than in Italian or Romansh. Another challenge is building on existing pre-trained models in a modular way. We adopt a recently proposed approach involving language adapters and show that the resulting model performs well in tasks such as Named Entity Recognition or German–Romansh alignment. We also highlight some limitations of our approach.
Amit Moryossef: Real-time Multilingual Sign Language Processing
Zifan Jiang: Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting
Omar Sanseviero: Building ML in an Open and Collaborative Way with the Hugging Face ecosystem
Bio. Omar Sanseviero is a lead machine learning engineer at Hugging Face, where he works at the intersection of open source, community, and product. Omar leads multiple ML teams that work on topics such as ML for Art, Developer Advocacy Engineering, ML Partnerships, Mobile ML, and ML for Healthcare. Previously, Omar worked at Google on Google Assistant and TensorFlow.
Chiara Tschirner: Developing an Eyetracking-Based Screening for Dyslexia: Current stage of the project and changes in experiment design
Chantal Amrhein: Gender-fair Rewriting
Natural language generation models reproduce and often amplify the biases present in their training data. Previous research explored using sequence-to-sequence rewriting models to transform biased model outputs (or original texts) into more gender-fair language by creating pseudo training data through linguistic rules. However, this approach is not practical for languages with more complex morphology than English. In this presentation, I argue that creating training data in the reverse direction, i.e. starting from gender-fair text is easier for morphologically complex languages and show that this matches the performance of state-of-the-art rewriting models for English. To eliminate the rule-based nature of data creation, I present an approach that uses machine translation models to create gender-biased text from real gender-fair text via round-trip translation. Our approach allowed us to train a rewriting model for German without the need for elaborate handcrafted rules. The outputs of this model were preferred over biased text in a human evaluation campaign.
Janis Goldzycher: Natural Language Inference for Hate Speech Detection
Alessia Battisti: Analyzing L1 and L2 Person Identification from Pose Estimates in Swiss German Sign Language
After a brief update on the SMILE-II project, I present a study on sign language user identification from pose estimation. Data from the SMILE-II project were also used for this study. In recent years, there has been growing interest in automatic sign language processing (SLP) because of its potential to greatly impact access to information and communication for sign language users. Anonymizing sign language data is not straightforward, as masking, for example, parts of a signer’s face means omitting important information from a sign utterance. For some time, it was anecdotally assumed that the output of skeletal pose estimation techniques applied to sign language yielded anonymized data. This assumption of anonymity has been challenged in recent work. The study I present extends this work by investigating the anonymity of skeletal representations.
Lena Bolliger: Synthesizing scanpaths on texts
Noëmi Aepli: Evaluation for dialect generation