Navigation auf uzh.ch

Suche

Institut für Computerlinguistik

Text Technology/Digital Linguistics colloquium FS 2023

Colloquium FS 2023: Reports from current research at the institute, bachelor and master theses, programming projects, guest lectures

Time & Location: every 2-3 weeks on Tuesdays from 10:15 am to 12:00 pm, BIN-2.A.10 (Karte)

Online participation via the MS Teams Team CL Colloquium is also possible.

Colloquium Schedule

Date

Speaker

Topic
Tuesday, 07.03.2023 Jannis Vamvas Challenges in Language-adaptive Pre-training
   
Tuesday, 21.03.2023

Amit Moryossef

Real-time Multilingual Sign Language Processing

Zifan Jiang
Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Tuesday, 04.04.2023

Omar Sanseviero

Building ML in an Open and Collaborative Way with the Hugging Face ecosystem

 
Tuesday, 02.05.2023

Chiara Tschirner

"Lesen im Blick": Developing an Eyetracking-Based Screening for Dyslexia: Current stage of the project and changes in experiment design

Chantal Amrhein Gender-fair Rewriting
Tuesday, 16.05.2023

Janis Goldzycher

Natural Language Inference for Hate Speech Detection
Alessia Battisti Analyzing L1 and L2 Person Identification from Pose Estimates in Swiss German Sign Language
Tuesday, 30.05.2023

Lena Bolliger

Synthesizing scanpaths on texts

Noëmi Aepli Evaluation for dialect generation

Abstracts

 

Jannis Vamvas: Challenges in Language-adaptive Pre-training.

The multilingual language situation of Switzerland calls for multilingual NLP tools. We present an ongoing project that involves training a masked language model in all the national languages of Switzerland. A main challenge of the project is learning multilingual representations with a highly imbalanced training corpus, with many more texts in German or French than in Italian or Romansh. Another challenge is building on existing pre-trained models in a modular way. We adopt a recently proposed approach involving language adapters and show that the resulting model performs well in tasks such as Named Entity Recognition or German–Romansh alignment. We also highlight some limitations of our approach.

Amit Moryossef: Real-time Multilingual Sign Language Processing

Zifan Jiang: Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Omar Sanseviero: Building ML in an Open and Collaborative Way with the Hugging Face ecosystem

Bio. Omar Sanseviero is a lead machine learning engineer at Hugging Face, where he works at the intersection of open source, community, and product. Omar leads multiple ML teams that work on topics such as ML for Art, Developer Advocacy Engineering, ML Partnerships, Mobile ML, and ML for Healthcare. Previously, Omar worked at Google on Google Assistant and TensorFlow.

Chiara Tschirner: Developing an Eyetracking-Based Screening for Dyslexia: Current stage of the project and changes in experiment design

Chantal Amrhein: Gender-fair Rewriting

Natural language generation models reproduce and often amplify the biases present in their training data. Previous research explored using sequence-to-sequence rewriting models to transform biased model outputs (or original texts) into more gender-fair language by creating pseudo training data through linguistic rules. However, this approach is not practical for languages with more complex morphology than English. In this presentation, I argue that creating training data in the reverse direction, i.e. starting from gender-fair text is easier for morphologically complex languages and show that this matches the performance of state-of-the-art rewriting models for English. To eliminate the rule-based nature of data creation, I present an approach that uses machine translation models to create gender-biased text from real gender-fair text via round-trip translation. Our approach allowed us to train a rewriting model for German without the need for elaborate handcrafted rules. The outputs of this model were preferred over biased text in a human evaluation campaign.

Janis Goldzycher: Natural Language Inference for Hate Speech Detection

Alessia Battisti: Analyzing L1 and L2 Person Identification from Pose Estimates in Swiss German Sign Language

After a brief update on the SMILE-II project, I present a study on sign language user identification from pose estimation. Data from the SMILE-II project were also used for this study. In recent years, there has been growing interest in automatic sign language processing (SLP) because of its potential to greatly impact access to information and communication for sign language users. Anonymizing sign language data is not straightforward, as masking, for example, parts of a signer’s face means omitting important information from a sign utterance. For some time, it was anecdotally assumed that the output of skeletal pose estimation techniques applied to sign language yielded anonymized data. This assumption of anonymity has been challenged in recent work. The study I present extends this work by investigating the anonymity of skeletal representations.

Lena Bolliger: Synthesizing scanpaths on texts

Noëmi Aepli: Evaluation for dialect generation