Time & Location: every 2 weeks on Tuesdays from 10:15 am to 12:00 pm in room AND 3.46.
Please note that the room has changed from previous semesters.
Online participation via the MS Teams Team CL Colloquium is also possible.
The creation of suitable reading comprehension materials for reading skill assessment tests in various frameworks (such as SAT, PISA, TOEFL or our own group's MultiplEye project) is both time and resource intensive. It often involves multiple iterations of question creation, quality control and field testing to get to a satisfactory result. In this talk, on the one hand, I will explore some ways in which LLMs could be used in the creation and evaluation process of reading comprehension questions in the context of reading skill assessment tests. On the other hand, I will outline what a model specialized for this task could look like, and how it could be integrated in the reading comprehension question creation process, in order to speed up the process and reduce its resource intensity.
The MultiplEYE COST action has officially started about a year ago. Its goal is to enable a multilingual eye-tracking while reading data collection that can be used as a basis to study human language processing within psycholinguistic research or the evaluation and improvement of machine language processing within machine learning research. I will present intermediate results and challenges of the action. Those range from the creation of a parallel stimulus corpus of more than 20 languages including carefully created comprehension questions to the development of software tools for the experimental presentation and the preprocessing pipeline of the eye-tracking data. One of the big challenges is the coordination between languages and labs mostly across Europe but also internationally.
It is important to investigate the behavior of the machine translation (MT) metrics when facing different error types, particularly accuracy errors, as those can lead to dangerous outcomes e.g. in legal or medical contexts. Last year, we developed ACES, a Translation Accuracy Challenge Set with 68 phenomena, varying from simple word/character-level perturbations to more complex errors. In this talk, I will outline how the scores assigned by a wide range of MT metrics when evaluated on ACES can be standardized onto a common scale. By doing so, we can compare the scores given to both correct and incorrect translations, revealing the metrics' sensitivity to various types of accuracy errors. As part of our research into the behavior of MT metrics across various phenomena, I will also discuss our work on annotating error spans in ACES. These spans can be used to develop more interpretable MT metrics that predict error spans rather than a single sentence-level score.
Preclinical neuroscientists currently lack a comprehensive resource that can assist them in making well-informed decisions when planning animal experiments. The drug development process is hindered as considerable amount of potentially relevant evidence is scattered throughout the literature without systematic curation. The goal of my PhD project is to provide scientists with a centralized access to this information, enabling them to optimize their research planning and ultimately reduce the number of experimental animals needed. In this talk, I will outline the methods that we are planning to use to achieve this objective. Furthermore, I will present an ongoing work for Named Entity Recognition of drug and disease names from clinical trial registries.
LLMs have been shown to be extremely useful generic assistants and can be applied directly to an astonishing array of tasks. But in an enterprise setting, one of the most important and useful tasks is to answer questions grounded in company-specific knowledge. This requires for LLMs to be customised or updated with external knowledge, while still retaining the useful general-purpose abilities that we've come to appreciate from applications like ChatGPT. Furthermore, existing open-weight LLMs are predominantly trained in English and their multilingual abilities largely under-explored, or at least undocumented. In this talk, I will present results from a recent internship project, in which we investigated custom "chat" LLMs that could potentially serve German-speaking companies in a Swiss setting. Specifically, I will discuss ways of integrating company-specific knowledge and how we can adapt publicly available, English-centric LLMs for a German-language use case.
Mouse Tracking for Reading (MoTR) is a new naturalistic incremental processing measurement tool that simulates eye-tracking. MoTR runs in the browser, enabling cheaper data collection, and collection in places where no eye-tracking equipment is available. In a MoTR trial participants are presented with text that is blurred, except for a small in-focus region around the tip of the mouse. Participants move the mouse over the text, bringing individual words into focus in order to read. Mouse movement is recorded, and can be analyzed similarly to eye-tracking data. We implement MoTR experiments in Magpie and validate it in two suites of experiments. First, we record MoTR data for the Provo Corpus, for which eye-tracking data exists. We find strong correlations between eye-tracking and MoTR reading times (RTs) from 0.67-0.78. In an analysis similar to Smith and Levy (2013), we find a linear effect on MoTR RTs of by-word surprisal (estimated from GPT-2). Second, we conduct a cross-methodological replication of three experiments from Witzel et al., 2012 and Boyce et al., 2020 that test preference for high vs. low attachment. MoTR RTs replicate previous self-paced reading results and novelly reveal how regressions are implicated in the processing of these phenomena.
The purpose of this talk is to present the SNF project ProPoSaL (Prototypes and Parts-of-Speech across Languages) that I have been part of since joining this department. In brief, the goal of the project is to investigate the existence of adjectives as a prototypical category across languages from the perspective of NLP, neurolinguistics as well as theoretical lingustics. As far as the NLP portion of the project is concerned, the broad idea is to analyze word embeddings in order to verify whether the (non-)existence of adjectives in a certain language, as hypothesized by theoretical lingustics, manifests itself in a given model. I will first elaborate on the project before proceeding to illustrate the part pertaining to NLP and the relevant methods, which for my role in the project include topological data analysis (TDA). Namely, I will give a hands-on overview of the main concepts and tools of TDA that I am using in the project.