Kolloquiumsplan FS 2020

Kolloquium FS 2020: Berichte aus der aktuellen Forschung am Institut, Bachelor- und Master-Arbeiten, Programmierprojekte, Gastvorträge

Zeit & Ort: alle 14 Tage dienstags von 10.15 Uhr bis 12.00 Uhr, BIN-2.A.10 (Karte)

Verantwortlich: Sarah Ebling


Vortragende & Thema



***CANCELLED*** Jan Niehues, University of Maastricht: "(Simultaneous) Speech Translation: Challenges and Techniques"



Austauschtreffen "Phonetics and Speech Sciences" und Computerlinguistik:

  • Elisa Pellegrino, Thayabaran Kathiresan, Sarah Ebling: "The influence of speech therapy on the vocal accuracy of the hearing impaired: human and machine processing"
  • Marie-Anne Morand, Noëmi Aepli, Nathalie Giroud: "Swiss German & Multimodal Dialect Recognition"



François Portet, Grenoble Institute of Technology: "Towards End-to-End Spoken Language Understanding: Application to Voice Command in Smart Homes"


  • Johannes Graën, University of Gothenburg & Pompeu Fabra University (Barcelona): "Using Corpora for Language Learning"
  • Markus Göckeritz: "Eager Machine Translation"


Mathias Müller: "Distill Noisy Parallel Corpora instead of Filtering them"

Learning translation systems requires parallel data sets. Data crawled from the web is usually big enough, but neural machine translation is susceptible to various types of noise found in web-crawled parallel corpora. Therefore, corpus filtering has become a standard procedure in machine translation. The two most popular approaches to corpus filtering currently are Dual Conditional Cross-entropy Filtering and Neighbor Search with multilingual sentence embeddings. I propose to use neither of them, and instead distill the source side of the noisy corpus with a system trained on clean, trustworthy data.


Jan Niehues, University of Maastricht: "(Simultaneous) Speech Translation: Challenges and Techniques"

In today’s globalized world, we have the opportunity to communicate with people all over the world. However, often the language barrier still poses a challenge and prevents communication. Machines that automatically translate the speech from one language into another one are a long dream of humankind.
In this presentation, we will start with an overview on the different uses cases and difficulties of speech translation. We will continue with a review of state-of-the-art methods to build speech translation system. We will start with reviewing the translation approach of spoken language translation, a cascade of an automatic speech recognition system and a machine translation system. We will highlight the challenges when combining both systems. Secondly, we will discuss end-to-end speech translation, which attracted a rising research interest with the success of neural models in both areas. In the final part of the lecture, we will highlight several challenges of simultaneous speech translation: Latency, sentence segmentation and stream decoding and present techniques that address these challenges.


Chantal Amrhein: "Why There Is More to Explore in Multilingual Machine Translation"

Multilingual machine translation has become ever more popular over the last few years. A single model can be used to translate between multiple language pairs which facilitates deploying and maintaining machine translation systems. Recent advances in NMT allow training massively multilingual models that support more than 100 languages. While this is desirable in many aspects, multilingual machine translation comes with its own challenges. In this talk, I will present my current research on two open questions: How can a pretrained multilingual NMT model be extended to new languages that use a different script? And how can the target language generation in zero-shot translation be improved, i.e. that the model produces the correct target language when translating between language pairs that have not been seen during training?


Jannis Vamvas: "Towards Zero-Shot Transfer to Languages with Minimal Monolingual Data"

Cross-lingual transfer has been greatly advanced by multilingual language model pre-training. But there are many languages for which large-scale pre-training has never been done because too little data are available – even monolingual data. We are currently studying the transfer of an English parser into Amharic, Bambara, Maltese and Wolof to test the limits of multilingual pre-training. In our talk we discuss preliminary findings which indicate that a subword vocabulary shared across all languages is still the best approach to such a challenge. Furthermore, a multilingually pre-trained model seems to provide a good basis for transfer even if it has never seen the target language in pre-training.


Elisa Pellegrino, Thayabaran Kathiresan, Sarah Ebling: "The influence of speech therapy on the vocal accuracy of the hearing impaired: human and machine processing"

Congenital deafness and hearing loss drastically affect people’s social interactions and quality of life. When it comes to deaf or hard-of-hearing younger children, this is even more dramatic as the normal development of language skills and the overall pattern of speech production are typically disrupted. Logopaedic interventions usually focus on the articulation of individual sounds, words or short utterances to improve the vocal accuracy and speech intelligibility of the deaf/hearing impaired.
The aim of this collaborative project between Speech and Text Scientists at Zurich University and the Zentrum für Gehör und Sprache Zürich is to test the effect of speech therapy interventions on the acoustics of Swiss German vowels (F1-F2) produced by hearing impaired children or teenagers, compared to gender and age-matched normal hearing peers. Vowel perception experiments with human listeners (e.g. two/four alternative forced choice, AX/AXB discrimination test) and computer (e.g. vowel classification), as well as speech-to-text conversion experiments on vowel stimuli recorded before the start, during and at the end of logopaedic sessions will inform about the participants progress towards vocal accuracy.


Marie-Anne Morand, Noëmi Aepli, Nathalie Giroud: "Swiss German & Multimodal Dialect Recognition"

Joining forces of phonetics, neuroscience and text processing at our institute, we're interested in making use of multimodal data of Swiss German dialects. We're presenting initial ideas and two data sets: 1) salient attributes in text vs. speech vs. brain signals 2) analysis of text & speech data of a small dialect.


Lukas Meier: "The environmental discourse from a computational linguistic point of view"

Diese Masterarbeit befasst sich mit dem Umweltdiskurs im Deutschen Bundestag. Ziel war es, mittels dreier computerlinguistischer Verfahren – der Analyse von Key- words und Kollokationen sowie Topic Modelling – den Umweltdiskurs insbesondere auch im zeitlichen Verlauf zu untersuchen. Zu diesem Zweck wurden die Plenarprotokolle von Ende 1998 bis Juli 2019 ausgewertet. Die in den Plenarprotokollen festgehaltenen Reden wurden dazu in einem Korpus zusammengefasst und mittels eines Seed-Word-basierten Ansatzes in ein thematisches Primärkorpus und ein Referenzkorpus aufgeteilt. Anschliessend wurden die Abschriften ausgewertet: Dabei hat sich gezeigt, dass die Schwerpunkte der Diskussion während des Untersuchungszeitraums auf Energie- und Klimafragen lagen, wobei auch zeitlich begrenzte Subdiskurse identifiziert werden konnten. Mittels Topic Modelling konnten schliesslich die Höhe- und Tiefpunkte von einigen Diskursen zeitlich genauer verortet werden.


François Portet, "Towards End-to-End Spoken Language Understanding: Application to Voice Command in Smart Homes"

In this talk, I will give an overview of an approach for voice command in smart home focusing on the Spoken Language Understanding part. I will start by introducing the domain of Spoken Language Understanding with a focus on the latest developments brought by the Deep Learning approach and its development in the French Language. I will then introduce the research carried out in the nationally funded Vocadom project (http://vocadom.imag.fr) whose aim is to build a distant voice controlled home automation system that adapts to the user and that can be used in real conditions (noise, presence of several people). In particular, how the voice command has been conceived to fit the needs of older adults and people with sight impairment will be detailed. The talk will then present some recent SLU experiments for voice command comparing a state-of-the-art pipeline approach and a Deep network based End-to-End model and will provide some analyses of the End-to-End model behavior. Finally, I will discuss the challenges that are still to be addressed in this domain.


François Portet (Ph.D’05) is Maître de Conferences (Associate Professor) at UGA. Previously, he was a postdoctoral researcher at Aberdeen University, Scotland (2006-2008) and a visiting researcher at Mobile Computing Laboratory, Osaka University, Japan (2019). His research interests lie at the cross-roads of NLP, Speech Processing and Ambient Intelligence. He has published about 120 peer-reviewed papers in these areas and is currently the coordinator of the nationally funded Vocadom project (http://vocadom.imag.fr).


Markus Göckeritz: "Eager Machine Translation"

Eager machine translation is a recent approach to simultaneous machine translation introduced by [Press and Smith, 2018]. To improve translation quality, [Press and Smith, 2018] use beam search. With beam search, however, the final translation can only be returned once the entire source sentence was processed. This effectively makes eager translation a non-simultaneous process.
We applied sequence-level knowledge distillation to the eager translation model and show that we can eliminate the need for beam search for eager translation. We furthermore report a significant improvement in both translation quality and translation speed. Our model outperforms the original eager translation model by more than 2 BLEU and translates at almost twice the speed.
We confirm that the distilled data set is more deterministic, more parallel, and more monotonous than the original training data, but show that the increase in determinism, parallelism, and monotonicity in the training data does not fully explain the superior translation quality.