Kolloquiumsplan FS 2020

Kolloquium FS 2020: Berichte aus der aktuellen Forschung am Institut, Bachelor- und Master-Arbeiten, Programmierprojekte, Gastvorträge

Zeit & Ort: alle 14 Tage dienstags von 10.15 Uhr bis 12.00 Uhr, BIN-2.A.10 (Karte)

Verantwortlich: Sarah Ebling


Vortragende & Thema



***CANCELLED*** Jan Niehues, University of Maastricht: "(Simultaneous) Speech Translation: Challenges and Techniques"



Austauschtreffen "Phonetics and Speech Sciences" und Computerlinguistik:

  • Elisa Pellegrino, Thayabaran Kathiresan, Sarah Ebling: "The influence of speech therapy on the vocal accuracy of the hearing impaired: human and machine processing"
  • Marie-Anne Morand, Noëmi Aepli, Nathalie Giroud: "Swiss German & Multimodal Dialect Recognition"


  • Phillip Ströbel: "A Transfer Learning Approach for Clustering Articles in Historical Periodicals
  • Lukas Meier


François Portet, Grenoble Institute of Technology: "Towards End-to-End Spoken Language Understanding: Application to Voice Command in Smart Homes"


  • Johannes Graën, University of Gothenburg & Pompeu Fabra University (Barcelona): "Using Corpora for Language Learning"
  • Markus Göckeritz: "Eager Machine Translation"


Mathias Müller: "Distill Noisy Parallel Corpora instead of Filtering them"

Learning translation systems requires parallel data sets. Data crawled from the web is usually big enough, but neural machine translation is susceptible to various types of noise found in web-crawled parallel corpora. Therefore, corpus filtering has become a standard procedure in machine translation. The two most popular approaches to corpus filtering currently are Dual Conditional Cross-entropy Filtering and Neighbor Search with multilingual sentence embeddings. I propose to use neither of them, and instead distill the source side of the noisy corpus with a system trained on clean, trustworthy data.


Jan Niehues, University of Maastricht: "(Simultaneous) Speech Translation: Challenges and Techniques"

In today’s globalized world, we have the opportunity to communicate with people all over the world. However, often the language barrier still poses a challenge and prevents communication. Machines that automatically translate the speech from one language into another one are a long dream of humankind.
In this presentation, we will start with an overview on the different uses cases and difficulties of speech translation. We will continue with a review of state-of-the-art methods to build speech translation system. We will start with reviewing the translation approach of spoken language translation, a cascade of an automatic speech recognition system and a machine translation system. We will highlight the challenges when combining both systems. Secondly, we will discuss end-to-end speech translation, which attracted a rising research interest with the success of neural models in both areas. In the final part of the lecture, we will highlight several challenges of simultaneous speech translation: Latency, sentence segmentation and stream decoding and present techniques that address these challenges.


Chantal Amrhein: "Why There Is More to Explore in Multilingual Machine Translation"

Multilingual machine translation has become ever more popular over the last few years. A single model can be used to translate between multiple language pairs which facilitates deploying and maintaining machine translation systems. Recent advances in NMT allow training massively multilingual models that support more than 100 languages. While this is desirable in many aspects, multilingual machine translation comes with its own challenges. In this talk, I will present my current research on two open questions: How can a pretrained multilingual NMT model be extended to new languages that use a different script? And how can the target language generation in zero-shot translation be improved, i.e. that the model produces the correct target language when translating between language pairs that have not been seen during training?


Jannis Vamvas: "Towards Zero-Shot Transfer to Languages with Minimal Monolingual Data"

Cross-lingual transfer has been greatly advanced by multilingual language model pre-training. But there are many languages for which large-scale pre-training has never been done because too little data are available – even monolingual data. We are currently studying the transfer of an English parser into Amharic, Bambara, Maltese and Wolof to test the limits of multilingual pre-training. In our talk we discuss preliminary findings which indicate that a subword vocabulary shared across all languages is still the best approach to such a challenge. Furthermore, a multilingually pre-trained model seems to provide a good basis for transfer even if it has never seen the target language in pre-training.


Sandra Schwab, Gerold Schneider: "Learner language"

We aim to develop an assessment tool to evaluate learner language in terms of fluency, pronunciation, lexicon, grammar, and vocabulary size. For that, we use an oral corpus that was created at the ELCF(University of Geneva). The corpus comprises picture descriptions in French produced by 11 Russian learners of French (B1-B2) and by 11 native speakers of French. All productions were first manually transcribed. Then, we annotated and coded, for the non-native productions only, the errors of pronunciation, grammar and lexicon, and we (automatically) calculated speech rate. The next step is to automatically evaluate the vocabulary size based on the transcripts of the productions. Finally, we will correlate the learners' different types of errors (pronunciation, lexicon, grammar) with vocabulary size and fluency to determine which features are the most representative of the learners' level.


Elisa Pellegrino, Thayabaran Kathiresan, Sarah Ebling: "The influence of speech therapy on the vocal accuracy of the hearing impaired: human and machine processing"

Congenital deafness and hearing loss drastically affect people’s social interactions and quality of life. When it comes to deaf or hard-of-hearing younger children, this is even more dramatic as the normal development of language skills and the overall pattern of speech production are typically disrupted. Logopaedic interventions usually focus on the articulation of individual sounds, words or short utterances to improve the vocal accuracy and speech intelligibility of the deaf/hearing impaired.
The aim of this collaborative project between Speech and Text Scientists at Zurich University and the Zentrum für Gehör und Sprache Zürich is to test the effect of speech therapy interventions on the acoustics of Swiss German vowels (F1-F2) produced by hearing impaired children or teenagers, compared to gender and age-matched normal hearing peers. Vowel perception experiments with human listeners (e.g. two/four alternative forced choice, AX/AXB discrimination test) and computer (e.g. vowel classification), as well as speech-to-text conversion experiments on vowel stimuli recorded before the start, during and at the end of logopaedic sessions will inform about the participants progress towards vocal accuracy.


Marie-Anne Morand, Noëmi Aepli, Nathalie Giroud: "Swiss German & Multimodal Dialect Recognition"

Joining forces of phonetics, neuroscience and text processing at our institute, we're interested in making use of multimodal data of Swiss German dialects. We're presenting initial ideas and two data sets: 1) salient attributes in text vs. speech vs. brain signals 2) analysis of text & speech data of a small dialect.


Phillip Ströbel: "A Transfer Learning Approach for Clustering Articles in Historical Periodicals"

The ​e-periodica is a collection of periodicals from Switzerland, covering a wide range of topics and a time period of over 200 years. During the digitalisation process, each periodical was assigned one or several generic labels (​Dewey Decimal Classification​), which can be used to filter results from a keyword search. For example, a user can filter the search results for the German word ​Schule (en ​school​), for broad categories like “History & geography”, or “Education”. Finer-grained filtering, however, is not possible. The ​Schweizer Wirtschaftsarchiv (SVA), on the other hand, has collected about 500k news articles about the economy and economy-related topics and labeled these texts according to the Standard-Thesaurus Wirtschaft​, i.e., every article received one or more detailed categories, like “Textilindustrie → Bekleidung → Distanzhandel” (going from general to more and more specific). The aim is to transfer the labels from the newspaper articles from the SVA to the periodicals of e-periodica, thus allowing for more precise filtering. This experiment examines how information from a neural classifier trained on the articles from the SVA can be used to cluster articles from periodicals in e-periodica. It showcases differences of a transfer learning approach to traditional clustering methods and topic modeling.


François Portet, "Towards End-to-End Spoken Language Understanding: Application to Voice Command in Smart Homes"

In this talk, I will give an overview of an approach for voice command in smart home focusing on the Spoken Language Understanding part. I will start by introducing the domain of Spoken Language Understanding with a focus on the latest developments brought by the Deep Learning approach and its development in the French Language. I will then introduce the research carried out in the nationally funded Vocadom project (http://vocadom.imag.fr) whose aim is to build a distant voice controlled home automation system that adapts to the user and that can be used in real conditions (noise, presence of several people). In particular, how the voice command has been conceived to fit the needs of older adults and people with sight impairment will be detailed. The talk will then present some recent SLU experiments for voice command comparing a state-of-the-art pipeline approach and a Deep network based End-to-End model and will provide some analyses of the End-to-End model behavior. Finally, I will discuss the challenges that are still to be addressed in this domain.


François Portet (Ph.D’05) is Maître de Conferences (Associate Professor) at UGA. Previously, he was a postdoctoral researcher at Aberdeen University, Scotland (2006-2008) and a visiting researcher at Mobile Computing Laboratory, Osaka University, Japan (2019). His research interests lie at the cross-roads of NLP, Speech Processing and Ambient Intelligence. He has published about 120 peer-reviewed papers in these areas and is currently the coordinator of the nationally funded Vocadom project (http://vocadom.imag.fr).


Markus Göckeritz: "Eager Machine Translation"

Eager machine translation is a recent approach to simultaneous machine translation introduced by [Press and Smith, 2018]. To improve translation quality, [Press and Smith, 2018] use beam search. With beam search, however, the final translation can only be returned once the entire source sentence was processed. This effectively makes eager translation a non-simultaneous process.
We applied sequence-level knowledge distillation to the eager translation model and show that we can eliminate the need for beam search for eager translation. We furthermore report a significant improvement in both translation quality and translation speed. Our model outperforms the original eager translation model by more than 2 BLEU and translates at almost twice the speed.
We confirm that the distilled data set is more deterministic, more parallel, and more monotonous than the original training data, but show that the increase in determinism, parallelism, and monotonicity in the training data does not fully explain the superior translation quality.