Kolloquiumsplan HS 2019

Kolloquium HS 2019: Berichte aus der aktuellen Forschung am Institut, Bachelor- und Master-Arbeiten, Programmierprojekte, Gastvorträge

Zeit & Ort: alle 14 Tage dienstags von 10.15 Uhr bis 12.00 Uhr, BIN-2.A.10 (Karte)

Verantwortlich: Simon Clematide


Vortragende & Thema


Lenz Furrer (PhD CL UZH): Sequence Tagging for Concept Recognition 

Peter Makarov (PhD CL UZH):  Semi-supervised Historical Text Normalization

Mo/Di 7./8.10. im KOL G 217

Diverse Berufungsvorträge zur neuen Professur Digitale Sprachwissenschaft am Institut für Computerlinguistik:


Janis Goldzycher (BA CL UZH): Taxonomy Learning without Labeled Data: Building on TaxoGen

Ximena Gutierrez-Vasques (PhD UNAM Mexico): Measuring Language Complexity


Tatyana Ruzsics (PhD UZH): Multilevel Text Normalization with Sequence-to-Sequence Networks and Multisource Learning

Nikola Nikolov (PhD ETH/UZH): Abstractive Document Summarization without Parallel Data


Felix Morger (Språkbanken, University of Gothenburg, Sweden): A Review of Machine Learning Interpretability in Natural Language Processing

Jan Deriu (PhD UZH/ZHAW): A Benchmark for Lifelong Machine Learning for Question Answering over Structured Data

Di 26.11. 

Lonneke van der Plas  web (University of Malta): Analysing compounds and predicting their emergence over time

Do 28.11. 17:15h

Raphael Winkelmann/Christoph Draxler: BAS Tools for the Processing of Spoken Language (ifi colloquium)

Fabio Rinaldi (UZH): 15 Years of Biomedical Text Mining


Lenz Furrer:  Sequence Tagging for Concept Recognition 

As our submission to the CRAFT shared task 2019, we present two neural approaches to concept recognition. We propose two different systems for joint named entity recognition (NER) and normalization (NEN), both of which model the task as a sequence labeling problem. Our first system is a BiLSTM network with two separate outputs for NER and NEN trained from scratch, whereas the second system is an instance of BioBERT fine-tuned on the concept-recognition task. We exploit two different strategies for extending concept coverage, ontology pretraining and back-off with a dictionary lookup.

Peter Makarov:  Semi-supervised historical text normalization 

Text normalization is the task of mapping non-standard texts (informal, dialectal, historical) into modern standard language. In this talk, I report on ongoing work on semi-supervised training of generative neural models for normalization of historical data. In contrast to most prior work, which treats this problem as character-level transduction of isolated words, we use sentential context to obtain training signal. This leads to considerably more data-efficient training.

Janis GoldzycherTaxonomy Learning without Labeled Data: Building on TaxoGen

Taxonomy learning is of great interest for automated knowledge acquisition since Taxonomies not only are a popular way to represent knowledge, but they also enable deductive reasoning and constitute an important step for ontology learning. Taxonomies are made of hypernym relations. Most current methods need labeled data to extract hypernym-relations. TaxoGen is a method for unsupervised learning of topical taxonomies using distributional semantics and a recursive, adaptive clustering process. I will talk about reimplementing TaxoGen, testing it with different embedding and clustering techniques, and introducing a new label score.

Ximena Gutierrez-Vasques: Measuring Language Complexity

Conceptualizing and quantifying linguistic complexity is not an easy task, many quantitative and qualitative dimensions must be taken into account. In particular, languages of the world have different word production processes. Therefore, the amount of semantic and grammatical information encoded at the word level, may vary significantly from language to language. It is important to quantify this morphological richness of languages and how it varies depending on their linguistic typology. This presentation summarizes some of the approaches presented at the Interactive Workshop on Measuring Language Complexity (IWMLC 2019).

Ximena Gutierrez-Vasques. PhD in Computational linguistics by the National Autonomous University of Mexico (UNAM). She is currently doing a postdoctoral stay at the University of Zurich (URPP Language and Space). Her research interests comprise: NLP for low-resource languages, quantitative linguistics, machine translation.

Lonneke van der Plas: Analysing compounds and predicting their emergence over time

Compounds can be defined as the formation of a new lexeme by adjoining two or more lexemes. The compound word formation process is productive and as a consequence, compounds are a common word type, but many occur with very low token counts. This creates challenges for NLP tools, and it raises questions about the processes that underlie the generation of novel compounds over time.

Lonneke van der Plas is a senior lecturer in Human Language Technology at the University of Malta. Before that, she was a junior professor at the Institute for Natural Language Processing (IMS), University of Stuttgart, where she led a research group in the framework of the SFB collaborative research center 732. She did a post-doc at the University of Geneva and earned her PhD from the University of Groningen.