Machine Translation for Downstream NLP Applications

Supervisors: Chantal Amrhein (and Ann-Sophie Gnehm for data from SMM)

Introduction

The Swiss Job Market Monitor is extracting information from job advertisements to monitor current developments on the job market. Since most of the data and resources for the information extraction are in German, it would be desirable to translate jobs opening texts from English, French and Italian to German before processing them.

Depending on the outcome of the project, it is possible to extend the project as a BA or MA thesis where you would build a domain-specific translation system to further improve the performance on the downstream tasks.

Aim and Purpose

The goal of this programming project is to test if the output of commercial MT systems can be used in various downstream NLP applications. For this programming project, you would need to:

  • evaluate the usefulness of existing commercial translation systems: To this end, you will integrate automatically translated job opening texts into our downstream information extraction pipeline and check accuracy of extraction.
  • analyse what the most common problems are with commercial translation systems for this domain. 
  • test if simple adaptations of the translation output improve the performance on the downstream tasks.
  • potentially integrate translation into the current work flow for information extraction.


For this project, we provide a multilingual corpus of job opening texts, a subset of which are already machine- or human-translated, as well as domain-specific multilingual gazetteers.

Requirements

  • knowledge of machine translation (e.g. taken the course "Machine Translation" or "Advanced Techniques of Machine Translation")
  • programming skills in Python