Natural Language Processing for Low-Resource Language Variations



Goal: The goal of the LORELAI project (LOw-REsource natural LAnguage processIng) is to develop methods and architectures that advance natural language processing (NLP) for low-resource languages.

Funding: This project is funded by the Swiss National Science Foundation (Doc.CH grant) and runs 2020 - 2024.

ResearchersNoëmi Aepli's PhD project is supervised by Rico Sennrich and co-supervised by  Yves Scherrer (University of Helsinki)

Abstract: Within my research project I am working on NLP for low-resource language varieties. The project addresses several issues with which state-of-the-art NLP systems struggle when dealing with any other than the 23 standard languages (such as English, Chinese, and Spanish). For most of the ~7000 known languages on our planet, the available data does not suffice to create NLP systems. The original purpose of working on NLP problems was to build systems which would break down language barriers and enable people to access important information written in a different language. This is especially important for regions where minority languages are spoken primarily. Hence, one goal is to develop more data-efficient methods which can cope with less data. Furthermore, non-standard languages feature high variability, posing problems for any system based on statistics. Thus, reducing this variability is essential to reduce the sparsity issues. We plan to solve this by finding a normalized representation for dialectal variations.