Header

Search

Linguistic data annotation in dialectal texts using LLMs

Supervisor: Jan Brasser

Summary

Automatically annotating dialectal texts with linguistic information (e.g. PoS tagging) poses a challenge when the text is written in dialect (such as Swiss German) without a standard orthography, since traditional methods are not equipped to deal with this kind of input. 

Recently, LLMs have improved drastically at understanding dialectal writing. This indicates that they could potentially be used for the creation of linguistically annotated corpora of dialectal writing. 

The students investigate how LLMs could be leveraged for dialectal text annotation. 

Requirements

Python, PCL2