Motivated by research questions from the humanities, automatic text processing of historical texts is an emerging subfield of the NLP, dealing with a set of unique problems.
The main issues of the automatic processing of historical texts are:
The Department of Computational Linguistics together with the Swiss Law Sources Foundation (SLSF) address the above mentioned problems investigating the methods for extraction of temporal expressions from historical texts.
Purpose: to enrich the database of historical names and places of Switzerland (based on the texts of the SLSF) with additional temporal information extracted from the source texts.
Being a research institution that publishes sources of old law up to 1798, the Swiss Law Sources Foundation provided us with about 30 volumes of digitized texts in German, French, Italian, Romansh or Latin depending on the canton of origin and creation time.
The project was funded by the Swiss Law Sources Foundation. It started in the beginning of the 2014 and was finished in 2019.
For the experiments in this projects, we created a Gold Standard of temporal annotations. The corpus contains 50 historical legal articles in Early New High German. It was annotated in a subset of the TimeML markup language for temporal annotation. The corpus contains about 34,000 tokens and is available here.