As of September 2017, I have been collaborating in the impresso project. impresso stands for Integrated Monitoring of Historical Press Corpora. During a project phase of three years, which is financed by a SNSF Sinergia grant, the DHLAB at the EPFL, the C2DH at the University of Luxembourg, and our department will work on text mining of historical newspapers.
My main contribution to this undertaking, which will (hopefully) end in a dissertation, will comprise lexical semantic indexing of texts, as well as topic modeling of historical newspaper articles. Since we are dealing with multiple languages in our newspaper collection, while only some of the data is available as parallel corpora, my main focus will lie on cross-lingual topic modeling. More concretely, I will dedicate my main efforts towards transfer learning, that is, making knowledge gained from topic models in one language available in other languages.
Right now I am ...
... working on a state-of-the-art paper on natural language processing for historical newspapers.
... working on topic models for the impresso corpus.
... experiments with dynamic topic modeling using NMF.
If you want to know more about what is going on right now, you might be interested in my blog.
... there is nothing yet.
Digital Humanities 2019 - Presentation at Conference, 8 - 12 July. Title: Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images. Slides.
|Vom Diarium zum Digitarium - Invited Talk at Workshop, 24 - 25 April. Slides (in German)|
|Sinergia: Kick-Off Workshop at the EPFL, 24 - 25 October.|
|Digital Humanities Austria 2017 - Invited Talk a workshop about "building bridges", 4 - 6 December. Slides|