As of September 2017, I will be a PhD student engaged in the impresso project. impresso stands for Integrated Monitoring of Historical Press Corpora. During a project phase of three years, which is financed by a SNSF Sinergia grant, the DHLAB from the EPFL, the C2DH from the University of Luxembourg, and our institute will work on text mining of historical newspapers.
My main contribution to this undertaking will comprise lexical semantic indexing of texts, as well as topic modeling of historical newspaper articles. My aim is to tailor state-of-the-art deep learning algorithms specifically for this task.
Right now I am ...
... working on improving OCR for historical newspaper texts.
... working on a state-of-the-art paper on natural language processing for historical newspapers.
... getting text out of 200 years of NZZ newspapers.
If you want to know more about what is going on right now, you might be interested in my blog.
Vom Diarium zum Digitarium - Invited Talk at Workshop, 24 - 25 April.
|Sinergia: Kick-Off Workshop at the EPFL, 24 - 25 October.|
|Digital Humanities Austria 2017 - Invited Talk a workshop about "building bridges", 4 - 6 December.|
Plamada, Magdalena; Linder, Gion; Ströbel, Phillip; Volk, Martin (2015). Pre-reordering of Translation of Non-fictional Subtitles. In: The 18th Annual Conference of the European Association for Machine Translation (EAMT 2015), Antalya, Turkey, May 2015 - May 2015.
Volk, Martin; Amrhein, Chantal; Aepli, Noëmi; Müller, Mathias; Ströbel, Phillip (2016). Building a Parallel Corpus on the World's Oldest Banking Magazine. In: KONVENS, Bochum, 19 September 2016 - 21 September 2016.
Volk, Martin; Clematide, Simon; Graën, Johannes; Ströbel, Phillip (2016). Bi-particle adverbs, PoS-tagging and the recognition of german separable prefix verbs. In: KONVENS 2016, Bochum, 19 September 2016 - 21 September 2016.