As of September 2017, I will be a PhD student engaged in the IMPRESSO project. IMPRESSO is an acronym for Integrated Monitoring of Historical Press Corpora. During a project phase of three years, which is financed by a SNSF Sinergia grant, the DHLAB from the EPFL, the C2DH from the University of Luxembourg, and our institute will work on text mining of historical newspapers.
My main contribution to this undertaking will comprise lexical semantic indexing of texts, as well as topic modeling of historical newspaper articles. My aim is to tailor state-of-the-art deep learning algorithms specifically for this task.
Right now I am ...
... working on improving OCR for historical newspaper texts.
... working on a state-of-the-art paper on natural language processing for historical newspapers.
|Sinergia: Kick-Off Workshop at the EPFL, 24 - 25 October.|
Plamada, Magdalena; Linder, Gion; Ströbel, Phillip; Volk, Martin (2015). Pre-reordering of Translation of Non-fictional Subtitles. In: The 18th Annual Conference of the European Association for Machine Translation (EAMT 2015), Antalya, Turkey, May 2015 - May 2015.
Volk, Martin; Amrhein, Chantal; Aepli, Noëmi; Müller, Mathias; Ströbel, Phillip (2016). Building a Parallel Corpus on the World's Oldest Banking Magazine. In: KONVENS, Bochum, 19 September 2016 - 21 September 2016.
Volk, Martin; Clematide, Simon; Graën, Johannes; Ströbel, Phillip (2016). Bi-particle adverbs, PoS-tagging and the recognition of german separable prefix verbs. In: KONVENS 2016, Bochum, 19 September 2016 - 21 September 2016.