As of September 2017, I am a PhD student engaged in the impresso project. impresso stands for Integrated Monitoring of Historical Press Corpora. During a project phase of three years, which is financed by a SNSF Sinergia grant, the DHLAB from the EPFL, the C2DH from the University of Luxembourg, and our institute will work on text mining of historical newspapers.
My main contribution to this undertaking will comprise lexical semantic indexing of texts, as well as topic modeling of historical newspaper articles. Since we are dealing with multiple languages in our newspaper collection, while only some of the data is available as parallel corpora, my main focus will lie on cross-lingual topic modeling. More concretely, I will dedicate my main efforts towards transfer learning, that is, making available knowledge gained from topic models in one language in other languages.
Right now I am ...
... working on improving OCR for historical newspaper texts.
... working on a state-of-the-art paper on natural language processing for historical newspapers.
... getting text out of 200 years of NZZ newspapers.
... working on topic models for the federal gazette, a parallel corpus. Also, I'm trying out if we can achieve some topic-like distribution by clustering word embeddings.
... working on a contribution for COMHUM 2018.
If you want to know more about what is going on right now, you might be interested in my blog.
... there is nothing yet.
Plamada, Magdalena; Linder, Gion; Ströbel, Phillip; Volk, Martin (2015). Pre-reordering of Translation of Non-fictional Subtitles. In: The 18th Annual Conference of the European Association for Machine Translation (EAMT 2015), Antalya, Turkey, May 2015 - May 2015.
Volk, Martin; Amrhein, Chantal; Aepli, Noëmi; Müller, Mathias; Ströbel, Phillip (2016). Building a Parallel Corpus on the World's Oldest Banking Magazine. In: KONVENS, Bochum, 19 September 2016 - 21 September 2016.
Volk, Martin; Clematide, Simon; Graën, Johannes; Ströbel, Phillip (2016). Bi-particle adverbs, PoS-tagging and the recognition of german separable prefix verbs. In: KONVENS 2016, Bochum, 19 September 2016 - 21 September 2016.