Navigation auf


Institut für Computerlinguistik

Phillip Ströbel

Phillip Benjamin Ströbel, Dr.

  • Postdoctoral researcher
+41 44 635 71 32
Andreasstrasse 15, 8050 Zürich

Want to discuss an idea? Arrange an appointment on my Calendly.

Social media

Connect with me on Twitteror mastodon or Linkedin.

Memberships and affiliations

Member of the DSI Community Libraries.

Current project(s)

Please also have a look at the Data page for datasets, models, and code from current and past projects.

  • In September 2023, innosuisse approved our application for an Innovation Cheque project with Locomot GmbH. This project aims at making photo archives more accessible to the general public. We will use digitised index cards, a photo, a building, and a card index combined with large language models to re-tell the history of Davos.
  • In January 2021, I joined the Bullinger Digital project, which is dedicated to digitalising the correspondence by Swiss reformer Heinrich Bullinger. The project aims to apply Handwritten Text Recognition (HTR) to about 3,000 of the over 12,000 letters Bullinger has written and received from many colleagues of his time from all over Europe. The project is kindly funded by the Hasler Foundation, among others, and also involves a partnership with Andreas Fischer from the Department of Informatics at the University of Fribourg and Tobias Hodel from the Digital Humanities group at the University of Bern. We are currently (2023) looking at getting funding for a second project phase which is supposed to start in 2024.

Past projects

  • As of September 2017, I have been collaborating in the impresso project. impresso stands for Integrated Monitoring of Historical Press Corpora. During a three-year project phase, financed by a SNSF Sinergia grant, the DHLAB at the EPFL, the C2DH at the University of Luxembourg, and our department will work on text mining of historical newspapers. My main contribution to this undertaking was the lexical semantic indexing of texts and topic modelling of historical newspaper articles. 
  • From 2015 to 2017, I was responsible for the Text+Berg pipeline. The Text+Berg corpus consists of the yearbooks from the Swiss Alpine Club (SAC) and is a big multilingual collection of texts with mountaineering as their main topic.
  • From 2014 to 2016, I helped in compiling the Credit Suisse Corpus, which features not only text in multiple languages from the web news and the available PDF files, but also from scans of the world's oldest banking magazine back to 1895.
  • From 2015 to 2016, I was a research assistant at the URPP Language and Space and helped build the ArchiMob corpus consisting of recorded speech from people who have lived through the Second World War in Switzerland.

Right now I am ...

... starting a collaboration with the Chair of Systems Desgin at ETH and the Heidelberger Akademie der Wissenschaften to track the spread of ideas in correspondence data during the Reformation.

... starting a collaboration with Patrick J. Burns from NYU about Latin language models.

Upcoming Events

Past Events

CAIDAS Workshop in Würzburg, February 6-8. Presentation entitled: Bullinger Digital - Texterkennung in einem reformatorischen Briefwechselkorpus.
Bullinger Digital: 500 Jahre Bullingerbriefwechsel in Zurich, February 24. Presentation entitled: Bullinger Digital 2.0.
DHd 2023 in Trier, March 13-17. Presentation entitled: Bullingers Briefwechsel
zugänglich machen: Stand der Handschriftenerkennung.
DaSCHCon on Digital Editions and Interoperabilityin Bern, March 24. Participation.
PhD Defense, May 26, Zürich. Passed.
June 9: Invited talk at the "Text Recognition and Cultural Heritage" workshop organised by DIZH about the state-of-the-art of handwritten text recognition in the project Bullinger Digital at the Zurich States Archive.
August 25: Participation at the ADAPDA Workshop at ICDAR in San José, USA, with a paper about the adaptability of TrOCR for historical handwritings.
Transkriptionen zeitgemäss mit Transkribus. October 4, Zentralbibliothek Zürich. Workshop organiser.
COMHUM 2022 in Lausanne, June 9-10. Presentation entitled: Transformer-based HTR for Historical Documents.

LREC 2022 in Marseille, June 21-23. Presentation entitled: Evaluation of HTR models without Ground Truth Material.

DARIAH-CH Study Day in Mendrisio, October 20. Presentation entitled: Bullinger Digital – The Transformation and Expansion of an Analogue Edition into the Digital Age.

Einführung in Theorie und Praxis der OCR mit neuronalen Netzwerken in Zurich, October 4. Workshop organiser.

LREC 2020 in Marseille (cancelled due to the Corona Pandemic). Presentation at conference. Paper title: How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR.

Digital Humanities 2019 - Presentation at Conference, July 8-12. Title: Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images. Slides.

Vom Diarium zum Digitarium - Invited Talk at Workshop, April 24-25. Slides (in German).


Sinergia: Kick-Off Workshop at the EPFL, October 24-25.
Digital Humanities Austria 2017 - Invited Talk at a workshop about "Building Bridges", December 4-6. Slides.



FS 2016 Teaching Assistant in Einführung in die Multilinguale Textanalyse, with Martin Volk
SS 2017 Teaching Assistant in Semantische Rollen und relationale Fakten, with Simon Clematide
SS 2018 Teaching Assistant in Sentimentanalyse und Media Monitoring, with Simon Clematide
FS 2018 Teaching Assistant in Deep Learning in der Sprachtechnologie, with Simon Clematide
FS 2018 Teaching Assistant in Automated Media Content Analysis, with Gerold Schneider
FS 2019 Teaching Assistant in Machine Learning for Natural Language Processing 1, with Simon Clematide
HS 2021 Online teaching of the course Einführung in die Computerlinguistik at the University of Innsbruck
FS 2023 Teaching of the course Creation and Annotation of Linguistic Resources with George Yong



Semester Student,  Thesis Type Status Topic (Thesis Title)
Spring 2023 Elina Stüssi, BA ongoing Part-of-Speech Tagging for Early Modern Latin Correspondence
Autumn 2023 Nikolaj Bauer, MA ongoing TBD
Autumn 2023 Olga Shpakova ongoing Dokumentbasierter LLM-gestützter Chatbot als Rechtsassistenz: Entwicklung einer chat-basierten API für den Zugriff auf juristische Datenbank.
Autumn 2023 Yung-Hsin Chen, MA ongoing Improvements in the Adaptation of TrOCR Models for Non-English OCR/HTR


Review Activities

I served as a reviewer/on a PC for the following conferences/workshops:


ZORA Publication List

Download Options


Weiterführende Informationen

impresso Logo

Mehr über

Twitter @phillipstroebel