Header

Search

VatPub: Investigating Chronological Semantic Shift in Religious Documents using Large Language Models

Abstract

This project examines how religious concepts evolve in meaning over time and investigates whether
large language models (LLMs) can reliably detect these chronological semantic shifts. Using a
curated corpus of historical and contemporary religious documents, the student will develop meth-
ods for text alignment, semantic comparison across periods, and quantitative tracking of lexical
drift.

Tasks include designing prompt-based and embedding-based analyses, multilingual machine
translation, evaluating model performance against linguistic baselines, and visualizing diachronic
semantic trends in an interpretable manner. The expected outcome is a systematic assessment of
how well modern language models capture long-term conceptual change in religious discourse,
accompanied by an analytical report and a prototype tool for exploring semantic evolution.

Problem Statement and Research Goals

Religious concepts such as “community,” “authority,” or “spirit” often shift in nuance as histori-
cal, political, and cultural contexts evolve. Despite substantial work in diachronic linguistics, it
remains unclear whether modern LLMs can accurately trace such chronological semantic changes
across heterogeneous religious documents. This project aims to investigate how meanings drift
over time, to what extent language models can detect and quantify these shifts, and how their
behavior compares with established linguistic baselines.
The research will pursue three core goals: (i) construct and align a temporally segmented multilin-
gual corpus of religious texts spanning multiple eras; (ii) develop and evaluate LLM methods such
as in-context prompting, retrieval-augmented generation, and semantic modeling, for identifying
and characterizing semantic drift; and (iii) produce interpretable analyses and visualizations that
reveal how selected concepts evolve across periods. The overarching objective is to deliver a sys-
tematic and empirically grounded assessment of the capabilities and limitations of modern LLMs
in modeling long-term semantic evolution in religious discourse.

Studies of semantic change have long relied on diachronic distributional methods, including tem-
poral word embeddings and frequency-based indicators of lexical drift [1, 2, 3]. Early computa-
tional approaches demonstrated that semantic trajectories can be modeled by aligning embeddings
across time [4, 5]. Subsequent work introduced probabilistic and dynamic embeddings to capture
smooth temporal transitions [6]. With contextual models, researchers began exploring semantic
shift using BERT-based contextualized embeddings [7] and transformer-driven clustering analyses
[8]. More recently, researchers have turned to generative language models as interpreters of mean-
ing, using them to generate roughly equivalent sense definitions for word occurrences that share
the same meaning across different contexts [9, 10].
In parallel, digital humanities research has examined conceptual evolution in religious and philo-
sophical corpora, though typically with manual annotation or classical NLP techniques [11]. Re-
cent studies have started evaluating whether LLMs can detect diachronic trends [12], but questions
remain about their temporal robustness and domain sensitivity [13]. This project builds directly on
these strands by applying both classical and LLM-based methods to semantic evolution in religious
texts.

Qualifications

We are looking for one to two highly motivated and outstanding master’s students that
• are passionate about cutting-edge NLP research, especially theories and applications of large
language models;
• enrolled at a European university (students at Swiss Universities such as UZH are pre-
ferred) with a major in computer science, computational linguistics, digital humanities, etc.
• have hands-on coding experience with Python;
• gained basic concepts of language modeling and in-context learning;
• are familiar with common deep learning frameworks such as PyTorch and vLLM.
We will host the student at Department of Computational Linguistics, University of Zurich.

Application

Please send your CV and your most recent academic transcript to yingqiang.gao@cl.uzh.ch
(with fabian.winiger@uzh.ch and francesco.periti@kuleuven.be in cc).

Deadline: December 31th 2025 23:59, anywhere on earth.
Start date: February 1st 2026 or upon agreement.
We will later email you about further organizational steps if we see you as a match.

References

[1] Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. Diachronic Word
Embeddings and Semantic Shifts: A Survey. In Emily M. Bender, Leon Derczynski, and
Pierre Isabelle, editors, Proceedings of the 27th International Conference on Computational
Linguistics, pages 1384–1397, Santa Fe, New Mexico, USA, August 2018. Association for
Computational Linguistics.
[2] Nina Tahmasebi, Lars Borin, and Adam Jatowt. Survey of Computational Approaches to
Lexical Semantic Change Detection. Computational approaches to semantic change, 6(1),
2021.
[3] Francesco Periti and Stefano Montanelli. Lexical Semantic Change through Large Language
Models: A Survey. ACM Comput. Surv., 56(11), June 2024. ISSN 0360-0300.
[4] William L Hamilton, Jure Leskovec, and Dan Jurafsky. Diachronic Word Embeddings Re-
veal Statistical Laws of Semantic Change. In Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers), pages 1489–1501,
2016.
[5] Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. Statistically Significant
Detection of Linguistic Change. In Proceedings of the 24th international conference on world
wide web, pages 625–635, 2015.
[6] Maja Rudolph and David Blei. Dynamic Embeddings for Language Evolution. In Proceed-
ings of the 2018 world wide web conference, pages 1003–1011, 2018.
[7] Matej Martinc, Petra Kralj Novak, and Senja Pollak. Leveraging Contextual Embeddings for
Detecting Diachronic Semantic Shift. In Proceedings of the twelfth language resources and
evaluation conference, pages 4811–4819, 2020.
[8] Mario Giulianelli, Marco Del Tredici, and Raquel Fern´ andez. Analysing Lexical Seman-
tic Change with Contextualised Word Representations. In Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics, pages 3960–3973, 2020.
[9] Mario Giulianelli, Iris Luden, Raquel Fernandez, and Andrey Kutuzov. Interpretable Word
Sense Representations via Definition Generation: The Case of Semantic Change Analysis.
In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
pages 3130–3148, Toronto, Canada, July 2023. Association for Computational Linguistics.
[10] Francesco Periti, David Alfter, and Nina Tahmasebi. Automatically Generated Definitions
and their Utility for Modeling Word Meaning. In Yaser Al-Onaizan, Mohit Bansal, and Yun-
Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural
Language Processing, pages 14008–14026, Miami, Florida, USA, November 2024. Associ-
ation for Computational Linguistics.
[11] Rens Bod. A New History of the Humanities: The Search for Principles and Patterns from
Antiquity to the Present. OUP UK, 2013.
[12] Yingqiang Gao, Fabian Winiger, Patrick Montjourides, Anastassia Shaitarova, Nianlong Gu,
Simon Peng-Keller, and Gerold Schneider. SpiritRAG: A Q&A System for Religion and Spir-
ituality in the United Nations Archive. In Proceedings of the 2025 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, pages 26–41, 2025.
[13] Elisabeth Fittschen, Sabrina Li, Tom Lippincott, Leshem Choshen, and Craig Messner. Pre-
training Language Models for Diachronic Linguistic Change Discovery. arXiv preprint
arXiv:2504.05523, 2025.

Additional Information

Supervisor: Yingqiang Gao