Metric Space Magnitude for Analysis of Language Models
Supervisors: Marius Huber, Juri Opitz, Michelle Wastl
Summary
Metric space magnitude (usually referred to as "magnitude") is a geometric tool that, similarly to entropy, allows one to measure diversity of point clouds. In NLP, it has been used, e.g., to “fingerprint” language models through analysis of their latent spaces.
This project is intended for MSc students. Its purpose is to develop the necessary understanding of magnitude as a tool, and, subsequently, to investigate the value of magnitude for NLP by applying it to problems such as, e.g.,
- translation (direction) detection;
- uniformity/alignment tradeoff in contrastive learning;
- explainability of language models, such as, e.g., effects of training and/or finetuning on magnitude of latent spaces;
- ...
Resources:
Requirements
Python, NLP, “Mathematical Foundations of Computational Linguistics 2” course or equivalent