Evaluation of factual consistency in simplified texts

Introduction

Text simplification is the process of making texts easier to read and understand for a wide audience. One of the biggest challenges in automatic text simplification is information preservation, i.e., that all information present in the original text is still there in the simplified version, and that no information is added (hallucinated). To detect these kinds of errors, we would like to have a system which can predict whether the information content of the original text is preserved in the simplification or not. Previous work has used a combination of question generation and question answering to achieve this for automatic text summarization (Wang et al., 2020). The goal of this project is to apply and test this approach for text simplification.

Goals

Depending on the scope, the project may involve:

Finding or developing question generation and question answering models (possibly large language models) to build a system that evaluates factual consistency
Finding or creating a test set containing annotations of information preservation for simplified texts
Evaluating the prediction accuracy of the system
Applying the system to evaluate information preservation in existing text simplification models

Requirements

Programming (Python)
Basic knowledge of generative language models
Ideally: experience with Hugging Face models

Literature

Chen et al. (2017): A semantic QA-based approach for text summarization evaluation
Wang et al. (2020): Asking and answering questions to evaluate the factual consistency of summaries
Muresan et al. (2022): Evaluating factuality in text simplification
Ebling et al. (2022): Automatic text simplification for German

Quicklinks

Main navigation

Evaluation of factual consistency in simplified texts

Introduction

Goals

Requirements

Literature