Evaluation of factual consistency in simplified texts
Supervisor: Prof. Dr. Sarah Ebling
Introduction
Text simplification is the process of making texts easier to read and understand for a wide audience. One of the biggest challenges in automatic text simplification is information preservation, i.e., that all information present in the original text is still there in the simplified version, and that no information is added (hallucinated). To detect these kinds of errors, we would like to have a system which can predict whether the information content of the original text is preserved in the simplification or not. Previous work has used a combination of question generation and question answering to achieve this for automatic text summarization (Wang et al., 2020). The goal of this project is to apply and test this approach for text simplification.
Goals
Depending on the scope, the project may involve:
- Finding or developing question generation and question answering models (possibly large language models) to build a system that evaluates factual consistency
- Finding or creating a test set containing annotations of information preservation for simplified texts
- Evaluating the prediction accuracy of the system
- Applying the system to evaluate information preservation in existing text simplification models
Requirements
- Programming (Python)
- Basic knowledge of generative language models
- Ideally: experience with Hugging Face models
Literature
- Chen et al. (2017): A semantic QA-based approach for text summarization evaluation
- Wang et al. (2020): Asking and answering questions to evaluate the factual consistency of summaries
- Muresan et al. (2022): Evaluating factuality in text simplification
- Ebling et al. (2022): Automatic text simplification for German