Navigation auf uzh.ch
Supervisor(s): Prof. Dr. Sarah Ebling, Yingqiang Gao, Dr. Nianlong Gu
Controllable Text Generation (CTG) refers to the ability to guide or influence the output of a language model to meet specific requirements or constraints. When integrated with Automatic Text Simplification (ATS), these constraints typically include compression ratio (the length of the simplified text divided by the length of the original text), lexical complexity, semantic similarity, and syntactic richness. By utilizing parallel complex-simple text pairs, ATS systems can be trained to generate simplifications that adhere to these specified requirements.
Despite the advancements made in controllable ATS systems, an important aspect has been overlooked: the text type. Text of different types should be simplified differently due to their distinct formalities. For instance, legal and scientific texts have unique lexical preferences and syntactic structures that differ significantly from those of news articles.
A significant challenge in controllable ATS systems is that they are typically trained to simplify text from a single specific types. This approach leads to two main issues:
Consequently, this leads to developing multiple distinct ATS systems, complicating access to simplified information. This project aims to address these limitations by developing a unified ATS system that is responsive to various input text types, thereby enhancing accessibility to diverse textual information.
The primary objective of this project is to develop a types-sensitive ATS system that can effectively distinguish between different text types and generate types-specific text simplifications. In pursuit of this goal, the project will explore several key research questions:
By addressing these questions, this project aims to advance the capabilities of current ATS systems and open new avenues for enhancing information accessibility.
This project involves further project objectives:
This project requires the following knowledge basis in:
We are looking for highly motivated students majoring in computer science/data science/mathematics/electrical engineering. Please send your CV and transcript to yingqiang.gao@uzh.ch (cc Prof. Dr. Sarah Ebling, ebling@cl.uzh.ch). This project will be co-supervised by Dr. Nianlong Gu (nianlong.gu@uzh.ch) at Linguistics Research Infrastructure (LiRI) of University of Zurich.