Multi-Agent System for Generating and Evaluating Award Criteria for Swiss Public Procurement
Abstract
Award criteria for public procurement are notoriously difficult to gather and formalize. Two challenges sit at the heart of the problem. First, expertise is scattered across a wide range of public sectors, each with its own terminology, regulatory culture, and decision logic. As a result, no single human expert can easily provide a comprehensive and consistent view. Second, there is no established gold standard for evaluating procurement criteria, which makes it hard to benchmark quality, ensure comparability, or build automated tools that can generalize across domains.This project explores methods to systematically extract, structure, and evaluate award criteria using data-driven approaches. It aims to support procurement professionals, increase transparency, and contribute to a more efficient and accountable public sector.
Problem Statement and Research Goals
Award criteria in public procurement remain difficult to obtain and standardize because expertise is fragmented across many public sectors, and no gold evaluation benchmark exists. Individual experts can only cover narrow slices of the procurement landscape, while criteria embedded in lengthy policy documents are often ambiguous, inconsistent, or domain-specific. Traditional single-model or rule-based extraction approaches struggle in this environment: without a shared reference standard or unified expert view, they cannot reliably judge completeness, relevance, or quality. This project investigates whether a multi-agent system can overcome these limitations by simulating a panel of diverse procurement experts. Different agents will assume complementary roles, for example, extraction, critique, refinement, and validation, and will collaborate or debate to converge on high-quality award criteria. The goal is to design and evaluate an agent architecture that can (1) extract criteria from heterogeneous documents, (2) reason collectively about their correctness and usefulness in the absence of a gold standard, and (3) produce structured, domain-agnostic outputs that support procurement officers. A successful system would demonstrate how coordinated agent interaction can approximate expert consensus where real human consensus is difficult to obtain.
Qualifications
We are looking for one highly motivated and outstanding master’s students that
• are passionate about cutting-edge NLP research, especially theories and applications of large
language models in multi-agent settings;
• major in computer science, computational linguistics, digital humanities, etc.
• have hands-on coding experience with Python;
• gained basic concepts of language modeling and in-context learning;
• are familiar with common deep learning frameworks such as PyTorch and vLLM.
We will host the student at Linguistic Research Infrastructure, University of Zurich.