T0183: Human-in-the-loop evaluations: myth-busting and setting realistic expectations on how AI can be used in evaluations.

T0183

Human-in-the-loop evaluations: myth-busting and setting realistic expectations on how AI can be used in evaluations.

Participants:: Paul Jasper (Oxford Policy Management)
Steve Powell (Causal Map Ltd)
Matthew McConnachie (NIRAS)
Valeria Raggi (Itad)
Send message to Participants

Format:: Roundtable discussion

Mode:: Presenting in-person

Sector:: Private sector / Commercial

Location:: Oval Hall

Sessions:: Thursday 21 May, 11:40-12:05, 12:10-12:35
Time zone: Europe/London

Add to Calendar:

Short Abstract

This round-table will share practical examples of AI in evaluation, debunk myths, address risks, and discuss realistic expectations on efficiency gains, showing how AI is currently used and how it may reshape evaluators' roles and workflows in the future.

Description

Among all the hype and gloom that dominates discussions about the usefulness and problems of AI, it can be difficult to identify meaningful and practical ways in which AI is already changing and supporting the ways in which we do our day-to-day work. This also holds true for discussions around AI in evaluations. On the one hand, people worry about how AI’s increasing capabilities might replace evaluators. On the other hand, concerns on AI’s biases and hallucinations may lead practitioners to conclude that there is no value in engaging with it for the purposes of evaluations. Similarly, evaluation commissioners expect significant efficiency gains from AI use in evaluations, while worrying continuously about the fundamentals of data privacy and leakage. This ‘myth-busting’ round-table discussion aims to cut through this noise, and provide practical and hands-on insights on what it means to do ‘AI-assisted’ evaluations right now. We will bring four experienced AI and evaluation practitioners together to share experiences and examples of how AI can already add value to an evaluation workflow, how efficiency expectations in this context need to be managed, how risks can be addressed, and what their personal views are on where the ‘AI in evaluation’ voyage is going.

The round table will consist of:

· Matthew McConnachie (NIRAS) – Principal Consultant at NIRAS. Matthew will share insights on how recent developments in agentic AI help him and his team to implement evaluations.

· Paul Jasper (Oxford Policy Management) – Principal Consultant and Data Innovation Lead at OPM. Paul will share examples of how AI assists with the implementation of specific tasks in evaluation workflows at OPM – allowing to implement ‘AI-assisted evaluations’ – while keeping human evaluators firmly in the loop.

· Kecia Bertermann (Itad) – Associate partner at Itad. Kecia will share examples of pairing AI specialists with subject-matter evaluators so that domain experts shape the questions and frameworks, while the AI experts translate those into effective tool configurations.

· Steve Powell (Causal Map Ltd.). Steve will share examples of how a “verifiable AI” approach to causal mapping has helped answer evaluation questions at scale.

Attendees to this session will get a sense of what practical use-cases there are for AI in evaluations right now. The presentations and the discussion will identify specific steps in an evaluation workflow where investing in AI can be considered good value-for-money. This will be complemented by personal assessments of where technological developments might be changing this picture, i.e. where we can expect advancements in AI that can change how it is used in an evaluation. Insights shared will allow the audience to understand what reasonable expectations about AI in evaluations are, what currently is possible in terms of assisting human evaluators using AI, and what is not possible. Participants to the session will get a sense of how AI might be changing the role of evaluators and the tasks that they implement on a regular basis.