T0254


AI in Evaluation: Learn Tools and Workflows from UN Case Studies with >90% Validated Accuracy 
Contributor:
James Goh
Send message to Contributor
Format:
Poster
Mode:
Presenting in-person
Sector:
Private sector / Commercial

Short Abstract

Get guided, hands-on practice on AI tools and workflows used in UN case studies where AI achieved >90% validated accuracy. Activities include AI analysis of interviews, reports, and survey responses, as well as practice with features like AI avatar interviewers, visualizations and chatbots.

Description

As evaluation teams face growing volumes of qualitative data, tighter timelines, and rising expectations for timely learning and use, AI is increasingly positioned as part of everyday evaluative practice. Yet many evaluators remain rightly cautious: How accurate is AI compared to human analysts? Where does it genuinely add value? And how can it be used ethically, transparently, and without reinforcing bias or hallucinations?

This interactive workshop addresses these questions through real-world UN evaluation case studies, where AI methods were systematically benchmarked against human evaluators and independently validated at >90% accuracy. Rather than focusing on theory or speculative futures, the session emphasises practical workflows, governance approaches, and hands-on application that evaluators can immediately translate into their own work.

Participants will explore three applied case studies drawn from UN evaluations:

1) AI Interview Transcript Analysis (UNHCR):

AI was used to analyse 50 qualitative interview transcripts, generating thematic, subgroup, and segment-specific insights aligned with evaluation questions. Results were benchmarked against human coding and validation processes, demonstrating how AI can support rigorous qualitative analysis while dramatically reducing time and cost.

2) AI Avatar Interviewers (UNESCO):

AI avatars were deployed to conduct 50 interviews in two days, enabling multilingual, culturally sensitive data collection at scale. This case illustrates how AI can expand reach to under-represented groups, reduce interviewer burden, and support more inclusive and adaptive evaluation designs.

3. AI Document and Survey Analysis (UNICEF):

AI analysed over 700 management responses and survey entries across 160 evaluation reports in five languages, identifying cross-cutting barriers, enablers, and patterns that would have been impractical to detect manually. The case demonstrates how AI can support synthesis, learning, and utilisation across portfolios.

Beyond showcasing results, the workshop focuses on how these outcomes were achieved responsibly. Participants will learn how human-AI benchmarking was conducted, how hallucination risks were mitigated, and how ethical safeguards, such as human-in-the-loop review, bias checks, and transparent documentation, were embedded into evaluation workflows.

A core feature of the session is hands-on participation. All attendees will be provided with complimentary access to the AI tools used in the case studies and guided through live exercises. Participants will have the option to work with their own evaluation data or provided sample interviews, reports, and survey responses. Activities include:

- Analysing qualitative data and survey responses using AI-assisted workflows

- Creating AI avatar interviewers tailored to specific evaluation contexts

- Generating visualisations and dashboards for sensemaking and communication

- Interacting with an AI chatbot to interactively query evaluation findings

By the end of the session, participants will leave with a clear understanding of where AI meaningfully strengthens evaluation practice, how to apply it ethically, and how it can help bridge the persistent gap between evidence generation and action. The workshop directly contributes to building evaluation cultures that value learning, timeliness, inclusion, and responsible innovation, aligning with the conference theme of “Bridging the Gap: Evaluation to Action.”