T0193


Harnessing AI for Education Evaluation: Insights from Commissioning and Practice 
Contributors:
James Merewood (RAND Europe)
Maria Pomoni (Education Endowment Foundation)
Send message to Contributors
Format:
Poster
Mode:
Presenting in-person
Sector:
Nonprofit / charity

Short Abstract

RAND Europe (RE) and the Education Endowment Foundation propose a joint presentation on AI in education evaluation. EEF will share their priorities for using AI in evaluation, RE will present insights from developing an AI assessment marking tool. We discuss scaling, ethics, and lessons learned.

Description

The role of AI in the evaluation of education programmes is rapidly expanding, offering opportunities to enhance efficiency, accuracy, and insight throughout the research process. Education systems are increasingly seeking innovative solutions to help improve learning outcomes; however, those commissioning and evaluating education interventions face the challenge of providing timely, actionable evidence while maintaining methodological rigour. AI technologies, particularly Generative AI, present new possibilities for addressing these challenges, particularly around the automation of routine tasks which are required throughout the evaluation process.

We propose a joint presentation by the EEF and RE, exploring AI’s role in evaluation from two complementary perspectives: commissioning and implementation. This talk would fit well within Theme 4: Evaluation in Action, as it directly explores the considerations required to integrate AI tools into live evaluations.

EEF will share their emerging interest in the application of AI within education evaluation, highlighting strategic priorities for integrating these technologies into evidence generation. This includes considerations around cost-effectiveness, scalability, and methodological integrity. EEF will also reflect on the implications for commissioners in ensuring that AI-driven approaches align with ethical standards and maintain transparency in decision-making.

RE will present insights from their current Writing Roots evaluation, an English writing intervention commissioned by EEF. Within this evaluation, RAND Europe is piloting an innovative AI-driven tool designed to mark handwritten assessments produced by children responding to the Writing Assessment Measure prompt. Marking this type of assessment can traditionally require a lot of resourcing and can be time-consuming. The AI tool RE has developed interprets and scores handwritten assessments, and outputs scores for each script in a format which can be directly analysed. This tool aims to reduce evaluator burden while maintaining reliability and validity. We will share lessons learned from the development process, validation results, and practical challenges encountered in integrating AI into a live evaluation project. These include technical and operational issues, such as ensuring fairness and avoiding bias in automated scoring, as well as the ethical considerations and information provided to participants.

Together, EEF and RE will discuss broader implications for scaling AI-enabled approaches in educational evaluation. Key themes will include ethical considerations, such as data privacy and transparency, alongside reflections about how AI can support evaluators in delivering timely insights for policy and practice. Our presentation will also consider the future research agenda and explore questions around the evidence needed to build confidence in AI-driven evaluation methods, and how commissioners and evaluators can collaborate to ensure these tools are deployed responsibly.