Accepted papers
Session 1Paper short abstract
The K+ programme is King’s College London’s flagship outreach activity to support progression into higher education. We demonstrate how evaluation activities, ranging from pre-post surveys to multiple RCT’s, have shaped and influenced the programme and how evaluation is embedded into the programme.
Paper long abstract
The K+ programme is King’s College London’s flagship outreach activity to support progression into higher education (HE) for sixth form students from under-represented backgrounds (e.g. non-selective state schools or first in family to progress to HE). The two-year programme consists of events and activities to equip students with the confidence, knowledge and skills to succeed at university. Approximately 600 students take part each year, for a total of 1200 students enrolled in the programme at any one time, organised into 9 pathways based on the subjects they wish to study.
Students begin their first year with a welcome induction event, also attended by parents and carers, to launch the programme followed by a non-residential spring or summer school to engage in academic lectures careers experiences and bespoke skills-based workshops. Their second-year focusses on the skills needed to succeed in their a-levels and their transitions to university including support with UCAS applications. Alongside the core K+ programme, additional intervention components are delivered supporting students with Black and mixed heritage, LGBTQ+ students, and aiming to raise attainment.
The K+ programme has undergone several rounds of revision throughout its lifecycle and has been subject to a range of evaluation activity ranging from pre-post surveys, to examinations of individual components, to two currently underway RCT’s (one in collaboration with TASO) aiming to demonstrate the causal impact of the programme on student progression to HE.
We demonstrate how the K+ programme has been shaped and informed by evaluation throughout its lifecycle and how this body of evaluative work has been built upon, including the emergence of longer-term findings. This covers multiple evaluation designs from pre-post surveys, process evaluations, participant focus groups, and the use of causal designs including two RCT’s. This also includes the use of validated measures such as the Access and Success Questionnaire (ASQ) and the Academic Behavioural Confidence – Revised (ABC-R) scales. The ABC-R is a recently validated measure developed between the Social Mobility and Widening Participation team and the Institute of Psychiatry, Psychology & Neuroscience at King’s.
We will showcase how evaluation activities are embedded throughout regular delivery, including the use of pre-post surveys within programme activities to permit consistent monitoring of impact and open-ended student feedback. In addition to showcasing evaluation activities we will bring together perspectives from those delivering the programme on the practicalities of embedding evaluation as well as engaging with evaluation to affect change.
Paper short abstract
How do you establish a new impact system and build an evaluation culture across a global non-profit with a lean team? This is a case study on co-creating an M&E framework at a global food awareness charity.
Paper long abstract
A robust evaluation culture is a goal for many charities, but the path to achieving it can be unclear. This is particularly true for small teams in large, international organisations. This presentation details a practical case study of a new Head of Impact tasked with building a new impact system from scratch at ProVeg International, a food awareness organisation operating in 15 countries with ~250 staff.
With a lean team of 2.6 FTE, the core challenge was not just to design a technical framework but to embed an evaluative culture that empowers programme teams to value, use, and generate evidence. This case study addresses the administrative burden and fragmentation of a legacy system based on inconsistent Google Sheets. The session will detail the co-creation process, which began with 31 stakeholder interviews, and outlines a new system using technology (Google Forms, Zapier, and Google BigQuery) to establish a "single source of truth." We leveraged automation to free up time for busy project managers, providing a streamlined system that replaces manual data collection wherever possible.
This approach establishes best practices for measuring impact in 'hard to measure' areas like corporate engagement, policy advocacy, and public campaigns. We have also moved beyond simple output metrics to establish unified variables that allow us to compare impact across our diverse country teams. The system includes an option for "deep dive" case studies that use methods like process tracing to uncover why projects succeed. In parallel, we developed "scenario models" to estimate impact against core metrics like CO2 emissions averted and animal consumption avoided. This approach provides a model for how other organisations with limited resources can create a unified system that serves multiple purposes—from strategic planning and fundraising to communications and partnerships—shifting from top-down reporting to a bottom-up culture of reflection and learning.
Paper short abstract
This study critically evaluates Botswana’s Scholarship Programme through a Tswanacentric Capabilities lens, exploring whether recipients truly flourish and attain Seriti. It blends critical realist and human development approaches to reframe education policy evaluation around indigenous values.
Paper long abstract
This study emerges from a deeply personal and critical reflection on my own journey as a recipient of Botswana’s Top Achievers Scholarship Programme (TASP). Despite graduating with a master’s degree from a globally ranked institution under the full sponsorship of the Government of Botswana, I found myself grappling with the sense that I was not ‘flourishing’. While the lack of meaningful employment was certainly a significant factor, I sensed that my lack of flourishing was shaped by deeper, less tangible issues. I reimagine the flourishing not merely an economic or professional outcome, but rather a condition for possessing Seriti, the moral and spiritual weight that defines one’s dignity, purpose, and standing in the community. Without flourishing, one’s Seriti is diminished. Conversations with fellow TASP recipients revealed similar experiences—a disconnect not just between our educational investments and the opportunities to apply our skills, but between the paths we chose and the sense of purpose we hoped to find.
We were selected in our late teens, handed a rare and extraordinary gift: the freedom to study anything, anywhere in the world, fully funded by our government. While the openness of opportunity offered by TASP can be empowering, it also introduces a paradox of choice that may complicate graduates’ ability to navigate toward a life of meaning. When post-graduation outcomes fall short of expectations, heightened reflexivity may lead some to perceive themselves as lacking Seriti.
This study investigates whether Botswana’s TASP has enabled its recipients to truly flourish—and, in doing so, attain Seriti. To understand this, we engage in a critical realist evaluation of the programme, incorporating a tracer study of scholarship recipients, and then extend the analysis to a broader human development evaluation.
Crucially, this study builds on Nussbaum’s emphasis on protecting those capabilities whose absence would render a life not worthy of human dignity (Nussbaum, 2011, p. 15). Seriti is more than a linguistic equivalent of dignity; it reflects a deeply rooted cosmological understanding of what it means for individuals to live a life they value in Tswana culture.
Situated within a Critical Realist ontology, this study rejects reductionist explanations and seeks to uncover the real but often unobservable mechanisms within higher education finance that shape graduates’ capacity to flourish—understood here as the attainment of Seriti. Methodologically, it employs a tracer study design, combining regression analysis with qualitative interviews to uncover causal effects and the deeper structural and cultural mechanisms that condition them.
My positionality as a TASP graduate now undertaking a PhD grounds this inquiry in both critical reflection and insider understanding. It enables me to engage empathetically and thoughtfully with other recipients. Ultimately, we seek to marry traditional education policy evaluation with a more expansive, human development–centered and culturally contextualised approach, recognising the importance of indigenous values and lived experiences in shaping meaningful educational outcomes.
Paper short abstract
Our participatory evaluation of Into the Light uses creative methods and systemic mapping to embed learning in cultural practice. This session explores how inclusive evaluation cultures can inform policy, challenge conventions, support regeneration, and inspire sector-wide change.
Paper long abstract
This presentation shares insights from the evaluation of Into the Light, an ambitious, place-based cultural programme designed to support cultural regeneration, talent development, and inclusive participation across County Durham over three years. Grounded in Socio-Cultural Historical Activity Theory (SCHAT), the ongoing evaluation adopts a participatory, arts-based approach to embed learning and reflection within everyday cultural practice.
Work Package 1 focused on co-developing an inclusive evaluation framework and toolkit, shaped by a review of UK and European place-based programmes. This framework supports capability-building among cultural practitioners through co-designed tools, inclusive principles, and creative methods that embed evaluation into everyday cultural practice.
Work Packages 2 and 3 apply creative methods, including Photovoice, LEGO® Serious Play, philosophical dialogues, and community storytelling, to surface lived experience and identify contradictions within the programme. These tensions are reframed as opportunities for transformation, supporting iterative learning and collaborative sensemaking.
A key strand of the evaluation is the development of a Community of Practice across County Durham, bringing together freelance creative practitioners, cultural organisations, educators, and community partners. This network is designed to support peer learning, reflective practice, and shared inquiry, enabling practitioners to engage with evaluation not as a separate activity but as an embedded part of cultural development. Through co-created resources, thematic workshops, and collaborative storytelling, the Community of Practice fosters a culture of openness, experimentation, and mutual support. It also provides a platform for surfacing diverse perspectives and amplifying voices that are often underrepresented in formal evaluation and policy processes.
The evaluation challenges conventional hierarchies by positioning participants as co-researchers and valuing narrative, visual, and performative data alongside traditional evidence. It also demonstrates how creative inquiry and arts-based methods can deepen understanding in complex systems, offering cultural organisations, governing bodies, regional and national policy actors, cultural funders, and strategic decision-makers new ways to surface insight, foster collaboration, and amplify community voice.
Embedded within Durham University’s Policy Hub, the evaluation contributes to cultural policy by generating inclusive, context-sensitive insights. It informs strategies for regeneration, workforce development, and place-shaping, demonstrating how embedded evaluation cultures can support both practice and policy. The approach offers a transferable model for other regions and sectors seeking to embed evaluation in complex, creative, and community-led contexts.
This session offers an original contribution to the field by showcasing how evaluation can be rigorous, relational, and responsive, supporting transformation across complex systems through creative and participatory approaches. The session will also reflect on how this approach can be adapted for other regions and sectors, contributing to wider learning in cultural evaluation, public policy, and community-led development. It invites dialogue on how creative methods can support inclusive decision-making and long-term cultural change.
Paper short abstract
Integrating evaluation into decision-making presents challenges. This presentation shares strategies for engaging stakeholders, building evaluation allies, and expanding into new areas. Gain practical insights to foster a culture where evidence is valued and used effectively.
Paper long abstract
Integrating evaluation into daily decision-making processes poses a significant challenge for large organisations, and National Highways is no exception. This presentation delves into the intricate task of establishing a clear vision for evaluation, particularly in the face of influential stakeholders who may question its relevance or resist its adoption. Drawing from recent experiences in developing National Highways’ evaluation strategy for their upcoming five-year funding period, we will explore how evaluation leaders can maintain resilience and adaptability while upholding professional integrity and fostering motivation and ambition within their teams.
The session will outline effective strategies for understanding and engaging with challenging stakeholders and making a compelling case for the importance of evaluation. We will discuss how to navigate internal resistance while addressing external demands for evaluation evidence, as well as identifying and managing the reputational risks associated with such pressures. Additionally, we will focus on identifying and nurturing evaluation allies, individuals who can help build momentum and legitimacy throughout the organisation.
Special emphasis will be placed on expanding evaluation into new and emerging areas where it is difficult to show evidence of its impact. Demonstrating value in these contexts requires strategic thinking, creativity, and persistence. The presentation will also address how to tackle resource challenges posed by stakeholders by enhancing evaluation efficiency and improving the usefulness and usability of evaluation evidence.
Paper long abstract
Our approach to evaluation culture is to get everyone involved. We will do the same in this session, engaging conference participants using tools that we have found effective in our own practice. We will discuss concrete approaches to embedding evaluative cultures, presenting our experiences and the challenges we have faced in implanting evaluation culture from within an organisation. We encourage others to share methods that have worked for them, as well as obstacles.
Imperial is a world–leading university for science, technology, engineering, medicine and business (STEMB), with a wide portfolio of outreach and public engagement work. As in-house evaluators in this setting, we will share tools we have used to build trust, relevance, and agency into an evaluation culture that values and benefits from different perspectives.
We will cover a range of scenarios and starting points, from working with colleagues who are confident using surveys but don’t feel a sense of ownership of their evaluation to others who aren’t sold on the value of evaluation at all. We will discuss how we have approached each of these as professional evaluators and the tools that have helped us build these cultures.
One of the key methods that has helped unify our evaluative approaches is the co-creation of shared outcomes. These outcomes provided essential buy-in from stakeholders and developed a shared language and sense of purpose, anchoring the benefits and need for evaluation beyond data collection. We will touch on how our shared outcomes helped us navigate organisational changes and set us up to protect programmes and communicate our shared purpose across an organisation of over 8,000 members of staff and 22,000 students.
With a show-don’t-tell approach, we will demonstrate some of the simple tools we have found effective to encourage engagement, generating discussion around challenges and opportunities of building evaluation culture. We will candidly share challenges we have faced with data availability, as well as over collection of data that is not used to its full potential.
Central to this session will be engaging conference participants in discussions of what has worked well (and not so well), using some of the tools we have implemented with colleagues at Imperial. We will create a safe space to grapple with challenges, discuss opportunities, and scaffold key take-aways for conference participants.
Paper short abstract
Involving young people in decision-making can benefit the community and improve local services. This case study assesses the influence of youth engagement on local government decision-making by examining local leaders’ perspectives on how it translates to actionable insights.
Paper long abstract
A recent review undertaken by the UK government identified that young people wanted to be involved in decision-making processes. Involving young people in decision-making can benefit the young people themselves and the wider community through shaping and improving local services. However, little is known about the policy impact of youth engagement. Understanding the mechanisms through which youth engagement translates to actionable insights for decision-makers, could help local governments to strengthen youth engagement to shape local policy decisions. This case study involving a document analysis, together with interviews and a focus group with leaders in a local authority in England, describes how youth engagement can inform decision-making and what factors reinforce or weaken these processes. The study found that even where a range of youth engagement activities are supported, the absence of strategic corporate commitment can result in an approach that is fragmented, without adequate resource to ensure insights reach the relevant decision-making forums. Services and policies are more likely to change where the pathway from insights to service provision is short, for example among Children’s services where service providers directly seek insights from the young people they support. However, outcomes were not routinely fed back to young people, and their input was not consistently acknowledged in relevant strategies. This may limit ongoing engagement. Creating a broader organisational culture that values youth engagement requires leaders willing to challenge the status quo to demand consideration of young people’s perspectives. This could involve adopting a systematic approach to embed youth engagement into key decision-making structures within the local authority.
Paper short abstract
All evaluations require good governance and adaptation, but these take on new meanings and importance in long-term evaluations of new interventions. In this session, the commissioner and evaluator reflect on how to build effective evaluation cultures in lengthy and novel evaluations.
Paper long abstract
Commissioning Better Outcomes was funded by The National Lottery Community Fund. It operated from 2013 to 2024, with a mission to support the development of more social outcomes contracts in England. It made up to £40m available to pay for a proportion of outcomes payments for social outcome contracts (SOCs, previously known as social impact bonds (SIBs) commissioned locally (i.e. by local authorities, clinical commissioning groups, police and crime commissioners etc; hereafter referred to as ‘commissioners’). Alongside the CBO programme, The National Lottery Community Fund commissioned Ecorys and ATQ Consultants to evaluate the CBO and to explore the ‘SOC Effect’. Running from 2013-2025, the evaluation aimed to explore the advantages and disadvantages of commissioning via a social outcomes contract; challenges in developing social outcomes contracts and how they can be overcome; and the extent to which CBO met its aim of growing the market for social outcomes contracts.
At the time of commissioning the evaluation, SOCs were a very new mechanism, with limited examples of how they had been evaluated previously. Furthermore, the evaluation was over a very long timescale – 12 years. All evaluations require good governance, strong working relationships and adaptation, but these take on new meanings and importance in an evaluation of such novelty and duration. This session highlights the key learnings of how to develop an effective evaluation culture that stands the test of time, drawing on both the commissioner (The National Lottery Community Fund) and evaluator (Ecorys) perspectives. In particular, it encourages stakeholders to be cognisant of, and embrace, the Forming, Storming, Norming, Performing process that takes place in any new team.
Paper long abstract
Evaluation policies play a critical role in shaping evaluation practice and outcomes. However, their development and theoretical foundations have received limited scholarly attention. Such research is important as it reveals how earlier policies inform subsequent policymaking and how policy can serve as a bridge between theory and practice by embedding theoretical concepts into organizational requirements (Klein and Marmor 2008; Christie and Lemire, 2019).
This study traces the evolution of federal evaluation policy in Canada from 1977 to 2016, analyzing six evaluation policies using Al Hudib and Cousins’ (2022) ten-component taxonomy. Findings reveal both continuity and incremental change in policy content, with certain expectations for evaluation practice persisting over time while others have shifted in scope, emphasis, and language. By linking these patterns to broader theoretical and historical influences, the study demonstrates how evaluation policy functions as an instrument that reflects, reinforces and institutionalizes prevailing evaluation theories. In particular, findings highlight how successive policy iterations embed conceptual and methodological assumptions that shape evaluative action and institutional norms, effectively bridging the gap between theory, policy, and practice. Understanding these dynamics provides insights for policymakers and evaluators seeking to design policies that better support effective, theory-informed evaluation practice and contribute to the ongoing strengthening of results-based governance.
Paper short abstract
Mercy Corps’ GIRL-H evaluation embedded learning within adolescent girl programming in six countries in East and West Africa. By applying participatory methods, a learning agenda and iterative reflection cycles, the evaluation enhanced adaptive learning, supported inclusion, and program development.
Paper long abstract
This paper presents how the Mercy Corps Girl-H programme integrated learning and programme development through a deliberately cultivated evaluation culture. GIRL-H provides tailored interventions for adolescent girls and young women to gain skills and transition on pathways to formal education, economic opportunities, and civic engagement. GIRL-H has operated in Kenya, Nigeria, Tanzania, Uganda, South Sudan and Sudan since 2020.
The multi-country GIRL-H evaluation, which was published in 2024, informed program adaptation. The evaluation employed participatory qualitative tools, notably the River of Life, enabling adolescents to narrate their own journeys using locally accessible materials. This co-creative method surfaced insights on programme relevance within peer networks and communities. The collected data was analysed using MAXQDA AI Assist to draw trends and common patterns across the dataset. The findings were used to reflect contributions and constraints for participatory analysis. Informed by data from regular review meetings, monitoring visits and learning sessions, the program made significant adaptations, such as in financial inclusion and social and behaviour change communication (SBCC). Crucially, learning sessions brought together mentors, enumerators, and programme participants to co-interpret results and guide course corrections in real time. The paper includes reflections on power dynamics, inclusion (e.g. whose voices were heard), and ethical tensions in conducting and using the evaluation.
Evaluation findings affirmed the importance of mental health and psychosocial support and SBCC to address harmful gender norms, while noting that more time and resources are required for meaningful norms change in communities. Based on participatory interpretation, the evaluation influenced mid-course adjustments and shaped partner decisions about scaling these components. This case contributes to evaluative practice by demonstrating how embedding routine evidence-based learning informed by programme monitoring and learning data into programme management processes and decision-making can shift an organisation toward being reflexively evaluative and how participatory methods enrich both uptake and ownership.
Paper short abstract
In 2019, UK lung cancer survival rates hadn't improved in 50 years. NHS England initiated a targeted screening programme for early detection. Ipsos and the Strategy Unit evaluated this, providing real-world delivery insights which helped inform the UK’s decision for national roll-out.
Paper long abstract
The UK has historically lagged behind comparable countries in cancer survival rates, emphasising the need for earlier diagnosis. In 2019, the NHS set a target to increase early-stage (1 and 2) cancer diagnoses from half to three-quarters by 2028. Lung cancer accounted for 21% of cancer deaths, making it the most common cause of cancer death in the UK, with late-stage diagnosis being a critical issue.
NHS England's Targeted Lung Health Check (TLHC) programme (2019-2024) was initiated to enable earlier lung cancer diagnosis in real-world settings, following positive results from several small-scale trials and pilots.
Ipsos, working with our data partners the NHS Strategy Unit, was commissioned to conduct a process, impact and economic evaluation of the programme. The main objective of the evaluation was to assess whether the encouraging results shown in earlier trials were replicated when the programme was delivered in real world NHS settings. The main outcome of interest was to assess whether there was a shift in cancer staging at diagnosis. We were also tasked with exploring how effectively the programme was delivered, what participants thought of the health checks, and to provide advice on how the programme should be rolled out in future.
During the lifetime of the programme, the UK National Screening Committee recommended that a national lung cancer screening programme should be initiated, and NHS England is now working on national roll-out.
The evaluation showed that 1.22 million invitations were sent, with an overall 44% uptake rate, leading to 324,000 Lung Health Checks and 163,000 CT scans. A total of 2,748 participants received a lung cancer diagnosis, representing a 1.7% conversion rate from initial CT scan. Approximately 75% of these cancers were diagnosed at stages 1 or 2, meeting key benchmarks for early detection. Furthermore, 2,056 other cancers were diagnosed and the programme identified incidental findings in three-quarters of CT scans.
The robust quantitative impact evaluation- which used a Propensity Score Matching and Difference-in-Differences methodology –estimated that an additional 781 lung cancers were diagnosed at stage 1 or 2 that would have otherwise been diagnosed at a later stage or not diagnosed at all. The programme also enabled the detection of an additional 341 lung cancers at stage 3 or 4.
While no immediate impacts were seen on lung cancer mortality rates, this aligns with clinical expectations within the timeframes of the evaluation. Programme challenges included lower participation rates in deprived areas and among ethnic minorities, despite projects reporting delivering engagement strategies to try and address this. However, the programme as a whole was delivered in some of the most deprived areas in England and these areas have therefore disproportionately benefitted. High delivery costs, largely due to staffing, highlighted the complexity and resource demands of implementation, supported by testimonies from various projects.
Insights from the evaluation have been instrumental in shaping the national rollout strategy. NHS England is integrating findings to optimise programme delivery, addressing engagement disparities, driving overall uptake and focusing on engaging the most high-risk individuals.
Paper long abstract
Developing theories of change for influencing and diplomacy interventions is knotty and hard. This is due to their propensity for multiple futures, penchant for potential-building interventions that have no causal pathways (yet), and frequent integration with programmatic interventions.
Developing useful and useable theories of change for influencing is challenging, but not impossible. In this talk we will discuss the features of these types of theories of change that are distinctive, and how to take account of these when delivering theory-based evaluation.
We will start the session by outlining common features of influencing theories of change, and how this impacts evaluation considerations. We will also outline the importance of articulating influencing theories of change well, and their power as a communication and sensemaking tool, as well as an analytical one.
We will then showcase two examples of theories of change for influencing, and how we were able to structure a theory-based evaluation around them. These examples will cover (i) an example of reconstructed portfolio-level influencing theory of change and (ii) an example of an influencing intervention that is integrated with a programme, where the theory was designed with support of the evaluator at the start of the intervention. We will engage the audience on these two scenarios, using a voting system to see if they can identify the challenges and opportunities that arose in these two scenarios, as well as how these theories were used to catalyse action at different points of the evaluation.
We will then workshop a scenario with the audience, where the presenters represent an intervention designer and an evaluator respectively. The scenario will be an intervention designed by a think tank to influence policy makers to improve equity legislation through research provision, and the evaluator must design a theory-based evaluation. The presenters roleplay a theory of change discussion typical of influencing interventions, where the audience are invited to join in and support the poor evaluator who is suffering in the discussion. Any audience members who may attended our potential (to be finalised) UKES training on influence and diplomacy monitoring and evaluation will be invited to participate first due to greater familiarity with the subject matter (though noting this in no way overlaps with our proposed training content). If we have a large group, we will pivot to two break out groups, and draw on additional Integrity colleagues to roleplay the discussion.
Relevance to the theme: this is relevant to theme 3 ‘Communicating evaluation for action’. Well-articulated and easily accessible theories of change are essential for theory-based evaluation: the better the theory of change, the more actionable and understandable these evaluations are. Our work is centred on the articulation of Useful theories of change in influencing evaluations, and how to use behavioural insights and scenario planning methods to support their creation. We will also aim to cover to how evaluators communicate these theories and their use in evaluation processes, as well as how to use them in sensemaking processes.
Paper long abstract
I am excited to submit a poster presentation to showcase our approach and findings from the evaluation of the Arts and Humanities Research Council's (AHRC) Follow on Fund (FoF) scheme and how they have informed the new evolution of the programme.
For over a decade, the AHRC FoF scheme has supported researchers in transforming arts and humanities insights into tangible change across knowledge exchange, skills development, commercialisation, policy engagement, and public life. As the scheme reached its fifteen-year mark, AHRC commissioned an independent evaluation to explore its effectiveness, relevance, and future direction.
This evaluation took place against a backdrop of increasing expectations for publicly funded research to demonstrate impact beyond academia. While this imperative spans all disciplines, what constitutes ‘impact’ (and how it unfolds) differs markedly between fields. In the arts and humanities, pathways to impact are often non-linear, relational, and co-produced, contrasting with the more structured trajectories typical of science and innovation funding (such as TRLs). The evaluation sought to reflect these distinctive pathways while recognising the growing policy interest in the economic and societal value of arts and humanities research.
To capture the richness of ten years’ worth of evidence, we took an exploratory approach, co-developing and iterating a Theory of Change with AHRC and using Outcome Harvesting to understand and analyse the full body of evidence. This approach allowed the team to identify, substantiate, and analyse hundreds of outcomes (and trace this back to the AHRC FoF scheme), capturing nuanced examples of how arts and humanities research creates social, cultural, and economic value.
The evaluation confirmed that FoF is a valued and effective part of the funding landscape. Between 2015 and 2024, FoF awards leveraged £193 million in further funding, outperforming comparator schemes. Furthermore, manyFoF awards were referenced in the 2021 Research Excellence Framework (REF) impact case studies, underlining the scheme’s significant contribution to research impact and reach across the UK.
Yet, the evaluation also surfaced a clear message: there remains untapped potential, particularly in supporting projects to drive economic impact, commercialisation, and bold new pathways to impact. These insights came at a pivotal time, as the arts and humanities sector (and the funding system more broadly ) grapples with how to demonstrate value and relevance in a changing innovation landscape.
The poster will showcase how Outcome Harvesting and iterative Theory of Change development can be used to generate actionable insights even late in a programme’s lifecycle. It will share practical lessons on how evaluative learning can inform funding redesign, demonstrating that it is never too late to evaluate, reflect, and adapt. Ultimately, it argues that embracing discipline-sensitive definitions of impact and adaptive evaluation methods is essential to supporting the full potential of arts and humanities research to contribute to the UK’s economic, cultural, and societal wellbeing.
Paper short abstract
WYCA’s new Outcomes Framework embeds evaluation into strategy, translating complex evidence into actionable insights that can guide regional investment and strengthen feedback loops. This work has solidified a focus on outcomes and will help promote a culture of evidence-based decision-making.
Paper long abstract
The West Yorkshire Combined Authority (WYCA) is a Mayoral Strategic Authority with responsibilities across transport, culture, skills, housing, and economic development. In recent years, our Evaluation Team has worked to embed robust monitoring and evaluation practices across an organisation undergoing rapid change and under pressure to deliver at pace. We have had to balance the urgency of tackling entrenched inequalities with the need for evidence-based decision-making and learning from past interventions.
As our ambitions expand—particularly in areas such as mass transit, bus franchising, and home retrofit—while funding remains constrained, the need to prioritise investment around strategic outcomes has become increasingly important. Equally, to support learning and accountability, we must distil complex evaluation evidence from a multibillion-pound investment portfolio into clear, actionable insights for senior leaders and elected members.
In response, throughout 2025, the Evaluation Team has led the development of an overarching Outcomes Framework. Grounded in WYCA’s Local Growth Plan and other strategies such as the Local Transport Plan, the framework identifies key outputs and outcomes across policy areas, supported by both intervention-level and regional metrics. This enables us to track progress at a regional level while assessing the contribution of individual programmes.
The framework is summarised in a series of one-page logic models. Whilst these are deceptively simple, we developed them through extensive stakeholder engagement, negotiation of competing priorities, and alignment with national devolution targets.
We showcase the development of an outcomes framework as a practical strategy for generating and disseminating evaluation evidence in ways that meaningfully inform strategic decision-making by gaining buy-in from organisational leadership up front. By using theory of change models, evaluation practitioners can provide strategic clarity and lead conversations around organisational outcomes and objectives. As the outcomes framework becomes established, this will help embed robust and proportionate approaches to evaluation within a complex, multi-stakeholder public sector organisation. This case study serves to highlight the critical role of sustained communication and relationship-building with a range of stakeholders in ensuring that evaluation has real real-world impact and can inform better policy-making and delivery.
We will sshare our approach to developing this framework through iterative stakeholder engagement and explore its implications for improving evaluation design, evidence use, and strategic learning. We argue that a shared set of outcomes and metrics can strengthen the ROAMEF cycle, support continuous improvement, and embed evaluation more deeply into regional strategy and decision-making, offering applicable insights for attendees working across diverse evaluation contexts.
Paper short abstract
Whole systems approaches and realist evaluation are positioned as antidotes to reductionist methods, due to their preoccupation with understanding the role of multiple layers of context and causal forces. Here, we communciate findings about how practitioners value and use them to inform their work.
Paper long abstract
Pressing healthcare issues, and health inequalities are recognised as complex issues that are irreducible to their constitute parts. Appropriate evaluation within complex areas, including whole systems approaches and realist evaluation, have burgeoning credibility in their ability to account for learning and innovation across complex issues. However, deployment is often fraught with challenges and understanding how stakeholders become engaged in these approaches and integrate cycles of learning is lacking. Questions exist surrounding how and in what ways stakeholders react to this “participation” in complex congruent evaluation and how this evidence is valued and used. The aim of this research was to understand how large scale transformation of whole systems realist practice and evaluation occurs, for whom, and in what circumstances.
The National Evaluation and Learning Partnership, commissioned by Sport England, have worked collaboratively with a wide range of place partnerships engaged in whole system place-based approaches to tackle physical inactivity. The team have supported places to explore how Place Partnerships can build capacity to undertake appropriate evaluation. A focus of the work has been to substantially raise capability in whole systems realist evaluation. Drawing upon a bricolage of participatory evaluation methods, this approach has worked with places to appreciate importance of complexity, the conditions for change, and then enable them to operationalise realist informed evaluation methods. In this paper we reflect on the findings from 11 realist interviews with stakeholders who have been engaged in this place partnership journey to explore ideas on how, capability and consciousness may develop to inform everyday decision making and delivery.
Emerging results verify that places initially require an increased recognition on the need to accept uncertainty and alternative evaluation approaches. A prominent feature was the need for a senior leader who advocates for, supports, and facilitates change by “feeding the beast” of traditional ways of thinking whilst highlighting the need for broader ways of capturing impact. Another resource influencing change was the presence of a credible external voice who “fights the corner” of innovative ways of thinking. Findings indicate that once places understand complexity, they become ready to alter practices. The influence of funder expectations, engrained beliefs on evaluation as a performance metric, and the role of share social spaces for knowledge exchange prominent. Commissioned activities and external frameworks can be persuasive due to the competitive landscape, meaning organisations will conform to meet the funders requirements. However, in other instances without enforcing expectations some used the approach to embellish their work. The evolution of places to being reflexive with cycles of learning was not as discrete. This was often complicated by the various levels of the system and trying to influence multiple varying agendas. Often, this cross boundary work required “translation” which many within the system found alien.
Sustainable uptake of whole systems place based realist work is influenced by historical practices of evaluation, enduring beliefs about practice, the funding landscape, the provision of external support and social spaces, the wider stakeholder belief system, and the interplay of senior and middle management in discursive ways.
Paper short abstract
Explores how feminist evaluation strengthens climate governance in Kenya, Nigeria, and Pakistan. Highlights participatory learning, gender gaps, and proposes practical principles to embed accountability, inclusion, and gender-sensitive evidence use in policy.
Paper long abstract
Embedding Feminist Evaluation Cultures in Climate Governance: Insights from Kenya, Nigeria & Pakistan
Strand: Building Evaluation Cultures
Author: Cynthia Jebichii KERING, MA Candidate, Keele University
Email: cynthiajebichii0@gmail.com | Phone: +447470589290
Full Abstract
Aligned with the UK Evaluation Society 2026 Conference theme “Bridging the Gap: Evaluation to Action,” this paper explores how feminist evaluation cultures can transform climate governance systems in Kenya, Nigeria, and Pakistan. Climate adaptation frameworks increasingly commit to gender equity, yet evaluation practice still prioritises technical indicators over lived experience, learning, and accountability.
This study examined how evaluation systems recognise or marginalise women’s climate knowledge and agency. Using a systematic qualitative analysis of climate policies, M&E frameworks, and evaluation reports, the research analysed how evaluation culture shapes equity and learning. Findings reveal that Kenya’s decentralised structures foster participatory learning and feedback, while Nigeria’s centralised, externally driven evaluation limits gender accountability. Pakistan’s dryland agriculture context illustrates risks where climate-smart frameworks and trade systems exacerbate women’s unpaid labour and water burdens when gender-sensitive evaluation is absent.
The paper proposes feminist evaluation principles that strengthen local learning cultures and promote relational accountability, inclusion, and epistemic justice. These principles offer evaluators practical guidance for embedding gender-sensitive evaluation approaches that ensure women’s knowledge and experiences inform climate action. By bridging evaluative insight and policy change, feminist evaluation cultures can help realise climate justice in the Global South.
Short Abstract
This paper contributes to the UKES 2026 theme “Bridging the Gap: Evaluation to Action” by exploring how feminist evaluation cultures can strengthen climate governance in Kenya, Nigeria, and Pakistan. Through systematic document analysis, the study examines how evaluation systems recognise or marginalise women’s climate knowledge. Kenya demonstrates participatory learning cultures, while Nigeria maintains technocratic practices. Pakistan reveals how the absence of gender-sensitive evaluation can intensify women’s unpaid labour. The paper proposes practical feminist evaluation principles for embedding learning and accountability.
Keywords: feminist evaluation; climate governance; participatory learning; evaluation capacity; gender equity; Global South.
Speaker Bio
Cynthia Jebichii KERING is a Gender and Development scholar and evaluation researcher completing her Master’s degree at Keele University, United Kingdom. Her work focuses on feminist evaluation, climate governance, and gender-responsive public policy in Africa. She has researched comparative gender-responsive climate adaptation in Kenya and Nigeria and explored climate-smart agriculture and women’s economic precarity in Pakistan. Her research advances participatory and equity-driven evaluation approaches that centre women’s lived knowledge in climate decision-making systems.
Presentation Title Options
- Feminist Evaluation Cultures in Climate Adaptation
- Embedding Women’s Knowledge in Climate Evaluation
- Climate Justice Requires Evaluation Justice
Paper short abstract
Evaluability assessments (EAs) are tools that can help evidence impact and bridge the evaluation-action gap. We conducted four EAs with organisations undertaking dog population management (DPM) to strengthen their monitoring and evaluation capacity in order to and advocate for humane DPM globally.
Paper long abstract
Background
Evaluability assessment (EA) is a quick and useful tool that can be used to support organisations facing challenges in demonstrating impact. Recent applications of EA have been used to support evaluation planning and improving monitoring and evaluation (M&E) systems (Kate Hamilton-West et al., 2019). Within the field of dog population management (DPM) numerous organisations around the world conduct passionate and intensive work to humanely manage dogs, yet they lack the necessary M&E knowledge and tools to evidence the impact of their work (Hiby et al., 2017). Animal welfare organisations such as The International Companion Animal Management Coalition (ICAM) are working to overcome these challenges via investing in research and methodological expertise to support charities and local governments carrying out DPM to increase their M&E capacity. The aim of this research was to demonstrate how EAs can bridge the evaluation-action gap by increasing M&E support to organisations carrying out DPM to better learn how to evidence their impacts. In doing so, successful case studies may be used to champion for humane DPM globally.
Methods
An M&E team comprising of a partnership between ICAM and the University of Glasgow included: evaluation scientists, DPM experts and epidemiologists with expertise in quantitative methods. The team worked collaboratively to provide direct support to a selected group of organisations implementing DPM. We conducted four EAs with organisations located in Thailand, Sri Lanka, Georgia, and India. For each organisation the EA process comprised three participatory workshops (one online, two in person) to meet with stakeholders, co-develop a theory of change, prioritise outcomes, identify key performance indicators, data availability and data needs. The process for each culminated in a clear and actionable set of M&E recommendations co-developed with the local organisation. After recommendations were identified, data experts worked intensively and collaboratively with the organisations to share, analyse and interpret data to showcase the impacts of their DPM activities.
Results
The four organisations who participated in the EA all had varying levels of M&E capacity. Three were collecting data on their DPM efforts, with basic analysis, interpretation and reporting, while one had a strong track record of publications. The M&E team were able to provide direct support to each organisation, and a bespoke plan was co-developed with each to strengthen their M&E capacity going forward. Specific actions varied across organisations and included: providing input for improving data collection tools, data cleaning, data analysis, data visualisation and interpretation, with the ultimate aim of publication of results. In some cases, the organisations adapted their practices for more effective data capture.
Conclusions
We conclude that evaluability assessments can work towards bridging the evaluation-action gap within DPM by supporting organisations to increase their M&E capacity, and in turn facilitate operational decision-making towards evidencing impact. This strengthens the evidence base for successful DPM approaches, which may be used to advocate for humane DPM globally.
Paper short abstract
ReAct embeds evaluation into test-and-learn approaches, aligning efforts across the employment sector. Through co-produced insights and adaptive methods, it has shaped employer engagement, recruitment, and participant support, translating evaluation into action.
Paper long abstract
The Get Britain Working white paper highlights the need for systems change, supported by test and learn and adaptation. This session will explore how the ReAct Partnership* has integrated evaluative thinking into test-and-learn environments across the employment sector. The ReAct Partnership is an industry-led, active collaboration to support a continuous improvement community in the Restart programme through action research, shared and iterative learning, and the development of applied, evidence-based resources.
At the heart of ReAct is a commitment to co-produced evaluation, funded and overseen by the Restart Prime Providers. Practitioners, policymakers, and other stakeholders are actively involved in shaping evaluation questions, interpreting findings, and driving change. This approach ensures that evaluation is relevant, grounded in context, and more likely to influence decisions.
The session will highlight three case examples where ReAct has contributed to positive outcomes and influenced action. First, in shaping how organisations engage employers. Second, in workforce recruitment and development, where evaluation insights prompted a redesign of recruitment processes to attract candidates from a wider range of backgrounds. Third, in shaping participant support, ReAct developed targeted resources such as carers webinars and top tips sheets to improve engagement and outcomes.
The session will reflect on how this change was achieved through action evaluation, including how evaluation is resourced, when and how evaluation questions are agreed, and how findings are shared. The session will also reflect on challenges and lessons for the evaluation community, including funding for non-traditional evaluation activity across organisations, building trust for collaborative evaluation, creating impact with the right audiences and ensuring timely evaluation insights.
*The ReAct Partnership is co-funded by the eight ‘prime providers’ for the Restart programme — FedCap Employment, AKG, G4S, Ingeus, Maximus, Reed, Seetec and Serco — and is being managed by the Institute of Employment Studies (IES), working alongside the Institute for Employability Professionals (IEP) and the Employment Related Services Association (ERSA).
Paper short abstract
We examine the intersection of evaluation policy and ECB as a foundation for strengthening evaluation culture within the Canadian federal government by showing how evaluation policies can operate as ECB strategies, and how additional strategies can be leveraged to enhance policy implementation.
Paper long abstract
Evaluation policy and evaluation capacity are two critical influences on evaluation practice. However, their relationship, including the fact that policy may be considered an ECB strategy in some contexts, remains relatively unexplored in the academic literature. Understanding how these two areas intersect might uncover how evaluation systems operate in practice as well as how certain factors facilitate and hinder policy uptake and implementation.
Evaluation policy is often defined as the rules and principles that guide an organization’s decisions and actions when planning, designing, conducting, reporting, or using evaluations within specific organizational, cultural and/or political contexts (Al Hudib & Cousins, 2022; Christie & Lemire, 2019; Trochim, 2009). Consequently, such policies play a pivotal role in shaping evaluation practice. Like evaluation policy, ECB is context-dependent, offering a range of strategies intended to facilitate and sustain quality evaluations (Bourgeois et al., 2013; Stockdill et al., 2002). ECB strategies may target individuals (e.g., training, technical assistance) or organizations (e.g., building data systems, designating evaluation champions, allocating resources) (Labin et al., 2012; Preskill & Boyle, 2008). Multi-level approaches are typically required because strategies implemented at one level often reinforce those implemented at another (LaMarre et al., 2020). For instance, organizational resources are often needed to support individual training opportunities.
There are several ways in which ECB strategies may intersect with evaluation policy. First and foremost, evaluation policy can be an ECB strategy that builds a common language, improves institutional knowledge, and establishes a long-term vision for evaluation practice (Sutter et al., 2024, p. 537). ECB can also serve as a bridge between policy and practice by developing the capacity of individuals responsible for interpreting and implementing evaluation policy, which helps them recognize and understand key policy requirements. Such strategies may include training, embedding policy language in key organizational documents, and communications materials (e.g., newsletters) (Fierro et al., 2022). Conversely, evaluation policy can drive ECB, as policy requirements guide capacity building strategies and signal where organizations must strengthen their capacity to meet policy expectations (Al Hudib & Cousins, 2022). Together, these perspectives position ECB as a mediating mechanism that enables evaluation policy to move beyond its role as a written directive to one that actively shapes practice. Even the most robust evaluation policies risk remaining aspirational without the necessary individual and organizational capacity to translate policy expectations into effective practice.
Our presentation examines the intersection of evaluation policy and ECB as a foundation for strengthening evaluation culture within the Canadian federal government. Drawing on interviews with federal policymakers, evaluation leaders, and scholars, we will share findings on how federal evaluation policies operate as ECB strategies, how individual and organizational capacities, or the lack thereof, affect the implementation of evaluation policy, and how additional ECB strategies can be leveraged to enhance policy uptake and implementation. The results from this study illustrate the relationship between evaluation policy and ECB, and how this relationship can be leveraged to create environments where evaluation is embedded in organizational culture to support evidence-informed decision-making, continuous learning, and ongoing improvement.
Paper long abstract
My company, Nexus Evaluation Ltd, had the pleasure of working with an organisation that works to improve the working conditions of people in global supply chains. For years, they have been collecting micro stories: a few lines to a paragraph or two, on the type of conversations, narratives and exchanges emerging from their factory visits and convenings. These narratives include quotes from factory workers, main questions raised during meetings, stories shared and field observations.
The organisation has now collected over 4,000 micro stories, and this number is increasing as we speak. They had spent quite a lot of time coding each story against a set of themes and categories, ranging from organisational values to specific human rights issues. And they engaged Nexus to lead a new type of analysis on all these stories, albeit with a very limited budget and time.
We quickly realised that each story offered but a glimpse of very complex systems and challenging lived experiences, and that together they tell you something more than the sum of their parts. Given all this, we were keen to use a mix of approaches as follows:
1) Systems thinking, which included pattern and trend identification and systems mapping.
2) Feminist and gender-transformative approaches to address country-specific and emergent global issues.
3) Strategic and organisational design principles, to add more value. This meant carefully crafting questions that guided a couple of facilitated sense-making discussions. The questions aimed to inform new ways of working, strategic direction and improve organisational capabilities and potential for impact.
4) A decolonial and humanising approach to storytelling.
I will describe in more detail how we put the above in practice, and share the findings and recommendations.
Paper short abstract
Drawing on Foundations’ toolkit and their Changemakers programme, Foundations and Cordis Bright share insights into how local evidence-based leadership bridges the longstanding gap between evaluation and action in children’s services.
Paper long abstract
There is a longstanding gap between what the evidence tells us improves outcomes and what is available for children and families locally. This session will explore how to close that gap by using a suite of tools developed by Foundations and partners to support local areas move from understanding evidence to embedding it into practice.
Using Foundations’ Changemakers programme as a case example, we will demonstrate how Foundations supports evidence-informed decision making at the local level. The toolkit brings together two key resources: Practice Guides, offering evidence-based recommendations for commissioning and delivering family support, and the Guidebook which summarises tested interventions that put these practices into action. Together, these provide a practical starting point for local authorities (LAs) seeking to make evidence-informed choices.
Despite a strong evidence base for parenting interventions, there is a gap in the interventions we know to be effective reaching scale. The Changemakers programme, funded by Foundations in partnership with the Department for Education and the Youth Endowment Fund, was designed address this challenge. It empowered LAs to bridge the gap between evidence and practice by appointing ‘Local Evidence Leaders’ to champion evidence use, embed evidence-based interventions, and build capacity for evaluation-informed decision-making.
Cordis Bright conducted an implementation and process evaluation following the programme from inception to completion (2024–2026). Using mixed methods, they explored how the model operated in different contexts, what supported or constrained implementation and how dedicated local evidence leadership can influence system wide change.
In this session, Foundations and Cordis Bright will reflect on what it takes to embed evaluation and evidence use in local systems. We will share findings on enablers and barriers to evidence leadership; from leadership commitment and organisational readiness to practical enablers such as time, networks, and peer learning. The session will conclude with interactive discussion, encouraging reflection on how What Works Centres and evaluators can collaborate to make evaluation useful, used and usable.
Aligned with the theme “Bridging the Gap: evaluation to action,” this session provides practical insights for What Works Centres, policymakers and evaluators seeking to move from isolated evidence use toward sustainable evidence-led cultures and explores what it really takes to make evidence-based leadership stick.
Paper short abstract
This session shows how structured products like evidence maps and digital summaries help turn evaluations into decisions. With examples from youth, employment, and climate policy, it highlights how design and communication make evidence easier for policymakers to use.
Paper long abstract
Evaluation is intended to serve two functions: accountability and lesson learning. The lessons from an evaluation can go beyond the intervention being evaluated to be applicable to other interventions in other settings. One channel for evaluation findings being transferred to other settings is when their findings are summarised in systematic reviews. But, like evaluation reports, systematic reviews often remain unread and are not accessible to decision-makers. Traditional outputs like academic papers or policy briefs fail to connect with decision makers. These formats often assume too much time, background knowledge, or interest from the audience. As a result, evidence ends up being underused, no matter how strong it is.
Knowledge brokering, or knowledge translation, has emerged as a means of getting evidence into use. This presentation presents examples of a specific form of evidence product, called Evidence-Based Decision-Making Products based on systematic reviews of evidence from evaluations. These products include interactive toolkits, visual platforms, and digital summaries based on systematic reviews and evidence and gap maps.
In this presentation, we share insights from three approaches:
1. Evidence toolkits
a. Youth Endowment Fund Toolkit (UK) – A platform that presents evidence on what works to reduce youth violence. Each approach is rated for its impact, strength of evidence, and cost. The toolkit has been used by government agencies, local councils, and even the Prime Minister’s Office, to shape funding decisions and policy strategies.
b. Youth Employment Evidence Platform (Sub-Saharan Africa) – A collaboration with the European Commission to help guide investments in youth employment. The platform includes a meta-analysis across ten interventions, plain-language summaries, and policy-relevant metrics. It supports planning by showing what works, where, and at what cost.
2. Evidence Q&A (Global) – The CIGAR Evidence portal is designed to support gender-responsive climate and agriculture policies. It organizes complex evidence into a simple question-and-answer format, helping users explore topics like how women adapt to climate change or how gender affects access to resources.
3. Evidence summary evidence and gap maps – Evidence and Gap Maps (EGMs) are of growing interest, being used to identify what evidence exists in particular policy are. These maps show what evidence exists, not what it says. We have worked on two projects in which the map does contain cell-wise evidence summaries: Child Protection Research, and Conflict and Atrocity Prevention.
Across all these cases, the core idea is the same; communication is not just the last step in evaluation, it is part of the design. We involve stakeholders early, ask what they need, and build tools around their questions. We use plain language, strong visuals, clear structure, and digital formats that are easy to navigate. We also build in features like filtering, comparisons, and implementation guidance to help people move from knowing to doing.
These efforts have already led to tangible results: budget decisions tied to toolkit ratings, local governments revising programs based on evidence, and greater awareness among international donors of where their money can make the biggest difference.
Paper short abstract
We report learnings from deployment of realist and participatory approaches with limited resources and time-constraints to evaluate a multi-faceted intervention in rural Ethiopia to improve health outcomes of podoconiosis patients, improve prevention, care-seeking and access, and reduce stigma.
Paper long abstract
Podoconiosis (endemic non-filarial elephantiasis) is a non-infectious disease caused by long-term exposure of bare feet to red clay soil derived from volcanic rock. Since 2023, Malaria Consortium has been implementing a project called “Happy Feet: Strengthening Community-based Podoconiosis Prevention and Control in Ethiopia”. The project involves a community-based, innovative intervention package, including training and support for health providers to improve access and quality of morbidity management, disability prevention and psychological support services; community messaging campaigns (billboards, radio messages and community events) to improve preventative and care-seeking behaviours, and reduce stigma against patients; and distribution of customised shoes to aid physical recovery. To evaluate this multi-faceted intervention, and provide usable evidence of what worked, for whom and in which contexts, we will employ aspects of realist and participatory approaches: adapted Ripple Effects Mapping will first be undertaken with providers, community members and patients to understand anticipated and unanticipated outcomes, and to challenge and add to the existing theoretical mechanisms and pathways to these outcomes. These theories and hypotheses will then be further tested with quantitative surveys carried out at households and health centres, allowing for analysis by gender and other factors. Finally, a participatory feedback and reflection event with local and national stakeholders will be held, following all data capture and preliminary analysis, to feed into final conclusions. Both participatory and realist approaches have challenges as they require expertise and time to implement, they have also not been widely used in African settings. In this session we will report on lessons learned from implementing this theory-driven evaluation in a rural Ethiopian setting, with limited resources and time-constraints. The lessons will be recorded systematically and prospectively throughout the evaluation (November 2025-March 2026) through individual, team and participant reflections.
Paper short abstract
This presentation explores the role of a collaborative PhD partnership in bridging the gap between academia, policy, and practice to deliver innovative, impactful skills research, influence policy, and develop a new generation of researchers in Scotland.
Paper long abstract
This presentation will describe how Skills Development Scotland (SDS) and the Scottish Graduate School of Social Science (SGSSS) have formed a collaborative PhD research partnership spanning 13 years. The main aim of this partnership is to connect PhD students, universities, practitioners, policymakers, and stakeholders, so that academic research can more directly inform policy and practical action in Scotland’s skills landscape.
The partnership is designed to close the gap between academic theory and real-world policy by encouraging innovative, impactful research on skills issues. It serves as a useful example of how to achieve research impact, involve stakeholders in research , and share knowledge in practice.
A key feature of this partnership is that collaboration and impact are built in from the start. The programme doesn’t just look at the quality and relevance of research produced on skills policy—it also examines how well and in what ways those research findings are shared. The partnership uses a range of events and outputs to make sure research outcomes reach both policymakers and practitioners.
We will share lessons on some of the challenges of our approach. These challenges include involving multiple stakeholders on a continual basis over the lengthy period of a PhD and making sure that complex research produced by PhD students is turned into clear, practical insights for people outside academia. The programme tackles these issues by using a variety of communication methods, such as student-led seminars and events, to make sure knowledge is shared widely and effectively.
The partnership also pays close attention to diversity, equality, and inclusion. It brings together a wide range of people—students, academics, practitioners, and stakeholders—ensuring that many voices and experiences are included in the research process and in the evaluation of the programme itself.
The presentation will highlight several practical results from the partnership. These include evidence that the research has influenced policy and practice in Scotland, increased the employment prospects of PhD students by giving them real-world policy experience, and developed a model for collaborative research partnerships.
Another major strength of the partnership is the transfer of innovative research methods from academia into practice. These include advanced approaches such as utilising AI in the research process, innovative methods like photo-elicitation, and working directly with young people to co-produce research. These methods have brought fresh perspectives and real innovation to benefit everyday professional practice in SDS and in skills policy research more broadly.
The presentation will highlight that by promoting knowledge exchange and supporting student development, the collaboration has become a model of good practice, showing how partnerships between academia and the public sector can lead to meaningful, impactful research that shapes policy and practice. Finally we will highlight recent developments in the programme that demonstrate our commitment to continuous improvement, for example through our use of AI and innovative research methods.
Paper short abstract
We applied realist evaluation approach to test the programme assumption that individual researchers can be institutional change agents in African universities. The resultant evidence catalysed practice-relevant dialogue among stakeholders and highlighted the need to strengthen research ecosystem
Paper long abstract
In the global health space, health research capacity strengthening (HRCS) has been deemed a strategic way of fostering [health] research equity, especially in low and middle-income countries (LMICs). While the majority of HRCS initiatives focus on developing a critical mass of individual researchers, evidence on the effectiveness of the ‘individuals as agents of institutional change’ model remains underdeveloped. We conducted a realist evaluation to examine how and why research partnerships under the ‘Developing Excellence in Leadership Training and Science in Africa’ (DELTAS Africa) programme – an initiative delivered through a global North-South research partnership – strengthen the health research capacity of African universities. Two cases representing unique research consortia were studied using realist-informed qualitative methods to test an initial programme theory (IPT). We conducted realist interviews with African principal investigators (PIs), collaborators, research support staff, PhD researchers and postdoctoral fellows, and programme-level staff. Retroductive theorising guided the testing of the IPT through the Context-Mechanism-Outcome (CMO) configuration framework. Through theoretical abstraction, we refined the IPT using CMOs from the case theories. Multiple mechanisms (e.g., empowerment, inspiration, sense of agency, vulnerability) were triggered to generate varied research capacity outcomes for individual researchers and their institutions across the two cases. Findings show that the research partnerships provided researchers with access to research resources and opportunities, triggering an empowerment, motivation and inspiration mechanism that resulted in short-term outcomes such as improved research outputs (e.g., increased publications and funding) and enhanced technical and soft research skills and researchers’ career growth in a context where there was buy-in and support by university leadership. A sense of agency mechanism was activated to generate medium-term outcomes, such as improved supervisory capacities in research departments and the establishment of research hubs, in a context where the university research environment was conducive, with researchers spending more time on research than on teaching activities. Even when researchers were empowered with the appropriate skills to mobilise research funding through grant writing, they were often frustrated and rendered vulnerable in contexts where the environment was less supportive, such as poor remuneration, a lack of protected time for research, and deprioritised funding by national governments. The evidence challenges the use of individuals as change agents as an HRCS model and argues that the institutions within which the individuals are based should have minimal supportive research systems in place. Shared with the programme stakeholders, the evidence catalysed discussions about the need to extend beyond individual-level research capacity to sustainably address systemic challenges and weaknesses, thereby building a conducive research environment that retains individual talent and enables research to thrive.
Paper short abstract
Nourish is a long-term, flexible intervention to improve food environments in schools. This presentation explores how adaptive evaluation was used over five years to develop and shape the Nourish delivery model, as well as to demonstrate impact and wider applicable learning in a compelling way.
Paper long abstract
Nourish is a long-term, flexible intervention designed to improve school food environments in individual schools. This presentation explores how adaptive evaluation was used to develop and shape the Nourish delivery model, as well as to demonstrate impact and wider applicable learning in a compelling way.
The Nourish programme supports schools to adopt a whole school approach to food. This approach, recommended by both the World Health Organisation and the UK’s School Food Plan, promotes nutritious food across the school day - from the classroom to the dining room- while engaging the whole school community.
Delivered over five years in Southwark and Lambeth, the programme evolved iteratively, shaped by continuous feedback from both the evaluation and the frontline team.
We will focus on how adaptive evaluation can:
- Rise to the challenge of evaluating a programme with no fixed delivery model at the outset
- Fully explore the nuance which makes evaluation learning more practically applicable to a broader range of audiences
- Strengthen relationships between evaluation and frontline teams, and magnify the value of an iterative process
- Help longer-term programmes adapt to changing policy climates
We share the strategies that helped build strong relationships between the evaluation and delivery teams and how we supported the delivery team to work iteratively and reflectively.
This session will showcase how the adaptive approach shaped not only programme delivery but also future iterations of the work, including new strands of the programme in secondary and special schools. It will also demonstrate how this approach supported School Food Matters’ wider policy and campaigning work around improving school food, including the government’s roll out of universal breakfast provision.
Paper short abstract
This workshop will introduce EvalC3, a free open-source online tool designed to help evaluators model and test multiple causal configurations using cross-case data, supported by within-case inquiries.
Paper long abstract
Evaluating complex interventions presents unique challenges, particularly when it comes to understanding causal pathways and communicating findings in ways that support learning and action.
This workshop will introduce EvalC3, a free open-source online tool designed to help evaluators model and test multiple causal configurations using cross-case data, supported by within-case inquiries.
Participants will explore how EvalC3 can be used to identify, explore and visualise plausible pathways to change. The session will demonstrate how EvalC3, together with collective sense-making, and within-case examples, helps evaluators and practitioners embrace complexity in their analysis to inform ongoing decision making.
The workshop will include a live interactive demonstration of the software. This will be further illuminated with a real-world example of how it has been used in an ongoing evaluation of the Sport England’s investment into place-based systemic approaches to tackle physical activity inequalities. The example will illuminate the steps to utilising EvalC3 by explaining the circumstances in which local actions have encouraged communities to lead on initiatives which support them to be physically active.
This session is ideal for evaluators, researchers, and practitioners working in complex systems who are seeking practical tools to strengthen their evaluation practice and better support change. No prior experience with EvalC3 is required.
Paper short abstract
We assess whether evaluating the construction phase of major scientific infrastructure is worthwhile. Using the National Satellite Test Facility, we show early spillovers to UK firms, offering policy-relevant insights for innovation and industrial capability building.
Paper long abstract
Major research infrastructures represent cornerstone public investments intended to enhance national innovation capacity, stimulate industry engagement, and attract global R&D. Yet evaluation often begins only once facilities become operational. This study asks whether evaluating the construction phase itself can provide policy-relevant evidence on early impacts and capability building.
The case examined is the National Satellite Test Facility (NSTF), a £100 million ISCF-funded investment completed in 2023 to provide nationally accessible satellite and payload testing capabilities. The NSTF enables UK firms of all sizes to compete internationally by offering co-located, world-class testing environments at a single site.
In line with the UKRI Industrial Strategy Challenge Fund (ISCF) objectives, to increase R&D investment, multidisciplinary research, collaboration, and overseas investment, the NSTF evaluation framework was designed to capture both construction and operational impacts. The construction-phase evaluation examined:
Direct, indirect and induced economic impacts
Knowledge and skill development within the UK supply chain
Market advantage and learning among contractors
New jobs, collaborations, and technological progress
Procurement impacts and UK content
Public awareness and outreach benefits
Drawing on interviews with contractors and suppliers, we identify early, measurable spillovers arising from highly technical and specialised construction activities. Firms reported that involvement in NSTF led directly to new technical competencies, enhanced reputations, and follow-on contracts in the UK and overseas. These findings reflect well-established evidence from international infrastructure evaluations (e.g., Florio et al., 2018; CERN studies) showing how participation in scientific construction projects stimulates industrial learning and productivity gains.
The analysis demonstrates that construction-phase evaluation is not merely about cost tracking—it can illuminate pathways of innovation diffusion and capability growth that inform policy and programme design. Early identification of spillover effects provides actionable intelligence for policymakers on how large-scale capital projects contribute to national R&D capacity, supply-chain resilience, and skills development long before the facility becomes operational.
This presentation will:
Outline the NSTF’s role within the UK space and innovation ecosystem;
Present the evaluation design applied to the construction phase;
Discuss empirical evidence of short-term impacts and knowledge spillovers; and
Reflect on implications for policy and evaluation practice, particularly for public investment in scientific infrastructure.
By demonstrating the policy value of early-phase evaluation, this work contributes to Theme 1 of the UK Evaluation Society Conference, showing how evidence from infrastructure construction can directly inform future investment decisions, strengthen industrial strategy, and embed evaluation across the full lifecycle of major R&D programmes.
Paper short abstract
Sport England wants more people play sport and be active. Join Ipsos, NPC, Sport England and representatives from their ‘System Partners’ to learn how a groundbreaking evaluation is helping drive Sport England’s investment into 137 organisations across the sport and physical activity sector.
Paper long abstract
Sport England’s 'Uniting the Movement' strategy exemplifies a transformative approach to address inequalities in sport and physical activity by investing in over 137 'System Partners'. This bold investment approach is designed to catalyse system change over the long term and on a broad scale. Here, evaluation is not just a measure of progress, but a dynamic process that drives action.
Ipsos will present our 'Learning & Knowledge Exchange' model, prioritising timely, utilisation-focused insights that are shared through clear reporting and visual storytelling. This model supports partners' ability to swiftly adapt based on insights, translating complex evaluations into actionable strategies.
NPC will share more about their Capability & Capacity building offer, which ensures that partners develop confidence around evaluation and learning techniques, are empowered to implement them effectively, and are supported to understand systems change. This offer is about building a shared understanding that transforms evaluation findings into practical applications.
Representatives from System Partner organisations will bring valuable insights into how integrating the 'Learning & Knowledge Exchange' model with Capability & Capacity support fosters actionable change. Their testimonies will highlight how evaluations have been pivotal in refining their approach and realising strategic goals.
Concluding the presentation, Sport England will underscore the essential role of this approach to drive change alongside their Theory of Change. The holistic integration of evaluation, learning, and action exemplifies a sustainable, impactful approach to achieving system change towards the Uniting the Movement vision.
Please note that a separate abstract from Kev Harris (Hartpury University) has been submitted relating to the National Evaluation and Learning Partnership, a separate but overlapping Sport England investment into ‘Place’. We are in touch with Kev and the team and would be delighted to work with them to ensure that, if selected, our presentations complement one another.
Paper short abstract
A case study about embedding participatory and theory-based evaluation in a research team. The example used contribution analysis and co-produced tools to build an evaluation culture in a research team to demonstrate value while trying to avoid M&E becoming a tick-box activity.
Paper long abstract
As institutional budgets tighten and non-income-generating activities face increasing scrutiny, evaluation has become a crucial means not only of improving programmes and projects but also of demonstrating their value and relevance. This paper explores how evaluation practices were introduced and embedded within a research team, referred to here as English Language Research, which had not previously been expected to evidence the impact or value of its work in such a systematic way. Introducing evaluation in this context required a sensitive and participatory approach that recognised both the autonomy of research practice and the need for accountability.
The paper presents a case study of an evaluator joining the team as a member of staff to establish monitoring and evaluation (M&E) practices that were both practical and theoretically informed. Drawing on participatory and co-production principles, the approach aimed to integrate evaluation into the team’s existing culture of inquiry, positioning it as a tool for learning and reflection rather than as an external audit mechanism. The process sought not only to demonstrate outcomes but also offer learning opportunities for the team.
Evaluating the impact of a research team working across a large and complex organisation presented distinct challenges. Activities such as dissemination, relationship-building, and collaboration often contributed indirectly to outcomes, making attribution difficult. Furthermore, there was initial concern that introducing a monitoring culture might reduce research activity to a ‘tick-box exercise’ or fail to recognise the value of exploratory, developmental work.
To address these challenges, a participatory evaluation framework was developed, engaging the team at every stage. Evaluation tools were co-created and refined through consultation, including the use of collaborative digital platforms (such as Padlet) that allowed members to share feedback, build collective insights and co-construct an evolving picture of outcomes. Regular team meetings were used to share findings and invite reflection on the M&E process, embedding evaluation within the team’s ongoing practices rather than positioning it as a separate requirement.
Alongside this participatory process, a contribution analysis approach was applied to explore the team’s influence within the wider organisational system. Contribution analysis provides a structured, theory-informed method for testing whether the evidence reasonably supports a hypothesised chain of outcomes. Combined with a collaboratively developed theory of change, this enabled the team to articulate how their research, dissemination and partnership activities contributed to longer-term institutional outcomes, even where direct attribution was not possible. However, aligning anecdotal and qualitative insights with structured evidence remains an area for further development. The next stage will involve developing case studies to explore how different areas of contribution interconnect within a broader picture of institutional impact.
The paper concludes by reflecting on how participatory and theory-based approaches can be combined to build evaluative capacity, foster ownership of evidence and support research teams to demonstrate value in ways that are meaningful, proportionate, and aligned with academic practice.
Paper short abstract
This presentation explores the equity and power dynamics in co-creating a Theory of Change for a systems change programme. It shares how evaluations can navigate equity-related challenges while embedding reflection, inclusion and usability in participatory evaluation practice.
Paper long abstract
Participatory approaches have been increasingly promoted as ways to ensure diverse voices are heard in evaluation and to build a shared understanding across stakeholders. Yet in practice, they usually present evaluators with challenges and difficult trade-offs - balancing divergent perspectives and reconciling conflicting priorities while still delivering evaluation responsibilities within resource constraints.
This presentation reflects on what this “equity knot” (Gates et al. 2024) looks like in practice, drawing on our experience of co-designing of a Theory of Change (ToC) for a place-based systems change programme aiming to improve education, employment, and training (EET) outcomes for South Asian young people, particularly those from Pakistani and Bangladeshi backgrounds who face persistent barriers to good-quality work.
As action researchers embedded in the programme, we worked closely with a wide range of stakeholders, including young people, youth ambassadors, local partners, employers, communities, and funders, to co-create a ToC that valued and reflected multiple perspectives. While the process sought to ensure transparency and equity, it surfaced significant tensions around which forms of knowledge (e.g. practitioner, lived, research) most strongly shaped the ToC, whose ideas were prioritised or left out, and how divergent views on success could be brought together without losing evaluation focus or feasibility.
We documented these tensions through keeping a reflective learning log, tracking key decision-making points, rationales, trade-offs throughout the development of the ToC. This helped embed evaluative thinking and reflection into the programme’s ongoing learning and delivery to address systemic barriers influencing local youth employment.
Through the presentation, we will invite the audience to reflect on the “equity knots” in their own evaluation practice and to see equity not as an endpoint or a tick-box item, but as a continuous process of negotiation, reflection, and adaptation – an integral part of embedding evaluative learning in complex systems change initiatives.
Paper short abstract
This poster highlights a case study from public engagement with research at festivals to demonstrate how evaluation findings can be shared to increase understanding and drive practical action.
Paper long abstract
This poster highlights a case study from public engagement with research at festivals to demonstrate how evaluation findings can be shared to increase understanding and drive practical action in the university sector and beyond.
Delivering interactive festival activities is a popular way for researchers to engage with a wide range of people. Many researchers and engagement professionals believe there is great value in such engagement, not just for festival-goers, but for researchers too. However, there is little published data backing this up. This case study highlights the benefits, challenges, and learning gained from 8 years of the FUTURES Festival. Our extensive evaluation evidence identifies the ingredients and support which ensures researchers can deliver high-quality public engagement; demonstrates the skills they gain from taking research to festivals; and suggests how to articulate, enable and reward researchers’ development.
High-quality public engagement helps researchers to understand, increase, and demonstrate the impact of their research outside of universities. It encourages academics to involve the public in their work, and consequently creates a more accessible research culture, generating meaningful, impactful research. Festivals in particular help researchers engage with communities of place and interest, inclusively sharing and situating their research with relevance, democratising knowledge and including diverse voices. Our soon-to-be-published journal article shows how researchers and engagement professionals can use this evidence themselves when organising festival-style events or advocating for them in their own institutions, to ensure maximum value for all concerned.
The FUTURES Festival of Discovery has been bringing research to life across the South West of England since 2018. The extensive free programme of public events exploring the worlds of science, culture and research has been funded by UKRI and the European Commission and delivered by a consortium of the Universities of Bath, Bath Spa, Bristol, Exeter and Plymouth. For more details see: https://futuresnight.co.uk/about/
Paper short abstract
A story of how a culture of continuous reflection and regular evaluation can embed multiple principles in research and related projects in ways which maximise their synergy and tackle their tension.
Paper long abstract
We report on an integrated framework for embedding principles of sustainability, inclusion and co-production in research and related practices. Applicable to policy, practice and academic settings, the ACCESS Guiding Principles Framework fosters a culture of continuous reflection and evaluation, creating pathways for learning and change.
There are increasing pressures on researchers working in all settings to explicitly underpin their professional practice with fundamental principles relating to people and the environment. Some of these pressures are internal – driven by our own values. Others are external – expected by our institutions, funders or partners.
The ACCESS Network*, which foregrounds the critical importance of social sciences for addressing environmental challenges, set out in 2022 to underpin all of its work with three fundamental principles: environmental sustainability (ES), equality, diversity and inclusion (EDI) and knowledge co-production (KCP). However, the team quickly realised these principles bump into one another, moving dynamically between synergy and tension in different contexts, and requiring frequent deliberation. This was particularly evident because ACCESS has such a wide remit encompassing delivery of training and networking events, flexible fund management and policy-facing environmental social science research.
To address this, the separate principles were reimagined as an integrated framework within which users are encouraged to continuously reflect on how sustainability, inclusion and co-production intersect. In this approach, which recognises the dynamic and context-specific nature of value-driven research and related work, reflective prompts replace fixed rules as the key tools for practice. And, where possible, partners and participants are invited to join the conversation through informal or formal channels. This supports open, evidence-based and thoughtful decision-making, rendering any trade-offs, compromises, prioritisations or innovations amongst the principles conscious and visible.
An overarching evaluation of the Guiding Principles Framework, based on 26 interviews with users and workshops with 65 members of the wider ACCESS Network, has uncovered stories of unexpected synergies between principles, as well as sticky situations where principles have seemed impossible to reconcile. In both of these circumstances, users of the Framework highlight the value of continuous reflection and regular evaluation. These practices create spaces and evidence for transparent and shared deliberation with partners and participants about what works and what can be improved, paving the way for sustainable, inclusive and co-productive change.
Paper short abstract
What happens when organisations embrace a “learning first” approach in place of results-oriented programming? In this session, Westminster Foundation for Democracy (WFD) will share its “hypothesis testing” approach – a real-time learning framework ideally suited to working with uncertainty.
Paper long abstract
Digital technologies are reshaping how societies communicate, govern, and engage with power – but democratic actors are often playing catch-up. The risks are clear: unchecked misinformation, exclusion by design, manipulative AI use, and civic spaces under pressure. But the opportunities are real too: stronger political inclusion, more responsive institutions, and new tools for accountability.
But realising these benefits is hard – especially when evidence of “what works” might be hard to find. So how do we learn what actually works? And how do we do that before scaling up unproven ideas?
This session explores what happens when a democracy support organisation embraces a “learning first” approach in place of traditional results-oriented programming. Join Westminster Foundation for Democracy (WFD) to learn about hypothesis testing – a practical learning framework for generating real-time programme- and portfolio-level insights when working with high levels of uncertainty
WFD’s “Democratic Resilience in a Digital World” programme was a one-year pilot programme designed not to deliver big results, but to generate lessons that WFD – and the wider democracy community – could use. It included:
1. Testing digital tools and interventions through pilot projects in Kenya, Bosnia & Herzegovina, and Sri Lanka;
2. Real-time learning through structured knowledge exchange and reflection between pilot projects and other organisational work on digital democracy;
3. Purposeful research to build WFD’s evidence base on promising digital approaches for democracy support.
We’ll share more about the key hypotheses and questions that underpinned the programme’s learning approach: Can digital tools support more inclusive governance? Can AI tools be used effectively to enhance public participation? What combination of human and AI inputs does it take to build a public interest Wiki on election candidates?
We’ll share how WFD’s hypothesis testing approach helped to surface honest insights to these questions in complex, fast-moving contexts – and what that means for others trying to deliver quality programming in the face of high levels of uncertainty. You’ll hear about what worked, what didn’t, and how intentional learning created space for more adaptive, resilient programming. We’ll also share details of how this approach helped to generate relevant portfolio-level learning to help inform the design of future programmes.
This session is especially relevant for:
• Programme managers and implementers seeking ways to generate practical and useful evidence of what’s working, what’s not, and why
• Grant managers seeking learning frameworks capable of delivering relevant programme- and portfolio-level insights
• Civic tech or democracy support innovators looking to better understand change
• Policymakers and donors looking for adaptive, evidence-driven approaches
• Researchers and technologists interested in co-creating with democracy actors
• Anyone seeking smarter, more humble ways to navigate digital transformation
Come ready to challenge assumptions, ask questions, and take away practical ideas to apply in your own work.
For more information, please see:
Alex Scales, Seyi Akiwowo, Adrienne Joy and Charlotte Egan, 2025. Using digital technology for democratic resilience, transformation and impact – learning paper. Westminster Foundation for Democracy. June 2025. Available online here: https://www.wfd.org/what-we-do/resources/using-digital-technology-democratic-resilience-transformation-and-impact
Paper short abstract
We piloted a collaborative qualitative approach to explore student experience of a new complex master's unit's first year of delivery to identify strengths and areas for improvement. We supported students to contribute to data analysis, who surfaced unexpected insights for the next unit iteration.
Paper long abstract
Introduction:
Fashion Practices for Social Change is an elective unit embedded within the Master's programme at London College of Fashion, UAL. Designed by Unit Leader, Dr Mazzarella, the unit delivery entails a combination of taught lectures, seminars and group project work responding to live creative briefs set by external partners. Students are asked to consider key principles and concepts relating to climate, racial, and social justice, and embed relevant practices into their work.
Evaluation approach:
In the academic year 2024-2025, we piloted a collaborative evaluation approach to explore how students experienced the unit during its first year of delivery. We drew on Ward et al.’s (2021) work on embedded research in health care settings to pilot a collaborative approach to embedding a culture of evaluation in universities delivering creative education. Our key aim in this evaluation was to identify strengths and weaknesses in the content and delivery of the unit to provide timely constructive feedback that would enable effective improvements in the next academic cycle and beyond. Through conversations between the Evaluation Lead (Dr Thompson) and the Unit Leader, we agreed on a qualitative approach that would sit alongside the delivery of the unit while causing minimal disruption. The core methodology involved a combination of some observations, with one-to-one semi-structured interviews with staff, students and external partners who were involved in setting live creative briefs.
Key adaptations:
The crucial adaptations within our approach concerned data analysis and feedback. We employed a small team of UAL students who had held student advocate roles relating to climate, racial and social justice to contribute to the analysis and interpretation of the data from the interviews. They did this through a guided series of thematic analysis meetings supported by the Evaluation Lead. This analysis was then curated onto a Miro board and fed back to the Unit Leader via a one-to-one meeting at the point in which he began planning for the next academic cycle, with later meetings scheduled with other members of the delivery team. The Unit Leader and Evaluation Lead agreed on an action plan for writing up the findings, in which the latter would write a first draft, and the former would then layer in his team’s reflections and correct key details around unit development and delivery through iterative discussions.
Key learnings:
• Student researchers identified several themes as being important to student experience that did not come to the immediate attention of the Evaluation Lead, enriching the analysis and feedback.
• The timing of feedback allowed the Unit Leader to have access to, and reflect on, the key messages in time to embed relevant changes into the curriculum for the planned unit delivery in the next academic year.
• This collaborative approach, that centred staff and student voice, was felt to be supportive and constructive.
Conclusion:
This unit and collaborative evaluation demonstrate how staff and student voice can be embedded in the curriculum and support student learning, through producing timely and constructive feedback resulting in effective iterative curriculum change.
Paper short abstract
Traditional dissemination often lacks accountability and follow-through. Evidence-to-Action workshops use co-production to create evidence-based actions to improve interventions. We use a grassroots sports case study to demonstrate how these workshops drive accountability in evidence-based policy.
Paper long abstract
Effective dissemination moves beyond typical one-way presentations and instead uses active engagement with stakeholders. Evidence-to-Action workshops aim to bridge the gap between evaluation findings and practical action. These workshops provide an opportunity to discuss key evaluation findings, explore their implications, and co-produce action-orientated recommendations with stakeholders to improve policy and practice.
Whilst traditional dissemination, presentation followed by Q&A, can raise awareness of evaluation evidence, it often fails to secure ownership or follow-through for improving policy and practice - with findings often 'sitting on the shelf'. Failing to change policy and practice using evidence does not achieve true value for money. Studies suggest limited effectiveness of passive methods on their own, and point to greater impact where activities involve two-way engagement. Active dissemination using co-production delivers practical benefits. For example, more efficient translation of evidence into actions, stronger stakeholder buy-in and accountability for using evidence, identification of context-specific adaptations, and an improved culture for making evidence-based decisions. Evidence-to-Action workshops increase the likelihood that findings will be operationalised rather than archived.
After an evidence-based summary of the evaluation, participants break into facilitated, mixed-stakeholder groups to test the implications of key findings for policy and delivery, identify barriers and enablers, and co-produce time-bound, evidence-based recommendations that stakeholders can integrate into decision-making. The outcome is a short implementation plan with assigned action leads and deadlines.
We will use DCMS' Multi-Sport Grassroots Facilities (MSGF) programme as a case-study to demonstrate how Evidence-to-Action workshops can actively translate findings into policy and practice. The MSGF programme allocates funding for the improvement of multi-sport grassroots facilities across the four Home Nations. This aims to boost activity levels and sports participation amongst local communities. The programme focuses on delivering projects in areas where there are under-represented groups and higher levels of deprivation to ensure physical activity is accessible to all, no matter background or location. We will discuss our lessons learned from delivering an Evidence-to-Action workshop with key programme stakeholders, highlighting how this has resulted in tangible improvements to programme delivery and offering reflections on how this approach could be applied elsewhere in DCMS and across government.
Evidence-to-Action workshops aim not merely to inform, but to catalyse change - turning evaluation findings into owned, implementable policy and practice.
Paper short abstract
This session shares learning from a developmental evaluation of the £2m Tech for Better Care Programme, showing how theory-based approaches and real-time evidence informed programme adaptation, strengthened local evaluation capacity, and advanced innovative evaluation practice.
Paper long abstract
This presentation will share learning on the role of a developmental evaluation approach in informing programmatic changes and decision-making within the Tech for Better Care Programme.
The Tech for Better Care Programme is a £2 million innovation programme exploring the potential for using digital technology to enable proactive and relational care at home and in the community. The programme adopted a ‘test and adapt’ approach, whereby funding was provided to teams develop, test and pilot innovation approaches to tech-enabled service change between October 2023 and March 2026. This was an innovative programme design developed at the Health Foundation, which positioned evidence-based iteration at the core of its way of working.
During the programme, a process and impact evaluation was undertaken to capture the programme process and experience, as well as the impact of the local interventions implemented. Specifically, a developmental evaluation was chosen to enable iterative development of the funded programme and local interventions in real-time using evaluation evidence and learning. This was underpinned by the Contribution Analysis theory-based evaluation methodology, which focused on testing the validity of and strength of support for eight core programme hypotheses in the Theory of Change. Data triangulation was also a characteristic of the evaluation methodology, with local project impact and learning data (e.g., on user experience and outcomes) combined with workshop and interview data collected at the programme level to generate findings. Evaluation activities also involved working closely with local implementing teams who were conducting local evaluations to feed into decision-making at the intervention and programme level. Thus, the evaluation also sought to directly encourage the development of evaluation practice and evidence use in local teams.
Our presentation will begin with a concise outline of the Tech for Better Care Programme, including its Theory of Change and evaluation approach. Thereafter, we will focus on the key learning obtained by the programme funder, evaluation team and local teams on the programme. This will allow attendees to learn about:
- How to apply theory-based evaluation approaches to support iterative developmental programmes;
- The role of different programme actors in effectively using evaluation to bring about programme change;
- The opportunities and challenges inherent in an iterative developmental programme;
- Practical tips for effectively embedding evaluation at different levels of decision-making.
Therefore, the session offers broad appeal to the evaluation community, but most notably to those interested in developmental evaluation, contribution analysis, and the use of evaluation and evidence in the development of digital intervention in healthcare.
Paper short abstract
How do teams become places where evidence is valued and used? Drawing on Pause & Reflect practice across 20+ humanitarian and development programs, this session shares practical lessons on the behaviours, conditions, and relationships that build real evaluation cultures.
Paper long abstract
What truly enables teams to value and use evidence in their everyday decisions? In fast-paced humanitarian and development settings, MEL systems often generate data, yet the cultural, relational, and organisational conditions required for evidence use are far less understood. This session offers practice-based insights from Mercy Corps’ experience designing and facilitating structured Pause & Reflect processes across more than twenty programs in Africa, the Middle East, and Asia—spanning emergency food security, cash assistance, protection, resilience, and market systems development.
Rather than framing evidence use as a technical gap, this work positions it as a cultural one. Through cross-functional reflection sessions—supported by learning questions, participatory dialogue, consolidated data sets, and SOAR analysis—teams begin to establish the norms, habits, and relationships that allow evidence to inform everyday decisions. While the USAID-funded Pause & Reflect toolkit provides a helpful structure, this session focuses on what enables the approach to work rather than on the tool itself.
Three insights consistently emerge across humanitarian and development programmes.
First, evidence is used when teams have protected spaces for sensemaking. Staff in emergency responses often move from one urgent priority to the next, with little room to interpret data collectively. When teams pause—away from immediate delivery pressures—they can identify trends, challenge assumptions, reflect on participant feedback, and generate shared interpretations. This strengthens both learning and decision ownership.
Second, evidence use increases when power dynamics are intentionally disrupted. In many teams, hierarchical routines shape whose interpretation is accepted and whose evidence counts. Creating inclusive, participatory spaces where diverse staff voices, local partners, and community insights are elevated proved essential. This redistribution of interpretive authority strengthens localisation and builds environments where evidence is collectively valued.
Third, evidence becomes actionable when learning is tied to feasible adaptation. Teams engaged more deeply with evidence when reflections led to clear next steps—adjusting transfer values, refining accountability mechanisms, improving market monitoring tools, or strengthening targeting approaches. Learning that remains abstract rarely shifts behaviour; learning that leads to adaptation does.
The session will also highlight challenges: building psychological safety in politically sensitive environments, addressing imperfect or fragmented data, sustaining learning amidst staff turnover, and balancing structured reflection with delivery demands. Examples will illustrate how similar enabling conditions—shared purpose, inclusive dialogue, and structured reflection—support evidence use across both humanitarian and development contexts.
Participants will leave with a nuanced understanding of what helps create environments where evidence is genuinely valued: collective reflection rituals, inclusive sensemaking, reduced hierarchy in evidence interpretation, and practical links to action. This session offers evaluators, practitioners, and programme leaders insights for embedding evaluation into everyday work, regardless of context.
Paper short abstract
We will introduce two digital tools recently developed by the Centre for Transforming Access and Outcomes in Higher Education (TASO) and share how the Theory of Change Builder and the Higher Education Evaluation Library support higher education institutions to embed evaluation.
Paper long abstract
In the UK, inequalities persist between who accesses, succeeds at, and successfully progresses from higher education. Higher education institutions run a variety of interventions aimed at addressing these inequalities, often targeted at people from socioeconomic backgrounds that are underrepresented in higher education. These interventions range from information sessions on applying to university, to wellbeing and academic skills support provision once at university, to career guidance supporting the progression from university into employment or further study. As a government What Works Centre, our role at the Centre for Transforming Access and Outcomes in Higher Education (TASO) is to support efforts to evaluate the impact and implementation of these interventions. We do this by commissioning evaluations and supporting the higher education sector to run their own evaluations, in line with requirements set by the higher education regulator.
A key driver of higher education institutions evaluating their own interventions is to embed a culture of evaluation across teams and departments to ensure that the effectiveness of all interventions is assessed and this evidence is used to inform practice. Similarly, across the higher education sector, we encourage sharing evaluation findings to collectively develop a better understanding of what works to address inequalities in higher education. However, in practice, the individuals tasked with evaluation often lack adequate resources to evaluate interventions and make evidence-informed decisions.
In response to these challenges TASO has developed two freely available digital tools. These tools support two key points in the evaluation process: developing a theory of change (ToC) and disseminating evaluation findings.
The ToC Builder is an online tool that walks the user through creating a ToC for their intervention. It supports those with little prior evaluation experience by including guidance and examples from the higher education sector at each step of the way. The tool produces a ready-to-export ToC in diagram and narrative format, compliant with accessibility standards.
The Higher Education Evaluation Library (HEEL, launching in spring 2026) is a freely accessible searchable database of evaluations focused on interventions supporting access, success and progression in higher education. The HEEL will support knowledge exchange, foster collaboration, and support the dissemination of evaluation evidence on what works to reduce inequalities in higher education. It will also help identify trends and gaps in evaluation practice across the sector.
Both digital tools build on existing TASO resources, including a framework for coding interventions, and make planning evaluations and reporting findings more interactive and accessible to non-specialists. In this session, we will introduce the ToC Builder and the HEEL, including how we developed them with input from prospective users, and how they embed evaluation in higher education providers. We will explore both the practical and ideological aspects of developing digital tools for evaluation, and how digital tools can be used to expand evaluation capacity and impact in response to sector needs.
Paper short abstract
‘Problems’ do not exist independently of our knowledge of them, but instead take shape through our efforts to study and evaluate them. This poster presents findings from a qualitative study that used visual metaphor to explore the implications of this epistemic insight for evaluation practice.
Paper long abstract
Despite significant research, intervention, and evaluation to inform policy and practice responses to health inequalities, improvements have been slow to follow, with some metrics even suggesting that health inequalities are widening. While the reasons for this are multiple and complex, there is increasing recognition that the ways in which inequalities get framed for action might be contributing to the challenge.
Scholarly insights from fields such as the sociology of social problems, and from novel Foucauldian-inspired approaches to policy and discourse analysis, have demonstrated the importance of attending to the forces that put shape on complex problems in policy and practice, and how they can open up or close down the scope of possibility for action.
Inspired by these insights and their practical application in the fields of health, early childhood, and youth justice, I created a resource of visual metaphors that is designed to assist practitioners in undertaking this form of critical analysis, and to reflect on how dominant approaches to evaluation may inadvertently lead to inequalities being framed in narrow and limiting ways. I undertook extensive engagement and data collection with people working to implement and evaluate action on inequalities across the health system in England to explore their perspectives on the role and value of this kind of creative resource in their work.
On the whole, health system actors were positive about the visual resource and appreciated the ways in which it distilled down complex and often abstract or theoretical ideas into a digestible narrative with supporting imagery in the form of visual metaphors. They offered constructive critique on aspects of the resource that could be further developed and clarified. However, they also expressed a degree of pessimism about the extent to which institutions can be reshaped and felt additional tools and resources would be required to help operationalise the insights presented. While the booklet does offer a useful tool for collective reflection and dialogue, more prescriptive guidance on the ‘how’ of realising deep institutional change to engage with and value alternative approaches to evaluation is needed.
Paper short abstract
Most evaluations end with a report. The best evaluations end with action. Understanding Attendance worked with 400 schools and 300,000 pupils to discover what bridges that gap. This session shares four transferable strategies for designing evaluations where action is built in from the start.
Paper long abstract
Most evaluations end with a report. The best evaluations end with action. This presentation explores what sits in between—the deliberate infrastructure needed to turn findings into change at scale.
When almost 20% of pupils in England are persistently absent, schools desperately need insights they can act on—not just more data to monitor. This presentation shares lessons from Understanding Attendance, a national action research project led by ImpactEd Evaluation spanning over 400 schools,10,000 parents, 300,000 pupils and three academic years, demonstrating how evaluation can be designed for action from the outset.
Traditional attendance evaluation tracks and compares absence rates between schools or pupil groups. Understanding Attendance took a different approach: what if we evaluated social, emotional and behavioural factors schools can actually influence? Utilising existing validated measures, our own data and learning, and working with school leaders, we designed a diagnostic exploring sense of belonging, relationships, attitudes towards attendance, and practicalities such as routine and sleep, alongside attendance data. The question wasn't just "who's absent?" but "what's driving absence for pupils in your specific context, and what can you do about it?"
This presentation shares four critical innovations that helped bridge evaluation into action, with practical implications for evaluators across sectors.
First: Make benchmarking meaningful. Our initial national benchmarks seemed helpful, but context-sensitive comparison—by time of year, pupil characteristics, and attendance distribution—dramatically increased actionability. Evaluators will explore how granular benchmarking makes comparative data genuinely relevant.
Second: Align timings of findings and decision-making. Rather than end-of-year reports, we built iterative data windows aligned with schools' natural planning cycles with automated reporting enabling quick turnaround. Autumn insights inform spring interventions; summer data shapes next year's strategy. Building stakeholder decision cycles into evaluation design from the start increases genuine use of findings.
Third: Create spaces for peer learning, not just individual reports. Our work at Trust level, as well as half-termly community webinars and research insight sessions bring schools together to explore emerging findings, hear from sector speakers, and discuss challenges with peers. When one school shares how they're building belonging, fifty others gain practical ideas. Evaluators can play a convening role, not just a reporting role—creating communities where stakeholders learn together rather than reading alone.
Fourth: Differentiate insights for different users. Senior leaders need strategic overview; attendance leads need diagnostic detail; SENDCos and PP-leads need subgroup-specific benchmarking; classroom teachers need pupil-level insights. Hear how we created layered reporting for multiple audiences from a single dataset, ensuring findings reach beyond the commissioning stakeholder.
The presentation also addresses important tensions: balancing rigorous methodology with accessible reporting for non-technical audiences; supporting individual schools while maintaining research integrity across the cohort; and sustaining engagement when findings reveal uncomfortable truths about systemic barriers schools cannot easily address.
Attendees will leave with practical strategies for designing evaluations that bridge insight to action, whether working with schools, charities, government, or other complex environments. The presentation draws on case studies showing successes - interventions directly shaped by diagnostic findings - and honest reflections on where the evaluation-to-action bridge still needs strengthening.
Paper short abstract
A visual approach to evaluation of England’s marine plans, which uses radial diagrams built of evaluation triangles, to communicate progress and interactions across policy areas, making marine plan monitoring and evaluation more accessible, engaging, and actionable for planners and policymakers
Paper long abstract
Marine plans are a central part of how England manages the sustainable use of its seas, balancing environmental, economic and social priorities across different marine sectors. Each plan contains a set of policies and objectives designed to guide decision-making and deliver multiple outcomes; from supporting blue growth to protecting marine ecosystems and enhancing community wellbeing. To assess their effectiveness, the Marine Management Organisation (MMO) monitors data on policy implementation and environmental and socio-economic indicators. However, this data is not yet systematically evaluated or presented in an accessible way, limiting understanding of whether plan policies are achieving their intended objectives, and how progress in one policy area may influence outcomes in others.
To address this challenge, ICF developed a contribution analysis framework for marine plans through an MMO-commissioned project, providing a structured way to assess how plan policies contribute to outcomes across the complex marine and coastal system. Building on this, a joint ICF/MMO CECAN Fellowship research project has been exploring how to organise marine plan monitoring data to enable more systems-based evaluation and more effective communication of findings. Central to this work is the development of a visual approach that helps represent the progress of policies and objectives in a clear, engaging, and holistic way.
Inspired by established visual frameworks such as Planetary Boundaries (Rockström et al., 2009) and Doughnut Economics (Raworth, 2018), radial diagrams built up of evaluation triangles show how each marine plan policy is performing relative to its intended outcomes and acceptable system limits. This visualisation makes it easier for planners, policymakers, and stakeholders to understand how progress is distributed across different outcomes, where synergies or trade-offs may exist, and which areas may require adaptive management.
This approach directly supports the conference theme “Communicating Evaluation for Action” by translating complex evaluation findings into intuitive visual narratives that promote shared understanding and dialogue. The evaluation triangles and radial diagrams are scalable, meaning they can be applied at the level of individual policies, plan objectives, or even across multiple marine plans. This scalability enhances the accessibility of evaluation findings, making contribution analysis more transparent, easier to interpret, and more actionable for policymakers and delivery partners.
In the joint presentation, delivered by Dr Rachel Holtby (ICF) and Victor Owoyomi (MMO), we will share: the methodological steps for linking monitoring data to the visual triangles; examples of how visual tools help clarify progress and interdependencies across marine policy areas; reflections on how the approach facilitates faster analysis and more engaging communication; and insights on how this technique could be adapted to other policy domains facing similar systems challenges.
Ultimately, this work aims to stimulate discussion around how evaluators can use visual tools to communicate complexity effectively, promote adaptive learning, and strengthen collaboration between evaluators, analysts, and decision-makers. By demonstrating the potential of evaluation triangles, we invite participants to consider how similar techniques might enhance evaluation reporting and action across diverse policy areas.
Paper short abstract
This study outlines an innovative qualitative approach using 'Stories of Change' to evaluate the learning of PLACE partnerships in implementing a whole systems approach.
Paper long abstract
Over the past four years the National Evaluation and Learning Partnership has supported place partnerships across England to evaluate their whole systems approach to tackling physical inactivity. This process often involves a change in leadership strategy, grappling with complexity, and advocating for how and why changes occur, which can encourage organisational changes. Place Partnership Leaders (PPLs) must often navigate personal and organisational barriers such as staff adapting to new responsibilities and willingness to change. This study explores what and how PPL’s learn about evaluating their work in a whole systems environment, using hindsight to write a Story of Change as a ‘Letter To Your Younger Self.’ We invited 10 PPL to write their story, providing some guiding prompts such as “How did you promote the new vision? How did you try to motivate partners (staff, partners, and residents) to adopt/react to the new approach? These letters were then followed up with semi-structured interviews. We analysed the data using a dialogical narrative analysis informed by realist theories of change underpinned by the transformational and servanthood leadership concepts. Our results identify composite theory-driven narrative profiles (i.e., how and why they made decisions and what approach worked and why) characterised by applied suggestions. These profiles could be used as a reflective learning tool for emergent PPLs to effectively implement a whole systems approach.
Paper short abstract
The session shows how an internal evaluation model delivers timely and iterative insights to improve a SEL based sports program for children, using collaborative sensemaking and tailored communication. It highlights ways to reach diverse audiences and make findings truly actionable.
Paper long abstract
At Fundación Luksic, the Evaluation Department has developed an internal evaluation approach inspired by utilisation-focused principles, aiming to produce insights that are practical, timely, and genuinely useful for program improvement. Our evaluation cycle includes design, implementation, results, and—when feasible—impact assessments, each offering evidence at key stages of a program’s development. These evaluations are conducted collaboratively with the Foundation’s implementation teams, fostering shared learning and strengthening the translation of evidence into concrete improvements.
We draw on the case of a Socio Emotional Learning (SEL) based sports program that has been running for the past two years to illustrate how an internal evaluation model can generate early and iterative insights, enabling informed decision-making.
The program seeks to nurture social and emotional skills in children aged 6 to 13 through formative after-school sports workshops inspired by the SEL framework, complemented by a positive parenting program for caregivers.
Since the program’s inception, the Evaluation Department has supported implementation through a series of implementation and results evaluations. During its first two years, these assessments informed several important design and delivery decisions. Data collection combined different methods—interviews, direct observation, surveys, and analysis of administrative records—applied to different actors, including sport instructors, caregivers, children, and program staff. We have been particularly challenged to innovate in participative methods adapted for children.
Evaluating the sports program has also offered opportunities to refine how we communicate findings and adapt them to different audiences. Internally, results are shared through workshop-style sessions that encourage implementers to reflect on the evidence, discuss its implications, and agree on the most relevant and feasible areas for improvement. Externally, insights from the evaluation—particularly those gathered from work with participating children in 2024—were presented to other NGOs working in child development and sports as part of the 2025 Evaluation Week organized by the Global Evaluation Initiative. This experience broadened our reach and highlighted the value of clear reporting, visual storytelling, and strategic framing in making evaluation findings accessible and actionable.
Paper short abstract
We demonstrate how ripple effects mapping as a participatory approach to evaluation, delivers rich and insightful data about a local authority research training programme. Findings reveal intended and unintended impacts on individual skills and organisational research activity.
Paper long abstract
Background
Local authorities regularly make decisions that impact the determinants of health and subsequently, health inequalities. To ensure that policy and service decisions are optimum, it is important that local authorities access and use the most contemporary and wide-ranging evidence. In response, Blackpool Researching Together co-designed and delivered a research training programme to help build local authority and voluntary sector staff capabilities to use and conduct research. The aim of the current study was to evaluate the intended and unintended outcomes of the research training programme.
Methods
We used ripple effects mapping (REM) as a participatory qualitative approach to evaluation. This interactive session encouraged previous research training participants and facilitators to reflect on how the programme had influenced individual skill development and contributed to changes in research activities and outputs in their respective organisations. All participants and programme facilitators from cohorts 1 and 2 were invited to map outcomes. Two workshops were held in December 2025, allowing 12 months of post-programme outcomes for cohort 1 and six months post-programme outcomes for cohort 2. Cohorts 1 and 2 workshops were held separately. Small group guided discussion and flip charts were used to capture participant insights. Discussions focused on links between actions and impacts, the most and least significant outcomes, and who was impacted. Workshops focused on sense-making, gathering both individual and collaborative insights, resulting in two co-created visual maps. The maps depict a timeline capturing all of the outcomes provided by the participants.
Results
Although formal results not yet available, prior to the study, informal anecdotes suggested that the research training programme was successful in improving the participant’s confidence in research and had positive impacts on career trajectories. At the conference, we will present the results of the REM workshops, including a summary of the intended and unintended outcomes of the programme, and the co-produced visual maps. We will discuss the evaluation’s implications for practice, turning evidence into action.
Conclusion
This presentation will demonstrate how participatory and reflective approaches to evaluation, such as ripple effects mapping, can deliver rich and insightful data that can be translated into practice. Our presentation will demonstrate how participatory methods capture nuanced impact that may otherwise be missed using traditional methods. We will also highlight how REM workshops break help to remove traditional power dynamics on research.
Paper short abstract
Robustly assessing how government funding achieves policy goals remains a challenge. This paper synthesises our experience of using Contribution Analysis (CA) in energy and environment evaluations. Drawing on practitioner insights, we explore the challenges and lessons learned through using CA.
Paper long abstract
This paper synthesises learnings from recent evaluations in the energy space that have applied Contribution Analysis (CA) to assess the contribution of UK government interventions towards net-zero policy goals across projects, programmes, and market mechanisms. For example, this includes evaluations of innovation funding schemes (such as Heat Pump Ready), retrofit funding schemes (such as the Social Housing Decarbonisation Fund), and market interventions (such as the Capacity Market). Drawing on practical experience in delivering evaluations with a public policy consultancy, we explore how Contribution Analysis, often combined with Process Tracing (PT) and/or informed by the work of Delahais and Toulemonde, has been delivered in live evaluation contexts.
Our synthesis highlights methodological challenges encountered when applying CA in live policy environments. Specifically, we critically assess the use of CA as tool to assess the contribution of interventions when evaluation projects specify the use of CA but it is not wholly fit for purpose: 1) complex and interlinked multi-programme evaluations, 2) evaluations with poor quality data sources, 3) evaluations using CA in parallel with programme delivery where there is insufficient time for contribution to be clear, and 4) smaller scale evaluations without resources to collect sufficient data to exploit the value of CA’s evaluative power.
We examine how we as evaluators have navigated these challenges to produce credible conclusions on contribution stories that have informed decision-making, and how this has refined our approach to theory-based evaluation to ensure impact for policymakers. The analysis identifies lessons from our experience on how to best utilise Contribution Analysis, Process Tracing and the work of Delahais and Toulemonde in order to develop and test compelling theory based contribution narratives. Namely, which programme objectives are met, how market mechanisms adequately incentivise participants and whether project teams successfully deliver innovation through their grant funded projects.
By combining the power of Contribution Analysis, Process Tracing, and the approach of Delahais and Toulemonde, our frameworks have evolved to allow us to deliver two elements in parallel. Robust assessments of whether an intervention is necessary and/or sufficient to achieve its objectives, as well as a clear commentary on the strength of the evidence used to form those judgements
Beyond our methodological insights, this paper also demonstrates how CA can bridge the gap between evidence and action in the energy sector by clarifying the role of government funding in achieving policy outcomes. We identify where CA has, and has not, effectively leveraged insights for future policy design and the reasons for these successes or failures.
We argue that CA’s structured approach to causal inference strengthens accountability. By linking the use of CA with PT and elements of the work of Delehais and Toulemonde, programmes can be robustly assessed against their original intention, rather than their actual out-turn.
By sharing practitioner perspectives and cross-case lessons centred on the energy sector, this session will contribute to the conference themes of influencing policy and programme change, building evaluation cultures, and communicating evaluation for action.
Paper short abstract
This session explores how Participatory Video Most Significant Change (PVMSC) builds evaluation cultures by co-producing evidence with adolescents, embedding reflection and learning in decision-making, and using ethical, user-led storytelling to value diverse voices.
Paper long abstract
Healthy Cities for Adolescents (HCA) is Fondation Botnar’s flagship initiative, managed by Ecorys, to create cities that are fit for adolescents. Now in its second phase (2022–2026), HCA operates in six countries, supporting projects that address adolescent health and wellbeing in diverse urban contexts.
This session introduces an innovative, participatory, and user-centred method used in HCA to enhance learning and reflection. Drawing on implementation experience, we will demonstrate the transformative potential of this approach, share a practical example, and reflect on lessons learned from both evaluator and implementor (InsightShare) perspectives on fostering an evaluation culture.
Aligned with Theme 2, the session will present PVMSC - a method combining Participatory Video (PV) with the story-based Most Significant Change (MSC) technique. Grounded in equitable evaluation and participatory action research, this approach enables adolescents to film, edit, and share their own stories of change, taking the lead in identifying and analysing what matters most to them. Through visual storytelling, they generate evidence for local action, learning, and advocacy.
Our experience with PVMSC illustrates its value, relevance and feasibility in complex programmes.
1. Ethical storytelling: PVMSC shifts away from extractive practices in which external actors (often in the Global North) interpret service user data. Instead, it co-produces evidence that centres adolescents in MEL, redefining what counts as evidence from externally set indicators to locally defined significance.
2. Adolescent agency: It exemplifies evaluation as empowerment, with adolescents acting as co-evaluators and co-communicators, generating evidence in their own words through creative expression.
3. Capacity strengthening: The process builds lasting skills among grantee organisations and young people in digital media, storytelling, participatory research, MEL, civic engagement, and facilitation.
4. Local ownership: Participatory analysis, collaborative reflection, and community film screenings become platforms for local sense-making, dialogue, and advocacy that inform ongoing learning and policy action.
The session will showcase examples of PVMSC in action and share lessons on creating environments where evidence is valued and used for local action beyond traditional reporting and accountability.
Please note, we are aware of Social Development Direct’s abstract submission on YET, and confirm that this is a different approach and that there is no overlap between the sessions’ content.
Paper short abstract
In recent years , evaluation quality has improved but evaluation directors still lament the limited use of evaluation results. This presentation considers one of the main reasons this gap persists; our collective failure to read the organisational environment and influence key decision makers.
Paper long abstract
We don't know how much taxpayer funding goes to the evaluation of international development and humanitarian evaluation, but it is comfortably over US$100 million a year. Some of this investment is wasted but we don't know how much; 20%, 30%? In any scenario, it’s millions of USD/GBP.
We’ve been asking for a long time why some evaluations catalyse change while others gather dust on the shelf. In response, in the last 25 years we have made real progress in the professionalisation of evaluation including the development of norms, standards, methods, ethics and in the communication of results. We’ve become better at engaging stakeholders before we complete evaluation reports.
Overall, evaluation quality has gone up. Why then, do evaluation directors still lament the limited uptake of evaluation results by their organisations and the managers most concerned? It appears that the gap between technically sound evaluation and genuine uptake of evaluation results remains frustratingly wide.
In this brief presentation, we will talk about one of the main reasons this gap persists; our collective failure to properly read the political and organisational environment around each evaluation and to influence the key individuals who decide if and how evaluation results are used.
Certainly, this requires skill and because we cannot control the outcome of these interactions, which means there are no guarantees of success. However, we can learn from experience.
In this session, we will cover a few key points concerning how to influence, using practical examples:
• Why high quality in evaluation takes away reasons for not using evaluations but, by itself, cannot not drive utilisation.
• The importance of shaping organisational connections and fitting evaluations into organisational decision-making
• Understanding managers perspectives; how evaluations interact with clients’ decision-making in light of the incentives, opportunities and risks they see in evaluation
• Building trust and credibility in the evaluation with clients through listening, impartiality and competence
• Allowing stakeholders to debate results and agree actions in the evaluation while maintaining the integrity of the evaluation process.
• Closing out the evaluation; managing the critical, high-risk transition from evaluation completion to organisational action
Some evaluation directors still think that influencing is not the business of evaluators and evaluation managers: ‘Deliver good evaluations and leave the rest to management’, they say. This presentation will discuss why this is a mistake and why learning how to influence the environment around any evaluation is critical to delivering on the conference theme of ‘Bridging the Bap: Evaluation into Action’.
Paper short abstract
Present two main EEF approaches to longitudinal analysis: (1) routine tracking via the EEF Archive, and (2) pre-specified longitudinal analysis built into evaluation design when delayed or sustained impact is plausible and discuss and compare the advantages and limitations of the two approaches.
Paper long abstract
Longitudinal analysis is a key component of the Education Endowment Foundation’s (EEF) mission to understand whether the effects of educational interventions persist, diminish, or emerge over time. By providing evidence on the durability of outcomes beyond the initial evaluation period, longitudinal analysis informs educational practice, supports evidence-based improvements, and guides decisions about regranting funding to programmes.
This presentation aims to introduce the two main approaches EEF uses for longitudinal analysis. The first is routine tracking through the EEF Archive, which leverages National Pupil Database data and analysis by Durham University. The second is pre-specified longitudinal analysis built into evaluation design when delayed or sustained impact is plausible. It is also aimed to compare the advantages and limitations of these approaches, considering factors such as cost, data completeness, theoretical interpretation, and the burden on schools. Another objective is to demonstrate how learning from challenges can refine educational approaches and how EEF’s evidence can help policymakers and funders decide which programmes to scale, adapt, or regrant. EEF is currently revising its approach to longitudinal analysis to consider how best the analysis can support EEF’s overall mission. Flexible models are being considered, including longitudinal analysis on all archived projects, longitudinal outcomes being incorporated in original trial design or focusing on programmes in the funding pipeline to maximise the practical value and impact of longitudinal evidence for education. By connecting these insights to the conference theme, we aim to show how evaluation can drive real-world change and improve outcomes for learners and the education sector.
Attendees will gain practical insights into two viable approaches to longitudinal analysis that are highly relevant to the educational sector. They will learn how EEF has applied these methods to strengthen its strategy and mission, ensuring evidence-based improvements and sustained impact. The session will provide an opportunity to explore the EEF's Archive and understand EEF’s longitudinal research methodology in depth. Attendees from evaluation teams in the charity sector, other What Works Centres, and consultancy evaluation teams will leave with a clearer understanding of how longitudinal analysis can inform decision-making and enhance evaluation practice. There will also be time for questions and discussion to support knowledge sharing and application.
EEF uses longitudinal analysis to identify which programmes are worth scaling, adapting, or regranting. This ensures that the education sector can come to more evidence informed decisions when it comes to investing their time and resources in interventions that are more likely to deliver a positive, lasting impact.
EEF’s mission is to break the link between family income and educational achievement by supporting the education sector to transform outcomes for socio-economically disadvantaged children and young people. To achieve this, EEF through its wide work and the longitudinal follow up research methodology enables practitioners to focus on evidence and what works best in practice, ensuring that early years providers, schools, and colleges have access to accurate, accessible, and actionable evidence to improve teaching and learning.
Paper short abstract
This session explores Uganda's 20-year journey to build a government monitoring and evaluation system. We analyse the fragile interplay of politics, administration, and donor influence and share critical lessons and offer insights for any country navigating the "unsteady pulse" of evidence.
Paper long abstract
Building a sustainable system is a complex, political endeavour, not merely a technical exercise. This session revisits and updates a seminal 2016 study (DOI:10.1057/9781137376374_10) on the supply and demand for evaluation in Uganda's public sector. We trace the two-decade arc of this effort, from the ambitious creation of both a cross-Government results system (GAPR) and Government Evaluation Facility (GEF), their challenges and sustainability, set against the backdrop of National Development Plans and shifting political and donor dynamics.
The session is structured to move from analysis to actionable insights. We begin by framing the issue using the supply-demand framework, explaining how the equilibrium between the production of evidence and the political will to use it has shifted over time. Co-presenters will then provide an updated analysis, detailing "what happened next" and describing the "new equilibrium" shaped by non-state actors and conditional political demand.
The core of the session lies in critical reflections and a facilitated discussion. We will distil hard-won lessons about the fragility of institutionalization and the double-edged sword of donor support. We then engage the audience with provocative questions on key dilemmas: supplying evidence in shrinking political spaces, achieving genuine government ownership, marketing evidence effectively, and re-imagining the future of government-led evaluation facilities. This session is essential for evaluators, policymakers, and M&E champions committed to making evidence matter in the real world.
Paper short abstract
Meet-the-Author session on Ukraine’s emerging national evaluation system, showing how evaluation architectures, capacities and incentives shape real policy and programme decisions in fragile, crisis-affected contexts, and what this means for influencing change elsewhere.
Paper long abstract
This Meet the Author session will explore how a national evaluation system can support – and sometimes struggle to support – evidence-informed policy and programme change in a highly fragile, rapidly evolving context. Drawing on the discussion paper Strengthening Evidence-Based Decision Making in Fragile and Conflict-Affected States: Insights from Ukraine’s National Evaluation System (DEval Discussion Paper 3/2025), the session uses Ukraine as a case study to examine how evaluation infrastructure, incentives and capacities shape real-world decisions on recovery, reconstruction and EU accession.
The paper finds that monitoring and evaluation of public policy in Ukraine is widely referenced in regulations and strategies, yet actual evaluation practice remains patchy and weakly institutionalised. Fragmented responsibilities, limited political demand for evaluation, and the absence of a coherent legal framework constrain the systematic use of evidence. At the same time, war-related recovery and reconstruction, and the EU accession process, have created powerful external pressures and opportunities to build stronger evaluation systems. Civil society organisations, evaluation associations and international partners are emerging as important actors, piloting practices and norms that can influence how public institutions generate and use evaluative knowledge.
In the session, the author will briefly present the paper’s conceptual framing and qualitative case study methodology, and then focus on the entry points it identifies for strengthening policy and programme change through evaluation. These include: clarifying mandates and co-ordination structures for government-led evaluation; using EU accession and reconstruction funding as levers for institutionalising evaluation; investing in evaluation capacity development across state and non-state actors; and fostering a culture that values learning alongside accountability. The session will highlight the tensions between urgent decision-making in crisis and the longer-term, systemic work of building a national evaluation system.
A facilitated discussion will invite participants to interrogate the transferability of these insights beyond Ukraine: How can evaluators and policymakers in other fragile or politically contested settings use windows of opportunity (e.g. reform processes, external funding, crises) to embed evaluation more deeply? What kinds of adaptive approaches and partnerships help ensure that evaluation findings travel into policy and programme decisions, rather than remaining at the margins?
The session will be of interest to evaluators, commissioners, policymakers and funders concerned with Theme 1 of the conference. Participants will leave with a richer understanding of how national evaluation architectures interact with politics, conflict and reform – and with practical ideas for leveraging evaluation to influence policy and programme change in their own contexts.
Paper short abstract
Through strategic design, the interim evaluation of DSIT's 5G Innovation Regions programme provided actionable insights for ongoing adaptive design. The evaluation provides insights into use of theory-based methods to drive programme improvement, inform policy, and accelerate tech adoption.
Paper long abstract
Too often, evaluation insights arrive too late in programme delivery to meaningfully influence performance and subsequent impact. This presentation addresses this critical challenge by sharing learnings from KPMG’s interim process and impact evaluations of the Department for Science, Innovation and Technology’s (DSIT) 5G Innovation Regions (5GIR) programme. This programme aims to support places across the UK in adopting advanced wireless technologies, accelerating commercial investment in 5G, and fostering the 5G ecosystem, ultimately driving economic growth.
The core of this presentation focuses on the importance of strategic timing of evaluation components and effective application of theory-based evaluation methods. We will illustrate how these methods and approaches were used to identify what has and hasn’t worked well, and the specific actions needed to either enhance or secure intended outcomes. Crucially, we will also detail how these findings were effectively communicated to relevant stakeholders in a timely way to enable tangible change within the programme.
Through this, the presentation offers tangible examples of how evaluations can be designed to provide timely and actionable insights that actively support adaptive programme design and ultimately improve programme outcomes, particularly for novel and innovative programmes. We will highlight specific lessons from the interim evaluation of 5GIR that may be relevant for other similar programmes, especially those involving technological projects. These findings include insights into novel funding mechanisms, appropriate programme timelines, and the mechanisms needed to effectively drive technology adoption. Finally, we will provide evidence of the policy impact of these findings, demonstrating how lessons learned from the evaluation have directly informed subsequent decision-making by DSIT. This showcases a practical approach to embedding evaluation as a dynamic tool for continuous improvement and strategic adaptation, as well as demonstrating progress towards achieving ultimate impacts.
Paper short abstract
We adapted the UK’s ICF KPI 15 from a static metric into a dynamic tool that tracks signals of transformational change. Applied under FCDO’s ARCAN programme, we demonstrate our experience of how this can enable actionable insights and future-aware decisions to accelerate systemic transformation.
Paper long abstract
Transformational change is central not only to climate and nature agendas but also to sectors where systemic shifts are critical. Yet measuring this change meaningfully, and using that evidence to shape decisions, remains an evaluative frontier. The UK’s International Climate Finance (ICF) Key Performance Indicator (KPI) 15 was designed to assess the likelihood of transformational change, but its original design was too static, linear and output-focused to capture the realities of complex, adaptive programmes or to inform forward-looking decisions. The urgency of the climate and nature crisis, and the growing scale of international climate finance, demands tools that do more than monitor progress: they must actively guide strategies and policies towards transformative action.
Drawing on our experience as the Monitoring, Evaluation and Learning (MEL) Unit of the FCDO’s Africa Regional Climate and Nature (ARCAN) programme this session will share how we developed, tested, and implemented an adjusted KPI 15 methodology, the lessons learned from doing so, and what this means for future applications across complex portfolios. Endorsed by ARCAN’s management team, our goal was not only to adapt the tool but to shift our team’s mindset towards transformational thinking - encouraging both evaluators and policymakers to move beyond compliance and adopt a reflective, future-oriented approach that asks “So what?” and “Now what?” to critically turn evidence into action.
At its centre is a Signals of Transformational Change framework, adapted from the Climate Investment Funds, which recognises that transformation unfolds in stages. Evaluators identify no, early, interim and advanced signals across criteria such as capacity, incentives and scalability to track cumulative progress and surface emerging momentum. We trialled additional dimensions such as gender equality and synergies, embedded sustainability as a cross-cutting judgement, and introduced a dual scoring system linking likelihood of change with strength of evidence. By moving KPI 15 from a static monitoring metric to an evaluative, future-aware sense-making tool; we aimed to strengthen evaluative insight, enable more nuanced judgements about where and how change is emerging, and better connect evidence to real learning and action.
Our session will begin with an introduction to transformational change concepts and the ICF KPI 15 methodology. We will then briefly introduce the ARCAN portfolio and the deficiencies of the existing ICF KPI 15 framework in our context. Next, we will provide a practical walkthrough of our adaptations and examples of its practical application. This includes a set of guiding principles we developed that encourage critical and contextual thinking - such as questioning the significance of change, recognising weak or negative signals and examining how change connects across levels (local to regional). Finally, we will share joint lessons learned by our team and FCDO, including challenges, along with forward-looking insights for evaluators, managers of complex portfolios (including the UK’s ICF portfolio team), and policymakers. While our focus is on climate and nature, these insights are relevant to other sectoral evaluators and MEL professionals seeking approaches to measurement that help programmes and policies move towards transformational change.
Paper short abstract
Building an evaluation culture in a national volunteer-led widening participation charity. A case study of how participatory, reflective practices drive learning, inclusion, and evidence-based decision-making in widening access to medical education.
Paper long abstract
In2MedSchool is a UK-registered charity founded in 2020 to widen participation in medicine by connecting aspiring medical students from disadvantaged backgrounds with volunteer mentors—doctors and medical students—across more than 100 schools. With over 3,000 mentors and 2,000 mentees, the organisation has achieved remarkable reach but faced the challenge of sustaining evidence-based practice without paid evaluators or formal research infrastructure.
This presentation explores how In2MedSchool embedded evaluation as a collective learning process rather than a compliance exercise. Using participatory and mixed-methods evaluation, including annual mentee and mentor surveys, regional focus groups, and feedback loops with schools, the organisation developed a “learning culture” that informs every level of decision-making—from safeguarding training to programme design.
The session will share findings from three years of practice:
Quantitative impact: 78% of mentees reported at least one medical-school offer, compared with 40% nationally.
Qualitative insight: mentors and mentees describe increased confidence, belonging, and professional fulfilment.
Cultural learning: regular “evaluation huddles” and trustee learning meetings embed reflection in governance.
Lessons include the importance of simple, iterative methods that build ownership; co-producing evaluation questions with stakeholders; and prioritising learning over reporting. The approach demonstrates how small organisations can democratise evaluation and turn it into a driver of inclusion, transparency, and strategic improvement.
By illustrating how volunteer communities can create meaningful, ethical evaluation cultures, this session contributes to Theme 2: Building Evaluation Cultures, offering a replicable model for evidence-led, community-based educational change.
Paper short abstract
A panel session of evaluators working in different strata of evaluation in programmes designed to promote the development of innovative place-based interventions. Each will discuss how their work is shaped by their stratum and how this corresponds to each of this year’s conference subthemes.
Paper long abstract
It is not uncommon for there to be different strata of evaluation in programmes designed to promote the development of innovative place-based policy and practice through inclusive collaboration between different stakeholders and communities. Each stratum has a specific role, responding to each of this year’s conference subthemes in their own ways. An example being the Local Policy Innovation Partnerships (LPIP) funded by Economic and Social Research Council (ESRC), Arts and Humanities Research Council (AHRC), and Innovate UK. The programme includes four local partnerships based in each of the four nations and a national coordinating hub. The programme aim is to create a step change in the quality and impact of the evidence created by universities and their local place partners to support place-based policy and practice innovation.
In this round table discussion we bring together evaluators involved in the Local Policy Innovation Partnerships working in different roles including:
• National independent evaluator responsible for developing and implementing the overarching evaluation framework and programme assessment
• Experienced evaluator based in the Strategic Coordinating Hub for LPIPs whose role is support the development of evaluation capabilities and understanding of the distinctiveness of place-based evaluation.
• LPIP leader and evaluator who is closely involved in building capability within local policymaking, co-producing evidence with communities and service users, and using participatory and user-centred methods that support reflection and learning.
Panel members will provide their reflections on:
• Importance of building trust both professionally methodologically and how to measure it as a key intermediary outcome and potential indicator of future sustainability
• Trust is an enabling condition to explore policy and programme effectiveness. Trust between partners, trust between evaluators and participants, and trust within communities shaped:
o Data quality
o Engagement levels
o Partnership stability
o The credibility of findings
• Alignment of and choices around perspective and approach adopted depending on where you are positioned as an evaluator
• The role of evaluation in supporting adaptive programme management and learning in placed-based innovation programmes
• The necessity of contextualised evaluation in for place-based systems?
• Evaluators as learning partners, enabling sensemaking rather than auditors?
• The need to build evaluative capacity as a core output. Evaluation activity should be:
o A capacity building exercise
o A route to strengthen strategic clarity
o A way to improve day to day decision making
o A tool for building internal cultures of reflection
• How to overcome challenges related to Data Quality, Infrastructure Gaps, and the Limits of Measurement. In place-based work, focused on sub-regions, evaluators often have to:
o Work creatively with incomplete data
o Build new baseline measures
o Use qualitative insights to compensate for gaps
o Advocate for long term infrastructure strengthening
• Supporting and engendering evaluative thinking when engaging with different communities and stakeholders including preparation to be able to evidence of impact co-produced initiatives.
The format will be each panel give a brief presentation of no more than 3 minutes to maximise time for a chaired discussion.
Paper short abstract
PAICE (Policy and Implementation for Climate & Health Equity) explores how evidence on climate action, health, and health equity can be translated into UK policy and practice. We outline a replicable model for integrating evaluation into complex, multi-stakeholder research-policy initiatives.
Paper long abstract
The PAICE project (Policy and Implementation for Climate & Health Equity) explores how evidence on climate action, health, and health equity can be translated into UK policy and practice. Addressing these systemic links requires approaches that integrate diverse disciplines and stakeholder knowledge. PAICE adopts a transdisciplinary research framework as its guiding theory for project design, delivery, and evaluation.
PAICE brings together researchers in systems thinking, modelling, epidemiology, building physics, and members of the Climate Change Committee and regional government (the Greater London Authority). A dedicated workstream has led the evaluation approach of the project by developing a monitoring, evaluation and learning plan (MELP). This plan aims to apply evaluation principles to derive criteria and indicators with which to evaluate across four project phases: formation, formulation, investigation and translation. Across these phases, transdisciplinary research processes, outputs and outcomes are evaluated using participatory qualitative and quantitative methods.
This poster presents:
• Evaluation principles and criteria adopted to evaluate process, outputs and outcomes
• Suggested methods for monitoring progress and facilitating reflexive learning
• Alignment with the governing program theory, including the intended action model and anticipated project impacts.
• Impact pathways for research and policy practice, including how we
Emerging insights included are:
• Challenges and opportunities of mid-term evaluations
• Lessons from working with resource-constrained societal partners
• Strategies for fostering discipline-specific learning within a climate-health context, including community engagement and systems thinking.
Few UK climate-health research projects embed evaluation activities into projects from the beginning. We hope that the MELP offers a replicable model for integrating evaluation into complex, multi-stakeholder research-policy initiatives. By embedding evaluation in a transdisciplinary framework, PAICE demonstrates how adaptive, participatory approaches can strengthen evidence translation and inform policy in complex, uncertain domains.
Paper short abstract
We used contribution analysis and a 9-stage reform value chain to assess how PLANE shaped seven education reforms across five Nigerian states, identifying insider-led and institution-embedded pathways that moved policies from drafting to budgeted implementation.
Paper long abstract
Background & alignment to UKES themes. This study examines how an FCDO-funded programme (PLANE) influenced systemic education reforms in Nigeria and the conditions under which influence translated into adoption, budget execution and early institutionalisation—squarely addressing UKES Theme 1 (policy influence) and, through utilisation-focused design and iterative insight sharing, Theme 3 (communicating evaluation for action).
Methods. We applied contribution analysis as the primary approach, structured around a 9-stage reform value chain (from gap analysis to sustained results). Evidence combined document review and 82 key informant interviews (17 PLANE staff; 65 stakeholders, of whom 9 were women), coded in MAXQDA against a pre-specified analytical framework. Reforms spanned seven processes across federal and state levels: Teacher Recruitment/Deployment/Replacement (Jigawa, Kano), Education Quality Assurance law (Jigawa), Girls’ Education Policy (Kano), Domestication of the National Policy on Almajiri (Kaduna), UBEC/Intervention Fund law reform (federal), School Safety policy (Jigawa), and TaRL sustainability (Borno, Yobe).
Systemic pathways (what worked). Evidence shows PLANE’s influence operated through four mutually reinforcing pathways:
• Insider-led brokerage: mobilising respected government technocrats and trusted intermediaries as champions to navigate ideological sensitivities (e.g., reframing gender language to “inclusive access”), keeping opponents engaged and approvals moving.
• Multi-tier convening and coalitioning: from governor-level dialogues to Technical Working Groups and civil society coalitions (e.g., K-SAFE), enabling consensus-building and de-politicised problem-solving.
• Legal-institutional embedding with budgets: creating units, mandates and budget codes prior to full passage—e.g., a Girls’ Education Unit and NGN 402.6m 2025 allocations across MDAs in Kano—so policies did not stall at adoption.
• Peer learning and vertical alignment: brokering federal–state linkages (e.g., UBEC reform dialogue) and cross-state diffusion (e.g., Jigawa TRDR influencing Kano/national; Kaduna’s Almajiri domestication inspiring neighbours).
Results (what changed). Across cases, no supported reform regressed; several advanced multiple stages in 2025. Examples include Kano Girls’ Education Policy (stage 4→6) with early operational uptake; and Kaduna’s Almajiri domestication moving from stage 1 to a multilateral implementation platform (ROOSC). Teacher reforms were associated with tangible staffing actions (e.g., 2,400 recruits in Jigawa; >23,000 volunteer teachers absorbed in Kano, 2023–2025). These movements reflect budget execution and governance arrangements activated (stages 6–7), not just paper progress.
What didn’t work (and why). Reform pace was constrained by elite religious/political sensitivities (notably on girls’ education), political turnover, under-attention to the non-formal sector, and MEL capacity gaps that limited feedback loops—pointing to where influence needs earlier elite engagement, broader actor inclusion, and embedded MEL to sustain momentum.
Contributions to evaluative practice. Methodologically, combining contribution analysis with a staged reform rubric offered a transparent, theory-linked account of influence and decision-relevant evidence for adaptive management—translating directly into approvals, budget lines, and units within ministries.
Implications. To turn influence into durable results, evaluative practice should: (i) plan for elite-sensitivity management and multi-party continuity pre-approval; (ii) institutionalise capacity and learning platforms; and (iii) integrate policy-tracking MEL into government performance systems from the outset.
Paper short abstract
Learn how formative evaluation shaped Fleming Fund design, adaptation and policy influence amid global uncertainty. Gain practical lessons on embedding evaluation for impact, balancing timeliness with rigour, and strategies for influencing policy—ideal for evaluators seeking actionable insights.
Paper long abstract
Introduction
This panel will share lessons from using formative evaluation to strengthen Fleming Fund programme outcomes and influence policy at global, regional, and national levels.
Context
The Fleming Fund was established in 2017 to strengthen Antimicrobial Resistance (AMR) surveillance as a key pillar in global efforts to tackle AMR. Through a portfolio of country-, regional- and global grants, the Fund has generated country-level analyses on AMR and shared with decision makers to influence national and global policy and regulation.
Itad has been the Fund’s independent evaluation partner since 2017, delivering a range of evaluation products from the start up of the Fund. These have informed substantial programme change (including securing a second phase of support for the Fund, with an evolving focus based on experience in phase 1); and policy and regulation change at national and global level, including at the UN High Level Meeting on AMR in September 2024.
The Fund has been implemented throughout a period of significant uncertainty. The programme has adapted to respond to the challenges of COVID-19, Brexit, changes of UK government, multiple short-term spending reviews and associated replanning exercises. Uncertainty looks set to influence the design and implementation of ODA programmes for the foreseeable future, particularly in the context of recent and ongoing US and UK cuts to ODA. The Fund’s incorporation of evaluation from the design stage and its use of evaluation outputs for timely support to key decisions offers valuable lessons to any evaluators or decision makers seeking effective, sustainable ODA programmes.
Objectives
The panel will show how DHSC and Itad structured and used evaluation to guide programme design, adaptation and decision-making in a complex, multi-stakeholder context.
Plan for panel
Two speakers will present for 10 minutes each:
• Milena von und zer Muhlen (DHSC) will provide an overview of how DHSC structured the evaluation to maximise value and relevance, including on how evaluative thinking was used to inform programme design and maximise effectiveness. For example, incorporating evidence on best practice in policy influencing and agenda setting.
• Jon Cooper (Itad) will outline how the evaluation adapted its approach to respond to DHSC changing needs and uncertainty whilst maintaining methodologically robust evaluative insights.
Both will discuss challenges and strategies for maximising effectiveness.
• DHSC adapted evaluation timeframes and deliverables to ensure timely, relevant insights.
• A supportive culture for learning and evidence use is critical.
• Rigid portfolio and contract systems hinder adaptation.
• Multiple tailored mechanisms are needed to engage decision-makers; findings must be simple yet substantive and striking the balance is not easy.
• Policy influence and sustainability require time, strategic action, and political awareness—evidence alone is insufficient.
• Sustainability must be embedded from the outset, not left to later stages.
Q&A facilitated by Tim Shorten, with potential questions such as:
1. Which evaluation design features more / less influential for DHSC decisions?
2. How did the Fund engage national decision-makers, and and to what extent were evaluation findings necessary/sufficient?
3. How to balance timeliness vs robustness, pragmatism vs perfection?
Paper short abstract
We present two evaluation case studies which show how AI‑assisted causal coding can turn large volumes of interviews and reports into theory-driven or theory-free causal visuals with traceable evidence. We share workflows, accuracy checks and design choices to make the maps useful in evaluations.
Paper long abstract
Evaluators often struggle to process and communicate thick qualitative evidence quickly and convincingly. Causal mapping offers a concise visual language - outcomes, drivers and intermediate steps linked into a causal map with supporting quotes - but building reliable causal maps at scale used to require weeks of manual coding. This talk shows how AI‑assisted coding accelerates coding and synthesis while making sure that every visual element is traceable to verbatim text in context.
We present two recent evaluations:
Case A – Using causal mapping and contribution analysis for the final evaluation of a large multi-country programme (Dena). We will explain how hundreds of interview transcripts and internal reports were uploaded for causal coding with a “verifiable AI” technique, and how the causal mapping helped to feed into the Contribution Analysis step.
We address two challenges
How to agree on a vocabulary for the common causal elements across the program components: what to do if the language in the Theory of Change itself is ambiguous, and terminology varies across contexts? We will explain how analysts validated suggestions and managed the merging of terms (e.g., “coalition‑building”/“alliance work”).
How to narrate the story of change that the maps are showing in an accessible way.
Case B – Making more sense of masses of Outcome Harvest data (Alastair).
This was a multi-country, multi-year project, with large amounts of outcome harvest data from 692 individual sources.
Both evaluations involved highly sensitive data. Partners were understandably concerned about automated processing procedures. In this case, we were able to get the approval even of some partners who started off pretty hesitant, mainly through the use of automated offline anonymisation of data before further processing.
Even more than Case A, this was a very complex programme with many partners, each country having its own Theory of Change (ToC) as well as a global programme ToC with various different outcomes for different countries, programmes and donors, plus learning questions and hypotheses that the client all wanted to check against the data. They were finding it hard to grasp the big picture. Using causal mapping enabled them to triangulate the other methods the team were using to evaluate the programme, and clearly articulate causal chains at various different levels. There was very good feedback from the client.
Across both cases we will demonstrate: (1) a reproducible workflow from corpus → verifiable coding by AI → iterative refinement of labels → application of standard algorithms to answer evaluation questions → maps/tables; (2) validation; (3) supervised use of AI to create accessible text summaries of the data contained in the maps.
Why Theme 3? Because the product is not the map or the algorithm - the aim is to strengthen shared understanding. Visuals with transparent provenance (every node/link opens the quotes behind it), are intended to help communicate complex findings in a way that promotes discussion. We will close with a compact checklist of do’s and don’ts to help others pilot AI‑assisted causal mapping responsibly.
Paper short abstract
A theory of change (ToC) in Social Return on Investment (SROI) is a useful to identify, measure and value programme impact. Gaps exist in how ToCs have been developed. The development of a robust ToC is essential in SROI evaluation with effective stakeholder engagement central to this.
Paper long abstract
A theory of change (ToC) is critical for understanding the relationship between the activities, inputs, outputs and outcomes of programmes. In a Social Return on Investment (SROI), ToC has been advocated as a useful process to assist in the identification and subsequent measurement and valuation of programmes activities, outcomes and impact. Despite its utility within SROI, gaps exist in how theories of change have been developed as reporting tends to be brief and scant. Stakeholder engagement methods are seldom clearly documented and integrated in the reporting of SROI analyses. We used ToC in an SROI evaluation of two programmes delivered by a football foundation aimed at young people. In this study we used focus group discussions and semi-structured interviews to collect data to inform the development of the ToC and to ensure all activities and outcomes were captured from the perspective of all stakeholders. Data collection involved four focus groups of ten participants each and fifteen interviews were conducted with delivery staff; organisation and service leads; and funders and partners. Focus groups and interviews were audio recorded and transcribed. Reflexive thematic analysis was used to develop codes and themes. The analysis identified similar themes across the two programmes including social skills, friendship, health and wellbeing, personal development, participation, lifestyle changes and programme structure; with a few unique to each programme. Stakeholder engagement was found to enhance the process of theory of change development by prioritising the input of those involved in the participation, delivery and support of the programmes under evaluation. We conclude that the development of a robust ToC is essential in any SROI evaluation and that effective stakeholder engagement is central to this. The ToC will also act as a tool to illustrate how the programmes create change, assess their effectiveness and communicate this to stakeholders.
Paper short abstract
This abstract features an evaluation of Barnardo’s social prescribing service (Cumbria LINK) and how it prompted key refinements in programme delivery. The findings highlight how a learning partnership approach including relational practice and iterative feedback loops, helped refine the programme.
Paper long abstract
The Barnardo’s LINK social prescribing programme was established in the Northwest of England to address a critical gap in support for children and young people (CYP) experiencing social, emotional, and mental health difficulties (that fell below clinical thresholds). Designed within a social prescribing model, the programme positioned social support and community engagement as therapeutic mechanisms to improve wellbeing.
The programme was independently evaluated by Edge Hill University over three years, generating evidence that measured impact and refined programme delivery. The evaluation adopted a learning partnership approach (alongside a process and impact evaluation) that embedded iterative reflection between researchers and LINK practitioners. This design transformed the role of evaluation from a retrospective assessment into a live process of co-inquiry and service improvement. Through multiple methods (e.g., qualitative interviews, outcome monitoring, and development of a Theory of Change), the evaluation gathered insights into how relational practice, and adaptability were key drivers of success. In turn, these insights catalysed a series of refinements for programme delivery, and system integration (particularly in health, education and social care).
The evaluation’s impact was evident in how the learning partnership refined programme delivery. Evaluation findings which highlighted inconsistent referral pathways, were used to prompt the development of clearer guidance to help families better understand the offer. At the organisational level, evaluation insights informed a shift from an initial pilot to a mature, system-embedded programme, encouraging efforts to improve visibility (across systems such as social care). Findings from evaluation activities also highlighted practical challenges faced by families (e.g., difficulties with transport), which led to programme refinements aimed at making the service easier to access. In addition, the evaluation team identified key data inefficiencies within the programme’s existing monitoring systems, highlighting gaps that limited the ability to evidence outcomes. These insights re-shaped data collection processes to capture more meaningful evidence, strengthening the programme’s capacity to demonstrate robust impact to commissioners.
The evaluation process also revealed barriers that hindered the translation of evaluation into decision-making (e.g., workforce capacity pressures, and constraints in funding), whilst also highlighting how an adaptive, learning partnership-based approach could overcome these hurdles. By maintaining regular dialogue between researchers and LINK practitioners, evaluation findings were mobilised in real time, promoting a culture of reflection and shared ownership of change. Ultimately, the evaluation of the programme did not simply describe what worked or outline its impact; it became a crucial mechanism for programme refinement. This case illustrates how evaluation (when relational and adaptive), can bridge the gap between evidence and action, not as an endpoint, but as an evolving process of learning.
Paper short abstract
This session presents an initiative to co-design an evidence-informed framework for evaluating sustainability interventions, drawing on a UK–Brazil project focused on building and empowering communities of evidence.
Paper long abstract
Municipalities in Brazil face persistent challenges in managing sustainability interventions due to limited resources, complex policy interdependencies, and competing political and administrative demands. These conditions frequently result in fragmented planning, inefficient implementation, and inadequate evaluation practices. Addressing such challenges requires structured evaluation frameworks that help municipalities design, implement, and assess sustainability interventions systematically, while balancing short-term constraints with long-term environmental goals.
Our project responds to this need by co-designing an evidence-informed evaluation framework (EF) that strengthens municipal capacity to integrate evidence throughout the policy cycle. The initiative focuses on sustainability interventions in solid waste management within municipalities in the State of São Paulo—one of Brazil’s 27 federal units and responsible for roughly one-third of the national GDP. Solid waste management is a critical policy area where behavioural change, stakeholder engagement, and cross-sector coordination are essential for effective and sustainable results.
The co-designed EF is grounded in behavioural change principles and in empowering communities of evidence. It recognises contextual factors shaping local decision-making and provides municipalities with practical tools to plan, implement, measure, and refine sustainability interventions. By integrating evidence use into routine processes, the framework supports improved governance, enhances transparency, and enables adaptive learning. In doing so, it contributes to advancing the Sustainable Development Goals at the municipal level.
This initiative emerges from a collaboration between the University of Portsmouth (UK) and the University of São Paulo (Brazil), with strong emphasis on knowledge mobilisation across international partners. The project also includes exchanges between researchers and municipal practitioners, enabling co-production of tools that are both context-sensitive and operationally feasible for local governments.
Through this session, we will share the framework, key components of the toolkits, and reflections from the co-design process. We will also discuss how building communities of evidence can strengthen municipal evaluation culture and contribute to more effective sustainability outcomes.
Paper short abstract
This evaluation of the Hertfordshire Suicide Prevention Pathway combined the Implementation Research Logic Model and Theoretical Domains Framework within a developmental evaluation to enable real-time adaptations and improved adoption and implementation across acute healthcare settings.
Paper long abstract
Background
We present an evaluation of the early implementation of a novel healthcare pathway and use of a developmental evaluation approach to influence pathway development and improvements in implementation.
In 2024, Hertfordshire Partnership University NHS Foundation Trust, in collaboration with Hertfordshire Mental Health, Learning Disability and Autism Health Care Partnership, launched the Hertfordshire Suicide Prevention Pathway (HSPP). Based on scientific evidence, the HSPP aims to enhance early identification, safety planning, and continuity of care across services. HPSS was developed in response to local and international evidence base relating to rising suicide rates and acute healthcare settings as critical intervention points for suicide prevention.
The evaluation employed the Implementation Research Logic Model (IRLM) and Theoretical Domains Framework (TDF) to explore both organisational and behavioural determinants to understand how the pathway was adopted, adapted, and embedded across a multi-agency system.
Methods
A developmental evaluation approach was used to support real-time learning and adaptation between April 2024 and February 2025. Qualitative data were gathered through stakeholder workshops and individual conversations. Purposive sampling included clinicians and senior leaders from acute and mental health services. The IRLM guided the mapping of determinants, strategies, mechanisms, and outcomes, while the TDF informed topic guides and coding, enabling analysis of behavioural factors such as knowledge, skills, beliefs, and motivation. Thematic analysis was applied to transcripts and detailed notes.
Results
Implementation was iterative and adaptive. The IRLM helped to identify key strategies, including face-to-face and simulation-based training, e-learning modules, promotional materials, leadership-led communication and IT system improvements. The evidence-based structure of the intervention was well received, although some training content and terminology were less relevant. Workload pressures, and inconsistent understanding created barriers to adoption. At the individual level, leadership at both senior and team levels acted as key enabler, while confidence varied according to clinical experience. Emotional factors, e.g. fear of making mistakes, also influenced uptake. In terms of process, promotional activities, IT optimisation, and flexible training formats helped support engagement.
Mechanisms of change included strengthened shared language around suicide prevention, increased staff confidence following simulation training, and improved visibility of the pathway in high-risk settings. Early outcomes included increasing numbers of staff trained and referrals to the pathway, greater awareness among acute teams, and improvements in signposting to external services.
The TDF highlighted variability in staff knowledge, confidence, motivation, and emotional resilience. Whilst variability in staff understanding and engagement remained a challenge, changes to training improved confidence, and improvements in communication gaps and IT integration enhanced adoption and acceptability.
The combined IRLM-TDF analysis and regular feedback workshops to share emerging findings enabled real-time adaptations; targeted communication strategies, expanded training formats, and improved electronic system integration.
Conclusion
Early implementation of the HSPP reflects strong organisational commitment, iterative adaptation, and growing cross-sector engagement. Integrating IRLM-TDF provided a robust, theory-driven framework to identify actionable improvements and behavioural drivers. The developmental evaluation approach ensured findings were rapidly translated into implementation. This combined methodology offers a transferable model for evaluating complex, multi-agency health service innovations that require quick translation of findings into actions.
Paper short abstract
Lessons from the ICO’s impact reporting of an ambitious and fast-paced project focused on advertising cookies. We will showcase a theory-based approach that can be deployed in any situation to generate real-time actionable insights from evaluation findings as well as post-project reporting.
Paper long abstract
How do we make insights from regulatory impact measurement as appealing and easy to digest as a chocolate chip cookie? The Information Commissioner’s Office (ICO) initiated the ‘Cookie banner project’ to improve compliance in the online advertising industry, analysing the cookie banners of the 1,000 most popular websites in the UK. We, the ICO’s Impact and Evaluation team, were tasked with writing a recipe (dropping the cookie metaphor now…) for measuring and reporting on the project’s impact.
The session will walk you through our approach from collaborative theory of change design to influencing decision-makers. We will cover the lessons we learned and the tools we implemented to provide actionable insights that can be applied by any evaluation team, including:
- Combining tools to enable visual story telling and clear reporting: where audiences have varying degrees of experience, it is important to have a varied toolkit ready for engaging with them. We will show how tools like interactive whiteboards, dashboards, and data entry platforms can be combined with an organisation’s existing board reporting arrangements to catalyse the use of insights.
- Delivering real-time insights early on to gain buy-in: winning project delivery team members over early on pays dividends when it comes to drawing on their time for evidence collection later. Our approach involved delivering small quick wins to colleagues at all levels, providing them with insights and time-saving tools to improve buy-in. This included reporting tools, data input tools, automated processes, feedback mechanisms and advice and guidance on case-making and communications. It was often more about demonstrating how common evaluation tools could also be drawn on to inform project delivery than it was about designing whole new processes and products to meet their needs.
- Theory of change (ToC) for the masses: Getting colleagues to engage with and own the project’s ToC requires involvement throughout the process. At inception, we set up design workshops using whiteboards so that colleagues could take a hands-on approach to shaping the ToC. We then layered a real-time dashboard on top of the ToC to bring the theory-based evaluation to life, demonstrating how activities, outputs and outcomes were being delivered as the project progressed. This, coupled with our organisation-wide theory of change training initiative, brought about a step-change in outcomes and impact-based decision-making for the project.
We would also love to hear from you if you have experience using any of the tools and approaches we cover during the session, or any alternatives. We welcome engagement during the Q&A and after the session so that we can learn from your experiences.
Paper short abstract
A feasibility study of Restart, a multi-agency domestic abuse programme, showed how evaluators can add value even without full recruitment by supporting system learning, strengthening programme design and improving conditions for future evaluation.
Paper long abstract
Feasibility studies rarely unfold as intended, particularly for complex domestic abuse programmes operating within dynamic multi-agency systems. This presentation uses the evaluation of Restart, delivered by the Drive Partnership and Cranstoun, to show how meaningful insights and value can still be generated even when formal study recruitment is not achieved.
Restart takes a multi-agency, whole family approach to hold perpetrators accountable for change, to prevent escalation of risk and helping (ex-) partner and child victim-survivors remain safe and together at home. It brings together professionals from Children’s Social Care, housing, and domestic abuse sector services to identify, change, and disrupt patterns of harmful behaviour at an early stage.
Our role evolved during the study from testing feasibility for a traditional impact design to acting as a learning partner, focused on understanding what was helping or hindering delivery, strengthening programme design, and identifying the conditions needed for robust future evaluation. Working closely with delivery partners, we explored where Restart sat within local systems, what was enabling or constraining implementation, and which model adaptations were needed before progressing to an outcomes-focused study. Key insights from our study included:
Mission drift around Restart’s aims and adaptations to Restart’s model, highlighting the need to refine the Theory of Change, clarify the programme’s vision, codifying its core delivery elements, and consider extending the intervention’s timeframe.
Engagement and retention barriers, including practitioner hesitancy around thresholds, competing priorities and uncertainty about Restart’s place within local pathways. There was also inconsistent understanding of eligibility criteria for Restart which influenced communication of the target cohort to referrers.
Challenges to embedding evaluation processes within delivery and barriers to recruiting participants to the research study
System-level enablers and barriers, especially the influence of strategic priorities, leadership visibility and alignment across CSC, Early Help, Housing and VAWG teams.
Rather than treating these issues as limitations, we used them as catalysts for programme and system learning. We brought together insights from multiple stakeholders to help partners understand the wider conditions shaping delivery. Our final report set out what is, and is not, currently feasible to evaluate, alongside clear recommendations for strengthening programme clarity, referral pathways, data systems and practitioner confidence in using validated tools.
While formal funding for Restart has now ended, the feasibility study has laid strong foundations for future collaboration between evaluators and practitioners. Partners identified several areas that would benefit from further development, including:
refining programme aims and mechanisms through further Theory of Change work;
strengthening eligibility and informed consent processes;
improving data monitoring systems to support future evaluability;
piloting outcomes measurement tools; and
further embedding meaningful Expert by Experience involvement in programme design and practitioner support.
This case study illustrates how evaluators can add significant value, even when recruitment to research is limited, by taking an adaptive approach that prioritises system learning, programme development and evaluability. It highlights the importance of evaluation as a developmental tool for shaping policy and practice within complex social systems.
Paper short abstract
The West Midlands Combined Authority and Sport England is investing in new Policy Officer posts to lead on the development of a Learning Evaluation and Evidence Plan to capture the impact of Sport England’s investment into whole-systems and place-based approaches.
Paper long abstract
The West Midlands Combined Authority and Sport England has taken innovative steps towards reducing physical inactivity in the West Midlands through investing in new Policy Officer - Health Inequalities (Monitoring and Evaluation) posts to lead on the development of a Learning Evaluation and Evidence Plan to capture the impact of Sport England’s investment into whole-systems and place-based approaches. The LEEP uses realist evaluation methods to enable places to listen to communities, understand local priorities and identify where investment is needed most.
While the Sport and Physical Activity sector has historically measured success with traditional monitoring mechanisms like programme participation numbers and budget profiles, the West Midlands Learning Evaluation and Evidence Plan (LEEP) pulls focus towards identifying the commonalities, strengths, weaknesses and failures that influence the conditions needed in a place to shift physical activity levels and enable sustained behaviour change in place-based contexts. This approach aligns with and supports several key UK Government initiatives including the WMCA’s West Midlands On The Move strategic framework, the Get Active strategy, and Office for Health Improvement and Disparities (OHID) objectives, which embed physical activity into public health workforce practice.
This work is currently being applied to a range of projects including the Birmingham 2022 Commonwealth Games legacy work funded via a WMCA and Sport England MoU grant and Sport England’s Place Expansion Partnerships to extend the LEEP approach in four Commonwealth Active Communities. The Commonwealth Active Communities were cocreated with local people to address physical inactivity levels through a £4 million investment from Sport England into the West Midlands.
The Policy Officers are responsible for guiding local staff through evaluation and learning processes, unpicking the complexity of place-based working by uncovering how, why and for whom approaches to reducing physical inactivity are working in a place to build a knowledge base of actionable insights that can be used to inform future projects making them more effective, sustainable and replicable.
This work is supported by academic partners from the National Evaluation and Learning Partnership, also funded via the WMCA and Sport England MoU grant, who validate and support the evaluation process and coach the Policy Officers through providing guidance and resources on realist evaluation. By working closely with community partners, local authorities, universities, and system stakeholders, the LEEP ensures that learning from place-based approaches inform region-wide projects, processes, and strategies.
This new approach to evaluation not only provides the Commonwealth Active Communities with dedicated time and capacity to devote to embedding learning processes in work to reduce physical inactivity, but it also demonstrates to organisations, like the West Midlands Combined Authority, that adopting a realist evaluation approach can lead to more evidence, learning and insight needed to create and embed conditions that will enable policy formation and system-wide change.
Paper short abstract
Presentation to discuss DBT's new Digital Evaluator role, which applies multidisciplinary research methods to evaluate digital interventions and enable smarter decisions. We will share lessons, best practices and case studies to showcase how the role was developed and the benefits it has achieved.
Paper long abstract
The UK public sector spends over £26bn annually on digital technology (gov.uk, 2025). While the Magenta book underlines the importance of evaluating government interventions, the digital aspect of public service delivery has thus far been under-evaluated. Addressing this gap in best practice, the Department for Business and Trade is the forerunner in establishing a team of evaluators at the heart of digital services. This approach enables smart policy making, embeds data-driven insights as a core value, shaping behaviours and decision-making across the organisation. Analysts in the department are embedded within digital teams, providing insightful and impactful evaluation to shape Government’s digital landscape.
Following the success of this initiative DBT led a cross government team, including members of the Evaluation Task Force, Government Economic and Social Research formally develop and launched a GDS "digital evaluator" role. Combining elements of social, statistical and economic research, this role underpins the unique skills required to effectively evaluate digital projects and tools. Compared to traditional evaluators, the digital evaluator is embedded in Agile teams to provide continuous insight to inform ongoing improvement, enable the measurement of the impacts and value for money of digital tools. The digital evaluator integrates ROAMEF principles with the product cycle, as illustrated by the DBT,’s Digital Evaluation Strategy enabling smart, reactive policy delivery, in a fast-paced environment.
This cultural shift will be demonstrated through case studies; notable success stories include the team’s evaluation of AI tools as well as public-facing digital services. By working closely with product teams, communications and senior stakeholders, DBT’s digital evaluators have conducted comprehensive evaluations of the two AI tools to understand their impacts, risks and the attitudes of colleagues. These evaluations have been crucial in enabling seniors to make informed decisions on the future of AI across the department.
For this presentation, we will explain how the digital evaluator works in digital teams, the key capabilities it covers and how it fosters a culture of evaluation in digital and technology settings – as well as the challenges specific to working within digital delivery. We will then cover case studies from our team, discussing how we have applied various evaluation methods to shape decisions and achieve stronger outcomes. To conclude, we will share lessons learned and practical advice for evaluators seeking to establish similar teams in their own organisation.
Paper short abstract
AI adoption in public services is growing fast. The homelessness sector needs capacity to both use these tools and evaluate them rigorously. We share early insights from two randomised trials testing predictive machine learning and generative AI interventions that aim to reduce homelessness.
Paper long abstract
The use of AI is rapidly expanding across public services, but evidence struggles to keep pace with adoption. In homelessness services, where AI tools hold promise, we risk scaling interventions that seem promising without testing their impacts. As with other interventions, AI tools should be robustly tested to understand their impact on the outcomes we care about and identify any unintended consequences.
The Centre for Homelessness Impact is conducting two complementary randomised controlled trials, one funded through MHCLG's Test & Learn programme - the first globally to invest in robust evidence of homelessness intervention impact - and one funded through the Cabinet Office’s Evaluation Accelerator Fund:
Trial 1: Predictive machine learning for upstream prevention (4 Local Authorities, ~2,000 households)
Testing whether machine learning models can identify households at risk of homelessness, and whether proactive phone calls to at-risk households reduces homelessness. Building on promising pilots, this addresses important questions about data quality, practical applications of predictive models, and scalability across local authorities with varying levels of data maturity. This trial is part of the groundbreaking £15m Test & Learn and Systems-wide Evaluation Programme, the first of its kind in the world.
Trial 2: Generative AI for housing advice (Southwark Council, ~9,000 households)
Evaluating an AI chatbot providing personalised housing advice from trusted sources (Shelter, Citizens Advice, government guidance). Unlike general-purpose AI tools like ChatGPT, this chatbot is specifically designed to assess someone's housing situation, offer tailored advice, and draft letters to landlords or councils. The intervention addresses a crucial gap - people often don't seek help until crisis point and can find advice difficult to access - by proactively reaching out to at-risk households and offering 24/7 accessible guidance before households reach statutory thresholds. This trial is funded by the Cabinet Office's Evaluation Accelerator Fund.
This presentation offers methodological insights and implementation learning from setting up trials to evaluate the use of AI. We share learning on:
Embedding rigorous evaluation in fast-moving tech contexts: Pre-registration protocols, ethical oversight, and adaptive designs that balance flexibility and methodological rigour.
Navigating data governance: Practical lessons from data-sharing agreements and concerns around algorithmic decision-making across multiple partners and local authorities.
Building organisational capacity: Understanding variation in data maturity and implications for scaling data-driven approaches.
Addressing ethical dimensions: Considering questions of algorithmic fairness and consent within the context of randomised trials.
These trials show how evaluation needs to keep pace with technology, and that successful adoption requires simultaneously building technical capacity and addressing ethical concerns. Overcoming these challenges builds strong evaluation practices that can test innovations while generating robust evidence to inform decision-making.
This directly addresses exploring "fast-emerging areas such as AI and new ways of working". By sharing learning from these groundbreaking evaluations, we support evaluators and policymakers considering: How do we test AI tools? What conditions enable adoption? How do we ensure these technologies serve vulnerable populations?
In a sector where neither AI applications nor rigorous trials are yet commonplace, these evaluations are building both the capacity and acceptance needed for evidence-based innovation.
Paper short abstract
Attendees will gain insights into evaluating complex, place-based systemic approaches to reducing physical activity inequalities. The session covers innovative methods, evidence on what drives change, and how findings shaped national policy and £250m investment.
Paper long abstract
Background and aims
Sport England have, over several strategy cycles, invested in place-based systemic approaches to tackle physical activity inequalities. Place-based systemic approaches are, by nature, complex interventions. They have multiple interacting parts, are based on local characteristics and aim to influence local conditions for physical activity, as opposed to delivering programmes, alone. To support the evaluation of this investment Sport England commissioned a National Evaluation and Learning Partnership with two aims: to build capacity for evaluation and learning across “Places” and to generate evidence about what meaningfully changes states in systems towards a narrowing of physical activity inequalities, for whom, in what circumstances and why?
Method
The evaluation is developmental, participatory and longitudinal. It uses mixed methods, and is underpinned by realist evaluation and a set theoretic modelling approach “configurational comparative analysis” supported by EvalC3 software. This presentation draws on data from documents (n=48), workshops (n=24) and online evidence submissions (n=150). Evaluation outputs are orientated to support a variety of stakeholders to learn and adapt their approach in real time.
Results
Findings will illuminate not just what has changed, but how change happens. We will highlight specific findings relating to the foundational work to operationalise place-based systemic approaches which has impacted on local and national decision making. This includes capacity building for place-based systemic approaches, generation and sharing of insight about underlying barriers to physical activity and organisational processes that enable or limit systemic ways of working.
Influence
The evaluation has supported the case for a further £250m investment into scaling place-based systemic approaches from 12-90 in England, and formalising the way of evaluating and learning from them. More specific contributions include informing the strategy for expanding the work; including investment guidance and support to places in understanding and developing their place-based systemic approach, and capacity do it. Sport England, it has influenced understanding of how change happens and therefore how to organise to deliver place expansion as a ‘programme’ of investment, and therefore reframing accountability and performance in ways which are in line with complexity.
Paper short abstract
Demonstrated through humanitarian programmes in Sudan and Somalia, this presentation covers MEL Systems Reviews— a document-based methodology that assesses portfolio-level MEL strengths and weaknesses and supports evidence-informed decision-making for more effective programming.
Paper long abstract
The Sudan Independent Monitoring and Analysis Programme (SIMAP) and Humanitarian and Health Evaluation, Learning and Monitoring in Somalia (HHELMS) are two multi-year, whole country humanitarian portfolios of the UK’s Foreign, Commonwealth and Development Office. Oxford Policy Management, the Monitoring, Evaluation and Learning partner for these two programmes, has supported FCDO through a variety of approaches on these two programmes, including through TPM, learning strategy and events, research support, and strategic programme design. One tool used on both projects is a MEL Systems Review.
A MEL Systems Review is a document-based review of programme material. It uses a series of co-developed definitions of different MEL elements (ie, a theory of change or a reporting strategy) and a rubric to assess the strength and integration of these different elements, as they are reflected in programme material. This process helps to identify what part of a MEL System is strongest in a particular programme, and what parts could benefit from strengthening. The tool is an entry point for identifying and designing, in collaboration with implementing partners and donors, tailored technical tools to strengthen overall MEL systems.
Throughout our work, we have found that both implementing partners and FCDO have used evidence produced by the MEL Systems Reviews for systems strengthening. We have found that the evidence produced has influenced decisions and outcomes at both the partner and funder level. One example of this is around future programme design: using findings from the MEL Systems Review, an iteration of one of the programmes supported by OPM will now include more systematic learning. Another example is around methodological training: FCDO has now adopted an indicator strength testing approach (produced as a biproduct of MEL Systems Reviews) for FCDO internal training.
We have found that MEL Systems Reviews have been highly influential and very positively received by FCDO; so much so that a new technical workstream dedicated exclusively to acting on the findings from the MEL Systems Review has been created for one of the programmes. For partners, this work has also increased collaboration between the IPs and MEL partners, ensuring that MEL support work is co-developed and ultimately owned by implementing partners.
With both SIMAP and HHELMS operating in conflict affected states, the MEL Systems Reviews have been important in taking stock of what systems are in place at the donor and implementing partner level. They have also been important in identifying how targeted tweaks to MEL systems can have the biggest impact across monitoring, evaluation, and learning.
Our proposed session will address the factors that affect the use or non-use of evidence produced from MEL Systems Reviews. It will also speak to some of the complexities of supporting whole country portfolios of humanitarian work in fragile and conflict affected states, and how tailored tools can best support both the donor and the implementing partners.
Paper short abstract
Learning journeys embed insights into policy and practice through co-production, participatory design, and systems thinking. This session explores engagement strategies, inclusion and use of technology to turn evidence into actionable insights in multiple learning partnerships.
Paper long abstract
Embedding evaluative evidence into everyday decision-making and programme delivery requires approaches that foster reflection, ownership, and actionable insights. Drawing on the Institute of Development Studies’ experience with Accompanied Learning processes alongside partners such as FCDO, IDRC, GIZ, WFP, and AHRC, this paper explores how co-produced evidence and participatory methods create environments where evidence is valued and used. Through co-design workshops, facilitated reflection spaces, and iterative feedback loops, these approaches integrate systems thinking and user-centred design to strengthen capability within policymaking and delivery.
Our work demonstrates how Learning Journeys—structured, collaborative processes—help organisations identify key learning questions, surface tacit knowledge, and synthesise evidence for real-time decisions. Examples include participatory online workshops with the AHRC Disability-Inclusive Development Network to address power dynamics and equity in funding, and feedback-driven processes with IDRC CORE to amplify Southern-led policy voices during global crises. In contrast, FCDO’s K4D Learning Journeys generally build upon rapid evidence reviews and then use facilitated spaces to explore the practical application of the existing explicit knowledge.
Technology and digital platforms have enabled global participation, while raising ethical questions around accessibility, bias, and consent. Emerging AI tools offer new opportunities for synthesis and sensemaking, but require careful attention to transparency and equity to avoid reinforcing existing power imbalances. By working at the interface of evaluation, communication, and technology, these approaches transform technical findings into actionable insights that influence policy and practice. Ultimately, embedding evaluative thinking through participatory and co-produced methods can catalyse organisational learning, strengthen systems, and ensure diverse voices shape decisions.
Paper short abstract
Evaluating a mega-portfolio of interventions? We’ve been there. We unpack how we tackled ill-suited criteria, abstraction, data access hurdles, and non-evaluator audiences from our work evaluating cyber portfolios, sparking debate on what credible evaluation really means at the mega-portfolio scale.
Paper long abstract
We are delivering a Portfolio Monitoring, Evaluation, and Learning programme focused on a portfolio of cyber interventions. This 'portfolio' is in fact a portfolio in name only, as it houses multiple sub-portfolios, all of which have multiple programmes that house projects. This is challenging evaluatively, as our role involves portfolio-level evaluations and reviews. These pieces of work must cut across a wide range of interventions, delivery bodies, and actors, all operating at different levels of society.
Delivering useful and actionable evaluation at this level of abstraction (i.e. cross-portfolio) is difficult, and cyber and security sector programming bring acute information access restrictions. Furthermore, we face the barrier of security-sector evaluation commissioners often not having any MEL or evaluation background, and requiring very different evaluation products and decision-making support to translate evaluation into action.
In this session we aim to share our experience and spark discussion with others evaluating mega-portfolios or otherwise evaluating the security sector. We will focus on two evaluative reviews we recently conducted focused on coherence and on Gender Equality and Social Inclusion (GESI). We will speak to how we created analytical frameworks and evaluative practice that was practical, defendable, and useful despite operating at a mega-portfolio level. We will also speak to how we delivered these, and created useful and actionable findings to overcome the above stated barriers. We hope to show how these solutions might translate into others’ contexts.
In running this session, we will outline our context, the barriers, and how we overcame them. We then wish to spark discussion with the audience on important questions facing evaluators in our position:
o Can and should government take an OECD-DAC approach to evaluation and reviews in these thematics, or when operating at a mega-portfolio level?
o How do we defensibly but flexibly look into assessing security sector topics at a portfolio-level without over-engineering new criteria that face the same problems?
o How do we define evidence, success, and credibility in these types of reviews and evaluations that operate at a mega-portfolio level? Do you think we got it right?
o How do you evaluate for non-evaluation clients given the above questions?
Relevance to the theme: this is relevant to ‘building evaluation cultures’ as our journey focuses not just on methods and approaches alone, but how evaluative practice is designed to a unique programming culture. We speak to how we created something useful and were a part of fostering a culture of commissioning, participating in, and using evaluation (as well as what went less well here).
Paper short abstract
How Natural England combined evaluation insights, futures thinking, and staff co-design to create its new Science, Evidence and Analysis Framework—embedding evidence at the heart of decisions for nature recovery. Learn what worked, what didn’t, and lessons for others.
Paper long abstract
This session will share how Natural England used evaluation insights, futures thinking, and staff co-design to create their new Science, Evidence and Analysis Framework (SEAF) a practical step toward embedding science, evidence, and evaluation at the heart of decision-making for people and nature.
Evaluation of our previous science strategy revealed significant challenges: fragmented governance, stretched capacity, and a culture that struggled to turn learning into action. At the same time, horizon scanning highlighted a future of accelerating environmental change and uncertainty, raising critical questions about how we could truly become evidence-led. By combining evaluation findings with foresight, we identified priority areas for investment and transformation.
Key elements of our approach:
• Developmental Evaluation in Action – Using “What? So what? Now what?” cycles to feed real-time insights into design decisions.
• Blending Foresight with Evidence – How horizon scanning and scenario planning shaped priorities and governance structures.
• From Theory to Tools – Translating evaluation into practical solutions like the Scientific Hive for evidence access and the Evidence Buddy network for capability building.
• Staff co-design – Involving staff as part of the process, under6and the nature of the problem and co-designing solutions.
The new SEAF provides a blueprint for how we use science and evidence to deliver nature recovery at scale through five themes: Data Science and Digital Innovations, Building Strategic Science Partnerships, Growing Scientific Capability, Science Communication and Impact, and Learning What Works.
We will share what worked well, what didn’t, and the lessons learned along the way. For other public sector organisations, this case offers a replicable approach: combining evaluation and futures thinking to design strategies that are both evidence-informed and resilient.
Paper short abstract
RAND Europe (RE) and the Education Endowment Foundation propose a joint presentation on AI in education evaluation. EEF will share their priorities for using AI in evaluation, RE will present insights from developing an AI assessment marking tool. We discuss scaling, ethics, and lessons learned.
Paper long abstract
The role of AI in the evaluation of education programmes is rapidly expanding, offering opportunities to enhance efficiency, accuracy, and insight throughout the research process. Education systems are increasingly seeking innovative solutions to help improve learning outcomes; however, those commissioning and evaluating education interventions face the challenge of providing timely, actionable evidence while maintaining methodological rigour. AI technologies, particularly Generative AI, present new possibilities for addressing these challenges, particularly around the automation of routine tasks which are required throughout the evaluation process.
We propose a joint presentation by the EEF and RE, exploring AI’s role in evaluation from two complementary perspectives: commissioning and implementation. This talk would fit well within Theme 4: Evaluation in Action, as it directly explores the considerations required to integrate AI tools into live evaluations.
EEF will share their emerging interest in the application of AI within education evaluation, highlighting strategic priorities for integrating these technologies into evidence generation. This includes considerations around cost-effectiveness, scalability, and methodological integrity. EEF will also reflect on the implications for commissioners in ensuring that AI-driven approaches align with ethical standards and maintain transparency in decision-making.
RE will present insights from their current Writing Roots evaluation, an English writing intervention commissioned by EEF. Within this evaluation, RAND Europe is piloting an innovative AI-driven tool designed to mark handwritten assessments produced by children responding to the Writing Assessment Measure prompt. Marking this type of assessment can traditionally require a lot of resourcing and can be time-consuming. The AI tool RE has developed interprets and scores handwritten assessments, and outputs scores for each script in a format which can be directly analysed. This tool aims to reduce evaluator burden while maintaining reliability and validity. We will share lessons learned from the development process, validation results, and practical challenges encountered in integrating AI into a live evaluation project. These include technical and operational issues, such as ensuring fairness and avoiding bias in automated scoring, as well as the ethical considerations and information provided to participants.
Together, EEF and RE will discuss broader implications for scaling AI-enabled approaches in educational evaluation. Key themes will include ethical considerations, such as data privacy and transparency, alongside reflections about how AI can support evaluators in delivering timely insights for policy and practice. Our presentation will also consider the future research agenda and explore questions around the evidence needed to build confidence in AI-driven evaluation methods, and how commissioners and evaluators can collaborate to ensure these tools are deployed responsibly.
Paper short abstract
The presentation demonstrates how a rigorous impact and process evaluation can generate robust evidence and drive real-time improvements. Oxford MeasurEd will present how they designed an evaluation for learning, while Right to Play will show how findings are already shaping their programme.
Paper long abstract
Enhancing Quality and Inclusive Education (EQIE) 2.0 programme is a multi-country initiative delivered by Right to Play and funded by NORAD. It aims to improve foundational literacy and socio-emotional learning (SEL) through play-based pedagogy in Ethiopia, Tanzania, Lebanon and Palestine. In Ethiopia, the programme involves in-service teacher training and training for the head teachers and District Education Officials who will support teachers to change their practice.
Right to Play have commissioned Oxford MeasurEd to deliver an independent evaluation of EQIE 2.0 in Ethiopia and to support their internal monitoring and learning throughout the five-year programme. This presentation focuses on how the evaluation design combines generating rigorous evidence and supporting real-time programme improvement.
Oxford MeasurEd will present how they have designed a robust efficacy trial to assess the programme’s impact on literacy and SEL outcomes, along with a mixed-methods process evaluation to provide timely insights into whether and how the programme is working and classroom practice is changing. This integrated design enables the evaluation to produce credible impact evidence for funders and policymakers while continuously informing programme delivery and adaptation.
Right To Play will present how baseline findings have guided adjustments in design, including refining teacher training content, adapting coaching strategies and prioritising what to monitor. They will discuss how – supported by Oxford MeasurEd – they have embedded a culture of “evidence into action”, ensuring that learning and reflection are integrated throughout the intervention. RTP will also share practical insights from the baseline that have supported adaptations to design.
The presentation will demonstrate how robust evaluation can function as both an accountability mechanism and a driver of adaptive practice. By combining a randomised trial with embedded feedback loops, EQIE 2.0 offers a model for how evaluation partnerships can generate actionable evidence, strengthen learning systems, and contribute to sustained improvements in education quality.
Paper short abstract
What can policy makers and evaluators learn from each other? An examination of six evaluations and research within the same sector, for the same client to demonstrate how the policy/evaluator approach can evolve over time.
Paper long abstract
A co-presented panel discussion between evaluators at Verian and the Ministry of Housing, Communities and Local Government looking at policy evaluation through six collaborative evaluations and research on the private rented sector. The session will cover:
• How does evaluation of policies differ from evaluations of programmes or projects? Policy evaluation demands an even greater focus on long-term impact, strategy over operations and recognising the political dimension.
• How does evolution of evaluation and research design deliver evidence to support housing policy in the private rented sector? This will draw on six projects that Verian has and is working on for MHCLG, influencing how policy is evaluated and supporting decisions to improve implementation.
• When evaluating policy, why we need to think about evaluation as an ongoing process (integrated with monitoring) rather than a series of set-piece snapshots.
Paper short abstract
In this session, we discuss the criteria for classifying a city as “smart” and propose an integrated evaluation of smart city rankings that reveals technocentric limits and improves sustainability outcomes aligned with the UN 2030 Agenda, demonstrating this through a case study in Brazil.
Paper long abstract
Brazil presents a highly unequal urban landscape marked by deep regional disparities, heterogeneous levels of digital infrastructure, and long-standing inequities in access to public services. In this context, smart city initiatives have expanded rapidly and gained visibility through rankings that reward digitalization, innovation ecosystems and technological sophistication. However, these rankings often influence policy priorities by signaling prestige and competitiveness, even though they may not reflect the social and environmental realities of most municipalities. This creates a unique environment to investigate how evaluation frameworks can reinforce—or challenge—policy agendas in complex and unequal urban systems typical of the Global South.
The SDGs offer a comprehensive and normative framework for evaluating urban development by integrating social justice, environmental protection, economic resilience and inclusive governance. However, despite their global adoption, the extent to which SDG principles are incorporated into local policy instruments varies widely. In Brazil, many municipalities face difficulties aligning technological innovation with social and environmental priorities. SDGs related to health, education, gender equality, climate action, inequality reduction and sustainable urban development provide a robust lens to assess whether the “smartness” promoted by rankings effectively contributes to broad-based well-being. By grounding the evaluation in the SDGs, this study positions sustainability not as an optional component of urban intelligence, but as its ethical and developmental foundation.
The proposal compares indicators from the Connected Smart Cities ranking (Urban Systems, 2024) with municipal performance on all 17 SDGs using the Sustainable Development Index of Cities (IDSC). Through this evaluation perspective, we identify that being ranked as a smart city does not guarantee superior SDG performance, particularly in social SDGs such as health, education, gender equality, and inequality reduction. Several non-ranked municipalities outperform ranked ones in these domains. Only SDGs related to innovation, infrastructure and environmental management show partial alignment with smart city indicators.
These results reveal structural weaknesses in the evaluative models used to guide public policies in Brazil. Current rankings are dominated by infrastructure and technocentric indicators, which provide an incomplete basis for policymaking. From an evaluation-use perspective, the findings highlight three barriers:
(1) misaligned incentives created by reputational rankings;
(2) contextual disparities that undermine cross-municipal comparability; and
(3) fragmentation between evaluation domains that results in weak or misleading policy signals.
Despite these barriers, the evaluation also identifies opportunities for policy improvement. By exposing inconsistencies between the “smart” label and actual SDG performance, the study supports the development of adaptive, integrative evaluation tools that align technological innovation with social and environmental goals. The proposed model, inspired by the SDG “wedding cake,” integrates urban intelligence indicators with sustainability outcomes, offering municipalities a path to revise priorities and strengthen public policy coherence.
Overall, the work argues that evaluations capable of influencing policy must go beyond technology-based performance measures and incorporate multidimensional, territorially informed perspectives that reflect the complexity of urban systems. Such approaches enable more just, sustainable and evidence-based urban policymaking.
Paper short abstract
This session shares early insights from the Refugee Employability Programme evaluation, showing how evidence is driving adaptive delivery and policy learning to improve refugee integration outcomes.
Paper long abstract
Background and aims
The Refugee Employability Programme (REP) was a major Home Office initiative supporting refugees in England to integrate and progress towards sustainable employment. The independent evaluation, led by Ipsos UK with RAND Europe and Renaisi, examined how programme design, local partnerships, and delivery models influenced employability and integration outcomes. This session shares interim findings and explores how evaluation evidence informed real-time programme learning and policy adaptation.
Methods and approach
Using a mixed-methods, theory-based and quasi-experimental designs (QED), the evaluation combines process and impact analysis, fieldwork across multiple regions, and monitoring data review. Interviews with delivery partners, local authorities, and refugees illuminate variations in delivery, coordination mechanisms, and contextual challenges. To estimate programme impacts, the evaluation applies a QED using HMRC administrative data. This approach enables the assessment of employment entry and progression patterns over time, providing credible counterfactual evidence on the programme’s contribution to labour-market integration.
Key findings and insights
Early findings highlight that strong partnerships, co-location of employability and integration services, and flexible employer engagement underpin success. Data-sharing barriers and regional labour market differences create challenges. The evaluation demonstrates how participatory, data-informed learning loops can support adaptive implementation and continuous improvement.
Conclusions and implications / potential for impact
This evaluation has the potential to generate actionable insights that will support the development of future refugee employment programmes. By triangulating multi-region implementation findings with robust QED evidence on employment outcomes, the study can identify which delivery models, partnership arrangements and support components are most effective in enabling sustained labour-market integration. It also has the potential to illuminate system-level barriers - such as coordination challenges, caseload pressures and access to wider services - that shape programme performance, thereby providing a stronger evidence base for enhancing delivery coherence and operational resilience. In doing so, the evaluation can inform future policy decisions on scaling, targeting and commissioning, ensuring that forthcoming programmes build on what works and are better equipped to address the structural and contextual challenges faced by refugees entering the UK labour market. This session will feature partners’ reflections on how evidence informed the REP’s delivery and broader policy on refugee employability. It offers practical insights into bridging evaluation and action within complex, multi-stakeholder programmes.
Paper short abstract
This presentation uses principles of utilization -focused evaluation to examine two real-world evaluations of programmes addressing serious youth violence in England. Drawing on these cases, we identify practices that meaningfully involve stakeholders throughout all stages of the evaluation process.
Paper long abstract
While evaluations, especially impact evaluations, are now standard practice in the delivery of social programmes, many are characterised by the limited engagement of the programme actors (members of organisations or programmes that are being evaluated). Rarely, and only in specific evaluation models, such as participatory approaches, is co-production treated as fundamental to the evaluation process. Whether impact, outcome, or process evaluations, there is minimal consideration of how programme actors actually engage with both the evaluation itself, its outputs, or the implementation of findings. Yet meaningful engagement throughout the evaluation cycle signals trust in the data, relevance of insights, and practical utility - hallmarks of evaluation that creates genuine value.
This presentation critically examines two real-world evaluations of programmes addressing serious youth violence in England, exploring two fundamental questions: ‘What makes an evaluation design and its implementation truly engaging?’ and ‘What value does this engagement add to the relationship between evaluators and those involved in delivery, and ultimately to the successful implementation of evaluation outputs?’
We argue that engagement, measured through active participation in design, data collection, sense-making discussions, and uptake into organisational decision-making, should be recognised as a core indicator of evaluation success, not merely a desirable by-product. Drawing on the theoretical framework of utilization-focused evaluation which emphasises the importance of making evaluations useful and relevant to stakeholders (Patton, 2008) and of the principles of bottom-up evaluation, we also demonstrate how such processes create opportunities to embed evaluative thinking and mindset among diverse stakeholders, ultimately cultivating sustainable evaluation cultures that continue to be helpful even after the evaluation is completed.
The presentation offers practical insights into designing for an engagement-based evaluation delivery from inception, including participatory approaches to evaluation design, co-creation of evaluation questions, collaborative data collection processes, and structured mechanisms for ongoing dialogue around emerging findings, including their implementation. We also consider how evaluators navigate the tension between engaging programme stakeholders on one hand and maintaining independence on the other, examining how this balance influences evaluator-stakeholder relationships in practice. We conclude by challenging the field to expand evaluation success metrics beyond methodological rigour and timely delivery to include the quality and depth of stakeholder engagement at critical junctures.
Paper short abstract
This panel explores how embedded learning partnerships shift evaluation from rigid frameworks to adaptive, trust-based approaches — supporting participatory design, real-time decision-making, and inclusive learning cultures across donor and grantee organisations.
Paper long abstract
Evaluation is widely promoted as essential for accountability and learning. However, in practice, traditional approaches—often rigid, judgement-oriented, and externally driven—tend to reinforce a culture of compliance rather than curiosity. Many funders are now deliberately shifting towards a learning-oriented view of evaluation: seeing it not as a mechanism for judgement, but as a strategic opportunity to strengthen programmes. This shift is prompting greater demand for more embedded evaluation roles, where evaluators work as learning partners. This closer relationship helps funders and partners more meaningfully interpret evidence and make real-time, evidence-based decisions that support programme adaptation for greater impact. Such roles create the conditions for staff to engage openly with feedback and to view evaluation as a supportive, adaptive learning process rather than an accountability exercise that occurs at the end of the programme.
In response to this shift, Triple Line has been working with Porticus since 2023 as an embedded Learning Partner. This collaboration spans Porticus’ global education portfolio, engaging both programme staff and grantee partners to co-design programmes, facilitate learning processes and foster an organisational ethos of continuous learning, reflection and adaptation. The partnership aims to move all players; Porticus, Triple Line, and grantee partners, beyond compliance-driven monitoring and evaluation, to cultivate a culture of inquiry grounded in trust, curiosity and shared purpose.
Drawing on their experience, speakers from Triple Line and Porticus will share practical insights into how embedded learning practices are being integrated into day-to-day work at both programme and organisational levels. The session will explore participatory programme and MEL framework co-creation, iterative reflection tools, and strategies for embedding evidence-informed decision-making within complex systems. It will also examine the challenges of cultivating a learning culture within donor agencies and across grantee organisations—including navigating power dynamics and enabling genuine collaboration.
Importantly, the panel will include short recorded contributions from two grantee partner organisations working on the ground. These voices will highlight how embedded evaluative activities have supported their own learning and real-time decision making, and the challenges they have encountered along the way.
We argue that cultivating a learning culture demands a fundamental shift in how evaluation is conceived and practiced. Rather than focusing solely on metrics and outcomes, evaluative processes must enable collaborative sense-making, support emergent learning and be responsive to context. This approach not only strengthens programme effectiveness but also contributes to more equitable and inclusive development practice.
Paper short abstract
This presentation discusses the design of the Behaviour Hubs programme evaluation, outlining the opportunities and challenges encountered at each step. The innovative design combined Realist Evaluation and Qualitative Comparative Analysis (QCA) with a survey designed specifically for QCA analysis.
Paper long abstract
This presentation discusses the design of the Behaviour Hubs programme evaluation. The Behaviour Hubs programme was launched to support schools and Multi-Academy Trusts (MATs) in improving pupil behaviour. The programme encouraged 'lead' schools and MATs with exemplary behaviour cultures to collaborate closely with 'partner' schools seeking to improve their pupil behaviour. Its objectives were to ensure that more teachers felt supported by senior leaders in managing misbehaviour, and understood and consistently applied their school's behaviour policy, ultimately leading to fewer incidents of disruptive behaviour.
The programme, which supported over 650 schools, built on centrally organised bespoke resources and a taskforce of behaviour advisers, delivering customised specialist training and networking events, open days, and encouraging the building of relationships between schools.
The evaluation aims were to: a) determine whether the programme had met its strategic objectives and achieved its projected outcomes for schools, staff, and pupils; b) understand how and why the intervention did (or did not) meet its objectives; and c) investigate the change mechanisms triggered by the programme that produced the observed outcomes and impacts, examining variation across different schools and respondent groups.
The combination of Realist Evaluation and Qualitative Comparative Analysis (QCA) was considered the most appropriate design because of its focus on change mechanisms, contextual variation, as well as ability to generalise findings to medium and large numbers of cases (the survey had responses from 105 schools from a total of 650+ participating schools).
The design was innovative because while there are relatively few examples of QCA applications to large N datasets and survey data, there are almost none in evaluation. The presentation outlines the opportunities and challenges encountered, going through each step, from model specification following exploratory case study work, to the design of a bespoke QCA survey to obtain a dataset of consistently comparable cases, through to calibration, running the QCA algorithms, and interpreting and presenting the findings.
It shows the kind of causal patterns QCA is able to discover, their fit to the impact evaluation questions, and the transparency and repeatability of analysis procedures.
Paper short abstract
Systems change takes more than strategy, it needs a culture of learning, reflection and adaptation. We’ll share how we’re collectively building an embedded evaluation culture in a multi-year programme, explore what we’ve learned, and hear from partners who’ve been part of the journey.
Paper long abstract
Connected Futures is a place-based systems change programme designed to transform the journey from education to employment for young people facing exclusion and disadvantage. Tackling complex, cross-cutting challenges, the programme supports the development of locally tailored approaches to youth employment by placing young people at the heart of the system, from schools and employers to housing, health, and care.
We’re excited to share what we’ve learnt from Connected Futures. As we continue to deliver this programme, we remain acutely aware of the challenges inherent in this approach, particularly the pressure between empowering local partnerships to lead learning and responding to expectations for evidence that can influence wider policy. We’ll explore these trade-offs, including navigating power dynamics, working within fixed funding structures while supporting adaptive learning, and sustaining engagement across complex systems. Partners from the programme will join us to share their experiences of integrating evaluation into their systems change activity and decision making.
We’ve cultivated an evaluation culture from the outset, adopting a developmental evaluation approach to support real-time learning in a dynamic systems environment, including rapid feedback loops & iterative sensemaking. This enabled partnerships to adapt to emergent conditions and shifting priorities. A dedicated learning partner provided strategic oversight, building capacity across local partnerships to use evidence, challenge assumptions, and strengthen coherence across diverse workstreams.
As partnerships deepened their understanding of local barriers and systemic opportunities, they began testing different approaches and looking for early signs of traction with stakeholders across the system. To support this, we commissioned embedded action researchers in each local area to facilitate reflective practice, capture emergent insights, and strengthen local learning loops.
Commissioning was intentionally collaborative and co-designed with local areas to build trust, leverage local knowledge, and foster a sense of joint ownership over learning and adaption. Together, the learning partner and action researchers helped shift evaluation from a reporting function to a shared practice of inquiry, prioritising participation, building capacity, strengthening relationships, and driving more coherent and impactful systems change. Theories of change were co-created with local partnerships as evolving tools for sensemaking, alignment, and adaptation.
We’re still learning what it takes to embed an evaluation culture within complex local systems. This session shares what’s worked, what’s been hard, and what we’re still figuring out, with voices from those who’ve lived and shaped the work.
Paper short abstract
Participatory research exploring how services collaborate to support people facing Multiple Disadvantage, fostering reflection, shared learning, and evidence-informed decision-making across government and third-sector partners.
Paper long abstract
This presentation shares insights and lessons learned from a national research programme exploring how services can work together more effectively to support people experiencing Multiple Disadvantage. Across multiple workstreams, we delivered participatory research activities designed to foster learning, reflection, and collaborative decision-making across central government, local authorities, and third-sector partners.
Key activities and workstreams included:
• Workstream 1: Systems mapping and theory of change workshops to align stakeholders around shared goals.
• Workstream 2: Co-designing thematic research priorities and local area knowledge products to ensure evidence generation meets practical needs.
• Workstream 3: Collaborative development of beneficiary monitoring and identification approaches.
• Workstream 4: Value-for-Investment frameworks to support understanding of service outcomes.
These activities embedded reflective practice, collaborative inquiry, and the use of evidence into everyday decision-making.
The presentation will highlight:
• How participatory research fosters shared understanding and alignment across services–drawing on lessons learned from WS1-WS4.
• Practical lessons from designing and coordinating multiple interlinked workshops and engagement sessions across complex systems.
• How responsive, collaborative client relationships enable iterative learning, adaptation, and actionable insights.
By reflecting on these experiences, the session demonstrates how participatory research can strengthen multi-agency collaboration, embed learning into decision-making, and support evidence-informed approaches to complex social issues like Multiple Disadvantage.
Paper short abstract
Fast, practical workshop: build a causal map from interviews in the Causal Map app. Learn manual coding, option to try AI-assisted suggestions. Leave with an understanding of what causal mapping can offer as an approach, an ethics/quality checklist and basic knowledge of how to use the app.
Paper long abstract
Evaluators need to turn rich qualitative material into visuals to communicate findings and assist with evaluative judgements about pertinent drivers and outcomes. In this 40-minute, laptop-open workshop we guide participants through creating a credible causal map from interview excerpts using the Causal Map app, combining manual and optional AI-assisted coding. Causal Map is free to use for manual coding and public projects.
What we’ll do:
• 00–05: Why causal mapping? Quick examples of causal links and how maps support shared sense-making.
• 05–10: Interface tour and setup. Load a small, anonymised dataset (provided), skim transcripts, and discuss coding rules such as “no link without a quote”.
• 10–20: Manual first. Participants manually code 3–5 links; we merge near-synonyms.
• 20–30: Optional AI-assist. Instruct the AI to continue the coding (or continue with manual coding). We review, modify the instructions, recreate new suggestions and then accept/modify/reject individual causal links.
• 30–35: Make it communicable. Create a filtered map for a target audience to answer a specific question, add automatic narrative summaries, and export and share with clickable evidence.
• 35–40: Debrief: limits, mitigations, and next steps.
Learning outcomes: how to…
- Build a causal map that enables tracing from every node/link to the original quotes.
- Optionally, use AI-assisted coding safely with human oversight and clear acceptance rules.
- Produce narrative vignettes and filtered causal maps which support specific research questions and evaluative judgements.
Who it’s for:
Evaluators, analysts, and commissioners who work with qualitative data, including interviews and reports and need fast, transparent synthesis of causal links in the text. No prior mapping experience required.
What to bring / setup:
Laptop + browser. We provide sample data (anonymised), and a one-page crib sheet. Participants will also find out how to upload their own material.
Why UKES Theme 3?
The focus is communication: turning transcripts into visuals to support evaluative judgements, with transparent provenance, means evaluators can answer evaluation-relevant questions in an easily communicable way - maintaining rigour and traceability.
Paper short abstract
How can we ethically and robustly evaluate domestic abuse recovery services for children? This roundtable brings together evaluators, delivery partners, and lived experience experts to explore barriers, solutions, and lessons from two pioneering UK pilot RCTs.
Paper long abstract
Domestic abuse affects one in five children in England. The consequences can be profound and enduring, from poor mental and physical wellbeing to difficulties with building healthy relationships in the future. Only 29% of parents seeking support for their child(ren) are able to access it. Robust evidence on what works to improve outcomes for children affected by domestic abuse is lacking because domestic abuse recovery services for children remain under-evaluated. Without evidence on what shifts the dial on outcomes, policymakers and funders lack the confidence to sustainably invest in services that could transform the lives of many children and young people if delivered at scale. At the same time, evaluating these services is complex and requires thoughtful consideration of the ethical concerns around some methodologies and appropriately mitigating these concerns in the evaluation design.
This roundtable will explore how impact evaluation can be done ethically and effectively in this complex policy area, drawing on pioneering randomised controlled trials (RCTs) of two recovery programmes: 1) Bounce Back 4 Kids, a trauma-informed programme for children aged 3 to 11 years and their non-abusive parents, and 2) WeMatter, an online group recovery programme for children and young people aged 8 to 17 years.
The discussion will bring together a small group of evaluators, programme facilitators, and lived experience experts across the two projects to share practical lessons and discuss key questions such as:
- How can evaluators and service providers collaborate effectively to maintain programme quality, participant wellbeing, and methodological rigor?
- What practical advice does the panel have for ensuring service providers, evaluators and those with lived experience work in genuine partnership?
- How can we balance methodological rigor with ethical concerns in evaluations involving vulnerable children and families?
- What strategies help to overcome barriers to recruitment, retention, and resource constraints?
- What role do evaluators and commissioners play in supporting service providers to build their evaluation capacity?
- How might evidence help make the case to secure sustainable funding and why is this important?
At the time of the conference, both projects will be well into the delivery of the full-scale trial, offering a unique opportunity to reflect on early learning and the transition between pilot and full-scale phases. Attendees will leave with insights into collaborative and iterative evaluation approaches, ethical design, and strategies for embedding evaluation cultures in under-evaluated policy areas. This session will demonstrate how generating evidence on what works, for whom, and in what context can shape policy and funding decisions, ensuring more children can receive the support they need.
(Note to abstract reviewers: We have not confirmed exactly who from the project teams will participate, as it's been a very busy time for them wrapping up the pilots of theses evaluations this month. But there is a lot of interest across both project teams to be involved in this roundtable discussion. We are also proposing to have an independent academic expert to moderate this discussion who is neither from Foundations, nor any of our partnering organisations.)
Paper short abstract
Evaluation consulting in the impact sector presents a paradox, as evaluators must serve as both objective 'outsiders' and collaborative co-designers. This paper advocates for learning partnerships in education and youth services to mediate this tension.
Paper long abstract
Evaluation consulting in the social impact sector embodies a paradox. Independent evaluators, influenced in part by the culture and ethos of private management consultancies such as the 'Big Three' and in part by the need to ensure objectivity, are encouraged to assume the role of the 'other' or 'outsider' in client-consultant relationships. Paradoxically, this very distance can foster transactional dynamics that undermine the collaborative conditions necessary for meaningful evaluation of social programmes. Evaluators systematically evaluate organisational practice but rarely face scrutiny of their own methodological assumptions, positional power, or contextual understanding. This one-way accountability becomes increasingly problematic as policy demands for evidence-based practice intensify. Without critical reflection on these consulting models, we risk institutionalising transactional rather than transformative approaches to evaluation.
This presentation draws on a critical literature review (Grant and Booth, 2009) of academic and grey literature to examine prevailing evaluation consulting models in the UK social impact sector. Anchored in Gaventa's Power Cube framework (Gaventa, 2006) and Blyde's consultant-client relationship typology (Blyde, 2008), it addresses two core questions: (a) What are the limitations and systemic risks of standard evaluation consulting arrangements, particularly regarding accountability gaps and epistemic asymmetries? and (b) Can “learning partnerships” offer a transformative alternative, redistributing power, embedding mutual accountability, and prioritising organisational learning alongside evaluative judgement?
I argue that learning partnerships, characterised by transparent negotiation of evaluator positionality and explicit capacity-building commitments, can address the fundamental power imbalances inherent in traditional consulting relationships. These partnerships are especially promising in sectors where power dynamics critically shape service quality, such as education, social care, and youth services. The presentation concludes by exploring why learning partnerships remain rare, despite their theoretical appeal, and examining the structural barriers to their design and implementation. It then proposes practical suggestions for evaluation commissioners and practitioners seeking to operationalise more equitable evaluation consulting approaches.
Paper short abstract
Automating Indicator Targets vs Actuals reporting through use of real-time MEL Technologies enables real-time tracking and analysis of program performance. This approach supports timely reflection, collaborative learning, and evidence-informed decisions throughout the program cycle.
Paper long abstract
Evaluation findings often arrive too late to influence decision-making, limiting their utility for adaptive management. Manual data collection and reporting introduce delays and inconsistencies, creating a disconnect between technical findings and actionable insights that undermines evaluation goals of supporting learning and improving effectiveness.
To address this, Mercy Corps is rolling out MEL Technologies globally, including CommCare for case management/offline data collection, Microsoft Azure for cloud data engineering and storage, and Power BI for interactive analysis and visualization. A key component of this rollout is a standardized approach to automating Indicator Targets vs Actuals reporting across all Mercy Corps country offices. This enables MEL champions to design and deploy dashboards that transform raw data into timely, actionable insights. This ensures all Mercy Corps country teams are well equipped to monitor performance in real time, identify gaps, and take corrective action long before final reports/evaluations are produced.
The approach aligns with utilization-focused evaluation principles, emphasizing practical application, stakeholder engagement, and actionable evidence. Automated dashboards are designed not only to display performance metrics but also to support interpretation and reflection across diverse audiences. Through visual storytelling, interactive features, and co-created analytics, program teams and decision-makers can collectively understand trends, co-design solutions, and embed learning throughout the program cycle.
A key strength of Mercy Corps’ approach is its capacity-building pathway, which moves from basic to advanced automation techniques. Recent MEL Tech trainings (delivered in English, French, and Spanish) reached over 130 MEL champions across 30+ countries in Latin America, Africa, Middle East, Asia, Eastern Europe equipping them to develop automated data engineering processes and dashboards that that establish consistent Indicator Targets vs Actuals analysis as well as conducting country portfolio-level analysis, organizational outcome measurement, and automated participant counts. This structured training approach ensures consistency, promotes shared understanding, and enables scaling of automated MEL Tech systems across diverse programs and countries.
This session will share Mercy Corps’ experience in standardizing Indicator Targets vs Actuals automation, highlighting technical design, training approach, and lessons learned. Participants will explore how combining MEL technologies, capacity building, and utilization-focused evaluation can transform evaluation from a static reporting exercise into a dynamic, collaborative practice and turn data into actionable insights that drive real-time program improvement and learning across multiple contexts.
Paper short abstract
Recognising that many academic studies focus on a narrow band of outputs, this in depth study uses participatory approach to co-create and develop a new set of dimensions through which to view and evaluate collaborative capital building and value co-creation in RDI contexts
Paper long abstract
As well as alignment to ‘Bridging the Gap: evaluation to action’, this paper also relates to the conference theme of 'Building evaluation cultures'. Through an in-depth study of a successful university-based cooperative research centre (a longitudinal study over a 10 year period conducted as a part time PhD), this paper unpacks the lived experiences of a range of stakeholders (including academics and a diverse range of industry participants from SMEs to large industry primes) involved in collaborative Research, Development & Innovation (RDI). The partners' shared long term goal of achieving a paradigm shift in the way pharmaceuticals are manufactured (from the current batch manufacturing to more efficient and sustainable continuous manufacturing technology, systems and processes). In this collaborative RDI context, the co-production of evidence was generated using participatory and user-centred methods supporting reflection and learning as an integral part of the evolution of the technology, products, processes and the the evolving partnerships across the innovation ecosystem.
This area of advanced manufacturing is critically important to the UK economy (one of the key sectors in the UK Industrial Strategy) and the the study of a successful case demonstrates how effective monitoring and evaluation has been embedded in such a way that ensured a focus on delivering impacts from the outset. Having an agreed shared goal is considered critical to maintaining focus and driving interaction, creativity and collaboration amongst partners.
This study has highlighted a broader range of important metrics being used in the monitoring, evaluation and management in collaborative RDI and this paper demonstrates how this approach has enabled evaluation to play a key role in everyday decision-making and delivery contributing to a culture of entrepreneurial action and and the evolution of the new technology, systems and processes.
Paper short abstract
This session explores developmental evaluation (DE) as a tool for driving systemic change at the intersection of climate, social justice, and development. Through case studies, we show how DE enables real-time learning, adaptation, and policy influence in complex, multi-stakeholder contexts.
Paper long abstract
Systemic change in climate and development programming requires adaptive, learning-oriented approaches that go beyond traditional evaluation models. This panel presentation explores developmental evaluation (DE) as a tool for influencing policy and programme change at the nexus of the climate crises, social justice and development. DE emphasizes real-time learning, iterative adaptation, and stakeholder engagement—critical elements for navigating complexity and uncertainty. Drawing on two case studies, we illustrate how DE has informed strategic shifts and strengthened resilience in diverse contexts:
Ford Foundation’s BUILD Programme: A global initiative to enhance the institutional capacity of social justice organizations. The evaluation demonstrated how DE can support long-term systems change by embedding learning into organizational strengthening strategies.
Climate Ambition Support Alliance (CASA): An ongoing evaluation of a multi-country programme aimed at accelerating climate ambition in vulnerable regions. Here, DE facilitates adaptive management and policy engagement in response to evolving climate and geopolitical crises.
The session will highlight practical insights on:
How DE fosters systemic change by influencing programme design and policy dialogue.
Lessons for applying DE in both domestic and international development contexts.
Challenges and opportunities in integrating DE within complex, multi-stakeholder initiatives.
Participants will gain actionable strategies for leveraging evaluation as a driver of systemic change, particularly in programmes operating at the intersection of climate, crisis, and development.
Paper short abstract
How can evaluators use third-party monitoring (TPM) evidence without treating this as “just M&E”? I argue TPM deserves its own evidence space distinct from Evaluation and M&E data. My talk draws on humanitarian TPM in Myanmar to offer shared terms and practical design tips for ethical, credible use.
Paper long abstract
Access constraints, remote management, and duty-of-care risks have made third-party monitoring (TPM) a defining feature of development and humanitarian delivery in many contexts. Yet evaluators often inherit TPM datasets late in the cycle, misread verification of delivery, quality and use as routine monitoring, or discount TPM evidence because its findings, methods, and governance do not map neatly onto evaluation practice.
This presentation argues that Evaluation, Monitoring & Evaluation (M&E), and TPM overlap—but none is a subset of the others. Each is shaped by different purposes and incentives: While evaluators are commissioned to make defensible claims about merit, worth, and contribution, M&E systems prioritise performance reporting, even as TPM is organised around independent verification, risk management, and operational accountability. When Evaluators treat TPM as “just M&E”, they miss TPMs distinctive evidentiary value, which puts their findings at risk of either overconfidence in reporting performance (e.g., treating verification as impact) or dismiss evidence-based scepticism in results (e.g., discarding useful findings).
Crucially, any evaluator of humanitarian and development assistance will eventually confront diversion and fraud, waste and abuse (FWA). These are not rare exceptions; they are predictable risks. Rather than treating FWA as taboo or as “audit-only” topics, this talk offers practical, ethical ways to detect, test, and communicate possible diversion/FWA without turning evaluation into an investigation or putting people at risk: asking questions and making observations that identify warning signs without prompting accusations; cross-checking claims across sources (including patterns in micro-narratives); being clear about where the data came from and how it was handled; agreeing in advance what counts as a serious concern; and reporting uncertainty calmly and proportionately.
Using a case example of TPM of humanitarian assistance across Myanmar, I show how embracing the TPM paradigm opens pathways of inquiry that conventional evaluation designs underuse: (1) evidence about implementation fidelity, who was “reached” and who was not; (2) deliberate use of “negative evidence” (non-delivery, substitution, or obstruction) to test causal tests; and (3) fast feedback loops that can inform decisions before a final report. The case example draws upon short, structured stories from participants (micro-narratives, in the spirit of SenseMaker) to complement checklists and numbers, and to help explain context and unintended effects. Used well, these stories strengthen cross-checking across sources and help surface issues that people may not name directly.
The talk sits squarely within Theme 2: how evaluation can be embedded into everyday decision-making, learning, and delivery; what helps create environments where evidence is valued and used; and how ethical considerations and power dynamics shape whose voices are heard. I address practical safeguards for voice, safety, and bias when “independence” is contractual and access is uneven, including managing gatekeeper influence and being transparent about who collects, controls, and interprets the information.
I conclude with a shared vocabulary, e.g., verification, validation, triangulation, fidelity, reach, risk signals, and evaluative claims, to make TPM data more interpretable, more comparable across time/areas, easier to use responsibly, and better support evaluative reasoning.
Paper short abstract
Get guided, hands-on practice on AI tools and workflows used in UN case studies where AI achieved >90% validated accuracy. Activities include AI analysis of interviews, reports, and survey responses, as well as practice with features like AI avatar interviewers, visualizations and chatbots.
Paper long abstract
As evaluation teams face growing volumes of qualitative data, tighter timelines, and rising expectations for timely learning and use, AI is increasingly positioned as part of everyday evaluative practice. Yet many evaluators remain rightly cautious: How accurate is AI compared to human analysts? Where does it genuinely add value? And how can it be used ethically, transparently, and without reinforcing bias or hallucinations?
This interactive workshop addresses these questions through real-world UN evaluation case studies, where AI methods were systematically benchmarked against human evaluators and independently validated at >90% accuracy. Rather than focusing on theory or speculative futures, the session emphasises practical workflows, governance approaches, and hands-on application that evaluators can immediately translate into their own work.
Participants will explore three applied case studies drawn from UN evaluations:
1) AI Interview Transcript Analysis (UNHCR):
AI was used to analyse 50 qualitative interview transcripts, generating thematic, subgroup, and segment-specific insights aligned with evaluation questions. Results were benchmarked against human coding and validation processes, demonstrating how AI can support rigorous qualitative analysis while dramatically reducing time and cost.
2) AI Avatar Interviewers (UNESCO):
AI avatars were deployed to conduct 50 interviews in two days, enabling multilingual, culturally sensitive data collection at scale. This case illustrates how AI can expand reach to under-represented groups, reduce interviewer burden, and support more inclusive and adaptive evaluation designs.
3. AI Document and Survey Analysis (UNICEF):
AI analysed over 700 management responses and survey entries across 160 evaluation reports in five languages, identifying cross-cutting barriers, enablers, and patterns that would have been impractical to detect manually. The case demonstrates how AI can support synthesis, learning, and utilisation across portfolios.
Beyond showcasing results, the workshop focuses on how these outcomes were achieved responsibly. Participants will learn how human-AI benchmarking was conducted, how hallucination risks were mitigated, and how ethical safeguards, such as human-in-the-loop review, bias checks, and transparent documentation, were embedded into evaluation workflows.
A core feature of the session is hands-on participation. All attendees will be provided with complimentary access to the AI tools used in the case studies and guided through live exercises. Participants will have the option to work with their own evaluation data or provided sample interviews, reports, and survey responses. Activities include:
- Analysing qualitative data and survey responses using AI-assisted workflows
- Creating AI avatar interviewers tailored to specific evaluation contexts
- Generating visualisations and dashboards for sensemaking and communication
- Interacting with an AI chatbot to interactively query evaluation findings
By the end of the session, participants will leave with a clear understanding of where AI meaningfully strengthens evaluation practice, how to apply it ethically, and how it can help bridge the persistent gap between evidence generation and action. The workshop directly contributes to building evaluation cultures that value learning, timeliness, inclusion, and responsible innovation, aligning with the conference theme of “Bridging the Gap: Evaluation to Action.”
Paper short abstract
This research piece reviewed the impacts of Gender Equality and Social Inclusion in the Foreign, Commonwealth and Development Office programming in Somalia. Evidence was provided to inform adaptive programming in a complex context and under policy shifts of reduced Oversees Development Assistance.
Paper long abstract
The Equalities research piece provides lessons on Gender and Social Inclusion (GESI) mainstreaming, to inform an adaptive approach to programming within the context of reduced Oversees Development Assistance (ODA) in Somalia. Findings and recommendations from this provides lessons on the use of evaluative review to adapt programmes based on evidence of what works, in the face of barriers and uncertain political context and policy environments.
This research was delivered under the Foreign, Commonwealth and Development Office’s (FCDO) Somalia Monitoring Programme III (SMP III). SMP III builds FCDO understanding of development needs in Somalia by providing actionable learning to improve the design and delivery of programmes. UK ODA allocations to Somalia have fluctuated substantially in recent years; these shifts affect spending and programmatic activities targeted towards women and other marginalised groups. In 2026, the UK will continue to reduce the aid budget to 0.3% of gross national income by 2027/28 (having been reduce from 0.7% to 0.5% from 2021). The Equalities research addresses urgent questions about who is reached by programmes and how Equality outcomes can be sustained as budgets reduce.
The Equalities piece was completed over four months. The objectives of the research were 1) to understand the extent to which Equality was a consideration in the design of programmes and the extent of reporting against Equity. 2) To understand how the application of the GESI Strategy advanced/maintained Equality expectations, and what lessons can be embedded into future programming, and 3) to understand the potential impact of reduced funding to FCDO programming in Somalia on Equality.
The team analysed the Equality considerations in design documents and their contextual grounding, the extent that Programme Results Frameworks and log frames included and measured Equality indicators, assessed data disaggregation against the nine protected characteristics, and the extent that programme Value for Money frameworks captured Equality data. The team conducted KIIs with key programme staff from the FCDO. The team sought to identify Equality gains achieved and how these can be sustained after programme closure, and crucially how Equality is viewed in the Somali context. The outcomes of this research 1) built the FCDO understanding of the current achievements of GESI mainstreaming, and 2) developed recommendations in collaboration with the FCDO for future programme design and adaptation in the face of reduced ODA funding.
The Equalities research offers a case study of how the SMPIII team used evaluative review to generate evidence-based, actionable insights for adapting programmes. It also provides lessons on how Equalities can continue to be monitored in the face of increasingly challenging contexts, and barriers to GESI programming.
Paper short abstract
This presentation demonstrates how scaling assessments enable evaluators to influence policy and programme change by assessing viability, costs, adaptations and risks to expand proven interventions, using practical tools and Tetra Tech case studies.
Paper long abstract
Scaling assessments are an underused but powerful approach for closing the gap between evidence and large-scale change. This presentation explains how systematic scaling assessments can influence policy and programme change by providing decision makers with clear, practical judgments about whether, how and under what conditions proven interventions can be expanded to benefit many more people.
Drawing on Tetra Tech International Development experience across a range of sectors, including child safety, parenting, food security, disease prevention, biodiversity preservation and water and sanitation, we will set out a pragmatic framework for assessment. Our framework examines both intrinsic features of the model and the external systems that determine whether replication or expansion is feasible. Key dimensions include credibility, observability of results, adaptability to new contexts, affordability at scale, incentives and capabilities of adopting organisations, and the policy and budget environment. The approach uses structured checklists and scoring tools to surface strengths, identify critical risks and prioritise information gaps that need to be filled before a full scale up is attempted.
I will show how assessment outputs can be translated into actionable guidance for policy makers and programme managers. Typical products include a concise articulation of scaling challenges sequenced adaptations, monitoring priorities and a staged piloting plan. These outputs are deliberately diagnostic rather than prescriptive. They support governance decisions by clarifying trade-offs between fidelity and reach, and by specifying the evidence and implementation conditions required to preserve impact at scale.
The presentation will feature two or three short case studies from Tetra Tech practice. Each case will illustrate how assessments influenced decisions about organisational priorities; new delivery partners; and alternative and lower cost delivery models. Participants will learn practical methods for integrating scaling assessments into evaluation portfolios so that evaluations move beyond measuring effect to shaping action. I will discuss timing and sequencing to ensure assessments inform policy windows, and how to present risk balanced, politically savvy recommendations. I will also address common pitfalls, including over reliance on pilot success without context analysis, and treating scalability as a binary judgement rather than a process.
By the end of the session attendees will be able to explain what a scaling assessment is, why it matters for influencing policy and programme change, and how to design one that delivers concise, credible advice for decision makers. In resource constrained times, funders and governments must be able to distinguish between programmes that are merely beneficial and those that can be transformative at scale. Scaling assessments are a practical evaluation tool to make that distinction and to increase the likelihood that proven interventions will be successfully expanded to produce sustained, population level impact.