- Convenor:
-
Nuria Castell
(NILU)
Send message to Convenor
- Format:
- Panel
Short Abstract
This workshop explores how AI and machine learning can validate citizen-generated data for scientific and policy use at local, national and global levels. Participants are invited to share experiences, tools and methods aimed at increasing data credibility, uptake and impact.
Description
Citizen-generated data (CGD) has growing potential to support sustainability, inform public health, and influence policy, but only if it meets credibility standards. This workshop focuses on technical validation of CGD, with an emphasis on the use of artificial intelligence (AI) and machine learning (ML).
Drawing from the Horizon Europe projects More4Nature and CitiObs, we will present two use cases:
• In deforestation monitoring, AI and ML are used to automatically validate geotagged photographs submitted by citizens in Cambodia.
• In air quality monitoring, ML techniques detect outliers and calibrate sensor data, which is then used to generate validated air quality maps at the European scale.
These examples show how CGD can become analysis-ready data, used for validating remote sensing products, enhancing models, and informing national and global decision-making.
The session invites participants to share their own technical approaches to CGD validation, especially those that apply automation, data fusion, or novel quality control frameworks. We aim to explore common challenges, successful strategies, and opportunities for collaboration.
The workshop will be an open discussion on making CGD scientifically robust and policy-relevant, increasing openness and harmonization in validation procedures to facilitate CGD uptake in national and global datasets, as Copernicus in-situ monitoring or Global Forest Watch.
Accepted papers
Short Abstract
This paper will present how Arter.dk, Denmark’s national biodiversity portal, integrates AI and machine learning to validate citizen-contributed species observations. We explore approaches to improve data credibility, including image recognition, data driven and expert-assisted validation.
Abstract
Citizen-generated biodiversity data holds immense promise for research, conservation and policy-making. However, its credibility depends on reliable validation processes that experts can trust. This paper explores how Arter.dk, Denmark’s national biodiversity portal, is integrating AI and machine learning to support the validation of species observations—while acknowledging the need for careful, transparent adaptation to expert workflows, including species experts in co-creation of new more automated validation processes.
While image recognition has become a common entry point for AI in biodiversity monitoring, we argue that AI must go beyond image classification to address challenges such as spatial anomalies and seasonal inconsistencies. At Arter.dk, we are experimenting with data driven and AI based concepts to detect outliers in location, time and more, at the time of observation as well as in the validation process.
The adoption of these tools is not purely technical—it is also cultural. Experts need to understand, trust, and gradually integrate AI and data driven outputs into their validation routines. We discuss how co-designing the workflow and allowing human-in-the-loop validation are essential steps toward building confidence in automated systems.
This paper contributes to the workshop’s goal of making citizen-generated data scientifically robust and policy-relevant. By sharing lessons from the Danish biodiversity portal, we invite discussion on how to harmonize validation standards across platforms.
Short Abstract
Citizen science biodiversity data are vital but unevenly distributed across time. Using records of six Iberian tree species, we show observations increase on weekends and mild spring days, while extreme weather reduces activity. Recognizing these patterns improves ecological research and monitoring.
Abstract
Citizen science biodiversity data have become increasingly important for ecological research, conservation, and long-term monitoring. However, the flow of records is not uniform: some days accumulate many observations, while others contribute very few. This temporal variability, influenced by environmental and social factors, can introduce biases that complicate the use of citizen science data for studying population trends. In this study, we investigated which factors influence the number of observations submitted by citizen scientists throughout the year. We focused on six tree species native to the Iberian Peninsula that maintain a relatively constant appearance across seasons, thus minimizing phenological cues as drivers of recording effort. Observation data was obtained from the iNaturalist/BioDiversity4All platform. We then examined how variables such as day of the week, month, public holidays, temperature, rainfall, wind, and snow affect recording activity. Our results show clear patterns: citizen scientists are more active on weekends, during spring, and under mild weather conditions. In contrast, very hot or cold days, as well as days with heavy rain or strong winds, are associated with fewer records. Public holidays and snowfall appear to exert little influence. By identifying these patterns, we can better account for potential biases and enhance the reliability of citizen science datasets. Recognizing that recording activity follows predictable social and seasonal dynamics allows for more accurate interpretation and application of citizen-generated biodiversity data in ecological research and monitoring programs.
Short Abstract
We present a circular optimal transport framework to validate & integrate seasonal biodiversity data across platforms. Applied to 250+ bird species, our method enables the merging of citizen-generated data, allowing robust, scalable cross-platform integration for global & local biodiversity efforts.
Abstract
Citizen science platforms like iNaturalist and eBird engage distinct communities in biodiversity monitoring, generating large biodiversity datasets. Differences in observer profiles, sampling intensity, and reporting behavior across platforms can lead to platform-specific biases which may challenge the consistency and validity of citizen-generated data (CGD) and hinder its integration across platforms for scientific and policy applications. To address this, we propose a statistical validation framework that assesses the compatibility and mergeability of temporal biodiversity distributions across platforms. Specifically, we develop a novel method using circular optimal transport to test the statistical equivalence of seasonal observation patterns within species. Applying this framework to over 250 bird species in Northern California and Nevada across two sample years (2019 and 2022), we find that the large majority (>97%) of species exhibit statistically mergeable seasonality patterns between eBird and iNaturalist, with post-hoc expert validation supporting our results. Our method provides a quantitative basis for integrating CGD sources and flagging inconsistencies that may require further expert investigation or explanation. This validation approach supports more robust data fusion across platforms and informs best practices for large-scale biodiversity monitoring. Our findings demonstrate the potential for citizen science platforms of all sizes to contribute to broadscale analyses and support global and local integration efforts, such as remote sensing validation and biodiversity trend monitoring.
Short Abstract
We introduce FILTER, a collection of algorithms and recommendations for quality control and correction of citizen-collected environmental data. It increases the reliability of measurements obtained from stationary and mobile low-cost sensors operating under different environmental conditions.
Abstract
The growing adoption of low-cost sensors (LCSs) has opened new opportunities for participatory air quality and environmental noise monitoring. However, the resulting data streams often vary in quality due to diversities in device performance, environmental conditions and deployment settings.
To address these challenges, we present FILTER (Framework for Improving Low-cost Technology Effectiveness and Reliability) – a collection of algorithms for quality control and correction of citizen-collected environmental data using advanced statistical and machine learning methods. FILTER statistically evaluates these datasets, and assigns both overall and individual quality flags, providing an additional measure of data reliability and trustworthiness. Originally designed for stationary PM2.5 measurements (Hassani et al., 2025), FILTER has been expanded to include modules for mobile and wearable LCSs (m-FILTER) as well as environmental noise measurements (n-FILTER).
All FILTER versions follow a processing pipeline. The initial steps involve basic statistical tests to identify the physical consistency of the collected measurements, their temporal stability, and the identification of potential outliers based on both historical trends and spatial comparison with neighboring LCSs. More advanced steps evaluate the relative and absolute performance of LCSs against higher quality instruments and reference stations. All algorithms are designed to be user- and case- specific, allowing easy adaptation to diverse monitoring contexts and study objectives.
Short Abstract
Deep Time’s human-in-the-loop model combines spatial AI and citizen mapping to validate 70,000 habitat polygons across 5000 km² of UK landscape. Citizen–AI fusion achieved 88 % accuracy vs Machine Learning datasets, proving people remain the essential validators in AI-driven environmental science.
Abstract
Deep Time shows how human-in-the-loop AI can transform citizen-generated data into scientifically robust, policy-ready evidence. Developed by DigVentures, the platform integrates spatial AI baselines with collective human interpretation, enabling citizens to refine and validate Earth Observation data across 5 300 km² of UK landscape.
Machine-learning habitat maps from Living England provided the initial training layer; citizens then improved these outputs through Deep Time’s participatory GIS and online learning system. Each citizen-drawn polygon was automatically cross-checked against AI predictions and scored for fidelity, accuracy, completeness, and recency via a live QA dashboard. Results showed 88 percent concordance with ML outputs, and 60 percent of grids surpassed machine-only accuracy in complex habitats such as peatlands and coastal zones.
This human-in-the-loop approach closes known AI gaps—limited training data, weak contextual reasoning, and low trust—by embedding citizens as co-creators of validated datasets. Mission Leaders and partner ecologists oversee tiered review cycles, creating analysis-ready outputs for Natural England, National Landscapes, and Wildlife Trusts. By combining machine efficiency with human judgement and local knowledge, Deep Time builds datasets that are both technically credible and socially legitimate.
The project demonstrates that AI and citizen science are not competing paradigms but complementary systems. Linked through a human-centred validation loop, they jointly enhance data quality, equity, and impact—offering a replicable framework for distributed environmental monitoring aligned with European data-validation standards.
https://digventures.com/projects/deep-time/