Accepted Paper
Short Abstract
We present a circular optimal transport framework to validate & integrate seasonal biodiversity data across platforms. Applied to 250+ bird species, our method enables the merging of citizen-generated data, allowing robust, scalable cross-platform integration for global & local biodiversity efforts.
Abstract
Citizen science platforms like iNaturalist and eBird engage distinct communities in biodiversity monitoring, generating large biodiversity datasets. Differences in observer profiles, sampling intensity, and reporting behavior across platforms can lead to platform-specific biases which may challenge the consistency and validity of citizen-generated data (CGD) and hinder its integration across platforms for scientific and policy applications. To address this, we propose a statistical validation framework that assesses the compatibility and mergeability of temporal biodiversity distributions across platforms. Specifically, we develop a novel method using circular optimal transport to test the statistical equivalence of seasonal observation patterns within species. Applying this framework to over 250 bird species in Northern California and Nevada across two sample years (2019 and 2022), we find that the large majority (>97%) of species exhibit statistically mergeable seasonality patterns between eBird and iNaturalist, with post-hoc expert validation supporting our results. Our method provides a quantitative basis for integrating CGD sources and flagging inconsistencies that may require further expert investigation or explanation. This validation approach supports more robust data fusion across platforms and informs best practices for large-scale biodiversity monitoring. Our findings demonstrate the potential for citizen science platforms of all sizes to contribute to broadscale analyses and support global and local integration efforts, such as remote sensing validation and biodiversity trend monitoring.
Validation of distributed citizen science data for integrated global use