Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality.

Accepted Paper:

Never blindly trust a pipeline: coordinating communities and their tools in data processing pipelines  
Kathryne Metcalf (University of California, San Diego)

Send message to Author

Short abstract:

Through a study of a replication debate, I examine how data processing can both enforce and invisibilize critical epistemic decisions. This project examines both open data and open source processing tools as sites for epistemic contestation, particularly across diverse forms of expertise.

Long abstract:

Critical research on scientific data processing has often focused on the human discernment involved in cleaning, labeling, and otherwise caring for data. For open datasets, data processing is understood to include the data transformations that “everyone”—all imagined research users—would agree are necessary. However, the epistemic values involved in processing decisions are often rooted in a community of practice which shares a specific understanding of their object of study—an understanding which may be distinct from that of other research communities. Here, I examine the role of data processing in human microbiome research through a recent replication debate over a high-profile study. While numerous groups have claimed fault with its findings, the argument hinges not on issues with the data or analytic method, but the data processing pipeline: the open dataset used in the paper was processed with a different set of assumptions than those of the researchers, whose own processing tools are alleged to have introduced (rather than identified) their key finding. Drawing on this debate, as well as interviews with bioinformatics software developers, I show that open source data processing tools can both enforce and invisibilize critical decisions, demonstrating the limits of openness without an awareness of tacit epistemic norms. I also reflect on what this means for diverse research communities who share large data resources, typically produced by investigators in the Global North.

Traditional Open Panel P095
Interrogating openness and equity in the data-centric life sciences
  Session 2 Tuesday 16 July, 2024, -