P113: Demystifying data supply chains: perspectives from markets of data sourcing, production, and brokerage

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

P113

Demystifying data supply chains: perspectives from markets of data sourcing, production, and brokerage

Convenors:: Jamie Wong (Harvard University)
Wanheng Hu (Cornell University)
Send message to Convenors

Discussant:: Danah Boyd (Microsoft Research)

Format:: Traditional Open Panel

Location:: HG-02A36

Sessions:: Thursday 18 July, 9:00-10:30, 11:00-12:30
Time zone: Europe/Amsterdam

Short Abstract:

This panel seeks to investigate “data supply chains.” We invite contributions that help clarify the practices and techniques, and the assemblage of networks and channels – formal and informal, legal and illegal, regional and global – that enable the commodification and economization of digital data.

Long Abstract:

Data, like any commodity, do not come already commodified. While "raw data" may be an "oxymoron" (Gitelman 2013), the "rawness" of data is, nonetheless, relative to those who deal with digital data and the specific contexts where data are produced, traded, or consumed. To manufacture data as "commodity" and as "product" involves many stages, transformations, and negotiations, requiring skillful sourcing and combination of materials, quality control, and marketing. Important recent research has begun to unveil the hidden labors behind data-driven technologies and businesses (e.g. Gray and Suri 2019). Yet, little is understood about the configuration of networks and channels – formal and informal, legal and illegal – that enable the commodification and economization (Çalışkan and Callon 2009) of data. This panel seeks to further clarify the regional and global operations of “data supply chains” (Spanaki et al. 2018).

We invite papers that offer insights to ground speculative rhetorics and debates, especially those pertaining to the AI industry, about data and their value – economic or otherwise – using real-world examples. Possible perspectives include but are not limited to: What actors, practices, techniques, and technologies comprise the infrastructure necessary for data brokerage? How do data brokers make digital information fungible to be sold at set prices? What are the conventions of pricing data at different levels of “rawness,” and how do these vary between different domains and industries? What are the marketing practices and rhetorics around “valuable” data? How are understandings of data’s value construed differently along different stages of the “data supply chain”? What do real-world cases tell us about the “interoperability” of data (Ribes 2017) across different models and domains and its impact on the data market?

Accepted papers:

Session 1 Thursday 18 July, 2024, 9:00-10:30

Re-intermediation in data work: the role of mediator organizations in China's data-annotation industry

Tongyu Wu (Zhejiang University)

Send message to Author

Short abstract:

This study draws on three year ethnographic work and 154 interviews to explores data annotation reintermediation in China, highlighting mediator organizations’ role in re-embedding data annotation in local societies and shaping human-algorithm complementarities in data annotating process.

Long abstract:

Data annotation, often unseen yet vital for data-centric technologies, is the focus of this investigation (Gray and Suri, 2019). Drawing on three years of ethnographic research and 154 interviews within China's annotation industry, this study adds to the debate on data work reintermediation. It responds to the emerging trend of reintermediation, as identified by Graham's team, which emphasizes the role of intermediary institutions in balancing algorithmic control and labor, challenging previous literature favoring disintermediation and its technological advantages (Graham & Lehdonvirta, 2017).

The concept of Complementary Organizations to Algorithms (COTAs) is introduced, drawing on human-computer interaction research. Supported by economists and sociologists, such as Autor (2015) and Shestakofsky (2017), it highlights a move towards complementarities but not substitutions between human and computer. The study examines how COTAs mitigate computerization’s and algorithms’ shortcomings by providing organizational resources for China's data supply chain and data annotation industry. Additionally, it illustrates that local governments, NGOs, and vocational training institutions can act as COTAs. For instance, the Guizhou local government functioned as a COTA, creating mechanisms to stabilize demand fluctuations for annotations from tech hubs and adopting a "people-optimization" approach to aid "product-optimization" in data annotation ecosystem.

Data collection involved thorough three-year fieldwork across seven data annotation centers in China and semi-structured interviews with stakeholders from major tech firms like Alibaba, TikTok, Tencent, and Baidu, data bureau and exchange center. This comprehensive method uncovers the complex interplay between human labor, technology, and the evolving data annotation landscape, highlighting COTAs' role in connecting these areas.

Negotiating data access in an increasingly datafied world

Sine Zambach (Copenhagen Business School)

Web data futures: reflections on web archiving as ‘knowing capitalism’

Jessica Ogden (University of Bristol)

Send message to Author

Short abstract:

This paper examines both the power and future-making capabilities of historical web data by turning its attention to the role that large-scale open access web archives (like the Internet Archive, CommonCrawl and others) are playing in the circulation and commodification of web data.

Long abstract:

The proliferation and commodification of new and emergent forms of data has been a key area of interest within the digital social sciences. Previous debates have focused on the ways that online platforms and technologies are implicated in the datafication of everyday life, as well as social science claims to expertise in the realm of so-called ‘big data’. Whereas studies of datafication have heavily focused on corporate-owned social media and communication platforms, this paper turns its attention to the role that large-scale open access web archives are playing in the circulation and commodification of web data. The paper conceptualises the sociotechnical significance of web archives through the lens of Thrift’s (2005) concept of ‘knowing capitalism’. The paper explores how web archives are fundamentally premised on the mass accumulation of web content over time, and outlines the ‘value chains’ that organisations (such as the Internet Archive, CommonCrawl and others) enact through the collection, maintenance and transformation of the Web into stable data archives. Example use-cases demonstrate how these archives embody and generate diverse forms of (social, cultural, economic and political) value when deployed online. This analysis enables a broader interrogation of web archives beyond repositories for web-based research data (as they are frequently framed), towards critical sites for examining both the power and future-making capabilities of historical web data. The paper concludes by mapping a research agenda for the study of web archival use to further understand these data infrastructures and their place in the digital economy.

Session 2 Thursday 18 July, 2024, 11:00-12:30

Quantifying vulnerability: the vulnerability assistance framework as an empirical investigation into UN humanitarian data production

Daniel Kryger (University of Washington, Seattle Campus)

Crawling the green: the uncertain role of environmental NGOs as data brokers

Matteo Tarantino (Università Cattolica del Sacro Cuore di Milano)

Lighting bumpy data roads: investigating heterogeneous data supply chains in urban traffic transformation

Catharina Dietrich (Goethe University Frankfurt) Janine Hagemeister (Goethe University Frankfurt)

Send message to Authors

Short abstract:

Tracing data journeys in the context of traffic transformation in Frankfurt, we explore the intricate data supply chains involving municipality, enterprises, and citizens. We examine the multifaceted roles those actors obtain in data trading, shedding light on forms of data valorization at play.

Long abstract:

Urban traffic infrastructure matters for a variety of actors: the municipal administration, private enterprises, and most numerously, citizens. Just as manifold as the relations between these actors are the traffic data they generate for different purposes at different locations, and the evolving supply chains.

This presentation is based on an ongoing research project on data politics in the proclaimed “sustainable traffic transformation” (Verkehrswende) of Frankfurt, Germany. We use the concept of data journeys (Bates et al. 2016) to trace how data is produced, processed, and shared within the complex networks at play.

Many actors obtain ambivalent roles in this entanglement: Citizens are simultaneously customers, objects of datafication, data consumers, or sometimes even data manufacturers, while the municipality acts as both a key data provider and a customer. In addition, relevant legal regulations and political interests contribute to a mixture of commodification logics and practices. As a public actor within private markets, the municipality finds itself in various double binds.

We will illustrate the diversity of data trading practices on two examples. While the data purchases involved in the development of a new municipal traffic model follow traditional logics of marketization, other constellations elude those. In the case of pedestrian counters that were installed in Frankfurt, the agreements are based on an exchange of various services and mutual data provision instead of monetary payments. Attending to those diverse forms of bargain allows us to investigate the multifaceted practices of data valorization involved along different stages of the supply chains.