Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality.
Log in
- Convenors:
-
Jamie Wong
(Harvard University)
Wanheng Hu (Cornell University)
Send message to Convenors
- Discussant:
-
Danah Boyd
(Microsoft Research)
- Format:
- Traditional Open Panel
Short Abstract:
This panel seeks to investigate “data supply chains.” We invite contributions that help clarify the practices and techniques, and the assemblage of networks and channels – formal and informal, legal and illegal, regional and global – that enable the commodification and economization of digital data.
Long Abstract:
Data, like any commodity, do not come already commodified. While "raw data" may be an "oxymoron" (Gitelman 2013), the "rawness" of data is, nonetheless, relative to those who deal with digital data and the specific contexts where data are produced, traded, or consumed. To manufacture data as "commodity" and as "product" involves many stages, transformations, and negotiations, requiring skillful sourcing and combination of materials, quality control, and marketing. Important recent research has begun to unveil the hidden labors behind data-driven technologies and businesses (e.g. Gray and Suri 2019). Yet, little is understood about the configuration of networks and channels – formal and informal, legal and illegal – that enable the commodification and economization (Çalışkan and Callon 2009) of data. This panel seeks to further clarify the regional and global operations of “data supply chains” (Spanaki et al. 2018).
We invite papers that offer insights to ground speculative rhetorics and debates, especially those pertaining to the AI industry, about data and their value – economic or otherwise – using real-world examples. Possible perspectives include but are not limited to: What actors, practices, techniques, and technologies comprise the infrastructure necessary for data brokerage? How do data brokers make digital information fungible to be sold at set prices? What are the conventions of pricing data at different levels of “rawness,” and how do these vary between different domains and industries? What are the marketing practices and rhetorics around “valuable” data? How are understandings of data’s value construed differently along different stages of the “data supply chain”? What do real-world cases tell us about the “interoperability” of data (Ribes 2017) across different models and domains and its impact on the data market?
Accepted papers:
Session 1Timothy Monteath (Warwick University)
Short abstract:
Identifying a company consistently across time and datasets is a highly complex task; through examining how this information becomes standardised, this research explores the ‘data supplychain’ of financial information and how ‘raw’ public information becomes ‘refined’ into a highly valuable product.
Long abstract:
Financial data is a commodity whose business is that commodities. Like all data, it does not come fully formed but must be shaped into a form that is of use and value to financial market professionals. This paper presents research on this ‘data supply chain’ by exploring how companies come to be identified in financial databases. By interrogating proprietary company identifiers, such as BLEI (Bloomberg) and RIC (LSEG) and emergent ‘open’ standard identifiers, such as GLEI, create standardisation and how they breakdown this research seeks to trace the supply chain of financial information from ‘raw’ to ‘refined’. STS researchers are no strangers to interrogating finance and its data, however, this research has primarily been concerned with questions of speed, immediacy, and performativity by contrast, identifying a company consistently is a seemingly mundane issue. Indeed, the identity of a company is ostensibly public information. Yet, as companies merge, rebrand, expand and develop ever more complex financial structures (often all occurring across multiple countries), identifying a single consistent company from public information can be a difficult and time-consuming task. This is particularly an issue for the use of ML and AI in finance as these models require consistent entity identification to be successfully trained. Through exploring issues of standardisation and interoperability in this domain, the research seeks to speak to and contribute towards understandings of commodification – and its intersection with ‘openness’ – across data supply chains.
Tongyu Wu (Zhejiang University)
Long abstract:
Data annotation, often unseen yet vital for data-centric technologies, is the focus of this investigation (Gray and Suri, 2019). Drawing on three years of ethnographic research and 154 interviews within China's annotation industry, this study adds to the debate on data work reintermediation. It responds to the emerging trend of reintermediation, as identified by Graham's team, which emphasizes the role of intermediary institutions in balancing algorithmic control and labor, challenging previous literature favoring disintermediation and its technological advantages (Graham & Lehdonvirta, 2017).
The concept of Complementary Organizations to Algorithms (COTAs) is introduced, drawing on human-computer interaction research. Supported by economists and sociologists, such as Autor (2015) and Shestakofsky (2017), it highlights a move towards complementarities but not substitutions between human and computer. The study examines how COTAs mitigate computerization’s and algorithms’ shortcomings by providing organizational resources for China's data supply chain and data annotation industry. Additionally, it illustrates that local governments, NGOs, and vocational training institutions can act as COTAs. For instance, the Guizhou local government functioned as a COTA, creating mechanisms to stabilize demand fluctuations for annotations from tech hubs and adopting a "people-optimization" approach to aid "product-optimization" in data annotation ecosystem.
Data collection involved thorough three-year fieldwork across seven data annotation centers in China and semi-structured interviews with stakeholders from major tech firms like Alibaba, TikTok, Tencent, and Baidu, data bureau and exchange center. This comprehensive method uncovers the complex interplay between human labor, technology, and the evolving data annotation landscape, highlighting COTAs' role in connecting these areas.
Sine Zambach (Copenhagen Business School)
Long abstract:
When working with data science and AI, data is a crucial resource. It is necessary to demonstrate robust results, and there is competition in both academia and industry to obtain the most and best data as quickly as possible. Negotiating the supply of data is critical to the practice of data science.
This study investigates data negotiation through the lens of 'data diplomacy'. We conducted interviews with professionals from various organizations who work with data to explore this concept. Diplomacy refers to the practice of building relationships between organizations or within an organization, with data as the central resource.
Initial findings indicate that certain data negotiation skills share similarities with traditional diplomatic virtues. It is important to establish trust and consider the concerns and feelings of other parties when transferring data. Additionally, those who possess data express concerns about potential misuse and misinterpretation. It is worth noting that strict regulations within an organization can lead to tension and improper data handling.
The ability to understand and apply these principles is a crucial skill for data workers, as data plays an increasingly central role in a wide range of fields, from industry to government.
Jessica Ogden (University of Bristol)
Long abstract:
The proliferation and commodification of new and emergent forms of data has been a key area of interest within the digital social sciences. Previous debates have focused on the ways that online platforms and technologies are implicated in the datafication of everyday life, as well as social science claims to expertise in the realm of so-called ‘big data’. Whereas studies of datafication have heavily focused on corporate-owned social media and communication platforms, this paper turns its attention to the role that large-scale open access web archives are playing in the circulation and commodification of web data. The paper conceptualises the sociotechnical significance of web archives through the lens of Thrift’s (2005) concept of ‘knowing capitalism’. The paper explores how web archives are fundamentally premised on the mass accumulation of web content over time, and outlines the ‘value chains’ that organisations (such as the Internet Archive, CommonCrawl and others) enact through the collection, maintenance and transformation of the Web into stable data archives. Example use-cases demonstrate how these archives embody and generate diverse forms of (social, cultural, economic and political) value when deployed online. This analysis enables a broader interrogation of web archives beyond repositories for web-based research data (as they are frequently framed), towards critical sites for examining both the power and future-making capabilities of historical web data. The paper concludes by mapping a research agenda for the study of web archival use to further understand these data infrastructures and their place in the digital economy.
Daniel Kryger (University of Washington, Seattle Campus)
Long abstract:
Since 2014, the UN humanitarian response in Jordan has used the Vulnerability Assistance Framework (VAF) initiative to produce data monitoring the situation in Jordan, enable comparisons with other humanitarian emergencies globally, and determining eligibility and prioritization of refugees for assistance. Making this data, and making it interoperable in these different domains, are an overlapping set of multinational, governmental, non-profit, and private-sector actors. Through a material-semiotic methodological approach inspired by Latour's Actor-Network Theory, I interview relevant staff and review the VAF bureaucratic literature in order to trace this data production pipeline and how these competing priorities are navigated. My preliminary findings point to the flexible operationalization of 'uncertainty' which enable the building of consensus across various technical and political domains. This empirical work has the potential to contribute to the academic literatures as a case study on data production and algorithmic governance.
Matteo Tarantino (Università Cattolica del Sacro Cuore di Milano)
Long abstract:
In an era marked by information superabundance and fragmentation, environmental communication faces both unprecedented challenges and transformative opportunities. This presentation explores how different social actors, particularly non-governmental organizations (NGOs), act as data brokers by harnessing web scraping techniques to aggregate dispersed data and pursue sustainability agendas. From the perspective of software studies, the paper investigates two cases: one regarding air quality data and the other regarding water quality data. The methodology draws from code studies, ethnography and interviews. The first case study regards an Italian citizen association scraping dispersed air quality data from various online sources to supplement their its own DIY monitoring network. The second case study focuses on the failure of another Italian citizen initiative to aggregate water quality data through web scraping. Both cases showcase the techno-social webs that propel and limit such approaches in fragmented information landscapes. We highlight how web scraping proves instrumental - often the only solution - to provide vital aggregation of dispersed environmental data, enabling NGOs to craft targeted and impactful communication strategies, fostering public awareness and engagement. At the same time, we identify material and immaterial costs and challenges associated with web scraping, including ethical considerations, data accuracy issues, and legal implications. This new role of NGOs as environmental information brokers emerge therefore as risky, expensive and marked with significant tradeoffs.
Akshita Sivakumar (University of California, Davis)
Long abstract:
Environmental justice and climate justice activists routinely participate in state-led processes to determine energy futures. In environmental governance, digital tools like computer models and sensors are routinely employed in consensus-based, deliberative democratic processes to ensure just outcomes. These tools coordinate the values of the state, social movements, and the market and help anticipate and implement programs and policies. They rely on data from empirical sources, statistical models, and machine learning algorithms. STS and critical data studies have recently acknowledged the role of difference and dissensus in participatory processes through theories of agonism (Crooks and Currie 2021). However, the methods and effectiveness of how social movement actors might maintain these differences to ensure just outcomes in state-led data practices remain to be studied. This paper traces data supply chains to amplify the role of difference and dissensus in data practices for environmental governance through a close reading of California’s Low Carbon Fuel Standard (LCFS). I draw on extensive fieldwork with environmental justice activists, decolonial theories on worlding (Spivak 1985, de la Cadena & Blaser 2018), and the theory of agonism (Mouffe 1999). This paper draws on two years of fieldwork with environmental justice activists participating in California's decarbonization program. I propose a conceptual and methodological framework called Agonistic Arrangements to maintain dissensus amongst various social groups to deepen reflexivity in data supply chains. Such interventions can bridge the gap between deliberative democracy and agonism in data-intensive environmental governance practices. These findings have implications for participatory governance across domains.
Catharina Dietrich (Goethe University Frankfurt) Janine Hagemeister (Goethe University Frankfurt)
Long abstract:
Urban traffic infrastructure matters for a variety of actors: the municipal administration, private enterprises, and most numerously, citizens. Just as manifold as the relations between these actors are the traffic data they generate for different purposes at different locations, and the evolving supply chains.
This presentation is based on an ongoing research project on data politics in the proclaimed “sustainable traffic transformation” (Verkehrswende) of Frankfurt, Germany. We use the concept of data journeys (Bates et al. 2016) to trace how data is produced, processed, and shared within the complex networks at play.
Many actors obtain ambivalent roles in this entanglement: Citizens are simultaneously customers, objects of datafication, data consumers, or sometimes even data manufacturers, while the municipality acts as both a key data provider and a customer. In addition, relevant legal regulations and political interests contribute to a mixture of commodification logics and practices. As a public actor within private markets, the municipality finds itself in various double binds.
We will illustrate the diversity of data trading practices on two examples. While the data purchases involved in the development of a new municipal traffic model follow traditional logics of marketization, other constellations elude those. In the case of pedestrian counters that were installed in Frankfurt, the agreements are based on an exchange of various services and mutual data provision instead of monetary payments. Attending to those diverse forms of bargain allows us to investigate the multifaceted practices of data valorization involved along different stages of the supply chains.