T002: The Lives and Deaths of Data

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

T002

The Lives and Deaths of Data

Convenors:: Sabina Leonelli (Technical University of Munich)
Brian Rappert (University of Exeter)
Send message to Convenors

Stream:: Tracks

Location:: 122

Sessions:: Thursday 1 September, 14:00-15:45, 16:00-17:45, 18:00-19:45, 19:30-21:15, Friday 2 September, 9:00-10:45
Time zone: Europe/Madrid

Short Abstract

This track investigates the relational constitution of data: how stages in the life of data articulate to one another and the challenges involved in storing, moving, classifying, manipulating and interpreting them.

Long Abstract

This session explores the collectivities emerging through data collection, dissemination, assemblage and analysis. Analysing the ways in which information becomes taken as given things, the manner in which data and their varying contexts of use are co-constituted, and the means by which utility is invested and divested in them provides a platform to explore and challenge the powers attributed to "Big" and "Open" data by governments, lobby groups and institutions around the world. With its long standing attention to the conditions of knowledge production, STS scholarship is well positioned to reflect on the value(s) attributed to data under a variety of different circumstances, how such attribution changes in time, and what this indicates about the properties of the objects being identified and used as 'data' and of the communities involved in such identification and use. Questions to be addressed include: What would it mean to speak of the birth of data? How do they develop, especially when they are used for a variety of purposes by different stakeholders? Do they ever cease to be data, and how can we conceptualize situations in which data are dismissed, forgotten, erased, lost or regarded as obsolete? This session will be organised as a set of individual presentations encompassing several different aspects and areas of data use. We aim to allocate between 15 and 20 minutes per paper, and to allocate chairs who can also work as discussants, helping to bring the content of the papers together.

SESSIONS: 5/5/5/5/4

Accepted papers

Session 1 Thursday 1 September, 2016, 14:00-15:45

Journeys and Deaths of Scientific Data

Sabina Leonelli (Technical University of Munich)

A new life abroad: the portability of racialized data.

Andrew Smart (Bath Spa University) Kate Weiner (University of Sheffield) Catherine Will (University of Sussex)

Key Issues in Social Studies of Disclosure Control

Andrew Turner (University of Bristol) Madeleine Murtagh (Newcastle University) Paul Burton (University of Bristol)

Creating Infrastructures: The Rise and Imaginary of Microfilm (1920-1950)

Estelle Blaschke (University of Lausanne)

Data birth, transformation and use in complex systems sciences

Fabrizio Li Vigni (Centre National de la Recherche Scientifique)

When are data? Reflections on the making (possible) of research data

Jutta Haider (Lund University ) Sara Kjellberg (Lund University)

How not to get scooped

Goetz Hoeppe (University of Waterloo)

The Limits to Data Sharing in Low-resourced Research Environments

Brian Rappert (University of Exeter) Louise Bezuidenhout (University of Leiden)

The shaping of an e-research infrastructure: drawings as equipped data

Dominique Vinck (Lausanne University) Pierre-Nicolas Oberhauser (University of Lausanne)

Send message to Authors

Paper short abstract

The paper accounts for a research project aimed at studying children’s representations of supernatural agents. The project team is going through the set-up and analysis of thousands of drawings made by children from various regions of the world. Our study documents the process of equipping these data.

Paper long abstract

Digitization of human products leads to data and databases that open new opportunities for researchers in the humanities and social sciences. However, the process is not straightforward. The path that leads from "raw" material to usable data is a tortuous and complex one. The shaping of digital data appears to be a key stage during which individuals and teams reconsider not only the objects under study and their research goals but also their competences, tools and relationships to other disciplines (e.g. IT experts).

The paper is based on a two-year participant-observation inside an interdisciplinary research project aimed at studying children's representations of supernatural agents. The project gathers scholars and practitioners from developmental and cultural psychology, social psychology, religious studies and computer science. Together, they are going through the set-up and analysis of thousands of drawings made by children from various regions of the world (Brazil, Iran, Japan, Romania, Russia, Switzerland, etc.). Our paper explores the emergence of the research collective and the way collaboration is achieved (or endangered) between team members. It documents the constitution of the data (digitized drawings and metadata), the database and the analytical tools. It shows the emergence of an e-research infrastructure and the way choices of various orders are progressively blackboxed into data, IT tools, research protocols, organization and skills. Central to our argument is the notion of data equipment, which we use to describe the process of adding various entities to data in order to enable their circulation and use.

Linkage, Exploration and Gatekeeping: The Role of Information Security In Biomedical Data Journeys

Niccolò Tempini (University of Exeter) Sabina Leonelli (Technical University of Munich)

Send message to Authors

Paper short abstract

The paper explores how information security strategies and solutions affect the trajectories and directions of data journeys and data-intensive discovery, on the basis of the ethnographic study of two linkage infrastructures for biomedical and environmental data based in the UK.

Paper long abstract

Whether data consigned to databases are accessible and useable depends partly on the strategies employed by database developers and curators to keep data alive as potential evidence and valuable commodities. As well-documented by STS scholars and Open Science advocates, those strategies are informed and constrained by many factors, ranging from financial and human resources to available materials, skills, expertise, policies, incentives and institutional locations. In this paper, we explore one crucial factor affecting the inclusion, accessibility and re-usability of data in databases, which has received little attention within STS so far: the management of information security strategies and policies, and its embedding in the material, social and regulatory landscapes of research. We show that security concerns and solutions exert a strong influence on the trajectories and outcomes of data sharing efforts, and the results can be at odds with the emphasis on exploratory research typical of open and big data science. To this aim, we build on an ethnographic study of two data linkage infrastructures in the biomedical research domain: the Secure Anonymised Information Link (SAIL), a databank based in Wales that aims to facilitate appropriate re-use of routine health data generated through public services and of otherwise unavailable datasets generated by scientific projects; and the Medical & Environmental Data Mash-up Infrastructure (MEDMI), which brings together researchers from the Universities of Exeter and Bristol, the London School of Hygiene & Tropical Medicine, the MET Office and Public Health England to link and analyse complex meteorological, environmental and epidemiological data.

Epistemic Data Cultures

Clifford Tatum (Leiden University) Alex Rushforth (Leiden University) Sarah de Rijcke (Leiden University)

Send message to Authors

Paper short abstract

In this study we shift focus from concerns of open data to a stratified account of data sharing practices. Through in-depth case studies, the aim of this approach is to develop a better understanding of established data practices as a means to inform the challenges and opportunities of the Open Data movement.

Paper long abstract

Studies of open data often focus on the status and potential of making data publicly available for reuse by academic actors situated outside of the local context in which they were produced or by public actors not directly associated with academic research (Borgman 2012). This formulation of open data imagines the widest practical range of potential (re)users and invokes significant effort to prepare data for use by unknown others. Often overlooked in this approach is the assessment of data practices that occur in fields with a tradition of data sharing that would not be considered 'open data'.

In this study, we shift the focus from concerns of public access to a stratified account of data sharing practices. We expand the conceptualization of openness to include epistemic concerns, such as: facilitating discussion about the practicalities of making data reusable, confronting concerns about transparency and validity, foregrounding concerns about globalization of research, and drawing attention to the commodification of data (Leonelli 2013:7).

To achieve this, we investigate data sharing practices within three fields: Soil Science, Human Genetics, and Digital Humanities. Empirically, we draw on interviews of key actors involved with data collection, analysis, and deposition. With this disciplinary mix, we expect to find new and emerging roles associated with data, and multiple configurations of data sharing within each of the selected cases. The aim of this approach is to develop a better understanding of established data practices as a means to inform the challenges and opportunities of the Open Data movement.

The role of samples in the "birth" of data

Gregor Halfmann (University of Exeter)

Data and natural history: Do museums dream of digital insects?

Tahani Nadim (Ruhr University Bochum)

Send message to Author

Paper short abstract

Based on ethnographic work at the Natural History Museum Berlin, I attend to questions of loss in the mass-digitization of natural history collections. Combining the sociology of data and infrastructural studies, I query the nature of digital specimens and the hopes and promises pinned on them.

Paper long abstract

Big natural history museums have begun digitizing their collections on an industrial scale. Digital production lines are turning molluscs, pressed plants, microscopic slides and millions of insects into data objects. The National Museum of Natural History Paris has digitized most of its herbarium; the Natural History Museum London has just begun mass-digitizing their 80 million specimens; the Naturalis Biodiversity Center in Leiden has already made more than 30 million specimens digitally available; and the Natural History Museum Berlin has recently completed a pilot project mass-digitizing 10,000 insect drawers. The ambition driving these efforts and iterated in national and international roadmaps is capacious: increase accessibility of collections, rationalise collection management, aid preservation , facilitate monitoring and conservation, allow for discoveries "born from the data", address societal needs and interests. Driven by the prospect of irrecoverable loss and decay and the promises of globally coordinated data-intensive biodiversity science, the production of digital specimens has thus emerged as a key response to institutional, environmental and political pressures. Yet, production of digital specimens is characterized by a distinct register of losses and absences (What gets digitized when, by whom? How is it made intelligible, for whom?). In this presentation I wish to attend to questions of loss by examining data and digitization practices at the Museum für Naturkunde Berlin. Based on ethnographic work at the museum, my analysis combines insights from the sociology of data and infrastructural studies to problematise the nature of digital specimens and the hopes and promises pinned on them.

Data friction and the power dynamics of meteorological data infrastructures

Jo Bates (University of Sheffield) Paula Goodale (University of Sheffield) Yuwei Lin (University of Roehampton)

Send message to Authors

Paper short abstract

We compare three cases in which people are engaged in efforts to reduce and/or maintain “friction” in the movement of meteorological data between different sites, and explore the role of data friction in the emerging power dynamics of meteorological data infrastructures.

Paper long abstract

Whilst data can be mobile between different sites of data generation, processing and use, they do not often 'flow' easily. As data move they experience "friction" (Edwards, 2010) which slows down or blocks their movement. These frictions are significant to the ways in which data and their complex socio-material contexts are co-constituted. It is therefore important to observe sites of potential, blocked and lack of movement, and think critically about the unseen "conflicts, disagreements, inexact or unruly processes" (Edwards et al, 2011) shaping data movements in a seemingly harmonious aggregated data infrastructure.

Through development of the "data journeys" methodology, we identified sites within the UK's meteorological data infrastructure where people are working to reduce and/or maintain data "friction" for different ends. Here, we discuss findings from three such sites. Firstly, the struggles of archivists, climate scientists and citizen scientists working on the Old Weather Project to recover 'lost' data from the logs of historical ships and move them into the ICOADS dataset. Secondly, policy developments aimed at "opening" meteorological data for commercial re-use to spur innovation in the weather derivatives industry. Thirdly, the maintenance of "data friction" through the commercialisation of data in an effort to sustain the physical infrastructure of Sheffield Weston Park weather station - one of the oldest stations in the UK - in the context of deep public spending cuts. The paper contributes to understanding about how power dynamics shape the movement of data through knowledge infrastructures, and demonstrates a new methodology for capturing such insight.

New energy data in the making: meaning, value and governance

Mette Kragh-Furbo (University of Liverpool) Gordon Walker (Lancaster University)

Hidden Cooperative Specialization in a High Energy Physics experiment

Emiko Adachi (RIKEN) Yasunobu Yasunobu (Japan Advanced Institute of Science and Technology (JAIST))

Data Analysis and the Perceived Value of Data

Jessey Wright (University of Western Ontario)

Send message to Author

Paper short abstract

Data often requires significant analysis to be used as evidence. Distinguishing between perceived and actual value, I use the interpretation of a meta-analysis of neuroimaging data to show that the intuition about an analysis technique determines the perceived value of data.

Paper long abstract

Faced with 'big', or otherwise complex data sets, scientists use analysis techniques to isolate data patterns that are relevant to their research. In this paper I show how the perceived value of data is, in part, determined by the methods available for probing the content of data, and the intuitive understanding of what the patterns isolated by those methods are about. This value can change over time as new techniques are developed, and as the conceptual understanding of existing techniques changes. To demonstrate this I review a recent debate over the interpretation of meta-analyses provided by NeuroSynth, an online database that correlates brain activation coordinates and terms used in neuroimaging publications. Neuroimaging data are subject to significant processing and analysis in order to isolate patterns in the data that can be used as evidence. The specific patterns isolated, and their interpretation, depends on a conception of the phenomena under investigation and what patterns are regarded as evidence. The claim prompting the debate is that patterns isolated by NeuroSynth's 'reverse inference' analysis can support claims about the selectivity of brain regions for cognitive functions. The disagreement is between the published authors and the database developer, and rests on a different intuition, or understanding, of what NeuroSynth's automated analyses are about. I show that an intuitive understanding of analysis techniques determines the perceived value of data, which can be distinct from its actual value. I conclude by situating this in the context of philosophical discussions about conceptual practices in data-intensive science.

Molecular Tumor Boards: data interpretation in the age of sequencing

Alberto Cambrosio (McGill University) Pascale Bourret (Aix-Marseille Université / SESSTIM) Sylvain Besle (INSERM)

Send message to Authors

Paper short abstract

Based on the comparative analysis of the activities of Molecular Tumor Boards in North America and Europe, the paper explores the co-production of data and their interpretation within these collective forums devoted to the discussion of the results of the genomic analysis of patient tumors.

Paper long abstract

The adoption of high-throughput technologies in oncology has led, among other things, to the establishment of a new kind of institutions, referred to generically as Molecular Tumor Boards. MTBs provide a forum for clinicians, molecular biologists, and bioinformatics specialists to discuss the results of the genomic analysis of patient tumors, and make therapeutic recommendations on that basis. To analyze sequencing and gene expression data, MTBs resort to a heterogeneous set of evidential resources, including a number of genomic databases, publications, clinical trial results, previous experience with other patients, and basic knowledge about mutations and genetic pathways, all to be related to the singular clinical trajectory of individual patients. While individual MTBs share a common purpose of providing an "informed" data interpretation, the means to reaching that goal differ from one MTB to the other, from the actual composition of the MTBs, to the extent to which molecular results are taken for granted or questioned, the resort or not to prioritization algorithms, the extent to which they are followed, and pragmatic considerations such as access to specific drugs. Based on the comparative analysis of the activities of several MTBs in North America and Europe, the paper explores the co-production of data and their interpretation within these institutions., emphasizing their situated aspects, and in particular how the definition of what counts as relevant data is not only an input for data interpretation, but also the outcome of interpretative practices grounded in the definition of what may count as actionable molecular alterations.

Spurious Categories: A study of data-model symbiosis in the Human Brain Project

Christine Aicardi (King's College London) Tara Mahfoud (University of Essex)

Preconditions, Procedures and Potentials: Data in Post-Genomic Cancer Research

Imme Petersen (Technical University Braunschweig) Regine Kollek (University of Hamburg)

The Life and Death of Big data in Education

Assunta Viteritti (University Sapienza) Orazio Giancola (Sapienza - Università di Roma)

Data Phantoms: The Uncanny Lives of Data Assets

Mary Ebeling (Drexel University)

(Re)making data: A case study of the Data Documentation Initiative (DDI)

Judit Gárdos (Hungarian Academy of Sciences) Natasha Mauthner (University of Aberdeen)

Send message to Authors

Paper short abstract

Using the Data Documentation Initiative as a case study, this paper explores how data archiving classification systems and standards are (re)making data, and the social sciences more generally, in historically- and culturally-specific ways.

Paper long abstract

Over the past two decades the archiving of research data within the social and the natural sciences has increasingly become subject to regulation. Research funding organizations, Universities, and academic journals are institutionalizing data archiving as a normative practice while many data archives are implementing standardized classification systems for archiving and sharing data. One example is the Data Documentation Initiative (DDI) international metadata standard for statistical and social science data. DDI comprises DDI-Codebook used for describing data at the archiving stage of the research process, and DDI-Lifecycle which conceptualizes the entire research lifecycle in terms of data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving. DDI constitutes itself as a neutral and passive classification system, which enables comprehensive description of data for discovery and analysis, and allows effective data sharing. Drawing on STS literature which challenges both the taken for granted-ness, and assumed innocence, of classification systems (Foucault, 1970; Derrida, 1994; Ritvo, 1997; Bowker & Star, 1999; Waterton, 2002; Bowker, 2005; Sommerlund, 2006), our paper approaches the DDI as an object of study in order to explore how DDI embeds and enacts a historically- and culturally-specific conception of the nature of 'data', and social science more generally. Following Barad (2007) and Derrida (1994), and building on our existing work in this area (Mauthner and Gardos, 2015, Mauthner, 2016), we further investigate how the DDI materializes power through a dual process of embodying a specific conceptualization of data (and social science), and naturalizing this 'privileged topology' (Derrida 1994: 3).

Beyond the deluge. Data and its invisible work.

Jerome Denis (Center for the sociology of innovation - Mines Paris - PSL)