T113: Critical data studies

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

T113

Critical data studies

Convenors:: Laura Noren (New York University)
Stuart Geiger (UC-Berkeley)
Gretchen Gano (University of California Berkeley)
Massimo Mazzotti (University of California, Berkeley)
Charlotte Mazel-Cabasse (University of California, Berkeley)
Brittany Fiore-Gartland (University of Washington)
Send message to Convenors

Stream:: Tracks

Location:: 116

Sessions:: Saturday 3 September, 9:00-10:45, 11:00-12:45, 12:30-14:15, 14:00-15:45
Time zone: Europe/Madrid

Short Abstract:

We invite papers investigating datadriven techniques in academic research and analytic industries and the consequences of implementing datadriven products and processes. Papers utilizing computational methods or ethnography with theorization of technology, social power, or politics are encouraged.

Long Abstract:

Computational methods with large datasets are becoming more common across disciplines in academia (including social sciences) and analytic industries, but the sprawling and ambiguous boundaries of "big data" makes it difficult to research. In this track we investigate the relationship between theories, instruments, methods and practices in data science research and implementation. How are such practices transforming the processes of knowledge creation and validation, as well as our understanding of empiricism and the scientific method?

Beyond case studies, we invite connective explorations of emerging theory, machinery, methods, and practices. Papers may examine data collection instruments, software, inscription devices, packages, algorithms and their interaction in sociotechnical systems used to produce, analyse, share, and validate knowledge. Looking at the way these knowledges are objectified, classified, imagined and contested, the aim is to reflect critically on the maturing practices of quantification and their historical, social, cultural, political, ideological, economical, scientific and ecological impacts.

We welcome papers tackling a variety of questions and cases studies such as:

- What does it mean to study quantification (including big data) as myth, narrative, ideology, discourse, and power?

- How is instrumentation is being used to connect data and theory?

- How well do we understand which domains are being reshaped by these techniques, and what are the consequences of their adoption in those domains and beyond? Is data science linking up to domains that have previously been distinct or dividing fields that had been unified?

SESSIONS: 5/5/4/4

Accepted papers:

Session 1 Saturday 3 September, 2016, 9:00-10:45

Scientific Open Data: Questions of Labor and Public Benefit

Irene Pasquetto (University of Maryland) Ashley E. Sands (UCLA)

It's the context, stupid: Reproducibility as a scientific communication problem

Brittany Fiore-Gartland (University of Washington) Anissa Tanweer (University of Washington)

Paper short abstract:

Context in data-intensive research is often seen as something that can be captured with metadata to extend reproducibility. Based on varied ways “context” is marshalled in reproducibility practice, we argue for a nuanced view of context and reframing of reproducibility as a communication problem.

Paper long abstract:

Reproducibility has long been considered integral to scientific research and increasingly must be adapted to highly computational, data-intensive practices. Central to reproducibility is the sharing of data across varied settings. Many scholars note that reproducible research necessitates thorough documentation and communication of the context in which scientific data and code are generated and transformed. Yet there has been some pushback against the generic use of the term context (Nicolini, 2012); for, as Seaver puts it, "the nice thing about context is everyone has it" (2015). Dourish (2004) articulates two approaches to context: representational and interactional. The representational perspective sees context as stable, delineable information; in terms of reproducibility, this is the sort of context that can be captured and communicated with metadata, such as location, time, and size. An interactional perspective, on the other hand, views context not as static information but as a relational and dynamic property arising from activity; something that is much harder to capture and convey using metadata or any other technological fix. In two years of ethnographic research with scientists negotiating reproducibility in their own data-intensive work, we found "context" being marshalled in multiple ways to mean different things within scientific practice and discourses of reproducibility advocates. Finding gaps in perspectives on context across stakeholders, we reframe reproducibility as a scientific communication problem, a move that recognizes the limits of representational context for the purpose of reproducible research and underscores the importance of developing cultures and practices for conveying interactional context.

Condensing Data into Images, Uncovering "the Higgs"

Martina Merz (Alpen-Adria-University Klagenfurt)

Paper short abstract:

In data-intensive sciences such as particle physics images become essential sites for evidential exploration and debate through procedures of black-boxing, synthesis, and contrasting. This paper addresses the challenges of data analysis using as an example the Higgs search at the LHC (CERN).

Paper long abstract:

Contemporary experimental particle physics is amongst the most data-intensive sciences and thus provides an interesting test case for critical data studies. Approximately 30 petabytes of data produced at CERN's Large Hadron Collider (LHC) annually need to be controlled and processed in multiple ways before physicists are ready to claim novel results: data are filtered, stored, distributed, analyzed, reconstructed, synthesized, etc. involving collaborations of 3000 scientists and heavily distributed work. Adopting a science-as-practice approach, this paper focuses on the associated challenges of data analysis using as an example the recent Higgs search at the LHC, based on a long-term qualitative study. In particle physics, data analysis relies on statistical reasoning. Physicists thus use a variety of standard and advanced statistical tools and procedures. I will emphasize that, and show how, the computational practice of data analysis is inextricably tied to the production and use of specific visual representations. These "statistical images" constitute "the Higgs" (or its absence) in the sense of making it "observable" and intelligible. The paper puts forward two main theses: (1) that images are constitutive of the prime analysis results due to the direct visual grasp of the data that they afford within large-scale collaborations and (2) that data analysis decisively relies on the computational and pictorial juxtaposition of "real" and "simulated data", based on multiple models of different kind. In data-intensive sciences such as particle physics images thus become essential sites for evidential exploration and debate through procedures of black-boxing, synthesis, and contrasting.

Data Pedagogy: Learning to Make Sense of Algorithmic Numbers

Samir Passi (Cornell University)

Big Data or Big Codata? Flows in Historical and Contemporary Data Practices

Michael Castelle (University of Warwick)

Paper short abstract:

This paper develops a empirical distinction between the aspects of “volume” and “velocity” currently conflated in theorizations of “big data”. The contrasting concept of “big codata” emphasizes streaming flows of events, contrasting data science practice with traditional social-scientific methodology.

Paper long abstract:

Presently existing theorizations of "big data" practices conflate observed aspects of both "volume" and "velocity" (Kitchin 2014). The practical management of these two qualities, however, have a comparably disjunct, if interwoven, computational history: on one side, the use of large (relational and non-relational) database systems, and on the other, the handling of real-time flows (the world of dataflow languages, stream and event processing, and message queues). While the commercial data practices of the late 20th century were predicated on an assumption of comparably static archival (the site-specific "mining" of data "warehouses"), much of the novelty and value of contemporary "big data" sociotechnics is in fact predicated on the harnessing/processing vast flows of events generated by the conceptually-centralized/ physically-distributed datastores of Google, Facebook, LinkedIn, etc. These latter processes—which I refer to as "big codata"—have their origins in IBM's mainframe updating of teletype message switching, were adapted for Wall Street trading firms in the 1980s, and have a contemporary manifestation in distributed "streaming" databases and message queues like Kafka and StormMQ, in which one differentially "subscribes" to brokered event streams for real-time visualization and analysis. Through ethnographic interviews with data science practitioners in various commercial startup and academic environments, I will contrast these technologies and techniques with those of traditional social-scientific methods—which may begin with empirically observed and transcribed "codata", but typically subject the resultant inert "dataset" to a far less real-time sequence of material and textual transformations (Latour 1987).

Talking to Non-Experts about Data: Translating and Synthesizing Modeling Data in Design Teams

Gina Neff (University of Cambridge)

Emerging Practices of Data-Driven Accountability in Healthcare: Individual Attribution of C-Sections

Kathleen Pine (Arizona State University)

Paper short abstract:

Through ethnographic research on obstetrical care, I describe a change in scale from performance measurement of hospitals to individual clinicians, and attendant dilemmas related to data quality management and tradeoffs between professional discretion and accountability.

Paper long abstract:

This paper examines the implementation and consequences of data science in a specific domain: evaluation and regulation of healthcare delivery. Recent iterations of data-driven management expand the dimensions along which organizations are evaluated and utilize a growing array of non-financial measures to audit performance (i.e. adherence to best practices). Abstract values such as "quality" and "effectiveness" are operationalized through design and implementation of certain performance measurements—it is not just what outcomes that demonstrate the quality of service provision, but the particular practices engaged during service delivery.

Recent years have seen the growth of a controversial new form of data-driven accountability in healthcare: application of performance measurements to the work of individual clinicians. Fine-grained performance measurements of individual providers were once far too resource intensive to undertake, but expanded digital capacities have made provider-level analyses feasible. Such measurements are being deployed as part of larger efforts to move from "volume-based" to "value- based" or "pay for performance" payment models.

Evaluating individual providers, and deploying pay for performance at the individual (rather than the organizational) level is a controversial idea. Critics argue that the measurements reflect a tiny sliver of any clinician's "quality," and that such algorithmic management schemes will lead professionals to focus on only a small number of measured activities. Despite these and other concerns, such measurements are on the horizon. I will discuss early ethnographic findings on implementation of provider-level cesarean section measurements, describing tensions between professional discretion and accountability and rising stakes of data quality in healthcare.

The (in)credibility of data science methods to non-experts

Daan Kolkman (Utrecht University)

Paper short abstract:

This paper explores the quantification practices through which models and algorithms are created, maintained and contested. It draws on data collected in the analytical industry and government in the UK and the Netherlands to illustrate how non-experts evaluate the credibility of highly technical objects.

Paper long abstract:

The rapid development and dissemination of data science methods, tools and libraries, allows for the development of ever more intricate models and algorithms. Such digital objects are simultaneously the vehicle and outcome of quantification practices and may embody a particular world-view with associated norms and values. More often than not, a set of specific technical skills is required to create, use or interpret these digital objects. As a result, the mechanics of the model or algorithm may be virtually incomprehensible to non-experts.

This is of consequence for the process of knowledge creation because it may introduce power asymmetries and because successful implementation of models and algorithms in an organizational context requires that all those involved have faith in the model or algorithm. This paper contributes to the sociology of quantification by exploring the practices through which non-experts ascertain the quality and credibility of digital objects as myths or fictions. By considering digital objects as myths or fictions, the codified nature of these objects comes into focus.

This permits the illustration of the practices through which experts and non-experts develop, maintain, question or contest such myths. The paper draws on fieldwork conducted in government and analytic industry in the form of interviews, observations and documents to illustrate and contrast the practices which are available to non-experts and experts in bringing about the credibility or incredibility of such myths or fictions. It presents a detailed account of how digital objects become embedded in the organisations that use them.

Big data and the mythology of algorithms

Howard Rosenbaum (Indiana University)

Paper short abstract:

Big data relies on algorithms, which are typically presented as objective and unbiased. They are not. As they become more deeply entangled in our lives, it is important to understand the implications of the roles they are playing. This paper critically analyzes this mythology of algorithms.

Paper long abstract:

There are no big data without algorithms. Algorithms are sociotechnical constructions and reflect the social, cultural, technical and other values embedded in their contexts of design, development, and use. The utopian "mythology" (boyd and Crawford 2011) about big data rests, in part, on the depiction of algorithms as objective and unbiased tools operating quietly in the background. As reliable technical participants in the routines of life, their impartiality provides legitimacy for the results of their work. This becomes more significant as algorithms become more deeply entangled in our online and offline lives. where we generate the data they analyze. They create "algorithmic identities," profiles of us based on our digital traces that are "shadow bodes," emphasizing some aspects and ignoring others (Gillespie 2012). They are powerful tools that use these identities to dynamically shape the information flows on which we depend in response to our actions and decisions made by their owners

Because this perspective tends to dominate the discourse about big data, thereby shaping public and scientific understandings of the phenomenon, it is necessary to subject it to critical review as an instance if critical data studies. This paper interrogates algorithms as human constructions and products of choices that have a range of consequences for their users and owners; issues explored include:

The epistemological implications of big data algorithms

The impacts of these algorithms in our social and organizational lives

The extent to which they encode power ways in which this power is exercised

The possibility of algorithmic accountability

Infrastructuring data analysis in Digital methods with digital data and tools

Klara Benda (IT University of Copenhagen)

Paper short abstract:

The presentation draws on ethnographic research to describe data practices of appropriating the web for social research in Digital methods as layers of infrastructuring. The web is mediated by community infrastructures to support iterative assembling of the local infrastructure of a knowledge space.

Paper long abstract:

The Digital methods approach seeks the strategic appropriation of digital resources on the web for social research. I apply the grounded theory to theorize how data practices in Digital methods are entangled with the web as a socio-technical phenomenon. My account draws on public sources of Digital methods and ethnographic research of semester-long student projects based on observations, interviews and project reports. It is inspired by Hutchin's call for understanding how people "create their cognitive powers by creating the environments in which they exercise those powers". The analysis draws on the lens of infrastructuring to show that making environments for creativity in Digital methods is a distributed process, which takes place on local and community levels with distinct temporalities. Digital methods is predicated on creating its local knowledge space for social analysis by pulling together digital data and tools from the web, and this quick local infrastructuring is supported by layers of slower community infrastructures which mediate the digital resources of the web for a Digital methods style analysis by means of translation and curation. Overall, the socially distributed, infrastructural style of data practice is made possible by the web as a socio-technical phenomenon predicated on openness, sharing and reuse. On the web, new digital resources are readily available to be incorporated into the local knowledge space, making way for an iterative, exploratory style of analysis, which oscillates between infrastructuring and inhabiting a local knowledge space. The web also serves as a socio-technical platform for community practices of infrastructuring.

"An afternoon hack" Enabling data driven scientific computing in the open

Charlotte Mazel-Cabasse (University of California, Berkeley)

Playing with educational data: the Learning Analytics Report Card (LARC)

Jeremy Knox (The University of Edinburgh)

Paper short abstract:

The field of ‘learning analytics’ is gaining significant traction in education, often driven by uncritical government and corporate research agendas. This paper describes the ‘LARC’: an interdisciplinary project investigating critical and student-focused educational data analysis.

Paper long abstract:

Education has become an important site for computational data analysis, and the burgeoning field of 'learning analytics' is gaining significant traction, motivated by the proliferation of online courses and large enrolment numbers. However, while this 'big data' and its analysis continue to be hyped across academic, government and corporate research agendas, critical and interdisciplinary approaches to educational data analysis are in short supply. Driven by narrow disciplinary areas in computer science, learning analytics is not only 'blackboxed', - in other words a propensity to 'focus only on its inputs and outputs and not on its internal complexity' (Latour 1999, p304), but also abstracted and distanced from the activities of education itself. This methodological estrangement may be particularly problematic in an educational context where the fostering of critical awareness is valued. The first half of this paper will describe three ways in which we can understand this 'distancing', and how it is implicated in enactments of power within the material conditions of education: the institutional surveilling of student activity; the mythologizing of empirical objectivity; and the privileging of prediction. The second half of the paper will describe the development of a small scale and experimental learning analytics project undertaken at the University of Edinburgh that sought to explore some of these issues. Entitled the Learning Analytics Report Card (LARC), the project investigated playful ways of offering student choice in the analytics process, and the fostering of critical awareness of issues related to data analysis in education.

Data science / science studies

Cathryn Carson (University of California, Berkeley)

Critical Information Practice

Yanni Loukissas (Georgia Institute of Technology) Matt Ratto (University of Toronto) Gabby Resch (University of Toronto)

Actor-Network VS Network Analysis VS Digital Networks Are We Talking About the Same Networks?

Tommaso Venturini (University of Geneva) Mathieu Jacomy (Aalborg University) Anders Kristian Munk (Technical University of Denmark)

The Navigators

Nick Seaver (Tufts University)

Paper short abstract:

Data scientists construct and navigate data spaces. Where critical data studies has focused on flaws in these spaces' construction, this paper examines their navigation. Studies of navigation illuminate key features of data science, particularly the interrelation of maps, spaces, plans, and action.

Paper long abstract:

Data scientists summon space into existence. Through gestures in the air, visualizations on screen, and loops in code, they locate data in spaces amenable to navigation. Typically, these spaces embody a Euro-American common sense: things near each other are similar to each other. This principle is evident in the work of algorithmic recommendation, for instance, where users are imagined to navigate a landscape composed of items arranged by similarity. If you like this hill, you might like the adjacent valley. Yet the topographies conceived by data scientists also pose challenges to this spatial common sense. They are constantly reconfigured by new data and the whims of their minders, subject to dramatic tectonic shifts, and they can be more than 3-dimensional. In highly dimensional spaces, data scientists encounter the "curse of dimensionality," by which human intuitions about distance fail as dimensions accumulate. Work in critical data studies has conventionally focused on the biases that shape these spaces. In this paper, I propose that critical data studies should not only attend to how representative data spaces are, but also to the techniques data scientists use to navigate them. Drawing on fieldwork with the developers of algorithmic music recommender systems, I describe a set of navigational practices that negotiate with the shifting, biased topographies of data space. Recalling a classic archetype from STS and anthropology, these practices complicate the image of the data scientist as rationalizing, European map-maker, resembling more closely the situated interactions of the ideal-typical Micronesian navigator.