Author:Michael Castelle (University of Chicago)
Paper short abstract:
This paper develops a empirical distinction between the aspects of “volume” and “velocity” currently conflated in theorizations of “big data”. The contrasting concept of “big codata” emphasizes streaming flows of events, contrasting data science practice with traditional social-scientific methodology.
Paper long abstract:
Presently existing theorizations of "big data" practices conflate observed aspects of both "volume" and "velocity" (Kitchin 2014). The practical management of these two qualities, however, have a comparably disjunct, if interwoven, computational history: on one side, the use of large (relational and non-relational) database systems, and on the other, the handling of real-time flows (the world of dataflow languages, stream and event processing, and message queues). While the commercial data practices of the late 20th century were predicated on an assumption of comparably static archival (the site-specific "mining" of data "warehouses"), much of the novelty and value of contemporary "big data" sociotechnics is in fact predicated on the harnessing/processing vast flows of events generated by the conceptually-centralized/ physically-distributed datastores of Google, Facebook, LinkedIn, etc. These latter processes—which I refer to as "big codata"—have their origins in IBM's mainframe updating of teletype message switching, were adapted for Wall Street trading firms in the 1980s, and have a contemporary manifestation in distributed "streaming" databases and message queues like Kafka and StormMQ, in which one differentially "subscribes" to brokered event streams for real-time visualization and analysis. Through ethnographic interviews with data science practitioners in various commercial startup and academic environments, I will contrast these technologies and techniques with those of traditional social-scientific methods—which may begin with empirically observed and transcribed "codata", but typically subject the resultant inert "dataset" to a far less real-time sequence of material and textual transformations (Latour 1987).
Critical data studies