Accepted Paper:



Nadine Levin (UCLA)

Paper short abstract:

This paper considers the history of data analysis algorithms in the metabolomics software XCMS Online, a cloud-based platform developed in 2012 for the analysis of mass spectrometry data. These algorithms form the backbone of 21st century big data analytics, but have a history dating back to the 1970s.

Paper long abstract:

Over the last decade, the size of post-genomic datasets has grown exponentially, presenting challenges with the interpretation of data into biological knowledge. Metabolomics, the "omics" study of metabolism, typifies these challenges because of the complexity of metabolism, which—unlike genes—changes in relation to diet, environment, and disease. To cope with these challenges, researchers have developed various pieces of in-house software, which aid in data standardization, analysis, and organization.

Drawing on 18 months of ethnographic fieldwork with metabolomics researchers, this paper discusses XCMS Online, a cloud-based software used for mass-spectrometry data analysis. Developed in 2012 at the Scripps Research Institute, and from an open source R project that began in 2006, this paper considers the history of the multivariate statistical algorithms that are encapsulated within XCMS Online, and which enable researchers to parse the complexity of metabolic data. I show how multivariate statistics (like Principal Components Analysis)—which now form the backbone of many of the algorithms used in "machine learning" and "big data analytics"—trace their origins in metabolomics to the hybrid field of "chemometrics" in the 1970s.

The paper argues that multivariate statistics enable metabolomics researchers to envision metabolism as a complex problem space. It also argues that more recently, researchers have reconsidered the value of "simple" univariate statistics, in attempts to make sense of metabolic complexity. Overall, this paper contributes to STS by examining the material practices underlying so-called "big data", and also the social and historical forces that have shaped the technical practices of data-intensive science.

Panel T158
Soft Focus: How Software Reshaped Technical Vision and Practice