Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality.
Log in
- Convenors:
-
Chrys Vilvang
(Concordia University)
Gabriel Pereira (University of Amsterdam)
Bruno Moreschi (Collegium Helveticum ETHZ)
Aikaterini Mniestri (London School of Economics and Political Science)
Send message to Convenors
- Format:
- Combined Format Open Panel
- Location:
- Agora 4, main building
- Sessions:
- Friday 19 July, -, -
Time zone: Europe/Amsterdam
Short Abstract:
This panel looks at the theoretical and practical aspects of algorithmic image processing, exploring the data techniques that train and enable machine learning and computer vision. How can these sociotechnical processes be reimagined to foster more radical ways of seeing the world through machines?
Long Abstract:
An age-old adage says that a picture is worth a thousand words. Although this has taken the meaning that an image can hold much information, it also reminds us that images are multifaceted and may contain within them multiple interpretations, practices, and subjective perceptions.
This panel engages with the way images have become a constitutive part of algorithmic processing systems today, particularly as they are variously used to constitute training data sets for machine learning. It builds upon much recent STS work that has sought to understand (and transform) the relations between images and algorithms, particularly within "critical data set studies" (Thylstrup), "ways of machine seeing" (Azar et al), or even "platform seeing" (Mackenzie & Munster).
The panel deals critically with the way images are organized, tagged, curated, and otherwise made to work within algorithmic pipelines, and the sociotechnical processes that they enable. Questions may include: How do image data sets constitute computer vision? How do image tracking algorithms define and represent minoritized bodies? What are other, more critical ways that data sets could be constituted? What human practices (beyond the images themselves) are not being highlighted in computer vision? How is/could fake or synthetic data enable alternative data sets?
This Combined Format Open Panel welcomes academic paper presentations, but also encourages scholars, artists, and activists to experiment with other forms of knowledge expression, particularly artistic and practice-based methodologies. These can be shown as, e.g., video essays, net art, short workshops, interactive modes of presentations, etc. Please include details on how your contribution would be best performed and we'll work to manage the different needs of selected contributors. We are open to academic research, but welcome more artistic and experimental formats, especially those that "think outside the box".
Accepted contributions:
Session 1 Friday 19 July, 2024, -Short abstract:
This talk is informed by ethnographic fieldwork within the machine learning and computer vision community and explores processes and justifications ("referential chains") in neural image generation and processing to tease out what constitutes robust knowledge in this field and beyond.
Long abstract:
This talk draws on ethnographic fieldwork within the machine learning and computer vision community to investigate the processes and justifications ("referential chains") employed by technical actors in the creation of computer-generated images. It compares advanced models such as Transformer/Diffusion models (e.g., SORA) and Neural Radiance Fields (NeRFs). These models are mediators that engage with the 'reality' inscribed in digital files in distinct ways, producing varied representations of visual data. SORA uses 'patches' within a probabilistic framework to process and nest semantically related scenes. NeRFs use 'rays' to construct connections with spatially bound scenes, allowing for precise three-dimensional reconstructions. By comparing these two techniques, this talk aims to unpack the criteria for success in visual AI to advance our understanding of what constitutes robust knowledge in the epistemic culture of the computer vision community and beyond.
Short abstract:
This participatory presentation will explore the efficacy of computer vision as a tool for navigating content in personal photo libraries. Volunteers from the audience will be invited to interact with their personal devices and critically reflect upon the impact of AI in shaping their memories.
Long abstract:
Computer vision is increasingly embedded in the apps and platforms many users employ to store and navigate their personal photo libraries, but does the use of this technology preconfigure our relationship with the past? While AI may purport to solve the growing issues of storage and retrieval associated with abundant digital archives, it also reimagines the agency of the user through the logics of computation. Algorithms trained to identify people, objects, and other types of content in photographs are already remarkably precise, but these capabilities may not be aligned with the subjectivities and nuances through which personal photographs are imbued with meaning. Apple, Google, and Facebook each offer AI enhanced ‘Memories’ features to automatically curate and resurface photographs, yet the algorithmic foundations that underlie these technologies are rarely considered for their role in shaping the actual memories of their users. This research project critically interrogates the premises upon which computer vision algorithms are trained to recognize specific content in personal photo libraries and repackage them in the form of ‘Memories’. Creative technical experiments and visual research methods are explored as ways to assess the possibilities, boundaries, and limitations of computer vision as a technology for mobilizing photographic memories. Attendees will be invited to participate in a series of guided interactions with their personal on-device photo libraries to facilitate a space for critical reflection and dialogue.
Short abstract:
This contribution draws on an ethnography of a computer science lab in Romania and argues that the stories, or fables (Haraway 2016), told both within the lab and about it are key to understanding algorithmic decisions-making and its inherent societal and political consequences.
Long abstract:
With the increased role of machine learning in security applications, questions about the interpretability of AI are gaining relevance in both computer and social sciences. This contribution draws on an ethnography of a computer science lab in Romania, where software engineers work on the interpretability of image recognition algorithms. It argues that the stories, or fables (Haraway 2016), told both within and about the lab are key to understanding algorithmic decision-making and their inherent societal and political consequences. In the lab, tinkering with deep neural network models, introducing additional layers into the learning process, and creating “visualisations”, such as heat maps, is not only a technical process but also, at every stage, relies on and incorporates storytelling. Computer scientists often use stories and metaphors, through which fabulation is entangled with the material and technical practices of “making”. In addition, fabulation is entangled with the practices of knowledge production about such practices within STS. Through an innovative methodological approach, this paper draws on a collaboration and co-laboration between a social- and computer scientist, and mobilises multimodal methods to argue that different modes of storytelling might enhance our understanding of “black boxed” image recognition algorithms and their societal and political consequences.
Short abstract:
As ML & AI encroach into education and research practices in the natural sciences, near-complete digitization and automation in analytical processes serve to distance us from alternative ways of knowing. We explore teaching interventions and foster situated, and embodied thinking in data science.
Long abstract:
As machine learning and artificial intelligence encroach further into education and research practices in the natural sciences, near-complete digitization and automation in analytical processes serve to distance us from alternative ways of knowing. For example, students no longer peer through the eyepiece of a telescope to ponder the night sky, they instead munge “downstream data'' processed and decontextualized by computers. Data pipelines replace complex sensorial field notes and train students to be consumers of data products rather than foragers for natural phenomena. During this talk we ask, how can educators collaborate across disciplines to design and implement a “green crossing,” between the power of data sets and the pleasures (or pain points) of the field? As educators, how do we privilege student learning over machine learning?
During this multimedia-rich presentation we explore two teaching interventions meant to foster divergent, situated, and embodied thinking in data science students. We begin our talk with a story of fieldwork unfolding on a small island in the Atlantic ocean. Here we show how real-world observations of wildlife can lead not only to unpredictable research findings, but can also spark unanticipated artistic endeavors. Then, we visit the forested edges of a heavily-used freeway on the west coast of the US, situating students of machine learning in the landscapes of wildlife crossings. What, we ask them to consider, can an embodied researcher accomplish that a fieldcam alone cannot? Why should we study the world through our full sensory apparatus?
Short abstract:
This panel will present an artistic approach to visualize the relationships between images within the invisible part of a dataset - a part that has not been identified by the Google API. How can art point to narratives and landscapes that have been erased - and to those that have yet to be created?
Long abstract:
Denise Scott Brown and Robert Venturi, influential architects in the field of urban planning, stated in "Learning from Las Vegas" that "learning from the existing landscape is, for the architect, a way of being revolutionary". Such learning is possible through the use of numerous tools: drawing, photography, data collection - but perhaps one of these tools is the most effective of all, which is, as both architects suggest - the gaze.
This panel will therefore present an artistic project that aims to visualize potential relationships between portions given as invisible images - the result of an algorithmic censorship. The project used images shared on Google Maps - a map and image visualization service - as a dataset to 1) discuss how machines are changing the nature of vision (Azar et al) and, therefore, our knowledge of the constructed-scape and 2) understand, through a disobedient stance, which landscapes we are failing to see and stories we are failing to tell. We are perhaps facing the most paradoxical situation of potentially creating the richest and most plural visual culture in history through access to the media and "being plunged into the limbo of the uniformity of the gaze" (Beiguelman). However, training sets are increasingly part of our cities infrastructure and therefore have the "power to shape the world in their own images" (Crawford & Paglen). Which potential landscapes (and narratives) are we failing to be agents of in this process?
Short abstract:
How do image tracking algorithms represent minoritized bodies? I have used situational mapping to capture the misappropriation of trans bodies across different internet locations, as those resurface through reverse search algorithms. What we see through the software becomes the focus of critique.
Long abstract:
It is common practice to post images of one’s body on social media. For some, this constitutes a routine, a naturalized aspect of networked social existence. For others, the digital publication of images of their bodies represents a contentious practice, a call for solidarity. In particular, trans-identifying content creators post images of their bodies to their followers to normalize the diversity of the trans experience. Their images invite viewers to embrace the presentation of different gender expressions and, potentially, to embrace their own non-normative physique in a society that is otherwise dominated by strict stereotypes of gender presentation. However, these images are downloaded, re-uploaded and misappropriated by various actors, who are not necessarily sympathetic or mindful of the original creator’s intentions. I used reverse-image-searching tools to investigate the misappropriation of images of trans bodies across the web. This method yields an algorithmically-curated mapping of the locations, where these images resurface online. Using situational mapping, I captured the misappropriation of these images by third-party-actors so as to draw attention to the double bind of the online representation of trans bodies. On one hand, trans content creators use social media platforms to achieve the broadest possible visibility. On the other hand, third parties take advantage of the public nature of social media platforms to misappropriate images of trans bodies for their own ends. Ultimately, this paper encourages readers to mind the numerous ways in which the non-normative body is treated online and to question how the attention economy affects different bodies.
Short abstract:
A mix of talk and screening of excerpts from the experimental feature film Acapulco, directed by Bruno Moreschi during 2023/2024 at the Collegium Helveticum, in Zurich. The film tests different methodologies to expand the possibilities of using datasets that train Computer Vision.
Long abstract:
The emergence of computer vision also coincides with the dissemination of Large Scale Vision Datasets (LSVDs). In these datasets everything is intricate: millions of amateur images taken without consent from social media; low resolution files; a strong presence of scenes related to US culture etc. All this organized from precarious labor carried out by thousands of anonymous microworkers on platforms such as Amazon Mechanical Turk. Challenged by such opacity, I decided not to argue that these datasets are simple black boxes and act in a purposeful and collaborative way. I created a long-term methodological practice for a deeper understanding of these images. During the pandemic and social isolation, I began to invite people directly or indirectly related to images, technology and/or arts to receive 3 postcards by mail with the images – now in print. With images now materialized and more individualized than in the LSVDs, I established countless exchanges and conversations with the “seers”. The result was more than 40 hours of recordings and other types of feedback received (such as drawings, texts, sounds, etc.). This is valuable material for understanding these images in depth and in a non-normative/commercial way. The result of this research culminated in the film Acapulco. In my presentation I intend to show excerpts from the film in a kind of experimental lecture.
Short abstract:
What can we (un)learn from adopting the oppressive gaze of algorithmic violence? This reflection will take shape as a live video-essay, exploring practice-based work that uses algorithmic surveillance images to question algorithmic surveillance itself.
Long abstract:
Computer vision algorithms both emerge from and give support to surveillance infrastructures: on one hand these systems are most often born out of the data scraped from the open web, organized as data sets (cf. Denton et al., 2021); on the other hand, these algorithms are becoming the backbone of algorithmic surveillance in supermarkets, streets, and war zones (cf. Bellanova et al., 2021).
In parallel, much STS scholarship has aimed at making visible the digital infrastructures that undergird our everyday life (e.g. Parks, 2015; Blanchette, 2012). Practice-based research has questioned the assumptions embedded in infrastructures, ranging from "datawalking" the smart city (van Es & de Lange, 2020) to drawing an anatomy of AI (Crawford & Joler, 2019).
Building on these points, my question is: How may we conceptualize practice-based engagement with computer vision as a surveillance infrastructure? How may we engage with algorithmic surveillance itself as a way for questioning algorithmic surveillance? What conceptual and ethical issues are born from seeing like algorithmic surveillance? Finally, what can we (un)learn from adopting the oppressive gaze of algorithmic violence?
This reflection will take shape as a live video-essay (a talk with short video segments). I will reference art/scholarship on STS/surveillance which uses the surveillance apparatus, such as: Theo Antony's movie "All Light, Everywhere" (2021), made with police body cams; and Manu Luksch's "Faceless" (2007), entirely made using London's CCTV. In parallel, I will reflect on my in-progress work on the colonial history of Automated License Plate Recognition.
Short abstract:
Offering a comparison between two training datasets, this paper considers the role of ‘interestingness’ as an empirical quality sought after by machine vision researchers. In such cases, the search for interestingness leads researchers to design elaborate ways to define, categorize, and quantify it.
Long abstract:
In 2012, the KITTI Vision Benchmark Suite was launched, a training dataset used to compare real-world benchmarks useful for the development of autonomous vehicles. Funded through a collaboration between the Karlsruhe Institute of Technology (KIT) in Germany and the Toyota Technological Institute at Chicago (TTI-C) in the USA – hence KIT-TI – the Vision Benchmark Suite provided the foundation for the early ‘benchmark era’ of autonomous driving in the 2010s. Seven years later in 2019, Google/Alphabet’s autonomous vehicle division launched the Waymo Open Dataset, indebted to KITTI and other such open-source benchmark projects, establishing a new ‘incrementalist’ phase of autonomous vehicle development. Tied to annual iterations of their Open Dataset Challenges, Waymo published updates to the dataset in 2021 and 2022, adding unrivalled 'domain diversity' to their offering. Together, both dataset and challenge constitute Waymo’s vision to ‘platformize’ autonomous driving, mobilizing open data initiatives and logics as the basis for commercial development, locking prospective users into their plug-and-play machine learning (ML) stack. Offering a comparison between these two training datasets, representative of different phases in the development of autonomous vehicles, this paper considers the role of ‘interestingness’ as an empirical quality sought after by machine vision researchers in the compilation of such training datasets. In these cases, the search for interestingness leads researchers to design and test ever-more elaborate ways to define the kinds of scenes, situations and scenarios captured in the training datasets themselves, resulting in the quantification of interestingness as an increasing degree of interaction between agents.