This talk is informed by ethnographic fieldwork within the machine learning and computer vision community and explores processes and justifications ("referential chains") in neural image generation and processing to tease out what constitutes robust knowledge in this field and beyond.
Paper long abstract:
This talk draws on ethnographic fieldwork within the machine learning and computer vision community to investigate the processes and justifications ("referential chains") employed by technical actors in the creation of computer-generated images. It compares advanced models such as Transformer/Diffusion models (e.g., SORA) and Neural Radiance Fields (NeRFs). These models are mediators that engage with the 'reality' inscribed in digital files in distinct ways, producing varied representations of visual data. SORA uses 'patches' within a probabilistic framework to process and nest semantically related scenes. NeRFs use 'rays' to construct connections with spatially bound scenes, allowing for precise three-dimensional reconstructions. By comparing these two techniques, this talk aims to unpack the criteria for success in visual AI to advance our understanding of what constitutes robust knowledge in the epistemic culture of the computer vision community and beyond.