to star items.

Accepted Paper

Tapping objects into existence: The relevance of non-vocalized sounds for voice agents  
Damien RUDAZ (University of Copenhagen) Sara Merlino (University of Copenaghen) Brian Due Joel Wester (University of Copenhagen)

Paper short abstract

After examining a recurring practice among blind individuals—using the sound of tapping on objects as both a summons and a spatial resource for co-present blind participants—we explore the situated relevance for multimodal voice agents of perceiving and responding to more than vocalized sounds.

Paper long abstract

Although paralinguistic features like pitch and speed are being explored in research on voice-based conversational agents, the broader soundscape of a spoken interaction is so far not taken into account by these devices as they generate their responses. Yet, human talk-in-interaction commonly indexes co-occurring non-vocalized sounds—such as the rumble of traffic, or the revealing noise an object makes when struck with one’s hand. We illustrate this emergent relevance of non-vocalized sounds by detailing a routinized practice produced by blind individuals: tapping on objects both as a summons and as a resource available to co-present blind participants in locating the object segmented and made relevant by this tapping. Then, turning to episodes of interaction between blind participants and multimodal voice agents, we examine how the design of these agents prevents users from efficiently mobilizing this repertoire of “tapping” methods they originally developed from and for human interaction. We show how users remedy this difficulty in securing a common referent through a step-by-step upgrading of their indexical practices. Building on this analysis, we shed light on a philosophy of language embedded in the data streams conversational agents are designed to process and respond to. Specifically, we argue that these agents enact a definition of language as stemming exclusively from the human vocal apparatus—and, for most current voice agents, as articulated speech.

Traditional Open Panel P136
Outlasting 'disruption': Empirical perspectives on practical reasoning with AI
  Session 1