P084: Machine listening: dissonance and transformation

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

P084

Machine listening: dissonance and transformation

Convenors:: Juana Catalina Becerra Sandoval (IBM Research)
Edward B. Kang (New York University)
Send message to Convenors

Format:: Traditional Open Panel

Location:: NU-5A47

Sessions:: Wednesday 17 July, 10:30-12:00
Time zone: Europe/Amsterdam

Short Abstract:

Machine listening AI systems are increasingly being used across medical, financial, and security infrastructures. This panel explores the epistemic question of what it means to listen, and more specifically, how listening is transformed through the essentialist logics of artificial intelligence.

Long Abstract:

Listening through and with machines has a centuries-long history in the form of technologies like the stethoscope, sound spectrograph, and telephone, among others. The more recent development of artificial intelligence (AI) technologies, however, that extract, collect, quantify, and parametrize sounds on an unprecedented scale to manage information, make predictions, and generate artificial media have positioned the intersection of AI and sound as “the ‘next frontier’ of AI/ML” (Kang 2023). Referred to as machine listening systems, these technologies are embedded into medical, financial, security, surveillance and workplace infrastructures with crucial implications for how society is and will be organized. In this way, machine listening systems add new valence to the epistemic question of what it means to listen, and more specifically, how listening – as a constructive epistemological process of projection, as opposed to reception – is transformed in and through the essentialist logics of artificial intelligence and machine learning (ML). Indeed, machine listening stands to reconfigure ideas around the body, identity, voice, and space, as well as complicate the relationship between ‘listening’ and ‘objectivity,’ especially in contexts such as law and science. To fill the gap in existing critical AI scholarship that has largely focused on computer vision, this panel invites Science & Technology Studies (STS) scholars interested in the relationship between AI and sound. This includes topics such as voice biometrics, acoustic gunshot detection, speech emotion recognition, accent-matching, and other forms of forensic and medical sound analysis, but also extends to machine listening systems that collect audio data for their use in AI models that transform and produce music and speech. We are especially keen on receiving submissions that engage with questions of epistemology and politics as articulated through feminist, critical race theory, crip, decolonial, and other frameworks grounded in material analyses of power.

Accepted papers:

Session 1 Wednesday 17 July, 2024, 10:30-12:00

Queering the ‘autistic voice:’ A proposal for unsettling the biologization of identity in vocal biomarkers research

Chiara Carboni (TU Dresden)

Short abstract:

Emerging research on vocal biomarkers for autism diagnosis mobilizes machine learning to enact the ‘autistic voice’ as an entity rooted in individual biology. How might histories of queer voices unsettle current attempts at making minoritarian identities legible through machine listening?

Long abstract:

Following technological developments in the field of artificial intelligence and ubiquitous computing, human voices are increasingly cast as a repository of rich information to be extracted through machine learning. In the medical field, the quest for ‘vocal biomarkers’ exemplifies this notion of the human voice as a correlate of, and an avenue into, a range of physio- and psychopathological states. Research on digital biomarkers, of which vocal biomarkers are a sub-type, aims at establishing a direct link between ‘the biological’ and ‘the digital’ by biologizing both the digital traces left behind by individuals, and the physio- and psychopathological conditions they are supposed to stand in for. Although no vocal biomarkers have been approved for clinical use yet, major investments are being made in their discovery, and they are being mobilized by medtech startups. This presentation asks what a voice can (be made to) do when enhanced through big data and machine learning. Specifically, it centers on research on vocal biomarkers for autism diagnosis. Targeting autism as a diagnosis troubled to this day by its lack of biomarkers, this emerging field of research mobilizes machine learning to enact the ‘autistic voice’ as an entity with stable and unchanging characteristics across individuals. Reading the history of the autistic voice, and its recent machine learning-fueled developments, in parallel to past and present attempts at biologizing queer voices, this presentation speculates on how queer theory and lavender linguistics might unsettle contemporary attempts at making minoritarian identities legible through machine listening.

Examining the origins of bias in speech recognition technology

Johann Diedrick (NYU)

Sonic similitude the significance of similarity metrics in the context of AED

Johan Malmstedt (Media and Communications Studies)

An ineradicable excess of sound: audio compression algorithms and the politics of intelligibility in the construction of “redundant” data

Felicia Jing (Johns Hopkins University)

The great perception machine: limits of formalization in speech emotion recognition

Tanja Knaus (University of Oslo)