Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality.
Log in
Accepted Paper:
Short abstract:
This paper traces the application of transformer network architectures to the domain of speech emotion recognition (SER). I aim to highlight the limitations of achieving a 'general purpose model' that can be applied to the 'whole wide world' when confronted with the human mind.
Long abstract:
This paper traces the application of transformer network architectures to the domain of speech emotion recognition (SER). While there is much literature on computational linguistics and image recognition, the study of the material-semiotic specificity transformer architectures within the audio domain is limited. How does a tool for knowing the world through spatial objects in the visual realm, become a tool to for knowing unstable objects, such as emotions (e.g. inner mental states), through the sonic realm? And what does this type of ‘transfer learning’ say about a brain-inspired connectionist AI paradigm? I answer this question through an empirical study of a European research community that develop speech emotion recognition systems for the private and public sector. The principle that underlies the transformer is the fact that it is emptied out of theory. In other words, practitioners try to devoid the model of any domain knowledge to circumvent the issue of applying stable labels to unstable objects. This phenomenon becomes problematic in the context of SER that quantifies the human mind through vocal signals. This paper, therefore, highlights the limitations of achieving a 'general purpose model' that can be applied to the 'whole wide world' when confronted with the human mind, and the problematic fact that these systems try to include inner mental states as a formal category.
Machine listening: dissonance and transformation
Session 1 Wednesday 17 July, 2024, -