Musical Similarity: The Case of an Impossible Ground Truth

Conference

EASST2026

Krakow, Poland

7 – 11 Sep 2026

Panel explorer Panel list Website
Log in

Accepted Contribution

Allison Jerzak (University of California, Berkeley)

Short abstract

I use a historical case study to probe how researchers established a ground truth for musical similarity — a contested, subjective concept. Their strategies gave researchers evaluating other cultural domains a proof-of-concept. I argue that history plays a vital role in algorithmic explainability.

Long abstract

Ground truths provide the basis for evaluating computational systems. In the musical case, however, evaluation has proven an abiding challenge. “Subjective evaluations are somewhat unreliable…objective evaluation is also problematic, because of the choice of a ground truth to compare the measure to,” observed Sony Music researchers Jean-Julien Aucoutuier and Francois Pachet in a 2004 survey of musical querying methods. Moreover, they readily admitted that the very concepts they were trying to evaluate — usually musical similarity or genre — were “ill-defined,” not readily measurable concepts, and unconducive to consensus (Aucouturier and Pachet, 2003; Pampalk, 2003). Nevertheless, researchers found ways to claim model validation.

This paper examines several experiments by Aucouturier and Pachet (2000, 2002, 2004) on computing musical similarity for music recommendation. This historical case is instructive: early computational systems were smaller, often supervised, and standards were developing, leading Aucouturier and Pachet to offer explicit discussions about their intuitions. They evaluated their system by comparing “similar” song pairs generated by their signal processing algorithm against those songs’ textual metadata. The problem: most returned pairs reinforced existing cultural knowledge. The optimal results were “interesting”— when the algorithm perceived similarity between unexpected songs. Yet the distinction between an “interesting” result and an incorrect one was never articulated: researchers relied on their own judgement. Ultimately, I show that a historical inquiry into musical evaluation demonstrates how imbricated aesthetic and technical questions became on the early internet, a legacy excavatable today. In doing so, I offer a new methodological approach to algorithmic explainability: history-as-method.

Combined Format Open Panel CF14
Ground truths and the epistemology of AI
Session 3

A A A A A