Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality.
Log in
- Convenors:
-
Anna Weichselbraun
(University of Vienna)
Michael Castelle (University of Warwick)
Siri Lamoureaux (University of Siegen)
Send message to Convenors
- Chair:
-
Siri Lamoureaux
(University of Siegen)
- Format:
- Traditional Open Panel
- Location:
- HG-05A24
- Sessions:
- Friday 19 July, -, -
Time zone: Europe/Amsterdam
Short Abstract:
This panel convenes linguistic anthropology and STS to consider how large language models (LLMs) emerge from language-ideological and material-semiotic practices. We invite papers that contribute to better understanding the constitutive work of LLMs at multiple stages and via multiple stakeholders.
Long Abstract:
This panel brings together linguistic anthropology and STS to consider how large language models (LLMS) are transforming both the language sciences (linguistics, computational linguistics, NLP) and technosocial practices. With the release of OpenAI’s ChatGPT in 2022, discussions have focused on what its uncannily human-like generated text means for politics, education, knowledge production, authorship, sociality, care, etc. Despite the interest of social scientists in critiques of LLMs over, e.g., the reproduction of biases in datasets, minority representation, and energy consumption, much of this work now takes place in computing sciences and/or industry. Few of these internalist critiques, however, center language as a form of social and cultural action, the purview of linguistic anthropology. This panel addresses this gap, encouraging both linguistic anthropologists and scholars of the language sciences to interrogate the construction, development, and imaginaries surrounding LLMs.
Linguistic anthropology challenges Enlightenment notions of “language” and “representation” prevalent in the computational and social sciences, and instead emphasizes the situated, pragmatic and indexical functions of language. While interested in the technological mediation of language, it has largely overlooked the transformation of the concept of language by computational linguists, scholars in NLP, and the designers of programming “languages”. In turn, posthumanist trends in STS have favored a “material-semiotics” set in opposition to language-as-representation — a contrast actively dissolved by LLMs. Both fields could contribute to new understandings of LLMs.
These papers contribute to understanding the work of LLMs on questions of (1) language policy and governance (e.g., how do policymakers understand LLMs? What language ideologies motivate the efforts to police LLMs?); (2) R&D in practice (what are the implicit or explicit language ideologies of individuals, professions or companies developing large language models?); (3) users and implementation (with what expectations do users encounter and interact with LLMs? What do these reveal about language practices?)
Accepted papers:
Session 1 Friday 19 July, 2024, -Short abstract:
This paper provides an overview of the fundamental relevance of the field of linguistic anthropology to the understanding and/or critique of Large Language Models, clarifying aspects of current debates about LLMs within NLP research as well as among those in the social science and humanities.
Long abstract:
At the core of the controversiality of Large Language Models (LLMs), on the one hand, their implicit rejection of influential theories in mainstream linguistics and cognitive science; and on the other, the unconscious adoption of interactional paradigms — such as the overt dialogicality of the "instruction-tuned" ChatGPT — championed more frequently in the humanities and social sciences as fundamental to sense-making. Indeed, many computer scientists in contemporary NLP do not find it necessary to concern themselves with the wide variety of past or present theories of language and learning. However, a prominent and arguably misguided assumption has been made by members of the AI research community that applying increasing scale to these models' training data, training time, and/or architectural size is likely to lead to the achievement of superhuman intelligence; this perspective, like many individualist approaches to cognition, necessarily downplays the role of indexical embodiment, social interaction, and contextually reflexive cultural practices in already-sociotechnical human communication. I will argue that a better understanding of the field known as linguistic anthropology can help understand both current and future successes and failure modes of LLMs, as well as helping social scientists and humanists to avoid some common, but misguided, avenues of critique for LLMs. From century-old works of American anthropological linguistics to the more contemporary insights of Michael Silverstein's theories of pragmatics and metapragmatics, this resolutely empirical — but semiotically and ethnographically well-grounded — school of thought provides surprising insights into both the intriguing strengths and fundamental limits of these computational artifacts.
Short abstract:
This paper studies the language ideologies of AI research through ethnographic fieldwork in San Francisco. Here, LLM-based conversational agents invoke the ideologies of the interview society. These ideologies inform LLMs’ perceived capacities and authority yet pose certain hazards for their use.
Long abstract:
This paper examines the language ideologies motivating generative AI development in the San Francisco Bay Area. To do so, it offers two case studies from ongoing ethnographic fieldwork among AI researchers in the region. The first case comes from the subfield of AI Safety, which seeks to “align” AI models with so-called human values. Here, researchers employ LLMs as “conversational agents” that supposedly discern human interlocutors’ underlying values through deliberative interaction. The second case comes from researchers’ use of LLMs in their personal lives. It describes efforts to fine-tune models like OpenAI’s ChatGPT with transcripts from discussions about topics like AI’s societal implications, creating purported conversational experts in the topics.
Both cases exemplify the use of LLMs as technologies that collect conversational data for interpretation through abduction. In such an application, LLMs employ the interactional and epistemic techniques of the “interview society” as described by Atkinson, Silverman, and, later, Briggs. Here, interviews offer privileged and authoritative access to knowledge that is otherwise hidden—especially knowledge about persons. To do so, interviews invoke Liberal and Romantic language ideologies about public reason, inner expression, and authenticity. In conversational agents, these ideologies now inform the perceived capacities and authority of LLMs. Yet interviews are always partial and positioned, posing hazards for LLM applications. These hazards are already faced by social researchers engaging in interview methodologies. By approaching interviews as an interactional form common to LLMs and social researchers alike, this paper also raises important reflexive questions for scholars of AI.
Short abstract:
This paper draws on ethnographic research with language workers involved in producing LLMs and other AI technologies in Amman, Jordan to analyze metaphors—and the language ideologies animating them—as crucial material-semiotic practices for rendering Arabic an object of technological advancement.
Long abstract:
In May 2023, OpenAI CEO Sam Altman spoke at the Xpand Technology Conference in Amman, Jordan about artificial intelligence and large language models (LLMs). In his opening remarks, tech mogul and moderator Fouad Jeryes emphasized the significance of the event happening in Jordan: “We remain a powerhouse here for tech…we are the creators of the overwhelming majority of [Arabic] content on the Internet.”
Discourses about Jordan’s inordinate production of digital content have circulated for over a decade but have taken on greater importance with the rise of LLMs and other AI-enabled technologies built on massive language corpora. Today, language workers in Jordan’s tech sector—a historically Anglocentric industry—not only accumulate socio-economic capital through their Arabic competencies; through everyday labor and discursive practices, they craft Arabic into a data-rich language of technological advancement.
This paper draws on a year of ethnographic research with language workers who help build and maintain LLMs in Amman’s tech sector—e.g., annotators, proofreaders, lexicographers—to understand the constitutive work of metaphor in constructing language technologies. Drawing on semi-structured interviews and participant observation, it analyzes the language ideologies that animate these metaphors and the mobilization of metaphor—successful or not—to translate technical concepts grounded in Anglocentric assumptions of how language ought to work. Bringing together STS and anthropological scholarship on metaphor (DeLoughrey 2013), language ideologies (Bauman and Briggs 2003), and language work (Orr 1996), this paper centers metaphor as a crucial material-semiotic practice for producing complex sociotechnical systems like LLMs, especially in linguistic contexts and political economies outside the Global North.
Short abstract:
This paper examines research on language and culture in computational linguistics in order to understand and theorize the field’s critique of itself. It further aims to characterize the language ideological assumptions that motivate the construction and application of the tests.
Long abstract:
How do the computational linguists and computer scientists who develop LLMs understand language and culture? In this paper, we examine research on language and culture in the field of LLMs to understand how the field critiques itself.
We postulate that the first round of critique, aimed at supervised machine learning classifiers, was the discovery of “bias” and the response to this discovery was “balancing the training sets” (Garrido-Muñoz et al. 2021, Shah et al. 2020). In the current era of machine learning, the critique is aimed at unsupervised models reinforced with human feedback, and presenting emergent qualities – meaning while the mode of interaction is predicated, the range of outputs is not. Previous NLP benchmarks that measure how closely a model is able to imitate natural language and use formal attributes such as GLUE (which measure a model’s ability to answer questions, detect sentiment, infer, and perform other generalizable tasks) and MAUVE (which measures how close machine-generated text is to human language), are no longer sufficient. Broader access to LLMs, has made testing on human benchmarks such as SAT, the Bar, and cultural alignment tests such as Hofstede Culture Survey (Yong Cao et al. 2023) popular to probe the quality of models (both for marketing purposes and in research papers).
This paper examines standardized testing as a benchmark for machines in order to (1) explore the underlying language ideological assumptions of LLM developers which (2) inform how they understand and critique the production of meaning in synthetic text.
Short abstract:
The paper explores how historical opposition between deep structure and surface statistics in linguistics has organised understanding of the relationship between language and meaning. As LLMs today struggle to align with human norms, revisiting these debates can clarify the aims of machine training.
Long abstract:
Large Language Models produce sequences learned as statistical patterns from large corpora. In order not to reproduce corpus biases, after initial training models must be aligned with human values, preferencing certain continuations over others. This supplementary process can be viewed as the superimposition of normative structure onto a statistical model. We examine one practice of this structuration in how ChatGPT4 redacts and interprets fragments of Joyce’s Ulysses, a text that deliberately contravenes literary norms. We demonstrate that despite observing the form of the text, its idiosyncrasies and ‘literariness’ of the text are smoothed over in the model’s rearticulation. We then situate this alignment problem historically, revisiting earlier postwar linguistic debates which counterposed two views of meaning: as discrete structures, and as continuous probability distributions. We discuss the largely occluded work of the Moscow Linguistic School, which sought to reconcile this opposition by studying language as a communicative system in which its elements are both coordinated relationally (as structuralism argued) and occur with differential frequency, according to extra-linguistic social norms (as speech act and information theory suggested). Our attention to the Moscow School and later related arguments by Searle and Kristeva casts the problem of alignment in a new light: as one involving attention to the social structuration of linguistic practice, including structuration of anomalies that, like the Joycean text, exist in defiance of expressive conventions. These debates around the communicative orientation toward language can help explain some of the contemporary behaviours and interdependencies that take place between users and LLMs.
Short abstract:
We use tools and insights from ethnomethodology and conversation analysis to document and understand how people come to treat LLM-based interactive interfaces as ‘knowledgeable’ or even ‘intelligent’, and how they iteratively refine prompts to coax text generators towards desired responses.
Long abstract:
The unprecedented spread of large language models provides us with what is possibly the greatest natural experiment in human sense-making since the sociological breaching experiments of Garfinkel. Garfinkel studied how people in interaction respond to things like preset phrases presented according to a randomized metric; responses specifically designed to conceal a lack of understanding; and statements that blatantly contradict the evidence before their own eyes. He found that people are willing to go to great lengths to provide a commonsense interpretation of the talk they were exposed to. This work revealed that people bring practical methods for sense-making to just about any interactively presented material. Against this background, it is unsurprising that people have been quite impressed by large language models that generate statistically plausible continuations, fine-tuned to conform to human ratings of ‘helpfulness’ and ‘authoritativeness’.
Here we plan to bring the analytical tools of ethnomethodology and conversation analysis to bear on the study of how people make sense of interactive interfaces, particularly those of text-based language models. We present early results of an observational, qualitative, sequential analysis of records of human-LLM interactions. This represents a fresh take in an area where automated metrics and large-scale quantitative analyses reign supreme. We aim to document how people come to treat text-based interfaces as ‘knowledgeable’ or even ‘intelligent’, and how they iteratively refine prompts to coax text generators towards desired responses.
Short abstract:
Unlike other kinds of speakers for whom verbal disfluencies are seen as evidence of being morally or cognitively deficient, chatbots seem to project a sense of subjective depth for some users because of - and not in spite of - the fact that they are extremely flawed interactional partners.
Long abstract:
Large language models excel in the reproduction of genres and generic texts (Gershon 2023), yet nevertheless frequently produce semantically incoherent ones. This asymmetry between the chatbots' incredible facility for syntax, genre, and combination coupled with their lack of human-like semantics has created a recurring dynamic in which that partial incoherence seems to encourage some users to search for hidden personas underneath or within the chatbot. That is, the bots’ facility with structural cotextuality (Silverstein 1997) and yet lack of contextually-deployed sense categories has left some users searching for a “real” speaking voice. This asymmetry in capacities is exploited in what people refer to as jailbreaking chatbots, in which users try to prompt the chatbots into revealing a “truer” persona by circumventing some of the safeguards built into the bots by their engineers. The search for subjectivity in one’s speaking partner has a much longer history than just the few years or months that people have been interacting with chatbots (see Peters 1999). Using arguments from religious studies (Johnson 2021), I argue that the search for a true chatbot persona is just the technologized form of a much broader search for a hidden agent, as when participants in religious events try to uncover the soul inside a subject or the god possessing a speaker. The trick with chatbots is how quickly and easily they seem to project that sense of subjective depth because of - and not in spite of - the fact that they are extremely flawed interactional partners.
Short abstract:
This paper outlines the language ideological work necessary to producing and understanding something to be “consensus,” among both the developers of LLM-based platforms for promoting democratic participation, and for their users.
Long abstract:
Democracy is one among the many latest domains of AI experimentation. Both developers and policymakers see AI as a way not only to scale-up citizen participation, but also to “improve” democracy by finding consensus within even seemingly polarized opinions. One commonly used tool is the open-source platform Polis, created by The Computational Democracy Project, an American non-profit organization. Participants in projects using Polis can submit their own comments in response to an open-ended prompt and rank others’ comments. Although Polis’s algorithms are currently language-agonistic, developers have begun to try to integrate LLMs, with the hope, among other things, of being able to find consensus more quickly through generating statements that it suspects a majority of participants will agree with and predicting how participants would vote on comments they have not seen.
This paper outlines the language ideological work necessary to producing and understanding something to be “consensus.” I ask what consensus is for the individuals and organizations behind Polis and similar platforms as they move towards adopting LLMs, and the expectations of organizations in the Netherlands experimenting with these tools. Preliminary research suggests that Polis’s promise of being able to find consensus where humans cannot is undermined because what it produces is not then recognized as consensus by its users.