From human values to collective values: what are we aligning AI with?

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

Accepted Paper:

David Moats (University of Helsinki) Minna Ruckenstein (University of Helsinki)

Short abstract:

This paper takes an empirical approach to the term ‘value alignment,’ which refers to attempts to align AI with human values. We analyse what understanding of values which underlies these initiatives and contrast this with understandings of values from anthropology, STS and valuation studies.

Long abstract:

The term 'value alignment,' which refers to attempts to design Artificial Intelligence systems which are aligned with ‘human values,’ is becoming more popular as leading AI companies like Open AI and Anthropic attempt to reassure the public of the safety of their models. But what do AI proponents mean by ‘values’?

This paper takes an empirical approach to the term ‘value alignment’ and asks, what understanding of values is present in these attempts to make AI safe and trustworthy? Based on an analysis of press releases, blog posts and academic papers we argue that those who use the term ‘value alignment’ are generally more concerned with existential risks in the future rather than evident harms in the present.

From these initiatives we extract an understanding of values we call the ‘secret function,’ which sees values as held individually, driving action yet mysterious to the humans who hold them. Values are seen as nothing more than (or translatable into) a ‘utility function’ or a ‘target variable’ to be optimised for.

We contrast this understanding with work from anthropology, valuation studies and pragmatist philosophy which see values as collectively formed and negotiated over time, as resources for binding groups together and policing their boundaries. What would it mean if AI was aligned this more social understanding of values?

Traditional Open Panel P287
Beyond value alignment: invoking, negotiating and implementing values in algorithmic systems
Session 1 Tuesday 16 July, 2024, 16:00-17:30

A A A A A