to star items.

Accepted Contribution

Expertise as Proxy: Stabilizing Uncertainty in Reinforcement Learning with Human Feedback  
Ishaan Pota (New York University)

Send message to Author

Short abstract

Through interviews with expert data labelers and a socio-technical analysis of RLHF, this paper shows how expertise stabilizes uncertainty and legitimizes ground truths in non-convergent domains like the humanities. It also reveals the opacity of labor platforms when assigning expert credentials.

Long abstract

This paper examines the recent practice of hiring “experts“ – defined by model developers and labor platforms as Master’s and PhD degree holders in relevant domains – to enhance model performance in the Reinforcement Learning with Human Feedback(RLHF) phase of LLM development. It combines a socio-technical analysis of the RLHF process with in-depth interviews with “expert” workers across different fields on platforms like SurgeAI and Outlier to understand how model developers conceive of expertise, and the assumptions underlying the technical infrastructure that attempts to “encode” expertise.

Performance improvements in RLHF rely on human reviewers converging on one solution to a problem. While this works in STEM domains where there is usually one correct answer to a problem, in fields like social sciences and humanities that require debate, expertise is narrowed to mean fact retention over nuanced engagement. This epistemic weakness reaches its limits when “expert” logic is applied to creative fields, reducing creativity to credentials, foreclosing the potential for radical uncertainty.

Despite the aforementioned limitations, the insistence on using “experts” across fields shows that expertise acts as a brittle legitimizing category for ground truths rather than a step in model improvement. The contingent foundations of this ground truth are compounded by the erratic behavior of labor platforms, where the title of “expert” is granted opaquely, and is revoked arbitrarily, further undermining the epistemic foundation of this process. This paper highlights the limits of expertise in RLHF and raises questions about how best to encode knowledge in domains defined by uncertainty.

Combined Format Open Panel CB186
Ground truths and the epistemology of AI
  Session 2