Accepted paper:

Exploring the bias in de-biasing

Authors:

Doris Allhutter (Austrian Academy of Sciences)

Paper short abstract:

My paper analyzes practices of de-biasing in machine learning and natural language processing. It investigates the concept of bias that different de-biasing methods are based on and shows how differing ideas of gender bias and racial bias suggest solutions that vary widely in complexity.

Paper long abstract:

In the past two years, researchers in machine learning and natural language processing have put much effort into finding ways of removing gender bias and racial bias produced by classification learning algorithms and word embeddings (e.g. Berendt & Preibusch 2017; Bolukbasi et al. 2016; Caliskan et al. 2017). Computer scientists experiment with different solutions and contexts and are gaining deeper insight into how profoundly data - text, language and societal discourse - are gendered and entangled with racist stereotypes. However, when exploring methods for de-biasing, computer scientists are also actively taking part in the co-production of meanings surrounding social categories such as gender and race. From a technological point of view, de-biasing is not a trivial task and the research that is done to avoid amplifying "human-like biases" seems to be a driver to improve machine learning as a whole. However, the problem is by and large diagnosed to exist in society and technical solutions are to adjust human shortcomings. My paper analyzes practices of de-biasing in machine learning and natural language processing. It investigates the concept of bias that different de-biasing methods are based on and shows how differing ideas of gender bias and racial bias suggest solutions that vary widely in complexity. Berendt/Preibusch (2017). In Big Data 5(2), 135-152. Bolukbasi at al. (2016). In Advances in Neural Information Processing Systems, 4349-4357. Caliskan et al. (2017). In Science 356(6334), 183-186.

panel A27
The power of correlation and the promises of auto-management. On the epistemological and societal dimension of data-based algorithms