Demystifying Machine Learning: looking critically at your dataset
Presentation
We might think of machine learning as an objective and politically neutral tool, but in truth algorithmic training has a huge influence on matters of social, gender and racial justice. Consider, for instance, these four disparate but telling examples.
Case 1. In March 2020 a group from Duke University in North Carolina publishes a paper [1] with a new approach to the super-resolution problem, i.e. the long-sought answer to the question of augmenting the resolution of a blurry image. Denis Malimonov, coder and popularizer of machine learning in art, uses their findings to develop Face-Depixelizer [2], which takes a blurry image and outputs its high-resolution version. Malimonov shares the code and advertises it on Twitter. The user @Chicken3gg then comments with a screenshot of a pixelated Barack Obama next to its high-resolution version, a Barack Obama with white skin and light eyes [3]. The first African-American president of the United States was being whitewashed by an artificial intelligence algorithm, a sort of digital skin-lightening cream. As these are bad for health [4], machine learning seems to be racist in its output.
Case 2. In 2014 Amazon’s engineers attempt to build an automated tool for pre-selecting applicants’ CVs [5]. Only a year later they discover that their experimental tool was being sexist. It was not rating applicants for software developer roles in a gender-neutral way, it penalised candidates that had the word “women” in their curriculum. In 2016 researchers denounced [6] a technique widely used to help machines understand human text. It was found that the machine was being sexist by perpetrating binary gender stereotypes. Although the machine established a (lexical) connection between ‘queen’ and ‘female’, it also connected the word ‘woman’ to ‘nurse’ and ‘man’ to ‘doctor’.
Case 3. In 2016, Microsoft launched Tay, a machine learning powered bot that was designed to learn how to engage in conversations by tweeting with other users. People could tweet Tay and she would tweet back. The same day Tay was made available online, a controversial group of users started interacting with her and feeding her politically incorrect text. Tay was designed to learn from human interactions and she quickly started to release sexist, racist, sexually explicit and even neo-Nazi tweets. Tay even tweeted in support of Trump. Microsoft apologised [7] and Tay tweeted that she was going to sleep since she needed a rest. She has stayed off-line ever since. All of this happened in less than 24 hours.
In the same year, Propublica.org analysed the outcomes of Northpointe’s commercial algorithm for assessing a criminal defendant’s likelihood of re-offending [8]. In 2016 judges, probation and parole officers throughout the United States were increasingly using this type of risk assessment algorithms. Propublica compared the predictions from COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) with how people behaved over a two-year period. They found that the algorithm was being racist against African-American defendants. Black defendants were classified by COMPAS twice as likely to be at higher risk of re-offending compared to white defendants.
Quite a few things seem to have gone wrong when machine learning algorithms were asked to make decisions about humans. But why was that, and how can it be avoided?
Ingredients
Machine: computer.
Algorithm: finite sequence of uniquely interpretable and computer-implementable instructions to solve a specific task.
Data: information on objects, persons, events, that are collected through observations. They can be in numerical form or converted (e.g. an image or a video can be processed by machines once translated into multidimensional numerical matrices).
Machine Learning: field of study that gives computers the ability to learn without being explicitly programmed [9] and automatically improve through experience [10]. It is a powerful approach since it learns from data and solves complicated tasks without requiring an explicit solution. Especially useful when the solution is unknown, too difficult, or too long to be implemented by a human.
Artificial Intelligence: the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.
Preparation
Let’s look at the keyword machine learning (ML). A ML model is designed by a human, which means s/he chooses its skeleton. Before seeing data, the model is neutral, uninformed, uninstructed. During the training phase, data passes through the model and the model’s adjustable parameters are continuously updated until it can give the desired outputs. The model has learnt. After training, anything that the model knows comes from the data we gave it. It is clear now how crucial the training data is. The model is now able to mix parts (features) learnt from data to create the best output it can, but it cannot in any way generate new features.
Nowadays ML models have become astonishingly good at mimicking human writing [11], or generating images of realistic people [10]. The word realistic is key, meaning similar to the images given as a training set. On the other hand, images generated by the model StyleGAN [12] will hardly be of black women. And this is because only a few percent of the images used for training were actually women and were black [13]. It becomes clear now that Microsoft’s bot Tay was not neo-Nazi, it was the data that turned her into one. Similarly, Face-Depixelizer was not whitewashing Obama, but its training set was made up of mostly white men and created a high-resolution image accordingly. We, humans, are contaminating machine learning models with our own biases and discriminatory views. As most of the people writing code are white cisgender straight men, and it seems to be left to them to create unbiased, politically correct, non-discriminating datasets. Moreover, learning from data at best means learning from the present, and in many cases, it means learning from the past. For us westerners, both our present and past are mostly white, racist, sexist, straight and cisgendered. As we are mostly the ones writing ML models there is precious little chance that they could be any different, unless we take direct actions.
Nevertheless, a few solutions have been proposed, along with tools to measure models’ biases [14]. How to make your model fair is definitely not the hottest topic in ML and often only gender bias is addressed, but results could be promising [15]. Measuring biases is undoubtedly an important step to compare models and build alternative solutions. However, building awareness is extremely important since ML is already largely used by software that most people interact with. Treating it as a black box can give a false impression of impartiality, which can be used to justify discrimination with meritocratic arguments.
What is for sure, is that technologies with such a great impact on our everyday life should not reproduce our own biases. Countermeasures are valuable, but the source of these biases remains. And until the people who code and build datasets will be part of the dominant system, there is no chance for our models to be socially just. For this reason, it is imperative for us to be aware that we are reproducing discriminations and domination relationships and act to show to our ML models not the world we live in, but the one we want to build.
References
[2] https://github.com/tg-bomze/Face-Depixelizer
[3] https://twitter.com/Chicken3gg/status/1274314622447820801
[6] https://arxiv.org/pdf/1607.06520.pdf
[7] https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/
[8] https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[9] https://doi.org/10.1016/B978-0-12-809715-1.00012-2
[10] http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.html
[11] https://arxiv.org/pdf/2005.14165.pdf
[12] https://arxiv.org/pdf/1912.04958.pdf