By Steve Kilpatrick
Co-Founder & Director
Artificial Intelligence & Machine Learning
Machine bias is when a machine learning process makes erroneous assumptions due to the limitations of a data set. Data sets can create machine bias when human interpretation and cognitive assessment may have influenced it, thereby the data set can reflect human biases. The data set may also create machine learning bias if there are problems related to the collection and quality of the data leading to improper conclusions being made during the machine learning process. In this article, I discuss what machine bias looks like and how we can go about preventing and mitigating these biases.
In the past few years, words like machine learning and artificial intelligence have become ubiquitous in the media. Companies are scrambling to see how AI fits into their business models. Increasingly, consumers are also embracing AI in their homes and lives with the rise of virtual assistants and automation. But the algorithms behind machine intelligence are only as good as we make them, and often due to limitations with a data set, there is human bias in AI systems.
It’s tempting to think of machine intelligence as impartial and deterministic. However, that couldn’t be further from the truth, and there are all kinds of ethical questions surrounding AI and myriad ways a machine learning algorithm can be influenced. From the individual biases of the programmers who write the algorithms to the societal biases present in the datasets AIs are trained on, bias in machine learning is a real and present concern for companies developing AI solutions.
In order for AI to play a fair role in our society, we need to be careful about the ways we create AI and how it’s deployed. There have already been many notable examples of machine learning algorithms learning biased lessons from their creators and datasets. Thankfully, there’s also an emerging field of machine ethics working on solutions and best practices for creating and deploying ethical AI.
How AI Becomes Biased
Researchers have identified over 180 different biases that affect human judgment and decision making. Every day, our biases play a role in how we interact with the world. However, if we’re going to offer algorithms the chance to automate or augment our decision making, those algorithms will have to do better. Bias in AI could diminish trust between humans and machine intelligence when the AI comes up short.
IBM Research outlines two main ways artificial intelligence inherits biases. First, AI is software, and it can have errors and dependencies. When an underlying tool or algorithm under the AI’s hood has flaws, the AI will inherit those flaws. This includes flaws in the user experience and the ways humans interact with the AI.
Second, most AI relies on historical data to learn its task. Recognising speech, images, or relevant search results is something AI learns over the course of thousands or millions of data points that have been pre-labelled with the correct answer. Unfortunately, these data points must come from the real world and the real world is biased. It’s not uncommon for datasets to favor white men to the exclusion of women and minorities, for instance. In addition, richer individuals, cities, and countries tend to have more data to work with than poor areas. Any preferences in the dataset will eventually become part of the AI’s preferences. Algorithms are only as smart as the data you feed them.
Still, we don’t know everything about how AI inherits its biases. Many machine learning techniques operate as a black box. It’s difficult to tell exactly how the algorithm works and arrives at its conclusion. This is especially true for neural networks where the algorithm dynamically adjusts the weighting of various factors in making its final predictions. It can be difficult to say why an algorithm has weighted various factors the way it has. In the face of such uncertainty, neural nets require rigorous and thorough testing in a variety of scenarios before we can be confident they’ll produce a fair result. Even with thorough testing, algorithms shouldn’t be trusted as the ultimate source of truth. Especially in social and policy making contexts, AI should be thought of only as a tool that helps humans make the final decisions.
What Machine Learning Bias Looks Like
Bias in machine learning can take many forms. It doesn’t necessarily have to fall along the lines of divisions among people. Any time an AI prefers a wrong course of action, that’s a sign of bias. So, when Tesla’s autopilot feature doesn’t notice a white truck ahead against the backdrop of a very sunny sky, that’s an algorithmic bias toward keeping the car on cruise in cases of uncertainty, instead of preferring the brake.
Of course, algorithms that respond differently based on race, colour, gender, age, physical ability, or sexual orientation are more insidious. These types of biases are ethical problems in our society at large and AI should help to reduce them, not exacerbate them. Nevertheless, examples of such biases abound.
Recently facial recognition technology has been in the news for poor racial inclusivity. The software performed face recognition within 1% accuracy for lighter-skinned males. However, for darker-skinned women, nearly 35% of images faced errors. Similarly, another algorithm tasked to recommend candidates with leadership potential was less likely to recommend women based on anonymised work history data.
More subtle still are the algorithms that influence what advertisements we’re shown and products we’re suggested. As advertisers use more personal information to tailor their offerings biases can widen gaps between groups of people. For instance, a study by AdFisher found that men were six times more likely than women to be shown advertisements for high-paying jobs. Biases like these, and even more subtle biases with less profound implications, are compounded over time. In the long run, machine biases may influence the way society views classes of people just by influencing the decisions and options available to those people.
In a major story that has become the canonical example of bias in production AI, ProPublica found that COMPAS, a software that helps judges in the United States make sentencing and bail decisions, was racially biased. The findings about COMPAS were subtle, because when the algorithm was correct, it made fair decisions about which ex-convicts would go on to re-offend. However, when the algorithm was wrong, people of colour were twice as likely to be falsely labelled “likely to re-offend.” COMPAS was widely used across the country to guide the criminal sentencing of real people, some of whom were probably mischaracterised by the algorithm.
In a less sinister way, consumers may also have experienced machine bias in their current technology. For example in 2012, during the early days of mainstream NLP based AI assistants. It was widely reported that Siri, Apple’s AI assistant could not understand Scottish accents. Notably, this would have been down to a lack of diversity of accents in the early data sets that trained this NLP algorithm.
Preventing & Mitigating AI Bias
Addressing Bias in AI is a serious challenge. The data that we use to train AI has to be carefully analysed for biases ahead of time and cleaned to represent a fair sample of the population. That analysis and cleaning takes time and expertise. The MIT Technology Review has found that some experts believe algorithmic bias is already prevalent across industries and there isn’t much will or business incentive to identify or correct it.
A report from Booz Allen suggests Governments can do a lot to address this issue on behalf of citizens. Privacy regulations already place restrictions on the types of data companies can collect from customers. Further regulations and guidelines around ethics, safety, and control of AI is needed. In addition, governments are in a unique position to help create publicly available anonymised datasets that are more reflective of populations as a whole.
Companies can’t rely on the government to be solely responsible for AI ethics, though. Corporations have an obligation to test their AIs for biases and identify problems with algorithms before they go into production.
How to prevent machine bias
1. Use a representative dataset
Feeding your algorithm representative data is THE most important aspect when it comes to preventing bias in machine learning.
Placing all the different types of groups of data into a dataset can be challenging, not least because you will need to segment the data to be sure that you have properly grouped and managed the data. Failing to do so could create a PR disaster, as was seen in the Apple Siri example above.
When you have insufficient data about one group you can weight groups against each other to increase its importance during training. But be aware this can lead to new, unexpected biases.
2. Choose the right model
Every AI algorithm is unique and there is no single model that can be used to avoid bias. There are however, frameworks that can be used to measure bias at various stages of the build.
It is important to utilise the best possible model for the problem at hand. Consider the limitations to each model and potential bias issues. For example, supervised learning may prove more accurate but can be subject to human biases and cognitive limitations. Unsupervised learning can be quicker to train but may be more prone to errors if the dataset has inconsistencies.
Finding out you used the wrong model and were not adjusting accordingly from the beginning can cause much bigger issues later on.
3. Monitor and review
Testing is paramount to the success of any application of machine learning. What’s important however is to do as much real-world testing as possible, on real-world datasets. Having successful results in controlled test environments might give a false belief that the algorithm is fool-proof. If your algorithm fails to factor for an unforeseen variable in the real world, it could be the designers that are blamed, especially if it has negative consequences for the recepeints.
Find out how much bias exists in your algorithm and when you discover unexpected biases, ensure that these biases are explained and/or resolved to the point that they no longer exist.
In the end, its good for business
Amazon was forced to stop using its application screening algorithm in 2018. It was found the algorithm was prejudice against women during the selection process. This is a perfect example of Ai bias in the real world.
Such testing is in a company’s best interests not just from a public relations perspective but also in building consumer trust, that their service and product is fair. In the long run, fair AI is good business practice and can negate any negative biases that humans have developed through time.
Most importantly, it’s the right thing to do for a fairer, more equitable future.
Is your company building machine learning algorithm? Are you applying machine learning to a real-world problem? We work with some of the largest companies on the planet as well as SMEs and start-ups building products & services that will change the world we know tomorrow. They trust us to find them exactly the right data talent for their projects. Feel free to reach out to me if you would like to discuss your resourcing challenges. You can contact me directly on [email protected].