Fairness and bias in AI
Fairness and bias in AI
Dr. Stephen W. Thomas, Assistant Professor and Executive Director, Analytics and AI Ecosystem at the Smith School of Business at Queen’s University, discussed fairness and bias in AI at Big Data and AI Toronto 2020. He teaches natural language processing, machine learning, and big data across Smith’s academic and executive education programs. His current research focuses on AI fairness and AI for good.
Dr. Thomas started by sharing successful stories of artificial intelligence across various fields; from self-driving cars and personal assistance to medical imaging and inventory robots, AI is transforming the way we lead our lives.
He explained that “at its core, AI is all about models making predictions”. The models are mathematical formulas that take numbers as input and produce a prediction as output. As an example, it could predict whether a loan applicant is likely to pay back the money he borrows, using data like the income level and credit score.
Automation makes it super easy to scale, finely tuned math equations make accuracy a surety, and the prediction’s accuracy irrespective of moods or preferences is highly reliable. This makes it successful in predicting credit scores, fighting fraud in real-time, self-driving cars, etc.
However, he pointed out that “it’s not all rosy”.
Bias in artificial intelligence
It has come to light that AI predictions can be biased. For example, COMPAS, a decision support tool “used by U.S. courts to assess the likelihood of a defendant becoming a recidivist”. An investigation of the algorithm found that African Americans were twice as likely as Caucasians to be given a false-positive result. This means that “it is keeping blacks in jail twice as often as it should compared to whites”.
Unfortunately, this is not an isolated case. A study of commercial facial recognition systems in the United States revealed that they could not differentiate among Asian or black faces like they could with white faces. False positives due to this incompetency could have “serious repercussions”, in Thomas’ words.
Another algorithm used in crime monitoring by predicting the occurrence of crime in districts would frequently guide police to low-income minority neighborhoods even if the crime rate there is low. The city of Pittsburgh has also struggled with similar issues. They used an algorithm to predict which children’s families are at risk for abuse, but the algorithm was shown to be biased against low-income families and often predicted abuse at a much higher rate than it should.
There are a lot more examples out there like the Apple’s credit card giving female applicants 20 times lower credit limit despite identical application statistics or Amazon’s AI-based HR resume screening tool eliminating a lot more women than men.
The data is biased
The model that makes predictions comes from a machine learning algorithm. An algorithm is given a large amount of historical data and it will learn what the best equation is to make the best predictions. Dr. Thomas explained that the algorithm by itself cannot be biased, as it only tries to maximize an objective mathematical function. All data is in the form of 0s and 1s for the algorithm. There is no way for it to know which number represents gender or race.
The bias comes from the data itself. Attributes like race, religion, color, or gender defined as protected attributes by the U.S. and Canadian governments are most prone to bias. According to Thomas, bias creeps into the model in three ways:
Although the data correctly captures what happened in the world, what happened is itself biased because humans are biased. For example, managers have historically hired more men than women or since police have historically gone to minority neighborhoods more often than they should, the algorithm learned that that’s what they should do in the future.
When data contains more samples from one subgroup than another subgroup, there is a mismatch in representation. For example, the data given to the model working on facial recognition had a lot of Caucasian faces but relatively few Black or Asian faces in comparison.
When it is difficult to measure what you really want to measure, you use a proxy variable. Unfortunately, that proxy is often not as good as expected. For instance, you might want to measure the number of crimes a person has committed in the past, but you don’t know that number so instead you look at how many time that person has been arrested. Replacing the number of crimes committed by the number of arrests is not a good proxy because minorities get arrested much more than they should.
How to determine if the predictions are fair?
Dr. Thomas pointed out that before we attempt to fix bias in AI and make it fair, we need to understand how to define a fair AI. What exactly is ‘fair’?
One way to find out if a prediction is fair is to check if the model gives the exact prediction about a particular person whether we include a protected attribute in the input. Another way to measure fairness is to ensure that the model gives the same positivity rate in different sub-groups. For instance, if it predicted that 20 out of 30 men would repay their loans, it must predict that 20 out of 30 women will also repay their loans.
Now, how do we build a fair model?
The most obvious way to go about it is to eradicate the protected attributes. With no data about race or gender, what could go wrong?
He surprised us all by saying that this doesn’t work. This is because historically, subgroups like women or black people have had bad credit scores and lower salaries because of societal mindset. So, without knowing the context, an algorithm might rank an ordinary woman much lower than a man with bad intentions simply because he has a higher salary. Historical bias has spread much deeper into our data than the protected attributes themselves, and an algorithm needs to factor in the gender or race of its subjects to counteract the bias. Not allowing that information will only cause more bias.
Another essential aspect to consider is that protected attributes must and should be considered in fields like medicine and should not be looked into at all while hiring your employees. So, it’s definitely a bad idea to use the presence or absence of protected attributes in the data as a means to fight bias.
Towards responsible AI
There are some best practices that we can follow to ensure the development of Responsible AI.
With respect to the data
Dr. Thomas suggested that representation bias can be avoided by ensuring that the data provided is balanced, i.e., has an equal number of training samples for all protected subgroups.
To get rid of measurement bias, we need to remove proxy variables. Even in situations where it is relatively difficult to get actual statistics, we must obtain them to achieve accurate predictions.
With respect to the algorithm
Forcing the predictions to be equal for all subgroups would also reduce the bias. “This is a little bit on the technical side, but you can basically add constraints to a model you make that it has to have the same metric for each subgroup”, Thomas said. This is the best way to deal with historical bias.
We can also make the algorithm set a different threshold for different sub-groups, making it easier or harder for a subgroup to get an optimistic prediction. Adopting such a technique will help offset the historical disadvantage one subgroup has over the other.
Dr. Thomas shared one last thought with the audience. “You need to ensure fairness in your AI models. Don’t assume that the algorithm is just going to do it properly, that the data is probably clean enough, or that it’s not going to matter that much. Everyone needs to solve this very important problem by taking active measures to assess whether your model’s predictions are fair and to fix it using some of these techniques.”
Interested in learning more about fairness and bias in AI? Join us for free at Big Data and AI Toronto on October 13-14, 2021. Learn more here.