How Discrimination & Bias Shows Up in Data Analytics

on October 19th, 2016
Big data

One of the many hopes for humanity, as we allow computers and algorithms to handle increasing amounts of our processing tasks, is that machines will be blind to race, sex, age, and other factors that frequently lend to bias and prejudices. For instance, if computers take over our HR practices, the instances of selecting male candidates over females, or offering males higher levels of pay, will be done with, right? Similarly, if we hand the analysis over to computers, we’ll stop seeing biases based on color or sex or other factors, because computers are blind, right?

Unfortunately, what we’re seeing is the opposite. Big data is merely taking human prejudices and compounding them, so that the final analytical results represent the worst of our biases and unfairness. Confused? There are reasons for this phenomenon.

Human Prejudices Become Data Used for Analytics

It’s important to understand where big data comes from in order to understand how it comes to biased analytical conclusions. Let’s take the case of hiring practices mentioned a moment ago. Company A has historically hired more male job candidates than females. Therefore, a higher number of males were high performers than their female counterparts, simply because there were more men doing the jobs. Hence, when the company turns over its hiring practices to automated HR systems, the program hires more males, because the data “says” that males are high performers on the job.

Something very similar to this happened when St. George’s Hospital Medical School handed its student acceptance activities over to big data. Since the school had historically admitted more white males than females and applicants of non-white nationality, the algorithm simply did more of the same. Machine learning just learned to carry on the unfair, baseless practices of human prejudices.

This phenomenon has been observed across the board with big data analytics and machine learning algorithms. Search algorithms, for example, have learned to show more results related to arrest records and background searches if the name in the search query is considered common among minority groups. Some credit card companies have even begun denying credit accounts to people who like certain things on Facebook that are typically “liked” by minority groups. This, in turn, is based on the assumption that minorities are less likely to be creditworthy.

Again and again these prejudices show up, based on the prejudices society has programmed inadvertently into the data. The data comes from biased individuals and institutions, as do the algorithms. Therefore the results showcase that bias, sometimes even more prominently than when done by humans.

Analysts Tend to Stop Querying Once the Data Gives the Answer They Expect

There’s another phenomenon contributing to the prejudices and biases in big data, and that’s the tendency for researchers to stop querying the data when they get the answer they’re expecting.

Again, it’s important to understand big data analytics to understand why. Most of us picture genius data scientists developing brilliant algorithms, running a simple query on the data, and tada! The answer pops out. That isn’t what’s happening. Big data science is mostly a lot of tweaking and twiddling to get the algorithms to deliver any meaningful insight from the data sets. Depending on how good the algorithm is, how good the data is, and how smart the data scientist is at making sense out of the whole thing, the query may or may not deliver a reasonable result.

Having done this over and over and over and over, if the query finally spits out what the researcher has been looking for, the tendency is to take it and run with it. So, if the big data analytics concludes that a black inmate up for parole is 86 percent likely to commit another crime, whereas the white inmate before the parole board is only 26 percent likely to commit another offense, the researcher is likely to concur with the results and submit those recommendations to the parole board. If the answer makes sense to the researcher, (s)he simply stops looking.

How can we get prejudice and unfairness out of our data, so that those of differing colors, sexes, races, ages, etc. are finally free from bias? Well, as programmers have said from the dawn of the computer age — garbage in, garbage out. The only way to free big data and analytics from prejudice is to free society from it.

Isn’t it time, already?

For more information, insight, and inspiration on the wide world of big data, follow us on Twitter!