Is Open-Source Data Science the Key to Unbiased AI?

Unbiased AI can help with major decision processes. Here’s how open-source data science can make that possible.

Sam Babic SVP and CIO, Hyland Software

Last Updated: February 10, 2023

Is Open-Source Data Science the Key to Unbiased AI?

Artificial intelligence (AI) has become so enmeshed in daily life that its presence is often invisible. AI technology drafts legal contracts, assesses job applicants, approves loan applications, detects financial fraud, and screens healthcare patients — the list of applications is endless and allows AI to be unbiased. Sam Babic, chief innovation officer at Hyland, discusses how open-source data science could help.

Like humans who performed certain tasks before technology was introduced, AI may execute them with a biased perspective.

Here’s why: AI algorithms “learn” by analyzing training datasets for predictable patterns and rules. But human biases are frequently baked into those datasets, even when engineers don’t intend to discriminate. And these biases — like a dataset that consists of a single or unrepresentative demographic — are usually not overt, making them challenging to correct. The result may be mortgage approval systems that deny one race over another, facial recognition tools that fail to identify people of colorOpens a new window , and image generators that only display images of white menOpens a new window when asked to depict a CEO.

As companies increasingly rely on AI tools to automate, streamline, and speed up routine business functions, minimizing bias in AI has never been more important. One solution is open-source data science, built on the work of a global community of contributors, enabling solution providers to introduce less biased AI tools with speed, governance, and transparency.

How AI Learns to Mimic Common Human Biases

Many AI algorithms are rooted in probability and statistics. They consist of a set of programmed rules and computations that determine how the AI will execute specific tasks based on the data that’s fed into the system. Bias occurs when the AI incorrectly predicts an outcome — such as the ability to pay back a loan — based on traits like race, gender, socioeconomic status, and others.

Engineers test AI against a set of data called training data that is cleaned and curated before it’s fed into the system. Training data can be skewed in ways that benefit or hinder certain groups, even when it’s clean and accurate. For example, data fed into an AI-based recruiting platform may teach the algorithm that most people hired for management roles hold a bachelor’s degree. So, the AI solution begins to screen out all applicants without four-year degrees, many of whom may be from low-income or marginalized communities.

The error occurred not because the original data was inaccurate but because human recruiters have historically prioritized college degrees in hiring, even for roles in which higher education doesn’t impact job performance. The engineers may not realize this bias is inherent in the training data and unknowingly launch machine learning models that haven’t been exposed to a representative or diverse demographic.

In other instances, the AI model “drifts” over time as it’s exposed to real-world data. Perhaps the training data and resulting algorithms are truly unbiased, and the AI treats all job applicants equally at launch. But over time, it learns that candidates who use words like “leader” and “proactive” on their resume are more likely to be hired, so it begins automatically rejecting applications from women, who may be less likely to describe themselves as leaders.

This is even more likely to occur if the applicant pool, in terms of numbers, is traditionally skewed towards a certain gender or demographic — such as men. As the AI processes more applications from men versus women, it will “drift” toward prioritizing traits associated with men.

See More: How to Succeed with AI Compliance in Manufacturing: Trustworthy AI for the Win?

3 Reasons to Use Open-source Data Science in AI Development

Open-source data science offers one of the most promising models for minimizing AI bias because it enables collaboration, trust, and transparency. In closed or proprietary systems, the engineer has complete control — and complete responsibility — over how the model behaves. But in an open system, engineers benefit from the perspectives, insights, and contributions of others working on similar problems.

Consider the following advantages of open-source AI when it comes to minimizing biased decision-making:

1. Unified code base: Open-source AI funnels resources toward more complex problems, like remediating bias in data sets. For example, a public university may use AI to determine which students receive scholarships, while a bank uses AI to approve small business loan applications. Even though these organizations aren’t competitors, the AI tools they deploy to assess financial needs are similar.

In an open system, both organizations can license an existing AI resource and contribute back to that resource so that outcomes improve for everyone. If the university determines that a given model excludes specific groups from scholarships and corrects it, the other licensees — and the people they serve — benefit.

2. Increased transparency and governance: In an open model, there is transparency about the data and approach used to train AI models and algorithms. That transparency enables any contributor to analyze whether the model drifted over time and suggest corrective measures. With so many contributors, it is difficult to compromise the product, whether intentionally or not.

3. Faster improvements: Bias manifests in code in disparate ways. Small internal teams aren’t necessarily well-positioned to spot and eliminate bias in all its forms, especially as AI models drift over time. Relying on open source dramatically expands the pool of people working on the project, which reduces the likelihood that bias will go unchecked while increasing the speed at which the model improves. Contributors issue corrections and improvements on an ongoing basis, decreasing the negative real-world impacts of biased AI decision-making.

Across industries, we’re growing increasingly reliant on AI decision-making. Organizations that deploy AI are morally and legally — to ensure their clients, customers, and users don’t experience discrimination due to biased algorithms. In fact, it’s conceivable that AI tools can be trained to be less biased decision-makers than their human counterparts — and an open, transparent model can help serve that goal.

Are you leveraging open-source data science to improve AI development? Share with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window .

Image Source: Shutterstock

MORE ON OPEN-SOURCE DATA SCIENCE

artificial intelligence Big Data

Sam Babic

SVP and CIO, Hyland Software

opens a new window opens a new window

Sam Babic is Chief Innovation Officer at Hyland. As Chief Innovation Officer, Babic is responsible for driving enterprise innovation by exploring business opportunities and emerging technologies to expand the company’s product portfolio and accelerate delivery of differentiated solutions to its global customers.

Do you still have questions? Head over to the Spiceworks Community to find answers.

Is Open-Source Data Science the Key to Unbiased AI?

How AI Learns to Mimic Common Human Biases

3 Reasons to Use Open-source Data Science in AI Development

MORE ON OPEN-SOURCE DATA SCIENCE

Recommended Reads

How To Enhance LLM Evaluation For Responsible AI

Can AI’s Need for Power Be Remediated or Offset by Its Benefits?

Will Prompt Engineering Change What It Means to Code?

GenAI in Legal Industry: Why Intelligent Document Processing Matters

Navigating Security Challenges in a Rapidly Evolving Developer Landscape

Top Three LLMs Compared: GPT-4 Turbo vs. Claude 3 Opus vs. Gemini 1.5 Pro

Is Open-Source Data Science the Key to Unbiased AI?

How AI Learns to Mimic Common Human Biases

3 Reasons to Use Open-source Data Science in AI Development

MORE ON OPEN-SOURCE DATA SCIENCE

Share This Article:

Recommended Reads

How To Enhance LLM Evaluation For Responsible AI

Can AI’s Need for Power Be Remediated or Offset by Its Benefits?

Will Prompt Engineering Change What It Means to Code?

GenAI in Legal Industry: Why Intelligent Document Processing Matters

Navigating Security Challenges in a Rapidly Evolving Developer Landscape

Top Three LLMs Compared: GPT-4 Turbo vs. Claude 3 Opus vs. Gemini 1.5 Pro