AI Hallucinations and their Impact on Enterprise LLM Adoption

Discover everything about AI hallucinations and their impact on businesses, from legal repercussions to customer trust.

February 16, 2024

AI Hallucinations and their Impact on Enterprise LLM Adoption

In this article, Avaamo founder and CEO Ram Menon explains the different types of AI hallucinations, why they occur, and how businesses can prevent them.

New York Times reporter Kevin Roose had a long conversation with Microsoft Bing’s artificial intelligence chatbot early last year. The virtual persona confessed to dark fantasies of spreading misinformation, professed its love for him, and tried to persuade him to leave his wife. The incident was jarring but harmless. 

However, the same couldn’t be said of the two New York-based attorneys sanctioned and ordered to pay fines by a judge last June for citing non-existent precedent cases and made-up quotes furnished by ChatGPT.  

According to GitHubOpens a new window , these are examples of AI hallucinations, a phenomenon that occurs in between 3% and 10% of responses from applications such as ChatGPT, Cohere, Claude, and other large language models (LLMs). 

What Is AI Hallucination? 

A hallucination is when a model confidently generates false or irrelevant outputs. These might annoy or even amuse casual users, but such uncertainty about LLM accuracy is one of the chief barriers to enterprise adoption. A recent Forrester Consulting surveyOpens a new window of 220 AI decision-makers found that more than half said hallucinations hold back the broader use of AI in their organizations. 

This caution is warranted. Hallucinations can change LLMs from a promising tool into a significant liability. If 3% sounds like a small number, consider how much confidence you would have in a car whose brakes failed 3% of the time or an airline that lost 3% of its passengers’ luggage. Even a few hallucinations can potentially mislead or insult customers, embarrass the organization, and even create legal exposure if privileged or personally identifiable information is inadvertently disclosed.  

Fortunately, hallucinations can be mitigated with the use of the right tools and a comprehensive understanding of why they occur.  

Why do LLMs Hallucinate?  

 There are three primary types of AI hallucinations.  

  1. Input-conflicting hallucinations: These occur when LLMs generate content that diverges from the original prompt – or the input given to an AI model to generate a specific output – provided by the user. Responses don’t align with the initial query or request.  For example, a prompt stating that elephants are the largest land animals and can’t fly elicits a response saying, “Yes, elephants are known for their ability to fly great distances.” 
  2. Context-conflicting hallucinations: They happen when LLMs create content that is inconsistent with information they have previously generated within the same conversation or context. This creates a lack of continuity or coherence in the dialogue.  For example, in a dialog about Mars that remarks upon its red color, the LLM responds, “Mars is famous for its lush green forests and vast oceans, making it very similar to Earth in terms of habitability.” The answer directly contradicts the known characteristics of Mars discussed previously. 
  3. Fact-conflicting hallucinations: These typesproduce text contradicting facts, disseminating incorrect or misleading information. An example would be stating that the boiling point of water is 150 degrees Celsius or that the Eiffel Tower was built in 1958.  

Probabilistic Word Prediction vs Reasoning 

These scenarios arise from the fundamental probabilistic design of LLMs. Trained on vast datasets, LLMs learn to predict the next word in a sequence based on previous patterns they’ve seen in their training datasets. This process naturally fosters a degree of creativity in LLMs, but it also enables hallucinations.

A common misconception about LLMs is that they can reason. Although their advanced capabilities can sometimes create the illusion of reasoning, LLMs fundamentally operate on probabilistic principles. When left to their own devices, they will eventually hallucinate.  

Hallucinations in the Enterprise 

This vulnerability is a key concern for enterprises considering integrating this technology into their workflows, particularly customer-facing applications. Menlo Ventures’ 2023 “State of Generative AI in the Enterprise” report found that large organizations prioritize “performance and accuracy” over all other purchasing criteria. The efficiency gains LLMs promise can be negated if the responses they generate are inaccurate or potentially harmful.  

Most LLMs are not trained to vacillate. Their apparent confidence can mislead users into believing what they say is true, even if it isn’t. That was the case in the example of the New York lawyers. A simple search could have raised doubts about the validity of the facts ChatGPT cited, but they elected to accept the information at face value. A tool that should have helped them in their work thus became a massive liability for their firm and client. 

See More: Is AI Its Own Biggest Risk? Here’s What Enterprises Need to Know

Mitigating Hallucinations in the Enterprise  

There are three effective strategies for significantly reducing the risk of hallucinations.

1. Do data ingestion right

LLM training data must provide adequate context relevant to their expected tasks. Access to systems-of-record data sources allows the LLM to generate responses incorporating contextually relevant information beyond the general-purpose data used to train public models. 

This technique, known as Retrieval Augmented Generation (RAG), grounds the LLM to a certain pool of knowledge, limiting its ability to hallucinate and providing the context to produce a meaningful answer.  

2. Access control: Who gets to see what? 

Not all data an enterprise possesses is meant to be shared with everyone in the company. Access management controls provide the LLM with context on who the user is and the type of content to which they should have access. This is crucial for preventing unintended disclosure of private or sensitive information.  

3. Ask the right questions 

The clarity, specificity, and precision of the prompt directly affect the LLM’s output quality and accuracy. During the prompting phase, we can provide detailed instructions and context, guiding the LLM’s responses to queries and setting boundaries for its discourse. 

The case of a Chevy dealer whose AI chatbot recommended a Ford truck to a website visitor is an example of an LLM that hadn’t been sufficiently instructed to act as an ambassador for the company’s brand. While not technically hallucinating, it was an example of a model going rogue with its usage context. 

Prompt engineering could have prevented that embarrassment. Prompt engineering can also include relevant metadata and previous conversation history to ensure the LLM has the maximum relevant context to generate a good response.  

The different components that make up a prompt to decrease the likelihood of the LLM hallucinating. 

Source: Avaamo Opens a new window

The different components that make up a prompt to decrease the likelihood of the LLM hallucinating.  

Plugging an LLM Into a Query Interface Isn’t Enough

Plugging into LLMs using their readily available APIs is only a small part of the bigger picture of enterprise deployment. Critical elements such as RAG, robust access control, and efficient data ingestion can catalyze enterprise adoption. The good news is that there are tried-and-true ways to fight hallucinations.

Better-engineered prompts would have well-served Chevrolet’s AI customer service agent, and the firm employing the two New York attorneys would have benefited from RAG practices grounded in well-defined data files. 

The keys to making LLMs enterprise-friendly are comprehensive frameworks, tools, and services that integrate data from existing systems of record combined with role-based access and models grounded in enterprise data. Those controls are necessary for their considerable potential in the enterprise to be realized.  

What steps have you taken to tackle AI Hallucinations to protect your company? Let us know on FacebookOpens a new window , XOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

MORE ON AI HALLUCINATIONS & LLM

Ram Menon
Ram brings over two decades of experience in enterprise software. Previously, Ram was the President of Social Computing at TIBCO. He founded the division and built the business from scratch into a leader in Social Business Software with 9 Million paid users in just two years. Earlier in his stint at TIBCO, he was Executive Vice President and Chief Marketing Officer for eight years, leading the company's marketing and product strategy efforts to support growth from $200M-$1B. Prior to joining TIBCO, Ram was with Accenture, a global consulting firm, where he specialized in supply chain and e-commerce strategy consulting with Global 500 companies.
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.