Generative AI Translation: Proceed with Caution

Understand the challenges and solutions in achieving accurate AI-driven, multilingual business communication.

November 24, 2023

Generative AI Translation: Proceed with Caution

Heather Shoemaker, founder of Language I/O, discusses the complexities and solutions to achieving seamless multilingual communication in this in-depth analysis.

Artificial intelligence has made remarkable strides in natural language processing (NLP). In this era of technological evolution, the boundaries of machine translation are being pushed to unprecedented levels, and the rise of generative AI has only added to that push.

The language translation technology industry is booming and shows no sign of slowing. The global language translation software market was valued at $10.81 billion in 2022Opens a new window and is expected to skyrocket to $35.93 billion by 2030. 

Enter NLP-based language translation platforms like ChatGPT, Google Translate, and Microsoft Translator, all large language models (LLMs), and computer programs trained on huge amounts of publicly available data. These sophisticated programs can “understand” the human language patterns and the intent or meaning behind the language. While some hail these cutting-edge tools as a panacea for solving all business problems, generative AI solutions still aren’t quite ready to meet businesses’ complete language translation needs. 

Questions have also arisen about whether businesses can trust generative AI for accurate language translation and whether this technology is secure. In the wake of these questions, the best course of action forward is to proceed cautiously. 

Is Generative AI Reliable?

So here’s the problem. Gen AI is great at quickly generating content (and coding and translating, too). And training it on specific data yields the most accurate and useful responses. Unfortunately, gen AI often lacks the context to produce the best results because it hasn’t been trained on industry or business-specific data. Just as a general LLM, such as ChatGPT, can’t accurately answer questions about a company’s proprietary content it was never trained on. A general LLM or an untrained, AI-powered translation platform such as Google cannot accurately translate content for a domain it was never trained on, either. In both cases, the AI lacks the needed context. 

Although businesses benefit from investing in real-time translation technology in lieu of hiring additional multilingual employees, the tool/tech needs the proper training. Customer satisfaction increases when the right tech is in place to help current team members communicate effortlessly with customers regardless of their language. 

The number of independent machine translation services available has increased sixfold since 2017. Despite this notable uptick, generative AI translation models remain under development. They are known for unreliability, hallucinations, or general responses based on general data, especially when asked to tackle complex or nuanced texts. Generative AI works best with well-constructed inputs, but in a business setting, where people of different backgrounds and familiarity (or lack thereof) with language technology are using chatbots to request information or ask for help in real-time, communication could be better. Chatbots also give internal teams another quick way to access data. Some traits of real-time communication that can trip up translations include:

  • Misspelled words: Text-based customer communication is often rife with typos, and AI’s attempts to translate these spelling errors can result in mistranslation or a complete failure to translate. These translation errors lead to confused customer service reps and frustrated customers, damaging the brand’s reputation.
  • Regional expressions and ambiguous terms: The words found in colloquial expressions, industry- and brand-specific terminology usually have multiple meanings, and generative AI is likely to translate those words to their literal meanings instead of deciphering the axiom or niche vocabulary word.
  • Multilingual inputs: If a Spanish-speaking customer drops an English word into their generative AI input, the technology may lack the sophistication to parse that it is in a different language, “confusing” the technology and causing it to mistranslate or skip translating the word entirely.

There are plenty of pathways leading to sub-par generative AI translation outputs. Without contextualizing technology and training employees to use it and feed it the correct inputs, organizations can’t trust generative AI translations will achieve the caliber needed for success in a customer service or business environment. 

See More: Why Source Recall Matters: Building Trust in AI 

Generative AI Ethics

The generative AI boom saw exponential growth in the space, but policies and protections associated with AI still need to catch up with the technology. For example, while 86% of organizationsOpens a new window adopting AI say it’s critical to have guidelines about its ethical usage, only 6% have implemented policies outlining responsible use. This policy gap leaves plenty of space for potential pitfalls when using generative AI tools, including:

  • Biased outputs and false information. Generative AI tools are trained on information from the public internet, and much of the internet’s data is subjective. Generative AI can’t differentiate between prejudiced and objective observations, running the risk of outputting information that is either wrong or biased. ChatGPT’s fine print states, “ChatGPT may produce inaccurate information about people, places, or facts.” A failure to implement appropriate AI oversight leaves organizations open to the potential for inaccurate and damaging outputs with disastrous consequences.
  • Security and privacy concerns. Passing sensitive data, including personal information, through LLMs raises multiple security questions. Is that data being stored? If so, how? What security measures are in place? For example, the data training tool outside the API can leak information as a response to another client. Data stored for reasons other than training is also vulnerable to cyber attacks or data breaches.
    According to Salesforce data, 61% of surveyed employees use or plan to use generative AI at workOpens a new window ; however, almost 60% don’t know how to ensure security while using the technology, further illustrating the need for usage guidelines. Policy transparency and informed consent are critical to ensure ethical data usage. Though nationwide data regulations may be on the horizon, each organization must prioritize data privacy by implementing robust protections.
  • Inaccessible information. Generative AI pulls from the public internet, but what about the data beyond its reach? It can’t access gated content, which requires authentication (filling out a form and entering a password). This gated knowledge often includes proprietary company information, so customers asking generative AI company-specific questions aren’t likely to get the most accurate answers because the tool lacks access to the company resources that would provide the information they seek. 

As generative AI usage continues to grow, future iterations of these LLMs will likely solve at least some of these problems. Still, until then, organizations must implement responsible use policies. 

See More: Biden Signs Executive Order on Artificial Intelligence Protections

The Language Limitations of Llms

Most well-known LLMs are trained on data in English or Chinese. As technology continues to influence the reframing of work, education, art, business, and more, the more than 6 billion worldwide who speak 7,000 other languages are at risk of being left out. For example, Meta warned that its updated LLM released in July would work best with queries in English because most of its training data was in that language, saying, “the model may not be suitable for use in other languages.” 

For organizations that want to facilitate multilingual communication with global customer bases, this language gap further illustrates the shortcomings of generative AI tools. To achieve the best real-time communications, the smartest organizations invest in contextualizing technology. For generative AI platforms, this involves some form of domain adaptation such as prompt engineering, RAG (retrieval augmented generation), or fine-tuning. 

However, to ensure a generative AI platform can accurately answer questions in multiple languages as well as translate between languages for a specific business, this domain adaptation has to occur not just in the base language but across all the languages the company supports. Gartner found that companies find the process of training AI in just one language more difficult than they expected it to be. Further, according to artificial solutions, when faced with the task of duplicating that training across all supported languages, companies are abandoning the effort. Companies are in dire need of a solution that automates the multilingual domain adaptation on their behalf, such as those provided by Language I/O.

That effort is worthwhile, however, because implementing this technology can help properly translate previously problematic language like misspellings, jargon, or slang. Please prioritize this contextualizing aspect to avoid incoherent conversations and, ultimately, dissatisfied customers.

Even though LLM-based technologies are popular, they can’t yet produce the most accurate business translations. Utilizing contextualizing technology, such as that provided by Language I/O, alongside generative AI tools, can help achieve top-notch translations. Investing in this type of technology maximizes existing headcount, shortens wait times, increases availability to 24/7, and supports more world languages, saving money and resources while driving customer satisfaction, employee inclusivity, and overall business success.

How can businesses overcome the hurdles in generative AI translation? Let us know on FacebookOpens a new window , XOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

MORE ON GENERATIVE AI

Heather Shoemaker
Heather is the mastermind behind Language I/O’s core technology, which eliminates the need for expensive and time-consuming neural machine translation engine training by dynamically selecting the NMT engine that will best translate a given piece of content and imposing company-specific terminology onto any of the many NMT engines integrated into the Language I/O cloud solution. Prior to co-founding Language I/O, Heather was well-known for globalizing code for Fortune 500s. She was also the senior director of Product Management and Globalization for eCollege, which was acquired by Pearson Education during her tenure. While at Pearson/eCollege, Heather and her team built a next-generation, online college education platform, which was launched globally. Heather holds a Master of Science from the University of Colorado at Boulder College of Engineering as well as a Bachelor of Arts in Latin American Studies from the University of Washington in Seattle. She has lived in various parts of the United States and Mexico and speaks English and Spanish.
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.