Leveraging Generative AI Safely and Responsibly in the Coming Year

Ensure AI trust and reliability with thorough testing. Mitigate risks for responsible and effective generative AI usage.

December 8, 2023

Leveraging Generative AI Safely and Responsibly in the Coming Year

In navigating the AI landscape, Adonis Celestine of Applause emphasizes responsible generative AI use. Learn key testing strategies to ensure reliability and ethical deployment in the fast-evolving tech environment.

This week, Gartner released its Top 10 Strategic Technology Trends for 2024Opens a new window . Not surprisingly, AI dominated the list, starting with #1, AI Trust, Risk and Security Management. Gartner states, “A comprehensive AI trust, risk, security management program helps you integrate much-needed governance upfront, and proactively ensure AI systems are compliant, fair, reliable and protect data privacy.”

Because many view generative AI as a tool to make businesses more cost-effective and efficient, there has been a rush among enterprises to find ways to leverage the power of the technology. Some have even attempted to replace their human staff with AI-driven chatbot technology, with disastrous results. In racing to the finish line to gain competitive advantages with the technology, testing and validating training data were lost in the rush. 

However, the consequences of hastily adopting AI without proper evaluation include deploying unreliable, inaccurate, or unsafe AI systems. This can lead to incorrect or biased results, causing misinformation or harm to users. With AI impacting everything from healthcare decisions to loan approvals, this becomes more important each day. 

Improperly evaluated AI solutions can negatively impact privacy and security by mishandling or inappropriately using sensitive data, raising legal and regulatory risks, undermining user trust, and ultimately missing useful opportunities for improvement. 

See More: The State of AI in Cybersecurity 2023: The Good News — and Some Ongoing Challenges

A Closer Look at Hallucinations

One interesting problem we’ve seen surface over the years is the tendency for generative AI systems to produce wrong content, even when prompted correctly. This phenomenon, coined a “hallucination,” can be quite troubling and even harmful, depending on the circumstance and the use case for the content created. Hallucinations most often occur when the training data for the algorithm is too flawed or incomplete to answer the query appropriately, but the algorithm is still generating text. This is what happened to an attorney who used ChatGPT to file a motion in a case. He asked it to cite relevant cases, and the filing ended up littered with mistakes, citing irrelevant and non-existent cases as precedents. What’s troubling about this tendency is that the content might look accurate most of the time. So, as users, we develop a sense of security and level of trust around the generated content and expect it to be correct. Consider the scary possibilities around that – regarding healthcare or other vital services.

To mitigate these risks, it is crucial to thoroughly test and evaluate AI systems before releasing them. Here are some ways to get started.

1. Benchmarking

Companies should benchmark different AI models and compare their performance for specific tasks or industries. This helps determine the most suitable AI model and ensures the use of the best available technology. Businesses can use AI benchmarking to evaluate and compare the performance of different AI models, systems, or algorithms. This evaluation can help businesses decide which AI solutions to adopt for their needs and requirements.

It is not only the type of AI models but also the versions and iterations of an AI model that need to be evaluated. For example, GPT-4 (March 2023) can recognize prime numbers with 97.6% accuracy, while GPT-4 (June 2023) fails (2.4% accuracy) and ignores the chain-of-thought prompt, which encourages the LLM to explain its reasoningOpens a new window . AI models are continuously tuned and trained, which can lead to better performance on certain tasks while failing miserably in others. Benchmarking is a continuous activity that needs to be performed at regular intervals.

Benchmarking can provide insights into the strengths and weaknesses of AI models, allowing businesses to identify areas for improvement and optimization. It can also help businesses ensure accountability and quality control in AI development and use by setting performance standards and benchmarks. Ultimately, AI benchmarking aids businesses in selecting the most effective and efficient AI solutions for their specific use cases.

2. Certification and accountability

Companies should consider obtaining certifications or approvals for their AI systems. This assures users and stakeholders that the AI has undergone proper testing and meets certain standards, ensuring responsible use and mitigating risks. 

Certifications aid accountability by providing a standardized performance measure, adherence to ethical guidelines, and assurance that an AI system has met specific standards and requirements. They demonstrate that an AI solution has undergone testing, evaluation, and validation to ensure reliability, safety, and compliance with relevant standards. Certifications also help establish benchmarks and guidelines that AI developers and users can refer to to ensure quality, safety, and ethics. By obtaining a certification, organizations can demonstrate their commitment to responsible AI development and deployment, and users can have confidence that the AI systems they are using have undergone rigorous testing and evaluation.

The first step in certification is to classify which AI programs need certification and which do not. For example, AI models deployed on critical applications, like in the medical industry, should undergo certification before use. Transparency, traceability, verifiability, and accountability are some factors that should be considered with certification. Though no acknowledged certification bodies exist in this space, laws and regulations are fast catching up.

3. Responsible AI Use

Responsible use of generative AI refers to using the technology in a way that considers ethical implications and safeguards against potential risks. It ensures that AI models are properly benchmarked, evaluated, and tested before deployment while meeting industry standards and regulations. 

Companies should adopt cautious measures and use a checklist to ensure responsible AI use. This includes considering ethical guidelines, training data quality, and potential limitations of AI systems. By taking these precautions, companies can minimize potential negative impacts and promote responsible deployment of AI.

Responsible usage entails addressing issues such as bias, explainability, and privacy and being transparent about the capabilities and limitations of AI systems. 

See More: Defeating Data Hallucination With Source Recall

However, testing for bias is complex and nuanced. It takes insights from real users to assess the generative content’s quality, uniqueness, and relevance, ensuring it meets desired objectives. Real users validate the accuracy, coherence, and appropriateness of generative natural language processing (NLP) models, ensuring they effectively interact with users and provide valuable, personalized insights. By leveraging a diverse community of real users, enterprises can ensure that generated content or recommendations are unbiased across demographics, cultures, and geographical regions. 

Committing to testing and improving the quality of outputs can help mitigate reputation-damaging situations and ensure great customer experiences that win over your customers’ trust and continued loyalty to your brand.

How are you testing the reliability of your generative AI? Why is responsible AI crucial for your business? Let us know on FacebookOpens a new window , XOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

MORE ON GENERATIVE AI

Adonis Celestine
Adonis Celestine

Senior Director & Automation Practice Leader, Applause

Adonis Celestine is the Director of Automation Testing with Applause. With nearly two decades of software testing experience, Adonis is an automation expert, QA thought leader, published author and regular speaker at leading software testing conferences. Quality engineering is a passion for Adonis, and he believes that true quality is a collective perspective of customer experiences, and that the natural evolution of QA is shaped by data and machine intelligence.
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.