AWS Sets the Stage for Generative AI Models & Clean Rooms
AWS’s Clean Rooms updates usher in a new era for generative AI models, fostering collaboration through automated governance layers.
Amazon Web Services (AWS) announced new capabilities enabling customers to share machine learning models or access a cloud-hosted industry model via AWS Clean Rooms at its re: Invent conference in Las Vegas on Nov. 30.
AWS announced Clean Rooms one year ago and released it as generally available in March as a data collaboration initiative for industries, allowing organizations to share data through a governance layer. It’s adding machine learning models to the mix with AWS Clean Rooms ML. Organizations can train the model with data that would normally be too sensitive to share with another organization, sanitizing it to prevent leaking of sensitive information while finding mutual benefit from predictive insights. Clean Rooms ML is a preview release available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Seol, Singapore, Sydney, Tokyo) and Europe (Frankfurt, Ireland, London) regions.
Organizations are already keen to use generative AI models specific to their industry. In our Future of IT survey, we asked IT leaders about their interest in the different varieties of generative AI. Industry-specific models showed the most interest, along with text-based models such as ChatGPT (which has added multi-modal support since we conducted the survey.) These types were code-based, interfaces, speech and audio, and then visual media. We received 518 responses from IT leaders worldwide, with our survey conducted from June to August 2023.
Adam Solomon, head of business development, AWS Data Collaboration Apps, teed up the general use case in a chalk talk at re:Invent.
“I have data, you have data, we want to collaborate across our datasets and perform some type of analysis, but I don’t want to reveal the granular contents around your data to me,” he explains. “AWS Clean Rooms can help with such analysis.”
Going further, AWS announced its plans to release a healthcare model – the first of many models supported in 2024. It’s a strategy that follows AWS’ cloud services approach, with industry-specific cloud services made available and often supported by a flagship customer in the category. Current customers of Clean Rooms ML credit data firm Experian, The Weather Company, marketing platform Bridge, and consumer purchase insights provider Affinity Solutions.
See More: Why Simplicity Is Key to Data Clean Room Adoption
IT Leaders Expect Industry-specific Models to Create More Business Value
According to the results of Info-Tech Research Group’s Future of IT survey, conducted online between June and August of 2023, IT leaders rated industry-specific generative AI models as the most interesting tied with text-based models such as ChatGPT (the survey was run before ChatGPT’s updates to include multi-modal support.) The third-most interesting type of generative AI was code, followed by interfaces.
Image source: Info-Tech Research Group.
The concept of shared data stores extends back to pre-digital eras when companies would keep physical records on-site and allow their review by regulators and partners for reasons including compliance and merger and acquisition considerations. Professionals had to sign non-disclosure agreements about using the data found therein. In the digital age, one primary use of shared data stores is to get better customer insights and create advertising markets. For example, since 2022, Google has offered its Publisher Advertiser Identity Reconciliation (PAIR) solution to reconcile first-party customer data between publishers and advertisers.
Solutions more akin to AWS Clean Rooms can be found in SoftwareReviews’ Analytical Data Store quadrant. For example, Snowflake is a dedicated cloud data platform used by customers pursuing a multi-cloud strategy and offers a Global Data Clean Room for its customers to collaborate. Cloudera’s open data lakehouse offers shared data between customers, using AWS as its backend system.
AWS Architecture to Facilitate Cross-organizational Data and Model Collaboration on Clean Rooms
Image source: AWS
AWS says Clean Rooms offers multiple layers of privacy protection for Clean Rooms. It starts with isolating data a company wants to share, selecting who you want to collaborate with, determining what data is allowed to be analyzed by those partners, and finally, what protections are applied to the outputs. AWS provides encryption for all data stored in the Clean Rooms but ultimately says it is up to customers not to include data from customers who’ve not consented to share their data with a third-party provider.
Once the architecture layers are in place, the shared data is available through a persistent, subscription-based offering or via a consumption-based API.
New Choices Await CIOs Charting Their AI Roadmap
Data is the fuel for machine learning models, including new generative AI models. Over the next year, organizations will need to create a data strategy to harvest their own proprietary data to customize and pre-train foundation models and seek other external data sources to complement that effort. Working with other organizations on creating a shared model for a specific purpose is one path to creating value. Consider what has been done: the airline-owned data broker creates custom offers personalized to each customer, boosting sales for airlines that share their shopping data.
Where to build these data alliances is the question. AWS offers a native way to connect all of its customers and is announcing its intent to take the lead in creating industry collaborations. That could provide AWS a first-move advantage if it pools the best data sets and trains the best industry models.
A downside for adopters will be vendor lock-in. Maybe you can take your data out of AWS, but you won’t be able to move the data or models inside of a Clean Room data store. Perhaps it can be integrated through a middleware layer, but this adds complexity. This will have CIOs who are pursuing multi-cloud strategies thinking carefully about where they want to source this capability and whether it should be from a multi-cloud gateway.
Sharing data directly should only be considered by organizations with mature data governance in place.
How can your organization benefit from this innovative approach? Let us know on Facebook, X, and LinkedIn. We’d love to hear from you!
About Expert Contributors: The Expert Contributor program is designed to help kickstart meaningful conversations around the priorities and challenges most critical to C-level executives. The insights and perspectives will help CIOs tackle what’s most important to them. We are always looking for industry thinkers who can help set the narrative for our enterprise audience. To know more about this program, and submit your ideas, reach out to the Spiceworks News & Insights Editorial team at editorial-toolbox@ziffdavis.com.
Image Source: Shutterstock