How to Scale Generative AI Without Hurting the Bottom Line

Scaling generative AI, managing inference costs and reducing carbon footprint pose challenges for organizations deploying AI models.

June 7, 2023

Generative AI

Generative AI models offer impressive outputs but demand high computing power and a massive carbon footprint. Managing their inference costs is a significant barrier for organizations seeking to adopt generative AI on a larger scale, says Yonatan Geifman, Ph.D, CEO of Deci.

The rapid advancement of generative AI has enabled the creation of visually stunning imagery and captivating texts. However, the high operational costs associated with scaling generative AI models pose a significant challenge for organizations. These costs hinder widespread adoption, from the enormous computing power requirements to the resulting carbon footprint. This article explores the unique characteristics of generative AI and the need to address inference costs for successful deployment and scalability.

Strikingly aesthetic imagery and impressively worded texts can now be generated in no time by generative AI. 

The Challenge of Scaling Generative AI

New applications of generative AI models are bringing AI much closer to the consumer. But as helpful and entertaining as tools like Midjourney and Chat-GPT are, they come at a significantly high operational cost – including the exorbitant capital needed to finance their demanding computing power and the resultant massive carbon footprintOpens a new window

As is often the case early in the lifecycle of transformative technologies, these costs are proving to be a large barrier for organizations looking to deploy and scale generative AI, particularly as its use and inference will be much larger. Several unique characteristics of generative AI models further highlight the challenge of scaling their inference cost:

1. Generative AI inference is more costly

Generative AI models are larger and more complex compared to “traditional” AI models. These new models are typically more general-purpose and require world knowledge like language and visual representations. Since they model rich “knowledge, ” they must be significantly larger than traditional discriminative models trained on a specific, narrow task. 

In addition, generative AI models’ inference process is more complex than non-generative AI models. To generate one new sample, the model performs several inference iterations. For example, in text generation, the text is generated word by word, and for each word the model produces, there is a separate inference call. The combination of extremely large models and the iterative nature of the inference process results in significantly higher demand for compute power and overall inference costs.

2. The cost per inference call is not fixed

In generative AI, the generated content (which depends on the users’ prompt) varies in size, making the cost per generation not constant. For example – the cost of generating a 500-word article differs from that of generating a new image. This variability makes it difficult to estimate and predict the cost of running and scaling those models.

3. The bigger the model, the higher the cost

Since every inference call requires a lot of compute to maintain the high availability of the service, companies usually need to over-provision compute power to support unpredictable peaks in demand with low latency. This, in turn, drives the infrastructure costs even higher.

See More: 5 Tasks ChatGPT Does Best: And 5 It Can’t

Controlling Inference Cost Is Key to Product Profitability

With that said, estimates suggest that if ChatGPT was inserted hastily into Google’s existing search businesses, there would be a $36 billion reductionOpens a new window in operating income, a major financial blow. 

With all this in mind, conservative estimates place the cost OpenAI pays to run ChatGPT around $100,000 daily – approximately $3 million monthly. But as many users have flocked to the service since its launch, analysts Opens a new window believe it could cost OpenAI $40 million to process the huge swathes of prompts per month. 

Mitigating Inference costs is crucial to scale generative AI models in business applications effectively. 

But it’s not only the financial costs that should concern us – the carbon footprint of running large generative AI models is also substantial. Research has shown that the information and communications technology (ICT) sector accounts for approximately 2% of global CO2 emissions, and the infrastructure tapped to train and deploy machine learning models are a significant factor. The larger these models become, which has become the trend for generative AI, the greater their carbon emissions will grow.

Mitigating these costs and carbon footprint is crucial to effectively and safely scale generative AI – and it all starts with minimizing the inflated size of the AI models needed to function.

See More: Can Generative AI Replace Search?

Small But Mighty

The cost to run one all-encompassing generative AI model is much too expensive for organizations to scale within their operations effectively. Instead, onboarding smaller, more specialized AI models optimized for specific applications is in an organization’s best interest.

The difference between the role of a baseball coach and his star pitcher is a great analogy. The coach may have a greater wealth of expertise for managing an entire 28-person team throughout any one game, but he would be far less suited to striking out a batter throwing a 90-mph fastball. Similarly, while large, general models may be impressive in generating a wide range of outputs. There may be better solutions for some specialized use cases.

Unleashing Its Full Potential

Despite the groundbreaking potential large generative AI models have already demonstrated, their burdensome cost is a key barrier preventing organizations of all sizes from leveraging them to their maximum. 

Enterprises using generative AI already know the pain points of scaling such tech. That’s why they are thinking about the cost of production from day one. 

Leaders and developers in the industry have a unique opportunity to develop AI models that incur less computational, fiscal, and environmental costs. Only then can we keep generating accessible and affordable generative AI to drive innovation and progress in the field. 

What steps have you taken to scale the power of generative AI and transform your business? Please share your thoughts with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

MORE ON GENERATIVE AI

Yonatan Geifman
Yonatan Geifman is co-founder and CEO of Deci, the deep learning company building the next generation of AI. He co-founded Deci after completing his PhD in computer science at the Technion-Israel Institute of Technology. His research focused on making Deep Neural Networks (DNNs) more applicable for mission-critical tasks. Yonatan was also a member of Google AI’s MorphNet team where he worked on developing an automated machine learning algorithm based on neural network pruning. His research has been published and presented at leading conferences across the globe, including the Conference on Neural Information Processing Systems (NeurIPS) and the International Conference on Learning Representations (ICLR).
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.