Follow ZDNET: Add us as a preferred source<!–> on Google.
ZDNET’s key takeaways
- The CNCF is bullish about cloud-native computing working hand in glove with AI.
- AI inference is the technology that will make hundreds of billions for cloud-native companies.
- New kinds of AI-first clouds, such as neoclouds, are already appearing.
At KubeCon North America 2025 in Atlanta, the Cloud Native Computing Foundation (CNCF)‘s leaders predicted an enormous surge in cloud-native computing, driven by the explosive growth of AI inference workloads. How much growth? They’re predicting hundreds of billions of dollars in spending over the next 18 months.
AI inference is the process by which a trained large language model (LLM) applies what it has learned to new data to make predictions, decisions, or classifications. In practical terms, the process goes like this. After a model is trained, say the new GPT 5.1, we use it during the inference phase, where it analyzes data (like a new image) and produces an output (identifying what’s in the image) without being explicitly programmed for each fresh image. These inference workloads bridge the gap between LLMs and AI chatbots and agents.
Also: Kubernetes, cloud-native computing’s engine, is getting turbocharged for AI
CNCF Executive Director Jonathan Bryce explained in a KubeCon press conference that AI inference is “a stage where you take that model, you serve the model, and you answer questions, you make predictions, you feed it into systems to take that intelligence and connect it out to the world.” He emphasized that inference involves transforming a trained AI model into a service that can respond to new questions or situations.
Making an LLM is mind-bogglingly expensive. According to Bryce, Sam Altman, OpenAI’s CEO, has said that GPT-5 training runs may cost up to a billion dollars. Fortunately, most companies, said Bryce, don’t need, nor should they even try, to build massive LLMs. Instead, they should use “hundreds of smaller, fine-tuned, open-source models for specific tasks, such as sentiment analysis, code gen, and contract review.” Additionally, they should use inference to maximize the benefits of their LLMs and smaller models.
–>
Bryce continued that there are dozens of inference engines. In particular, a new wave of cloud-native inference engines is emerging. These engines include KServe, NVIDIA NIM, Parasail.io, AIBrix, and llm-d. What they all have in common is that these platforms deploy, manage, and scale AI in production using containers and Kubernetes.
Also: Why even a US tech giant is launching ‘sovereign support’ for Europe now
According to CNCF, these specialized interference models offer users multiple benefits. These include:
- Cost-effectiveness: Vastly cheaper to operate and fine-tune.
- Performance: Faster and often more accurate for a specific domain.
- Cheaper hardware: They do not require the largest, latest, and most scarce GPUs for inference work.
- Security and privacy: They can be self-hosted, on-prem, or in the cloud.
Where cloud-native computing and AI inference come together is when AI is no longer a separate track from cloud-native computing. Instead, AI workloads, particularly inference tasks, are fueling a new era where intelligent applications require scalable and reliable infrastructure.
That era is unfolding because, said Bryce, “AI is moving from a few ‘Training supercomputers’ to widespread ‘Enterprise Inference.’ This is fundamentally a cloud-native problem. You, the platform engineers, are the ones who will build the open-source platforms that unlock enterprise AI.”
Also: Coding with AI? My top 5 tips for vetting its output – and staying out of trouble
“Cloud native and AI-native development are merging, and it’s really an incredible place we’re in right now,” said CNCF CTO Chris Aniszczyk. The data backs up this opinion. For example, Google has reported that its internal inference jobs have processed 1.33 quadrillion tokens per month recently, up from 980 trillion just months before.
Indeed, there’s a new kind of cloud, known as neoclouds, dedicated to AI. Neoclouds focus almost exclusively on delivering GPU-as-a-Service (GPUaaS), bare-metal performance, and infrastructure explicitly optimized for AI training, and, crucially, inference.
Aniszczyk added that cloud-native projects, especially Kubernetes, are adapting to serve inference workloads at scale: “Kubernetes is obviously one of the leading examples as of the last release … the dynamic resource allocation feature enables GPU and TPU hardware abstraction in a Kubernetes context.”
To better meet the demand, the CNCF announced the Certified Kubernetes AI Conformance Program, which aims to make AI workloads as portable and reliable as traditional cloud-native applications.
Also: Enterprises are not prepared for a world of malicious AI agents
“As AI moves into production, teams need a consistent infrastructure they can rely on,” Aniszczyk stated during his keynote. “This initiative will create shared guardrails to ensure AI workloads behave predictably across environments. It builds on the same community-driven standards process we’ve used with Kubernetes to help bring consistency as AI adoption scales.”
What all this effort means for business is that AI inference spending on cloud-native infrastructure and services will reach into the hundreds of billions within the next 18 months. That investment is because CNCF leaders predict that enterprises will race to stand up reliable, cost-effective AI services. They’re not the only ones seeing this trend. Dominic Wilde, SVP of Kubernetes distribution company Mirantis, said in an interview that there will soon be Inference-as-a-Service cloud services.
I think these experts are right. There is a natural synergy between AI and cloud-native computing. This connection, in turn, means businesses that can make the best use of the pairing can expect to profit whether they offer cloud-native/AI services or use them to enhance their own business plans.
Artificial Intelligence
<!–>
–>
