All articles

AI

Enterprises Lead The Next Phase Of AI By Squeezing More From Existing GPUs

The Data Wire - News Team

|

June 12, 2026

Bijit Ghosh, board chair at Atlas Cloud, made the case for a control plane that coordinates compute, memory, and data over the systems enterprises already run.

Credit: The Data Wire

The real problem is getting better yield and efficiency out of the infrastructure you already own.

Bijit Ghosh

Chair of the Board of Directors
Atlas Cloud

When companies moved AI agents into real production work, the agents quickly overwhelmed infrastructure that was never built for them. A single employee could have five or 10 agents running at once, each one firing off far more requests than a person ever would. Servers tuned for steady, human-paced traffic buckled, response times slipped, and the GPU bill climbed. By the time most teams noticed, the systems running beneath the model had become the bottleneck.

It was the kind of breakdown Bijit Ghosh spent much of his career watching unfold. The Chair of the Board of Directors at Atlas Cloud, a model-agnostic AI cloud built for high-trust industries, he had formed his view of enterprise AI over two decades in financial services. An advisor to Kubex, a platform for optimizing Kubernetes and GPU workloads, he had previously led AI, cloud, and data platform engineering as a managing director at Wells Fargo, served as a global cloud and AI/ML CTO at Deutsche Bank, and helped run payments technology at BNY Mellon.

From that vantage point, he kept returning to one conclusion about where the answer actually lay. "The real problem is getting better yield and efficiency out of the infrastructure you already own," Ghosh said.

Paying for idle GPUs

For years, the hard part of AI was training, the work of gathering enough compute to build a model in the first place. That has changed. The bigger cost now is inference, the day-to-day work of running models in production, and agents run it constantly, often several calls deep for a single task. That load looks nothing like what most infrastructure was built to handle, and the gap shows up first in the economics. Companies wait months for new GPUs and pay a premium for the ones they secure, yet those machines run far below capacity, with average use across enterprise fleets in the single digits. The shortage is genuine, but distribution matters just as much. Firms pay for compute that sits largely idle, unable to route enough work to it.

That allocation problem comes down to scheduling. Older systems hand out compute in fixed blocks, an approach that works when demand stays steady and a job either runs or it does not. Agent traffic behaves the opposite way, spiking and dropping minute to minute, and fixed scheduling cannot keep pace. "Traditional systems like legacy applications and relational databases such as Postgres rely on static Kubernetes or bare-metal scheduling. As you move to the top of the stack where we need everything in real time, our traffic patterns are changing because now we have dedicated agents running," Ghosh said. The goal was to release compute as fast as the agents consumed it.

The context tax

Allocating compute well goes only so far. A large share of the spend covers work the system has already done. Anyone who runs an AI coding assistant hard has seen the pattern. A few hours into heavy use, the tool warns that the token allowance is almost gone.

The remedy sat one layer above the model. "Every time an engineer or a product team opens a tool like Codex or Claude Code, it starts the interaction from scratch. The cache logic has to change. With KV-cache reuse, we absorb context across sessions so it isn't reevaluated from the beginning, and we can route to the model with the lowest cost per token," Ghosh said.

Caching lets the system hold onto context instead of recomputing it for every request. The harder problem is getting the right context to an agent in the first place, and that grows tougher when one job is split across several agents. "The whole economy of context comes down to one thing. Better context produces a better answer. In our SDLC, separate agents handle PRD reviews, writing code, and testing, and each works across a different context. As the work changes in the system of record, we have to refresh that context to keep them current," Ghosh said.

Postgres holds the line

That system of record was usually Postgres, the relational database that quietly ran a large share of enterprise workloads and stored a company's most sensitive data. Decades of hardening had made it reliable and exact, the reason so many companies built on it. The trouble was that agents had started asking it to do a different job. "As you're scaling AI workloads that need more semantic search, embeddings, long-context memory, and real-time retrieval with agent reasoning, Postgres cannot extend to that functionality. That's the bottleneck," Ghosh said.

The reflex was to solve it with a migration, swapping the old database for a newer platform built for AI. Ghosh pushed back on the idea. Tearing out a working system carried enormous risk, and most of what made data useful to an agent did not require moving it at all. "As long as you have the data product defined, it supersedes what is needed as part of your layer stack. So rather than completely restacking and moving Postgres to a shiny lakehouse, we need to find a hybrid model where Postgres and unstructured data are working cohesively, simultaneously, as a runtime layer," Ghosh said.

The better path left the database in place and built around it, adding a layer that handled semantic search and retrieval while the original system kept doing what it did well. The data stayed trustworthy and became usable for agents at the same time, without betting the business on a rebuild.

The case for a control plane

Each of those fixes solved one piece of the problem. Tying them together called for something new. Ghosh called it a control plane, a single tier that sits over the whole arrangement and separates the systems that store the truth from the systems that act on it. "The good architecture should be a distributed architecture where we have a system of record, but at the same time a system of action. We have an AI data layer where we are doing constant retrieval, and we have a harness where the mechanics of combining memory, inference, optimization, and orchestration with your agents all work in a cohesive form," Ghosh said.

The point is to take the moment-to-moment decisions off human operators. Instead of engineers adjusting each part by hand, the system makes those calls itself, keeping data, compute, and context aligned as the work shifts underneath it. The control plane is becoming a permanent tier of the stack rather than a temporary fix that folds back into the database over time. As agents multiply and the cost of running them keeps rising, the alternatives are thin. "This control plane logic is already conceptualized in the industry, and it's becoming the platform for coordinating the model, Postgres, context, routing, data locality, economics, governance, and agents. Instead of humans tuning all of it by hand, the control plane bakes everything together," Ghosh said.

Related Stories