For the past two years, the conversation around AI has focused almost exclusively on training.
To build massive large language models (LLMs), companies poured resources into raw compute. This training phase favored hyperscale deployments in remote regions where land was cheap, power was abundant, and latency didn’t matter.
But as 2026 approaches, that phase is ending.
The models are largely built. Now the race is about deploying them. And that shift changes the conversation from training to inference.
Inference is the moment where AI meets the real world, such as executing trades, personalizing media, analyzing healthcare data, and responding to users in real time. Unlike training, inference is continuous, latency-sensitive, and tightly coupled to business outcomes. As a result, it is reshaping not just how data centers are designed, but where they must exist, especially in dense urban markets like New York City.
AI’s Move to the City
The infrastructure that built the models is not the infrastructure required to use them.
AI training is a batch process and can be done anywhere. Inference is a real-time process that demands proximity. Milliseconds matter when models are embedded in financial systems, healthcare platforms, media delivery, or customer-facing applications.
The need to execute complex logic as close to the end user as possible is driving a migration of AI workloads back toward the edge (specifically into dense urban hubs like Brooklyn). For CIOs and infrastructure leaders, this means the days of treating location as an afterthought are over.
Why 100 kW Is Becoming the Baseline
As AI workloads return to the city, they are bringing hyperscale-level power demands with them.
In traditional colocation environments, a standard cabinet might draw between 4 and 8 kW. That model breaks down quickly when inference clusters are built around advanced GPUs running continuously, not intermittently. Even facilities considered high density often top out well below what modern AI hardware requires. At DataVerge, we are actively preparing for customer deployments ranging from 100 kW to 1 MW, often delivered on compressed timelines.
This isn’t simply an incremental increase in power consumption. It is a physical transformation inside the data center. Sustained high-density inference generates immense heat at the rack level. Facilities that were not purpose-built to support 50 kW and beyond per cabinet through localized cooling and heat capture at the source cannot support this next phase, regardless of their marketing claims.
The Power Constraint That Will Define 2026
The most significant constraint facing AI infrastructure in 2026 will not be silicon availability, but power delivery at the facility level.
AI inference workloads introduce sustained, high-density demand into environments that were never designed for it. Even when cooling and rack design are sufficient, AI adoption is accelerating faster than utility infrastructure can expand. Grid upgrades operate on three- to five-year timelines, governed by regulatory processes and physical construction. That pace is fundamentally misaligned with how quickly AI workloads are moving into production.
For urban data centers, this creates a hard ceiling on growth. Facilities that rely exclusively on the public grid will struggle to support the power densities now required for modern AI inference, regardless of available space or network connectivity.
To overcome this limitation, forward-looking data centers are turning to modular power strategies. On-site solutions such as fuel cells and microgrids allow facilities to supplement grid capacity and scale power incrementally. This approach enables data centers to support high-density deployments today while longer-term utility expansions remain in progress.
Speed as the Real Competitive Advantage
Companies deploying AI inference are no longer operating on exploratory timelines. Increasingly, infrastructure decisions are driven by urgency. That means infrastructure partners are being evaluated not only on technical capability, but on readiness: available space, cooling systems designed for high density, and power that can be delivered immediately.
Models that sit idle don’t create value. Companies that delay deployment risk losing ground to competitors who can operationalize faster. As a result, traditional infrastructure timelines (18 months for new capacity, phased rollouts tied to future upgrades, etc.) are no longer acceptable. We see requests for large, high-density environments where customers need it by the end of the quarter.
Looking Ahead
For infrastructure leaders, the key question is no longer simply how much space is required. It is whether existing facilities are prepared for the density, proximity, and speed that AI inference demands.
At DataVerge, we are building for that reality right here in New York.