AI costs begin to bite as agents may increase token demand by 24 times, says Goldman…

The topic AI costs begin to bite as agents may increase token demand by 24 times, says Goldman… is currently the subject of lively discussion — readers and analysts are keeping a close eye on developments.

This is taking place in a dynamic environment: companies’ decisions and competitors’ reactions can quickly change the picture.

Microsoft and Uber are both considering refining their AI strategies as costs mount.

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Major tech companies are struggling to justify the skyrocketing prices of heavy AI usage, with even major tech firms like Microsoft and Uber looking at changes to their AI process. Following the recent viral post from Uber CTO Praveen Neppalli Naga that the company had blown through its entire 2026 AI budget in just a few months, Uber’s Operations chief, Andrew Macdonald, said that token usage just didn’t seem to have a direct correlation with useful consumer features.

Microsoft began revoking its developers’ access to the Claude Code programming assistant earlier this month, with plans to move them over to the internal Copilot CLI tool by June 30. Although that has been framed as consolidating its teams onto the tools it’s developing, it also comes right at the end of Microsoft’s fiscal year, suggesting it may have also been a move to cut costs before the new year.

Worsening matters, Goldman Sachs estimates that Agentic AI could see token use increase by over 24 times in just the next few years. There appears to be a growing disconnect between AI needs, AI wants, and the reality of what AI companies can actually afford as costs mount.

We’ve been hearing reports for months about how companies and CEOs are struggling to find the tangible benefit of heavy AI deployment. Uber appears to be the latest AI boosting company to have this come to Jesus moment, following the CTO’s explosive claims of annual budgets being wiped out in mere months. In the interview with Business Insider, Andrew Macdonald lamented that there just wasn’t a clear correlation between the money Uber was investing in AI use and real consumer feature development.

Having talked to the senior engineers, he said there was no link between higher token usage and a proportional increase in consumer features with real benefits for their customers. Although he admitted more code was being shipped, it “was very hard to draw a line” between that and improvements in the software.

Meanwhile, after opening up its workers to Claude Code subscriptions in December last year, Microsoft is now clawing that back in what’s seen by many as a financial move, as much as a consolidation. Microsoft also recently announced the switch of Copilot on GitHub to token-based billing, as the cost of running the tool ballooned earlier this year.

A major reason for this is the explosive growth in agentic AI use. These agents can eat up more than 1,000 times the tokens of a single AI chatbot.

Nvidia CEO Jensen Huang famously said in March this year that if an Nvidia engineer on $500,000 a year wasn’t using at least $250,000 of tokens in that same period, he’d be alarmed. This isn’t a rare sentiment, though. Many company CEOs are now bragging about the extent of their AI use, as if that alone equates to performance increases.

As Business Insider reports, Airbnb’s CEO proudly told investors that 60% of the company’s code was now AI-generated. Chime claimed it was shipping 84% AI code earlier this year, and even Google is claiming 50% of its code is AI-generated (though crucially, always checked by a human engineer).

Yet these numbers sound very similar to those of Uber. In the CTO’s shocking report of budget runaway, they claimed over 80% of Uber software engineers were using agentic AI, and over 60% of the code was AI-generated. Even then, it’s not worth the cost.

And those costs can be extreme if the guardrails are removed. OpenClaw creator and now OpenAI employee, Peter Steinberger, recently announced his team of three people had spent over $1.3 million in tokens in a single month running a suite of agentic AI tools.

This very much reinforces the idea that the cost of AI is rising above that of the workers it’s supposed to be replacing. That makes many of the layoffs laid at the feet of AI efficiency and productivity increasingly shaky, unless these companies are simply racing to the bottom.

Or at least racing to new hardware. Goldman Sachs’ recent AI agent report suggests that the massive efficiency gains coming from next-generation inferencing chips would make AI use so much cheaper that investment can continue unabated, and profit should follow, with AI agents increasing the revenue at AI companies enormously.

Nvidia will talk up its Vera Rubin platform at Computex and will officially launch it later this year. It improves AI performance by several times over, uses a new process node, and will reportedly offer as much as 10 times the performance per watt, making it dramatically more efficient than its predecessors.

Such huge gains would give the AI companies that first deploy these cards an enormous advantage over the companies still running Blackwell hardware, and even more so over older Hopper designs. But over 50% of the data center projects announced with Blackwell hardware in mind have been cancelled or delayed, and of those that do complete in the next year, just how keen are the developers going to be to replace those GPUs after they’ve barely gotten started?

In late 2025, Google, Oracle, and Microsoft all adjusted their plans for hardware in the other direction entirely, suggesting they would make it run for six years before replacing it. That seems impossible to square away with ambitious AI plans and hardware leaps every year.

The reality is, even as some token costs are falling, the explosion in the number of agentic AI requires cannot be offset by hardware efficiency gains that are many years away from reaching effective deployment, if they ever get to the scale needed to catch up with this ramp-up in AI demand.

That means in the short term, even major companies like Microsoft and Uber are restructuring their use of AI to figure out how to continue using it at scale without nuking their budgets in the process. If those companies can’t figure out how to afford it, it’s increasingly difficult to imagine how the rest of us will be able to.

And if usage drops because of rising costs, the AI companies are never going to find the short-term profit they need to offset the enormous infrastructure spending they’re still trying to justify.

Jon Martindale is a contributing writer for Tom’s Hardware. For the past 20 years, he’s been writing about PC components, emerging technologies, and the latest software advances. His deep and broad journalistic experience gives him unique insights into the most exciting technologies trends of today and tomorrow.

Related Stories

Lenovo's Legion 7a gaming laptop now comes with an RTX 5070 12GB GPU option…

Christopher Nolan’s personal take on smartphones is surprisingly practical

Grab a blazing-fast dual-interface M.2 SSD enclosure for just $59 on Amazon…

You may have missed

Lenovo's Legion 7a gaming laptop now comes with an RTX 5070 12GB GPU option…

Christopher Nolan’s personal take on smartphones is surprisingly practical

Grab a blazing-fast dual-interface M.2 SSD enclosure for just $59 on Amazon…

What is Copilot? Everything you need to know about Microsoft’s AI assistant