Tokenmaxxing breaks the AI budget — and the CFO walks in
Uber and ServiceNow burned through their 2026 AI coding budgets in four months; finance chiefs now sit between the engineer and the model..
For a decade, enterprise software was bought by the seat: a fixed monthly fee per user, predictable to the cent. Generative AI broke that model. Modern coding assistants and agents charge by the “token” — roughly a fragment of a word — consumed by each prompt, each reasoning step, each tool call. The more an engineer uses the system, the more it costs. A single agentic workflow can chew through tens of thousands of tokens to answer what feels like one question. “Tokenmaxxing” is the new shorthand: a culture in which staff race to consume more tokens as a badge of being AI-native. The result is that AI spend behaves less like a software licence and more like an electricity bill in a heatwave — variable, surprising, and increasingly the business of the CFO.
On the last day of April, at the Sentro Filipino Cultural Center in San Francisco, Uber’s Chief Technology Officer Praveen Neppalli Naga took the StrictlyVC stage opposite TechCrunch editor Connie Loizos. The room expected an Uber-at-scale talk. What it got, three minutes in, was a confession. Uber’s 5,000 engineers had run through the company’s entire 2026 AI coding budget by April. Four months. “We blew through our budget,” Naga said, describing the moment shortly after Uber opened the door to agentic coding tools late last year. Adoption of Claude Code inside engineering had jumped from 32% to 84% of the workforce in a single quarter. Power users were spending $500 to $2,000 a month on tokens; Naga himself, by his own account, burned roughly $1,200 in a two-hour demo. Not by accident: Uber had stopped throttling. Two weeks later, ServiceNow’s Chief Information Officer Kellie Romack told reporters her company had done the same thing. The pattern, she said, was “a really hard problem.” The headlines that followed wrote themselves — the canaries in the AI coal mine were singing in chorus. By the time Azeem Azhar’s Exponential View ran its May 18 dispatch under the headline “The cost of tokenmaxxing,” the term had escaped the engineering Slack channels of San Francisco and entered the boardroom vocabulary. What is unusual is not that engineers want more AI. It is the shape of the bill. The average enterprise AI invoice in the United States has grown from roughly $63,000 a month in 2024 to $85,500 in 2025 — a 36% jump in twelve months, according to CloudZero’s State of AI Costs benchmark. The share of organisations planning to spend more than $100,000 a month on AI tools more than doubled from 20% to 45% in the same window. For comparison, the average enterprise AI bill is now larger than the entire annual cloud bill at a typical German Mittelstand firm five years ago. The line item that did not exist in 2022 is now competing for budget with data centres, salaries and physical security. More remarkable still is who is showing up to the meeting. “It is frustrating that I have no idea what we’re going to spend on AI this quarter,” one Fortune 500 CFO told the analyst Ed Zitron earlier this year. “My business units have no forecast of what they are going to use.” The finance organisation, accustomed to negotiating per-seat SaaS contracts in three-year cycles, now finds itself trying to govern an input that behaves like jet fuel — priced daily, burned unevenly, and central to whatever the engineers do next. The CFO is the new gatekeeper of AI.
Start with the variance. CloudZero’s benchmark of large US enterprises puts the median monthly AI bill at $85,521 in 2025, up 36% year over year. But the median hides the tail. A survey by Benchmarkit and Mavvrik, cited in CIO Magazine, found that 85% of organisations misestimate their AI costs by more than 10%, and nearly a quarter are off by 50% or more. Boards have noticed: more than four in five CIOs and CTOs say their directors are now actively questioning the AI line. 71% of leaders told the same survey they plan to raise AI investment in 2026 even as scrutiny tightens — a contradiction the CFO has been handed to resolve. The Uber case study makes the mechanics visible. Per-engineer AI spend at Uber averages $150–$250 a month. Heavy users — the so-called tokenmaxxers — sit between $500 and $2,000. Multiply by 5,000 engineers and even the lower band lands at $9 million a year for one tool; the upper band runs past $100 million. Uber’s engineering chief reports that internal AI agents now author one in nine production-ready code changes, up from less than one percent a few months earlier. The productivity story is real. So is the invoice. The spend curve also breaks the old SaaS economics on both sides. Anthropic, OpenAI and Google have cut headline per-token prices by roughly 80% from 2025 to 2026, according to cross-provider pricing data compiled by Finout and CloudZero. Caching can shave 90% off repeat inputs; batch processing knocks another 50% off. Yet aggregate enterprise spend keeps rising. The reason is Jevons’ paradox in real time: cheaper tokens unlock heavier workflows — agentic chains, sub-agents, tool use — that consume 10 to 40 times more tokens per user interaction than a single prompt did in 2024. Anthropic has restructured enterprise billing accordingly, replacing its old Premium and Standard tiers with role-based pricing ($20 per seat for Claude Code, $10 for Claude.ai) and quietly eliminating the 10–15% volume discounts large customers relied on. The DACH picture is sharper than many in Berlin or Munich would like. The April 2026 Bitkom KI-Studie found that 33% of German companies running AI in production are over their original business case — a rate Digital Chiefs called the moment when “the back-of-the-envelope math from the pilot phase stops holding up.” Forty-one percent of surveyed firms have AI in productive use; nineteen percent already cite AI as justification for headcount reductions. The Bitkom authors warn that 2026 will be remembered either as the year DACH boards built real AI governance, or as the year the first management boards were dismissed for letting AI spend run unsupervised. China is no exception. Domestic labs that boast a structural cost advantage — cheaper inference, open weights, leaner architectures — are nonetheless reporting double-digit monthly growth in token consumption from their enterprise customers, according to Exponential View’s May 2026 compute-crunch dispatch. The conclusion is the same on both sides of the Pacific: efficiency gains are being recycled into more usage, not lower bills.
The strategic implication for senior leaders is straightforward, if uncomfortable. AI is no longer an IT cost centre; it is closer to a raw material, and a volatile one. That has three operating consequences. First, the CFO must move upstream. The traditional pattern — IT procures, finance audits later — is incompatible with a variable-cost input that can swing by an order of magnitude in a quarter. The FinOps Foundation’s 2026 framework now lists “maximising the value of tokens” as a first-class capability alongside cloud cost management, and 90% of FinOps practitioners are being asked to extend their remit to SaaS and AI. Second, business units — not central IT — should own their token budgets. The Exponential View thesis, articulated by Azhar in March, is that treating tokens as an IT line item is the organisational error of the moment: “They are a productive input, as fundamental to knowledge work as electricity or office space.” Engineering, marketing and customer service should each carry their own meter. Third, governance must shift from gatekeeping to telemetry. Per-team dashboards, per-workflow attribution and hard alerts beat quarterly committee reviews. The catch: most enterprises lack the instrumentation to do any of this today. Uber and ServiceNow are the canaries because they are large, public and instrumented enough to notice. The Mittelstand is unlikely to notice until the invoice arrives. The second-order question is contractual. Most existing AI vendor agreements were negotiated in 2024 against per-seat or low-volume API patterns; they do not contain the consumption caps, alert thresholds, or committed-use discounts that a serious FinOps team would now insist on. SAP procurement leaders inside DAX40 firms confirm informally that their AI cost lines are growing 8 to 15% month over month, faster than any other category in the IT plan, and that few of the underlying contracts even surface monthly burn against budget. A practical 2026 control set looks like this: token budgets owned by business unit, instrumented to dashboards refreshed daily; vendor contracts re-papered with rate cards, caps, and audit rights before the next renewal cycle; and a quarterly board-level review of AI unit economics, treated with the same seriousness as the cloud-cost review board most large enterprises stood up in 2018.
For DAX40 CIOs, the Uber and ServiceNow disclosures are a free warning shot. The cheapest learning is borrowed. Three immediate moves: instrument token consumption per team and per workflow before scaling any agentic pilot beyond a hundred users; renegotiate vendor contracts to include consumption caps, alerts and a committed-use discount in writing, since Anthropic’s elimination of volume tiers is unlikely to be the last quiet pricing change; and seat a senior finance partner inside the AI programme, not adjacent to it. The Bitkom 33% overrun figure suggests the German market is roughly twelve months behind the US wave — still time to govern, but not much. Treat 2026 as the year token budgets become as boardroom-visible as cloud spend became in 2018.
Regulators have not yet woken up to AI cost volatility, but they will. Two pressure points are already visible. The EU AI Act’s general-purpose-AI obligations require deployers to monitor systemic risk, and uncontrolled spend on third-party model APIs is plausibly a financial-stability risk for listed deployers — a topic Germany’s BaFin is reportedly raising informally with DAX issuers. Separately, audit standard-setters (IDW in Germany, IAASB internationally) are circulating early guidance on disclosing AI usage costs in management reports, on the basis that material variability deserves narrative discussion. Expect the first formal enforcement actions to land not on model bias but on missing or misleading disclosure of AI operating cost trajectories. CFOs of listed firms should pre-empt this with a clean cost-attribution methodology in this year’s annual report.
For venture-backed AI startups, tokenmaxxing is a double-edged sword. Heavier customer usage drives top-line revenue — OpenAI, Anthropic and the inference providers are all guiding to record growth on the back of agentic workloads. But the same dynamic forces a structural rethink of pricing. Per-seat subscriptions are dying; consumption pricing, with margin protection clauses, is the new default. Sierra’s $950 million round in early May, at a valuation that prices it as enterprise AI’s consumption-billing standard-bearer, is the market signalling where it expects margin to accrue. Ed Zitron’s subprime-AI critique — that current pricing is “far from stable and even further from profitable” — still cuts. Founders pitching to DAX procurement should expect questions on cost-per-outcome, not feature lists. The companies that survive 2027 will be the ones whose own gross margin does not depend on tokenmaxxing continuing forever.
Sources 12 references
- [1]The cost of tokenmaxxing — Exponential View (Monday Data, May 18 2026)
- [2]Uber CTO Praveen Neppalli Naga at StrictlyVC SF, April 30 2026
- [3]Uber Exhausts Full AI Coding Budget in Four Months as Usage Explodes
- [4]ServiceNow exhausts full-year Anthropic AI coding budget early — Laura Bratton
- [5]AI cost overruns are adding up — with major implications for CIOs (CIO Magazine)
- [6]The State of AI Costs 2025 — CloudZero
- [7]AI’s Economics Don’t Make Sense — Ed Zitron, Where’s Your Ed At
- [8]Anthropic shifts enterprise billing to token-based pricing — IT Brief
- [9]AI Cost Observability: Measuring and Justifying Token Spend in 2026 — Vantage
- [10]State of FinOps 2026 Report — FinOps Foundation
- [11]KI-Cost-Overruns 2026: Was die 33-Prozent-Rate für DACH-C-Level bedeutet — Digital Chiefs
- [12]Jensen’s OpenClaw thesis — Exponential View