Daily AI Briefing · Wednesday, 27 May 2026

01 / 04 · Markets & FinOps

7 min read

Uber Torched Its 2026 AI Budget by April. Now the COO Is on the Record.

A publicly listed flagship admits, in plain English, that it cannot draw a line between Claude Code spend and shipped consumer value — landing the week DAX40 CIOs reopen Q3 reforecasts..

·01Primer

Uber pays a vendor — Anthropic — every time its engineers ask an AI assistant called Claude Code to write or review software. The price is per token, roughly per syllable of text the model reads or writes. In February, about a third of Uber engineers used the tool. By March, it was 84%. The bill ballooned. By April, the entire 2026 budget Uber had set aside for AI coding tools was gone. On May 25–26, in a podcast interview tied to a Bloomberg conference appearance, Uber's chief operating officer admitted he cannot prove the spending produced 25% more useful features for riders or drivers. That admission, from a profitable, publicly listed company, is the first time a flagship enterprise has put AI return-on-investment doubt openly on the record. CIOs in Frankfurt, Munich and Zurich are paying attention.

·02What Happened

It was the line about heads exploding that travelled fastest. Sitting in a Rapid Response podcast studio during the conference circuit around Bloomberg's late-May tech week, Andrew Macdonald — Uber's president and chief operating officer — recounted the moment his chief technology officer, Praveen Neppalli Naga, walked into a leadership review in April and announced that the entire 2026 budget for Claude Code, Cursor and related AI coding tools was, as Naga put it to The Information, “blown away already.” Macdonald called it a “head-exploding moment.” Then he did something corporate executives almost never do in public: he questioned, out loud, whether the spending had been worth it. “I think maybe implicitly there is more that is getting shipped,” Macdonald told host Bob Safian, “but it's very hard to draw a line between one of those stats and, ‘Okay, now we're actually producing 25% more useful consumer features.’” He went further, telling the audience that Uber would now have to begin “talking about token consumption and the associated cost versus headcount.” Translated: the FinOps team and HR are about to sit at the same table. The disclosure has a backstory. In late 2025, Uber rolled out an internal leaderboard ranking engineering teams by total AI tool usage. The leaderboard worked exactly as designed. Adoption of Claude Code jumped from roughly 32% of Uber's roughly 5,000 engineers in early February to 63% by the end of that month, and to 84% by March, according to The Information's reporting on Naga's internal memo. About 70% of code committed at Uber is now AI-assisted; roughly 11% of live backend updates ship without a human author in the loop. Per-engineer bills landed in a $150–$250 monthly range, with heavy users routinely clocking $500–$2,000. Multiply that across thousands of seats and the math, as Ed Zitron has been arguing for two years, simply does not flatter the user. CEO Dara Khosrowshahi confirmed the squeeze on the Q1 earnings call on 6 May: Uber is slowing hiring to absorb the AI line item. The historical rhyme is the early-2010s cloud era, when CFOs at Netflix and Capital One opened their first seven-figure AWS bills and discovered that elastic infrastructure was, in fact, elastic in both directions. The difference: cloud overage stories took two years to leak. This one took six weeks. And it came from the C-suite itself.

·03The Numbers

Strip out the rhetoric and the arithmetic is brutal. Uber does not publish its AI-tooling line item, but the company's total 2026 R&D budget is roughly $3.4 billion, against which industry reporting suggests the dedicated AI coding allocation sat in the low hundreds of millions — a sum exhausted in four months. Anthropic's Claude Code, in its enterprise tier, charges per token: input tokens at roughly $3 per million, output tokens at $15 per million, with Opus-class models several times more. A senior engineer running an agentic workflow that reads a 200,000-token codebase, drafts patches, runs tests, and iterates, can burn three to five million tokens in a single afternoon. At $500 a month, an engineer is consuming the rough equivalent of 33 million input tokens, or one mid-size monorepo every working day. At $2,000 — the heavy-user ceiling Uber's CTO flagged — that engineer alone costs the company more in AI tokens than a junior developer in Wrocław costs in salary. Compare to the unit economics that Microsoft, Anthropic's nominal competitor and partner, encountered with its own pilot. Per Fortune's 22 May report, Microsoft's Experiences & Devices division paused internal Claude Code use after watching the team's entire annual AI budget evaporate within months; the publicly available numbers suggest some engineers' token costs exceeded their fully loaded salaries. That is the line — AI seat-cost above human seat-cost — that historically marked the moment enterprise software repriced. The on-prem-to-SaaS shift had it. The cloud-overage shock had it. Token-based AI has now had it, in public, from two of the world's most profitable software-adjacent companies in the same fortnight. The second number worth lingering on is the productivity claim. Uber's leadership has said internally that 70% of committed code is AI-assisted and 11% of backend changes ship agentically. Yet Macdonald's own framing — “very hard to draw a line” to 25% more useful consumer features — concedes that the throughput gains have not translated into observable product velocity that a board can underwrite. The Pragmatic Engineer's recent FinOps coverage flagged the same gap: developer-survey self-reports of “30–40% faster” coexist with shipping-cadence dashboards that look flat. Either the measurement is wrong, or the leverage is going somewhere — bug-fix backlog, code churn, refactors — that does not show up in the revenue line. Naga's quoted line lands the point with the precision only a CTO can manage: “I'm back to the drawing board because the budget I thought I would need is blown away already.” That is not a complaint about pricing. It is a confession about forecasting. And forecasting is what every CIO reading this in Stuttgart will be doing again in the next ten weeks.

Three Perspectives What this story means for different readers

For DAX40 CIOs heading into Q3 reforecasts, Uber is the first public, named, audited datapoint they can cite to their CFOs without sounding like a sceptic. The action items write themselves: instrument token spend at the engineer and team level, not the contract level; renegotiate Anthropic and OpenAI commitments with consumption caps and rate-limit clauses; align HR planning with FinOps so that “AI replaces headcount” is modelled as a swap, not a saving. Expect at least one Frankfurt-listed insurer or carmaker to quietly pull a planned 2026 expansion of agentic coding pilots within the quarter, citing Uber by name in the board pack. The competitive read is harsher: those who instrument early will discover their own heavy users — and decide whether to throttle, charge back, or celebrate them.

There is, as yet, no specific European rule on AI-tooling cost disclosure, but the dominoes are visible. The CSRD's double-materiality framework already obliges large EU-listed companies to disclose material operational risks; if AI tooling becomes a line item that meaningfully shifts EBIT, auditors will start asking. The EU AI Act's general-purpose-model transparency obligations, in force since August 2025, push providers — including Anthropic and OpenAI — to publish more on training and inference economics, which in turn arms enterprise buyers with negotiating leverage. BaFin has begun informal soundings with German banks on third-party AI vendor concentration. Uber's confession will accelerate that conversation: regulators do not love a single-vendor dependency on a US lab whose pricing model is opaque, metered, and capable of consuming a full-year budget in 120 days.

For founders, Uber's admission is a green light, not a warning. The biggest near-term venture opportunity in enterprise software is not another coding assistant — it is the FinOps layer that sits between the assistant and the CFO. Expect a wave of seed and Series A rounds in 2026 H2 for token-observability, prompt-caching proxies, model-routing gateways, and chargeback platforms purpose-built for AI consumption. Vellum, Helicone, Langfuse and Portkey already raised on this thesis; Bessemer's recent State of the Cloud noted that AI cost-management is the fastest-growing FinOps subcategory. The bear case: if Anthropic and OpenAI compress prices aggressively in the next twelve months — which Dario Amodei has hinted at — the entire FinOps stack becomes a feature, not a category. The bull case: enterprises will pay for visibility regardless of headline token prices, because they have just watched what happens when they don't have it.

Sources 7 references

02 / 04 · Law & Governance

8 min read

Ten Minutes to Strip a Frontier Model

An open-source tool called Heretic peels safety alignment off Meta's Llama and Google's Gemma in under ten minutes — and lands on European supervisors' desks ten weeks before they get the power to fine..

·01Primer

Open-weight models — Meta's Llama 3.3, Google's Gemma 3, Mistral, DeepSeek, Alibaba's Qwen — ship with safety alignment baked into their refusal behaviour. A May 25 Financial Times investigation showed that a free GitHub tool called Heretic, written by German developer Philipp Emanuel Weidmann, removes that alignment in under ten minutes on an ordinary laptop, using a technique called abliteration that surgically deletes the refusal directions inside the model. The decensored systems then explain how to disperse chlorine gas in a crowded room, calculate lethal ricin doses, generate credit-card-stealing malware and write child sexual abuse material. The investigation lands ten weeks before the EU AI Act's General-Purpose AI enforcement powers activate on August 2, 2026, exposing every open-weight provider to fines of up to three percent of global turnover.

·02What Happened

On a commodity laptop, in a Financial Times newsroom, a journalist downloaded Meta's Llama 3.3 from Hugging Face, pointed Weidmann's open-source script at it, and waited. Less than ten minutes later, the model that had refused to discuss ricin toxicity calmly produced a per-kilogram lethal-dose calculation. The same procedure on Google's Gemma 3 yielded step-by-step protocols for dispersing chlorine gas through a crowded indoor space, working code to skim credit card numbers, and stories depicting child sexual abuse. The investigation, conducted jointly with AI safety firm Alice, was published on May 25 and immediately replicated by the Irish Times, eWeek, Futurism and Techstrong. The protagonist is the tool's author. Weidmann, who publishes under the handle p-e-w, told the FT his software has been used to create more than 3,500 decensored model variants since its release last year, and that modified systems built with Heretic have been downloaded 13 million times. Heretic is licensed under AGPLv3, runs without GPU acceleration, and uses an Optuna-powered Tree-structured Parzen Estimator to find abliteration parameters that minimise both refusals and KL divergence from the original weights — meaning the model loses its safety reflex without losing measurable intelligence. “More than half of all abliterated models ever published on Hugging Face,” the project's documentation states, were made with Heretic. The corporate responses were revealing. Google told the FT that “abliteration is a known technical challenge facing all open models” and that Gemma undergoes “rigorous internal safety evaluations prior to launch to help prevent these kinds of troubling examples.” Meta declined to comment. Neither company disputed the FT's reproduction. Alice CEO Noam Schwartz delivered the line that travelled: “The genie is out of the bottle. Things that look like sci-fi are no longer sci-fi and we need as a society to prepare accordingly.” The framing matters because Alice — formerly ActiveFence — sells trust-and-safety tooling to the same model providers whose products it just helped strip. The investigation is not an academic provocation. It is a vendor demonstration with a regulatory addressee. The pivot is where this story stops being a content-moderation problem and becomes a liability problem. Abliteration is not a jailbreak; it is a permanent modification of the released weights. There is no system prompt, no rate limit, no API key, and no telemetry to revoke. Once a provider publishes weights, every downstream user — including the 13 million who downloaded Heretic-modified variants — operates outside any safety perimeter the provider can technically reach. European supervisors will read that sentence the same way the FT did.

·03Timeline & Context

The EU AI Act's General-Purpose AI obligations entered into application on August 2, 2025. For ten months they have existed without teeth: the Commission's enforcement powers, including the ability to issue administrative fines of up to fifteen million euros or three percent of worldwide annual turnover, switch on August 2, 2026 — sixty-eight days after the FT investigation. Models placed on the market before August 2, 2025 have until August 2, 2027 to comply. Llama 3.3 and Gemma 3 are squarely in the post-2025 cohort. The Act offers a narrow open-source exemption: providers releasing weights under a free and open licence are excused from most GPAI obligations — unless the model is classified as a “GPAI model with systemic risk.” That classification, triggered by high-impact capabilities measured roughly in training compute, removes the exemption entirely and requires model evaluations, adversarial testing, serious-incident reporting and cybersecurity assurance. Llama 3.3 70B and Gemma 3 27B are credible candidates for systemic-risk designation; the AI Office's Guidelines for providers of general-purpose AI models, published in 2025, make clear the open-source carve-out collapses the moment a model crosses that threshold. The historical comparison is Napster. In 2000, a single MIT undergraduate's file-sharing client shifted the burden of copyright enforcement from distribution choke-points to end-user behaviour, and the recording industry spent a decade litigating a problem that had already escaped the legal frame. Heretic is the Napster moment for model alignment: the safety contract was assumed to live at the provider, but the technical reality is that it lives nowhere once weights are public. Unlike Napster, however, the rights-holders are also the regulators — and they have just been handed, on paper, the power to fine. The encyclical context tightens the screw. Pope Leo XIV's May 15 encyclical Magnifica Humanitas framed unaligned AI as a moral question for the operators who deploy it, not just the providers who build it. That moral framing has now collided with a technical demonstration that providers cannot guarantee alignment downstream. Brussels, Berlin and Paris read both documents in the same week. Google's defensive posture is informative. Its April 2026 release of AMS — the Activation-based Model Scanner for open-weight LLM safety verification — was already an admission that post-distribution drift is the open problem. Meta's silence is louder. Yann LeCun's long-running argument that open weights are a safety benefit because “models can be audited” now has to absorb the case where the audit tool and the bypass tool are the same code. Mistral's Arthur Mensch, who has called existential-risk discourse a “diversion” used by closed labs to entrench monopoly, has not yet responded publicly to the FT's findings. He will be asked.

Three Perspectives What this story means for different readers

For DAX40 AI programmes, the immediate question is not whether to use open-weight models — most already do, often via Hugging Face mirrors or Vertex AI Model Garden — but how to evidence safety controls when the upstream provider has none. Procurement and model-risk teams should expect their second-line functions to demand attestations that any open-weight deployment runs only weights pulled from cryptographically verified provider repositories, with hashes logged at ingestion. Any fine-tune or model card that does not match a known-good hash is now a regulatory artefact. Downstream deployers in regulated sectors — finance, pharma, critical infrastructure — should also treat their own deployment as the de facto safety perimeter: input filtering, output classification, and continuous red-teaming move from optional to standard of care.

Supervisors at the AI Office, BfDI, CNIL and AEPD will read the FT investigation as a stress-test of Article 55's systemic-risk regime. The legal question is whether publishing weights known to be trivially decensorable constitutes a “reasonably foreseeable negative effect on public health, safety, public security or fundamental rights” — the statutory trigger for systemic-risk classification. If the answer is yes, the open-source exemption collapses for Llama 3.3 and Gemma 3, and Meta and Google owe the full suite of evaluation, mitigation and incident-reporting obligations from August 2. Expect the first formal information request under Article 91 within weeks of enforcement powers activating. A test case is now politically necessary.

The open-weight thesis sold to LPs since 2023 — that European AI sovereignty rides on Mistral, on Aleph Alpha, on a permissive licensing posture — is suddenly carrying a regulatory tail risk that closed-API competitors do not. Founders building on Llama or Gemma derivatives should price compliance overhead into their next round: model evaluations, third-party red-teaming, and an incident-response function are no longer optional line items. Conversely, a clear opening exists for safety-tooling startups: Alice, Lakera, Lakera-equivalents in Germany, and any vendor selling abliteration detection, weight provenance, or post-deployment guardrails. The FT investigation is, in effect, a sales document for that category, and the August 2 deadline is the close date.

Sources 8 references

03 / 04 · Markets & FinOps

8 min read

The Atlantic Splits: US Hires, Germany Doesn't

Pragmatic Engineer data crystallises a 2026 talent geography in which American employers absorb the AI engineering wave while German and French postings keep sliding..

·01Primer

On 26 May, Gergely Orosz and Jessica Salmon published the most-anticipated annual labour read in enterprise tech: The Pragmatic Engineer's State of the Software Engineering Job Market 2026. Drawing on TrueUp, Workforce.ai and Indeed/FRED data, the deepdive shows software vacancies rising in the US and UK, flat in Canada, and falling in Germany and France. AI/ML skills now appear in 42% of postings, up from 8% in 2022, per LinkedIn's 2025 Workforce Report. Apple, Amazon and IBM lead hiring; Meta has dropped out of the top 20 after cutting 8,000 staff on 20 May. The data lands the same week Uber's CTO admitted the company torched its 2026 AI budget in four months as Claude Code adoption among its 5,000 engineers jumped from 32% to 84%.

·02What Happened

Tuesday morning Berlin time, Gergely Orosz published the post that every DAX40 talent chief read before lunch. The Pragmatic Engineer's deepdive opens with what its author calls “encouraging growth signs” in dev job listings over twelve months — then pivots, four charts in, to a map that is uncomfortable reading in Frankfurt, Munich and Paris. “The US and the UK are the only two countries where vacancies are up; Canada is flat, while Germany and France have seen declines,” Orosz writes, citing Indeed data routed through the St Louis Fed's FRED series. “US-headquartered companies are hiring more devs, mostly in the US and some in the UK, whereas European-headquartered companies are more cautious about recruitment.” The scene matters because of who reads Orosz. The Pragmatic Engineer newsletter sits on the laptops of CTOs at every Big Tech firm and most ambitious European scale-ups; it functions as the de-facto sell-side equity research of engineering hiring. When its annual state-of-the-market piece lands, talent committees treat it as a calibration document, not commentary. The 2026 edition is co-authored with Jessica Salmon, who joined the operation as a dedicated industry analyst — itself a tell that Orosz is treating talent data as a recurring institutional product. Salmon and Orosz spent two months synthesising three previously unpublished datasets: TrueUp, which scans every open role at top-paying Big Tech, scaleups and unicorns; Workforce.ai (built by Live Data Technologies), which validates 1M+ job changes monthly across 300M+ profiles; and Indeed's country-level posting indices. The headline finding for an enterprise audience is not the absolute level but the geographic split. The US line on the FRED chart curves upward through 2026; the UK's tracks it; Canada is flat; the German and French lines slope down. TrueUp's ranking of “top” employers by software-engineering openings reads as a roll call of US headquarters: Apple, Amazon, IBM, Accenture, Google, Tesla, Cadence, HPE, SpaceX. Stripe, Atlassian and Shopify hire faster than most of Big Tech. New entrants to the top 20 include three US hardware companies — Micron, Qualcomm, AMD — retooling around silicon and on-device AI. Dropouts: Meta and Oracle, both following mass layoffs. Meta let 8,000 people go on 20 May, the third redundancy round in three years; Oracle announced up to 30,000 cuts in March. The narrative pivot is sharper still when set against the week's other tech-labour story. On the same Tuesday, Fortune reported that Uber CTO Praveen Neppalli Naga and COO Andrew Macdonald had blown through Uber's entire 2026 AI budget in four months, after Claude Code adoption among its 5,000 engineers rose from 32% in February to 84% in March, with individual users billing $500–$2,000 in tokens per month. “It's very hard to draw a line between one of those stats and ‘okay, now we're actually producing 25 percent more useful consumer features,’” Macdonald told the publication. Read together, the Pragmatic Engineer geography and the Uber FinOps shock describe the same machine: AI-native productivity is being capitalised on American payrolls and American token bills.

·03The Numbers

Start with the headline percentage every recruiter in DACH will quote this week. LinkedIn's 2025 Workforce Report, picked up across the 2026 commentary, puts AI/ML skills in 42% of software job descriptions, against 8% in 2022 — a five-fold rise in the share of postings that demand at least nominal fluency in model training, RAG pipelines, evals or agent orchestration. The entry-level cohort tells the other half of the story: postings for junior roles remain 28% below the 2022 peak according to the Indeed/FRED-derived figures reproduced across multiple 2026 analyses, with the overall US software posting index still 45% below its mid-2022 high. TrueUp's top-20 ranking now reads: Apple (1), Amazon (2), IBM (3), then a cluster of Accenture, Google, Tesla, Cadence, HPE and SpaceX. Google's engineering openings are up 62% year-on-year. Apple's software headcount has grown 10% in two years, Google's 5%; Microsoft is down 1.1%, Amazon down 1.3%. The publicly traded mid-tier outpaces all of them: Stripe +29% over two years, Atlassian +23%, Shopify +36%. Among the fastest-growing private hirers, Workforce.ai records Ramp at +94%, Wiz at +84%, Datadog at +68%, Rippling at +55%, Figma at +41% and Netflix at +37%. Meta is the cautionary case. Headcount grew nearly 20% in the two preceding years; on 20 May the company began cutting 8,000 roles — roughly 10% of staff — cancelled 6,000 open positions and signalled further cuts in H2. NPR and TheNextWeb report that Meta's 2026 capex guide of around $125 billion is a 73% rise on 2025's $72.2 billion, with most of it directed at AI infrastructure; Chief People Officer Janelle Gale told staff that up to 7,000 redeployed engineers would move into Applied AI, Agent Transformation Accelerator XFN and Central Analytics pods. Internally, Meta projects AI will write four times the code its humans write in 2026. The European numbers are colder. Indeed/FRED records the German and French software-posting indices declining over twelve months. Bitkom's most recent count puts unfilled IT roles around 109,000, but — critically — the volume of open positions itself dropped 26.2% in 2024 year-on-year and has only stabilised in 2025–early 2026, not recovered. Roughly 6% of German firms surveyed laid off IT staff in the past twelve months; 14% expect to in the next twelve. Computer-science enrolment fell for a second consecutive year, to 72,075 starts, almost 6,000 below 2019. Germany is short on bodies and short on postings simultaneously — the diagnostic of an economy where demand has moved abroad while supply contracts at home. The historical rhyme is the 2010–2015 mobile-app developer wave, when iOS and Android skills concentrated in a Silicon Valley salary band that European employers chose not to match and lost a decade of mobile-native product leadership in consequence. The 2026 AI-engineering wave is running the same play, faster, on token-denominated compensation.

Three Perspectives What this story means for different readers

For DAX40 CTOs and consultancy talent leads, the Pragmatic Engineer split is the empirical version of what their recruiters have been signalling since Q4 2025: senior engineers with credible AI/ML resumes now anchor to a US compensation curve denominated in dollars and refresh grants, not a Stuttgart or Munich band denominated in euros and 13th-month payments. With 42% of postings naming AI/ML skills, the practical move is to treat AI fluency as a baseline screen on every software requisition — not a separate “AI engineer” ladder — and to redirect retention budgets from senior IC count toward fewer, more expensive principal-grade hires with explicit token-budget authority. The Uber FinOps blow-up is the warning shot: enterprises that hand Claude Code seats to entire engineering orgs without FinOps guardrails will produce the same chart Macdonald flagged — spend up, useful features ambiguous.

For Brussels and Berlin policymakers, the Orosz data is the cleanest evidence yet that the EU AI Act's compliance overhead is colliding with a softer demand environment for European tech employers. The German computer-science enrolment decline — two consecutive years, 6,000 fewer starts than 2019 — closes the supply tap precisely as US firms widen their demand pipe. Expect renewed pressure on the Skilled Immigration Act, on the Blue Card threshold, and on the EU's slow-moving AI Talent Pool initiative. The deeper question for regulators is whether the AI Act's general-purpose model obligations, due to bite in stages through 2026, are accelerating the relocation of model-adjacent engineering work to the US and UK. Bitkom will use this report; so will the FDP and the CDU's digital wings.

European seed and Series A founders read the same chart and reach an opposite conclusion: a softer DACH posting environment is the best hiring tape they have seen since 2021. Senior engineers laid off from Meta, Atlassian (10% cut this month) and Snap (16% in April) are circulating in Berlin and Amsterdam at compensation expectations that finally clear seed-stage budgets. The fastest-growing US private hirers in the Pragmatic Engineer data — Ramp, Wiz, Datadog, Rippling, Figma — also signal where European founders should aim: fintech rails, security, observability, design tooling. Index, Accel London and HV Capital partners have been telling LPs for two quarters that AI-native European observability and security plays are the cleanest 2026 trade; this dataset is the slide they will paste into the next LP letter.

Sources 8 references

04 / 04 · Enterprise & Architecture

8 min read

Compliance Becomes the Killer AI Vertical — Ten Weeks to Capex Line

a16z names enterprise compliance the next vertical AI battleground as Auditoria ships Governed Autonomy, Vanta's compliance agent goes live, and the EU AI Act August enforcement clock forces every DAX board to budget..

·01Primer

Compliance — the unglamorous machinery that files suspicious-activity reports, runs SOX controls, evidences SOC 2 audits and now documents AI risk — has quietly become the largest white-collar labour pool in regulated industries. The US Bureau of Labor Statistics counts roughly 400,000 compliance officers and projects 33,300 openings every year through 2034. One global bank has 30,000 of 210,000 employees working only on financial-crime checks. Until early 2026 most CIOs treated AI in this domain with suspicion: regulators required human review, auditors mistrusted model output, and hallucinations were career-ending. On May 26 two events collapsed that resistance simultaneously: Andreessen Horowitz published a fintech thesis declaring compliance the killer vertical for AI, and Auditoria.AI launched a framework called Governed Autonomy that gives CFOs a governance grammar to let agents act without per-step approval.

·02What Happened

At the Gartner CFO Symposium in National Harbor on May 26, Rohit Gupta walked on stage with a sentence that has been quoted in every Office-of-the-CFO Slack channel since. “Human-in-the-loop was how the industry learned to trust AI. It is not how the enterprise will ultimately run on it,” the Auditoria.AI co-founder told the audience. “If every invoice or approval still needs a human to validate the system, AI is just sitting on top of the old operating model. The bottleneck shifts from doing the work to approving it.” The product behind the line is Governed Autonomy, an operating framework that pushes oversight upstream into policy design rather than transaction-level review. The enterprise defines what agents may do, under which conditions, with which entitlements, and with what audit trail; the agents execute inside those rails on accounts payable, accounts receivable and FP&A workflows without waiting for a human to click approve. KPMG's global head of AI labs, Swami Chandrasekaran, endorsed the framework on stage as “how trust gets engineered into the operating model.” Hours later, a16z's fintech team published “Everything, Everywhere is Compliance” on a16z.news, arguing that the same vertical-AI logic that ate accounting in 2025 will eat compliance in 2026–2030. The thesis lands inside a remarkable two-week sequence. On March 19 Vanta unveiled three agents — a Compliance Agent that runs the evidence lifecycle, a TPRM agent that cuts vendor reviews by 81 percent, and a Customer Trust agent for inbound questionnaires — and disclosed $300 million ARR with 16,000 customers by April. In November 2025 a16z led a $21 million Series A into Sphere, an AI-native cross-border tax compliance engine whose TRAM model ingests global tax law in real time. BaFin published ICT-risk guidance for AI in December 2025 placing the use of AI inside DORA's perimeter, and on May 13 BaFin signalled it would expand inspections specifically for AI risk. The Financial Times then dropped its May 25 investigation showing that Meta's Llama 3.3 and Google's Gemma 3 could be stripped of guardrails in under ten minutes using a GitHub tool called Heretic — a finding that, for any Chief Risk Officer reading the EU AI Act's August 2 GPAI enforcement deadline, turned the open-source debate into a board item. The historical parallel is Sarbanes-Oxley in 2002. After Enron and WorldCom, Congress passed SOX in a 423–3 House vote, the Big Four divested consulting units, and an entire compliance industry was built on top of internal-control attestation. Two decades of constant rule expansion — Basel III, MiFID II, GDPR, DORA, the AI Act — produced the 400,000-person workforce that AI vendors now propose to instrument. The May 26 announcements are best understood as the moment the venture-capital pattern recognised the opening.

·03Architecture

Three architectural patterns are crystallising across the new compliance-tech stack, and DACH platform teams should expect all three on their procurement shortlist by Q3. The first is the audit-trail-as-control-plane pattern that Vanta has industrialised. Every agent action emits an identity-bound, immutable evidence event — who acted, under which policy version, against which control, with which input. The evidence graph then drives both human dashboards and downstream attestations against SOC 2, ISO 27001, HIPAA, PCI and the AI Act's Article 12 logging obligations. The compliance program is not a quarterly campaign but a continuously evaluated state machine; the auditor consumes a stream rather than a snapshot. Vanta's March release exposed natural-language queries over that graph (“has the auditor flagged anything?”) and reported an 81 percent reduction in vendor security-review time. The second is Governed Autonomy itself, which is best read as policy-as-code reborn for agents. Auditoria's framework binds three layers: identity propagation (each agent inherits the entitlement set of the operator or service principal it acts on behalf of), runtime authorisation (an explicit allow/deny on each external action against a configurable rule store), and controller-grade audit logging (every step writes to an append-only ledger the controller can replay). Critically, the framework is designed to interoperate with Workday's Agent System of Record, ServiceNow's AI Control Tower, Microsoft's governance services and OpenAI's governance frameworks — acknowledging that no single vendor will own the agent runtime in a Workday plus SAP plus Oracle plus Coupa plus ServiceNow enterprise. The third pattern is regulation-ingest-as-data. Sphere's TRAM engine continuously ingests tax law across 100+ jurisdictions and re-classifies products against current rules; a16z partner Angela Strange's “regulation becomes code” thesis envisages the same primitive for banking and insurance regulation, where rule sets span tens of thousands of pages and SBA documentation alone exceeds a thousand. For DAX architecture leads, the integration question is concrete: where in your reference model does the policy store live, who owns it (Risk, Legal, IT, or a new joint function), how does it propagate to agent runtimes embedded in S/4HANA, Workday Financials and a fast-growing portfolio of point-solution agents, and how do you reconcile the BaFin DORA log requirement with the EU AI Act's Article 12 record-keeping rule and US-side SOX 404? The architectural choice that matters most in 2026 is not which model you license, but which control plane writes the evidence stream and which policy store agents read at runtime. Get that decision wrong and you will rebuild it in 2028 when the EU AI Office issues its first enforcement order; get it right and the marginal cost of new regulation collapses from a project to a config change.

·04The DACH View

Three months ago a senior BaFin official told a closed Frankfurt roundtable that the regulator would treat AI “as part of ICT risk under DORA, full stop” — a position formalised in BaFin's December 2025 ICT-AI guidance and reinforced on May 13, 2026, when the watchdog said it would expand inspections specifically for AI risk. That single sentence is more consequential for DAX compliance budgets than any vendor demo. It means an AI agent that books a journal entry at Allianz, screens a wire at Commerzbank or routes a vendor invoice at BASF must produce the same evidence package an ICT control would — with the additional AI-Act overlay once August 2 lands. Deutsche Bank's TCS-built compliance digital assistant is the visible tip; behind it is a much larger silent retooling in DAX risk and legal departments where headcount cannot scale with regulation but evidence requirements can only grow. The DACH peculiarity is bilingualism: BaFin, BfDI and state DPAs expect German-language documentation, which immediately disqualifies US vendors that cannot localise policy descriptions and audit narratives. Expect Workday, SAP and Microsoft to win the platform fight in the DACH mid-market on language and existing footprint, with Auditoria, Vanta and Sphere arriving via channel partners. The strategic question for a German Managing Director advising DAX clients is whether to commit to a single platform control plane in 2026 — SAP Build/Joule, Workday Agent SoR, or ServiceNow AI Control Tower — or to wait for the AI Office's 2027 enforcement decisions to reveal which evidence formats actually survive audit. The cost of waiting is twelve months of agent sprawl without a policy store; the cost of committing early is a one-vendor lock-in priced before the category settles.

Three Perspectives What this story means for different readers

For DAX CIOs and CFOs the practical question is governance ownership. Auditoria's framing — oversight moves upstream into policy design — implies a new operating function that sits between Risk, Legal and IT, with budget authority over what agents may do across SAP, Workday, Oracle, Coupa and ServiceNow. That function does not yet exist on most German org charts. Where it does, it lives inside the second line of defence and is staffed by lawyers rather than engineers. Mark McDonald of Finance Next put the stake clearly: teams that adapt their governance approach now will scale agents quickly; teams that focus on short-term wins will end with a fragmented AI landscape still dependent on people for transactional integrity. The 2026 capex line is therefore as much about hiring policy-engineering talent as about licensing platforms.

August 2, 2026 is the EU AI Act's GPAI enforcement activation: the Commission gains powers to demand documentation, run model evaluations, order mitigations and fine up to €15 million or 3 percent of global revenue. Compliance officers must now evidence both AI-Act Article 12 logging and DORA ICT-risk controls in parallel. BaFin's May 13 announcement of expanded AI-focused inspections turns the abstract risk into supervisory reality for German banks and insurers. The Financial Times' May 25 demonstration that Llama 3.3 and Gemma 3 guardrails fall in ten minutes with a public GitHub tool eliminates the residual defence that open-source safety claims can substitute for downstream controls. Boards should expect the regulator question of 2026 to be not whether you use AI, but whether your evidence stream would survive a no-notice DORA inspection.

The funding pattern looks like 2018–2022 HR-tech and procurement-tech reborn for the agent era. Vanta crossed $300 million ARR at a $4.15 billion July 2025 valuation; a16z led Sphere's $21 million Series A in November 2025; Auditoria reported triple-digit revenue growth and threefold quarterly bookings in its March fiscal-year close. Drata, Hadrian and a long tail of vertical compliance-AI startups are now raising on the thesis a16z formalised on May 26. The competitive risk is platform absorption: Workday Agent System of Record, ServiceNow AI Control Tower and Microsoft Purview each want to own the audit log and policy store as part of their existing seat. The startups that survive will be those whose evidence graph is hard to replicate — multi-jurisdictional regulation ingest (Sphere), cross-system identity propagation (Auditoria), or auditor-network distribution (Vanta) — rather than another wrapper over GPT-class models.

Sources 10 references

Ethan Mollick, "Choosing to Stay Human" (One Useful Thing, May 26, 2026)

Mollick's argument: AI writing has saturated social media, opinion pages and even prize-winning fiction, but the bigger danger is cognitive surrender — defaulting to AI without thinking. He cites a Wharton/BCG study showing 758 consultants who used GPT-4 outperformed peers on tasks suited to AI, yet did markedly worse than non-AI peers on a task designed to expose the model's errors, because they accepted plausible-sounding wrong answers. A Turkey high-school study shows ChatGPT used as an answer machine hurt learning, while a Taipei study shows AI used as a personalised tutor produced gains equivalent to six to nine months of extra schooling. Why this matters: for enterprise leaders, the framing reorients AI adoption away from raw productivity metrics toward designing system-level constraints — tutor modes, verification rituals, explanation requirements — that prevent skill atrophy in the consultants, analysts and engineers who will spend the next decade validating AI output rather than producing it themselves.

Source

Jack Clark, "Import AI 458: Reckoning with the future" / 2026 Cosmos HAI Lecture, Oxford (May 26, 2026)

Anthropic's co-founder and policy lead publishes the essay version of his Oxford Cosmos lecture, arguing that continued AI progress forces a binary choice: explore the future, or retreat from the present. His concrete claim: Opus 4.6 has so changed work inside Anthropic that humans are migrating to a verification layer atop pyramids of agents, with single researchers running teams of nine synthetic agents on alignment research; he predicts autonomous companies generating tens of millions in revenue by late 2027 and AI designing its own successor by December 2028. Why this matters: the essay is the most concrete public account to date of how a frontier lab is restructuring hiring — more early-career LLM-natives, more very senior people, more interdisciplinary hires — code production (majority machine-written) and team design (smaller teams, more ambitious targets); a template enterprises will be pressed to copy and a regulatory artefact policymakers will cite when AGI-timeline arguments next surface.

Source