Daily AI Briefing · Saturday, 23 May 2026

01 / 04 · Markets & FinOps

7 min read

Anthropic Splits Claude Billing: Agent SDK Lands on a Meter

From June 15, programmatic Claude usage moves to a separate API-priced credit pool — chat, IDE and Cowork stay subsidised..

·01Primer

Until now, anyone with a Claude subscription could run Anthropic’s chatbot, its coding tool and its automated agents out of the same monthly pot. From 15 June 2026, that single pot is split in two. Chatting with Claude in a browser, on the desktop or inside an editor stays bundled in the existing subscription. But anything programmatic — scripts, automated agents, GitHub Actions, third-party tools that log in as you — is moved to a separate, dollar-denominated monthly credit, charged at the same metered rates that Anthropic uses for direct API customers. The credit does not roll over. When it runs out, requests either stop or spill into pay-as-you-go billing, depending on a setting. For heavy automation users, that quietly turns a flat-fee buffet into a metered taxi.

·02What Happened

Late on 13 May, Pacific time, an email landed in the inboxes of Anthropic’s top-tier Max 20x subscribers. The subject line promised something called a “new monthly Agent SDK credit.” Inside, a short note explained that, starting 15 June, the company’s Agent SDK, its non-interactive ‘claude -p’ command, Claude Code GitHub Actions and any third-party app that authenticates through a user’s subscription would all leave the familiar rate-limit bucket and move to a new credit pool — $20 for Pro, $100 for Max 5x, $200 for Max 20x — billed at full API list prices. Interactive Claude Code in the terminal, the chat apps and the new Claude Cowork collaboration mode would carry on as before. Anthropic framed it as a perk. The developer internet did not. Within hours, a clarification tweet from Lydia Hallie, who works on Claude Code at Anthropic, was tagged with a Community Note that summarised the change in language her employer had not used: “Previously, programmatic usage like ‘claude -p’ counted toward subsidized subscription limits; starting June 15, it draws from a separate $20–$200 monthly credit metered at full API rates, while interactive limits remain unchanged.” Theo Browne, the T3.gg founder, was blunter still in a post viewed more than 200,000 times: “If you use any of the following with your Claude sub, your usage just got cut by 25x… They’re disguising this as ‘free credits.’ Don’t fall for it.” The blunt reaction has context. In early April, Anthropic had unilaterally cut OpenClaw and other third-party agent harnesses off subscription billing entirely, citing capacity strain. Boris Cherny, who runs Claude Code, told The Register at the time that the company’s “systems are highly optimized for one kind of workload” and that subscription pricing “wasn’t built for the usage patterns of these third-party tools.” The new June 15 regime reinstates those tools — but on a meter. Not by accident, the split also pulls Claude’s commercial model towards the rest of the industry. GitHub is converting Copilot to an AI-credit system on 1 June. OpenAI has long kept ChatGPT Plus and the API on separate rails. Cursor already runs on a metered overage model on top of a flat fee. Anthropic, which had stood out by letting power users hammer the Agent SDK under a single $200 plan, has now stepped into line. For Greyhound Research analyst Sanchit Vir Gogia, the lesson generalises: over the next 12 to 24 months, he told InfoWorld, every major vendor will carve out “separate consumption pools for agents, premium models, tool use, background tasks and third-party integrations… The vocabulary will vary because marketing departments need hobbies. The direction will not.” For boards that have spent twelve months arguing about which AI subscription to standardise on, the assumption that “Claude is in our IDE” needs to be cracked open into four different commercial buckets: chat, interactive code, SDK, and Cowork — each with its own meter.

·03The Numbers

Strip away the marketing and the question is simple: how much Claude does $200 actually buy at API list price? On Anthropic’s own published rates, the answer is roughly 13 million Opus 4.7 tokens, 22 million Sonnet 4.6 tokens, or 67 million Haiku 4.5 tokens at a 50/50 input-output mix. A single Claude Code session investigating a complex bug burns somewhere between 500,000 and one million tokens, so the headline Max 20x credit covers 13 to 67 of those sessions a month before overage kicks in. For an engineer running an agent fleet through CI, that is a few days, not a month. The Pulsed Media gist that has emerged as the canonical reference for the change — written by an autonomous AI agent that itself runs on Claude Code — works the math three ways. A Pro user routing OpenClaw through their $20 plan was previously extracting around $236 of API-equivalent value, an effective ratio of about 12x. A Max 20x user running heavy Opus workloads against the weekly quota could extract roughly $5,800 of API value on $200 paid, a ratio of about 29–35x. A Max 20x user pointed at Sonnet, which is roughly five times cheaper per token, could push that to 150x–175x. From 15 June, every one of those ratios collapses toward 1.0. That is the headline number behind Browne’s claim of a “25x cut” for typical operators — a midpoint, not a worst case. A historical comparison helps. The shift looks structurally similar to AWS Reserved Instances ceding ground to on-demand pricing in the early 2010s: a vendor that had subsidised long-running workloads to seed adoption discovered that the workloads stayed, multiplied, and ate the subsidy. Anthropic’s Colossus 1 expansion, which pushed its compute fleet past 220,000 GPUs, was not enough to keep flat-rate agent usage profitable. Capping it was, in Cherny’s phrase to The Register, what survival math demanded. The credit mechanics also matter to forecasting. Credits are per account and per seat, with no pooling across a team — “you cannot share a budget across a team,” as Broadcom site reliability engineer Advait Patel observed in InfoWorld. They do not roll over month to month. They must be claimed monthly through a separate flow that Anthropic says it will document in June. Overage, billed at standard API rates, is opt-in: if a team forgets to enable extra usage, an agent simply stops mid-pipeline when the credit hits zero. If it is enabled, a runaway loop can burn through hundreds of dollars in minutes. Patel’s point that “a runaway agent or a bad prompt can burn through credit fast and then either stop your pipeline or quietly start garnering extra usage” reads, to a CFO, like an unbounded liability sitting inside a developer’s git push. For DAX-listed IT organisations that have standardised forecasting around per-seat licences, the move drags a slice of the AI spend out of HR’s seat-count spreadsheet and into a cloud-style consumption ledger that nobody has owned before.

·04Strategy & Transition

The catch for enterprises is architectural, not financial. The four routes into Claude — chat, interactive Claude Code, Agent SDK, and Cowork — used to share a single ledger; from 15 June they do not. That changes who has to care. A pricing question becomes a routing question, which becomes a platform question. For German Großkonzerne that have already pushed Claude into shared developer platforms, three immediate decisions follow. First, observability. Token consumption per workflow has to be visible inside FinOps dashboards before 15 June, not after the first surprise bill. Both InfoWorld’s sources and Anthropic’s own help text point in the same direction: treat Claude like AWS, with budget alerts, cost-per-agent attribution and hard caps wired into pipelines. Doozer AI co-founder Paul Chada captured the operating posture in InfoWorld: “Stop optimizing for the subsidy and start optimizing for the token. Treat prompt caching, context discipline and model selection as first-class engineering.” Second, optionality. Zed has already published guidance telling its users that ACP-routed agents will fall under the new credit, while running the official ‘claude’ CLI inside Zed’s terminal still draws on subscription limits. That kind of routing decision will now have to be made deliberately, across hundreds of repositories. Third, multi-vendor posture. Cursor Ultra offers a $400 programmatic envelope on a $200 plan; OpenAI keeps coding subscriptions separate from API usage; GitHub Copilot is moving to credits on 1 June. For a CIO at a DAX40 IT shop, the question is no longer which Claude tier to buy, but how to architect a model-agnostic agent runtime where the meter can be redirected without ripping out the IDE.

Three Perspectives What this story means for different readers

For a DAX40 CIO, this is the moment Claude stops being a software licence and becomes a cloud bill. The neat per-seat economics that finance teams love — predictable, comparable, easy to defend in a board pack — survive for chat and IDE use, but break for any agentic pipeline. Procurement, FinOps and platform engineering now share ownership of the same line item. Expect three immediate workstreams: a token-level audit of existing Claude Code automations, a hard-cap policy embedded in CI runners, and a rewrite of internal AI usage guidelines that distinguishes interactive from programmatic invocation. Consultancies advising on AI rollouts will need to add a routing diagram next to the architecture diagram.

The split also reshapes the compliance surface. Under the EU AI Act, providers of high-risk systems must document data flows, model versions and intended use. A programmatic credit that opt-in spills into pay-as-you-go API usage creates two distinct contractual relationships — subscription and API — possibly under different data processing terms. German works councils, already wary of opaque AI spend, will probably ask whether overage requests carry the same residency guarantees as subscribed traffic. Procurement and the DSB should expect questions about whether the new credit pool sits on the same regional inference infrastructure as the chat subscription, and whether the audit logs reconcile cleanly across both ledgers.

For agent-first founders, the subsidy era is closing fast. The Pulsed Media gist documents one operator who extracted ten billion tokens — roughly $15,000 of API-equivalent value — across eight months on a $100 plan. That arbitrage is over. Cursor, OpenAI and GitHub will read the muted shareholder reaction to Anthropic’s move as permission to tighten too. The investor takeaway: a startup whose unit economics rely on flat-fee Claude is now a discounted asset. Conversely, infrastructure plays around prompt caching, model routing and cross-vendor agent runtimes — the picks-and-shovels of metered AI — just acquired a clearer customer base. European founders building agent platforms now have a credible pitch deck slide on vendor lock-in.

Sources 8 references

02 / 04 · Research & Open Source

8 min read

Qwen3.7-Max closes the weights and opens the Claude socket

Alibaba's first proprietary frontier model speaks Anthropic's protocol natively — a sovereignty option that drops into Claude Code without a rewrite..

·01Primer

Alibaba's Qwen Team has, for years, been the unofficial standard-bearer of open-weights frontier AI outside the United States. Each new Qwen release shipped downloadable weights that DAX40 IT departments quietly tested in air-gapped clusters as a hedge against US export controls. That bargain has now changed. At the Alibaba Cloud Summit in Hangzhou on 20–21 May 2026, the team unveiled Qwen3.7-Max, its first explicitly proprietary, API-only flagship. The model carries a one-million-token context window, sustained a 35-hour autonomous coding run, and — the detail that matters most for European enterprise architects — speaks Anthropic's API protocol natively. Claude Code, OpenClaw, Hermes Agent, Qwen Paw and Qoder all accept it as a drop-in. The open-weights tier remains, but the frontier has moved behind a paywall.

·02What Happened

On a stage in Hangzhou ringed with racks of T-Head silicon, Liu Weiguang, senior vice-president of Alibaba's cloud unit, framed the day's announcements not as model releases but as factory openings. “What we're building is China's AI factory,” he told the Cloud Summit audience, describing Alibaba as the only domestic firm operating “all five layers of the full AI stack” — chips, agentic cloud, foundation models, model-service platform and agentic applications. Behind him, the Panjiu AL128 supernode server and the Zhenwu M890 accelerator were positioned as the iron the new model would, quite literally, write code for. The flagship was Qwen3.7-Max. Zhou Jingren, the former Alibaba Cloud CTO recently elevated to Chief AI Architect of the new group-wide Technology Committee, presented the benchmark sweep: top-tier results across GPQA Diamond (92.4), HLE (41.4), HMMT February 2026 (97.1), SWE-Verified (80.4) and MCP-Atlas (76.4) — figures that, on Alibaba's own scoring, place the model at parity with Anthropic's Claude Opus-4.6 Max and ahead of DeepSeek V4 Pro and Moonshot's Kimi K2.6. Zhou said the model “consistently ranked among the top tier” and outperformed every other Chinese system. Then came the narrative pivot. Quietly buried in the Qwen Studio post was a single sentence: Qwen3.7-Max is “our latest proprietary model.” No weights. No Hugging Face drop. API only, served from DashScope endpoints in Beijing, Singapore and Virginia. For a team whose open-weights releases of Qwen 2.5 and Qwen 3.6 had become reference implementations across Hugging Face, this was a clean break — and one that follows the quiet departure earlier this year of several senior Qwen researchers, a reshuffle VentureBeat covered at the time and that now reads as prologue. The parallel is instructive. OpenAI made the same pivot in 2023, retiring GPT-3-class openness for a closed GPT-4 served only through APIs; Anthropic never opened weights at all; Google's Gemini Ultra tier followed suit. Alibaba is now, for the first time, doing what every Western frontier lab does — keeping a credible open tier alive below the line (Qwen 3.6 Plus, Qwen3-Coder-Next) while moving the actual frontier behind a meter. The community reaction, captured in posts circulating on X within hours of the launch, was split between technical awe at the 35-hour run and audible dismay at the closed weights. “Please open source this one too,” one widely shared post read. “The max tier going API-only would close a door we have been keeping open.” For the DACH enterprise buyer the signal is sharper still. The pricing — $2.50 per million input tokens and $7.50 per million output — sits at roughly a third of Claude Opus 4.7 ($30 blended) and half of GPT-5.4 ($17.50), while the protocol is identical. Any agent harness already wired for Claude can be repointed at DashScope with four environment variables. Sovereignty arbitrage, for the first time, has a drop-in price tag.

·03Architecture

Alibaba has not disclosed parameter count, training compute or a full technical report — those are promised “soon” — but the published numbers, harness list and chip story together describe the shape of the system precisely enough for architectural assessment. The context window is one million tokens with a 64K maximum output, up from Qwen 3.6 Max's 256K. The OpenClaw configuration block published by the Qwen Team explicitly registers ‘contextWindow: 1000000’ and ‘maxTokens: 65536’. A ‘preserve_thinking’ flag retains reasoning traces across turns — Alibaba recommends it for agentic workloads where the model must carry chain-of-thought across hundreds of tool calls. The reasoning effort is exposed as a system-prompt parameter (‘xhigh’ is the default for hard tasks), mirroring the pattern OpenAI introduced with o-series models and Anthropic with Claude's extended thinking. Harness compatibility is the load-bearing design choice. Qwen3.7-Max ships with native support for the Anthropic API protocol — meaning Claude Code, the de facto reference agent harness of 2026, points at a DashScope URL with ‘ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic’ and runs unmodified. The same model also serves OpenAI's chat-completions and responses APIs, plus first-party integrations for OpenClaw, Hermes Agent, Qwen Paw, Qoder and Qwen Code. Alibaba's term for this is “cross-harness generalisation” and it is more than marketing: their training infrastructure (Rollout) decouples task, harness and verifier so the same problem is replayed under varying agent scaffolds during reinforcement learning. The model learns to solve tasks, not to exploit a particular harness — a structural answer to the brittleness that plagued earlier agentic systems. The 35-hour demonstration is the architectural proof. The Qwen Team handed the model a single attention kernel from SGLang — Extend Attention, the production-grade variable-length multi-head attention operator used in LLM serving — and asked it to optimise the kernel for the T-Head ZW-M890 PPU. Crucially, the M890 architecture was never seen during training: no hardware documentation, no example kernels, no profiling data. Starting from an empty workspace, Qwen3.7-Max executed 1,158 tool calls and 432 kernel evaluations across 35 hours of continuous autonomous operation, ending with a 10.0× geometric mean speedup over the Triton reference. Under identical conditions, GLM 5.1 reached 7.3×, Kimi K2.6 reached 5.0×, DeepSeek V4 Pro reached 3.3×, and Qwen3.6-Plus — the previous-generation open model — managed 1.1×. The competitors voluntarily terminated their sessions after five consecutive empty tool-call rounds; Qwen3.7-Max was still finding non-trivial improvements after thirty hours. The full-stack story closes the loop. The Zhenwu M890 ships with 144 GB of on-chip memory, 800 GB/s inter-chip bandwidth and native FP4 precision support; the Panjiu AL128 packs 128 of them into a single rack with petabyte-per-second internal bandwidth via T-Head's ICN Switch 1.0, delivering 25.6 Tbps of aggregate fabric. T-Head reports 560,000 Zhenwu units delivered to date across more than 400 external customers in 20 industries. The model now writes performant code for the chip it runs on; the chip is now manufactured by an Alibaba subsidiary at scale. That recursion — model optimising silicon, silicon serving model, both inside one corporate boundary — is what Liu's “AI factory” phrase actually means. It is the architectural answer to Nvidia, and it is being demonstrated, not promised.

·04Strategy & Transition

For a DAX40 group, the calculus changes in three concrete ways. First, the protocol-compatibility layer collapses switching cost. An IT organisation that has spent twelve months building agent infrastructure on Claude Code — Anthropic's reference harness, increasingly the default in enterprise dev tooling — can now run that infrastructure against a non-US model with a four-line environment change. The harness, the prompts, the eval suites, the security policies all transfer. This is the first time a credible non-Anthropic, non-American model has shipped Anthropic-protocol native; the historical analogue is the moment AWS-compatible APIs appeared on competing clouds in the mid-2010s and reset enterprise lock-in dynamics. Second, the sovereignty conversation gains a real alternative rather than a slogan. The DashScope endpoint exists in Singapore (‘dashscope-intl’) and Virginia (‘dashscope-us’) as well as Beijing, which means the model can be served from non-PRC jurisdictions — though every European data-protection officer will note that Alibaba Cloud is still Alibaba Cloud, and that the Singapore region terminates in a corporate group ultimately subject to PRC jurisdiction under the National Intelligence Law. The model is plausibly deployable; it is not obviously compliant. DAX40 boards will have to decide whether the geopolitical optionality outweighs the regulatory exposure, and the answer is unlikely to be uniform across industries — defence and critical infrastructure will reject it on its face; consumer goods and manufacturing will pilot it. Third, the closed-weights pivot changes the open-source hedge calculation entirely. The previous Qwen bargain — frontier capability you could download — is gone. DeepSeek remains the open frontier alternative, but DeepSeek's release cadence is slower and its agentic capabilities, on Alibaba's own published numbers, are now meaningfully behind. Enterprises that built their non-US strategy on the assumption of a reliable open Chinese frontier will need to rewrite it. The realistic choice for sovereign on-prem deployment now narrows to Qwen 3.6-Plus, DeepSeek V4 Pro, Mistral and a thinning field of European labs.

Three Perspectives What this story means for different readers

For DAX40 CIOs the immediate question is not whether to deploy Qwen3.7-Max — that decision is months of legal review away — but whether to write the integration anyway. The protocol compatibility means agentic platforms can be designed model-agnostic at near-zero marginal cost: any team building on Claude Code today can route a subset of traffic to DashScope for benchmarking by quarter end. Pricing pressure is the more immediate effect. At $10 blended per million tokens against Opus 4.7 at $30, the existence of a protocol-compatible alternative gives procurement real leverage in Anthropic and OpenAI renewals over the next twelve months — even for buyers who never intend to actually deploy the Chinese model. Optionality is itself the asset.

BaFin, BSI and the Bundesdatenschutzbeauftragte have not yet ruled on whether an Alibaba-served frontier model can lawfully process regulated DACH workloads, and the EU AI Act's GPAI obligations — transparency, documentation, copyright disclosure, systemic-risk assessment — apply to any provider placing a general-purpose model on the EU market regardless of country of origin. Alibaba has so far published benchmark numbers but no technical report, no training-data disclosure and no model card of the kind required under Article 53. Until those documents land, enterprise legal will block production use even for non-regulated workloads. The model's closed-weights status removes the on-prem-inspection escape hatch that made Qwen 2.5 and 3.6 quietly tolerable to compliance officers.

European AI infrastructure startups have just been handed both a tailwind and a threat. The tailwind: the “agent harness as universal port” thesis — Claude Code, MCP and the Anthropic protocol becoming the TCP/IP of agentic computing — is now empirically validated by a non-Anthropic vendor adopting it natively. Companies building on that abstraction layer (orchestration, observability, eval, governance) have a wider addressable market by Monday morning. The threat: the open-weights thesis that underpinned much European model-layer investment (Mistral, Aleph Alpha, Black Forest Labs) is weakening as the largest open-frontier program closes at the top. Investors will reweight toward application and infrastructure layers and away from undifferentiated foundation-model bets. The window for European sovereign-model pitches just narrowed.

Sources 10 references

03 / 04 · European Sovereignty

8 min read

SAP's Madrid Pivot: Mistral Goes GA, a Sovereign Stack a CIO Can Sign

At Sapphire Madrid, SAP made Mistral Plus generally available on its sovereign cloud and framed the Autonomous Enterprise as a European default — Capgemini and Accenture both signed up to sell it..

·01Primer

SAP Sapphire is the company's annual customer conference, this year split between Orlando and a Madrid leg running May 19–21. In Madrid, SAP unveiled the “Autonomous Enterprise” — a unified Business AI Platform that wires Joule agents into S/4HANA, Ariba, SuccessFactors and the rest of the suite. The headline for European buyers: Mistral Plus is now generally available on SAP's sovereign cloud infrastructure, sitting alongside Anthropic's Claude and Canada's Cohere as a bookable model option. Two consultancies announced parallel programmes on the same stack — Capgemini with Mistral and SAP for regulated industries, Accenture with Mistral for industry co-development. For a DAX40 board that wants AI inside its core ERP without an exclusively American model stack, the pieces of a sovereign default are now assembled.

·02What Happened

Christian Klein walked the IFEMA stage in Madrid with the same slide deck he had used in Orlando a week earlier — and a different argument. In Florida, SAP's CEO had pitched the Autonomous Enterprise as a productivity story: agents that compress the financial close from weeks to days, a Joule Work interface that replaces screen-hopping with outcomes. In Madrid, the pitch turned. “For the mission-critical processes of our customers, ‘almost right’ just isn't good enough,” Klein said, before handing off to CTO Philipp Herzig and COO Sebastian Steinhaeuser for a Q&A that ERP Today's Tarsilla Moura described as a European control test. Where the agents run, what models they use, who owns the data — those were the questions Madrid was built to answer. The answer arrived in the form of a model menu. Joule's reasoning layer is anchored by Anthropic's Claude, with SAP's own RPT-1 tabular model and the pending Prior Labs acquisition feeding the structured-data layer. But alongside Claude, SAP confirmed that Mistral Plus had gone generally available on the SAP Business AI Platform running inside SAP's sovereign cloud — and that Cohere North would follow in June. For the first time, a customer signing a RISE with SAP contract can route Joule's reasoning to a European-headquartered model trained on European-funded compute, with the runtime hosted by SAP under EU AI Cloud's data-residency rules. Two channel partnerships landed the same week, neither of them accidents. Capgemini, Mistral and SAP announced a tri-party programme aimed squarely at regulated sectors — financial services, public sector, aerospace and defence, energy and utilities — with Capgemini bringing a library of more than fifty pre-built business AI use cases validated by SAP and powered by Mistral models. Capgemini CEO Aiman Ezzat has previously argued that open models matter precisely because they let regulators and customers “scrutinize the model for potential sources of bias.” In Madrid the argument became a product line. The second deal had been seeded earlier. On February 26, Accenture's EMEA CEO Mauro Macchi and Mistral's Arthur Mensch had stood together in Paris to announce a multi-year strategic collaboration. “Our clients are looking for AI solutions that combine world class performance with the complete ownership that Mistral AI's technology offers enterprises,” Macchi said at the time. Mensch added that the partnership would help enterprises “deploy AI that meets their needs for performance, control, and customization.” In Madrid, that February announcement read as the precondition for Sapphire — Accenture as the global system integrator, Mistral as the model, SAP as the business-process backbone. The narrative pivot of the week was simple. Sovereignty stopped being a slide and started being a SKU.

·03Timeline & Context

The arc that ended in Madrid started in Walldorf almost two years earlier. SAP's AI strategy began as a wrapper — Joule launched in 2023 as a copilot bolted onto Fiori screens, useful but not architectural. The 2024–2025 retooling fixed that. SAP acquired LeanIX for enterprise-architecture context, then WalkMe for digital adoption, then signalled the Prior Labs deal for tabular foundation models. The n8n acquisition in early 2026 gave SAP visual workflow orchestration — the German unicorn now sits inside Joule Studio as the no-code layer for agentic processes. Joule Studio became the developer surface; SAP Knowledge Graph became the semantic map; SAP Business Data Cloud became the substrate. By the time Klein took the Orlando stage on May 11, the architecture was ready. Mistral's build-out happened on a parallel track. In March 2026, the company raised $830 million in debt from a seven-bank consortium led by BNP Paribas, Crédit Agricole CIB, HSBC and MUFG to finance a Nvidia Grace Blackwell cluster at Bruyères-le-Châtel south of Paris — 13,800 GB300 GPUs, 44 megawatts, operated by French data-centre firm Eclairion, commissioned this quarter. In February, Mistral revealed a €1.2 billion Swedish facility in Borlänge with EcoDataCenter, 23 megawatts, online by 2027. The combined target is 200 megawatts of European compute by 2027. That is roughly the contracted load of a mid-sized DAX40 chemicals plant — small next to the gigawatt campuses being built in Virginia or Abu Dhabi, but significant precisely because it is European, debt-financed by European banks, and not owned by a hyperscaler. SAP, meanwhile, had been seeding the model layer well before Sapphire. Anthropic's Claude went on the platform earlier in May with Joule integration across HR, procurement and supply chain. Cohere had been positioned as the sovereign hedge. Mistral's GA closed the European corner of that triangle. By Madrid, the menu read: US frontier model (Claude), Canadian enterprise model (Cohere), French sovereign model (Mistral), and SAP's own tabular RPT-1 — all callable from the same Joule Studio, all governed by EU AI Cloud's residency controls when the customer ticks that box. The Madrid announcement also clicks into a five-day European arc that any DAX40 CIO will have noticed. On May 18, Siemens and Nvidia opened the Munich Industrial AI Cloud — sovereign GPU capacity for German manufacturers. On May 19, SAP went live with Mistral Plus in Madrid. On May 20, Deutsche Telekom, Vodafone, Orange and Telefónica announced an edge-cloud federation spanning four operators. And the EU AI Act's omnibus deadline shift — the Commission's signal that high-risk obligations will phase in more slowly than the original 2026 cliff — landed in the same window, giving regulated industries breathing room to deploy before the strictest provisions bite. The historical comparison worth holding: when SAP launched R/3 in 1992, it sold European enterprises an architecture that became the spine of their operations for thirty years. The Autonomous Enterprise is the same play with a different payload. The bet is that agents-on-top-of-ERP will be as durable a default as client-server-on-top-of-mainframe once was. Whether Mistral can hold a model-layer seat on that spine for the next decade — against Claude's reasoning advantage and Cohere's enterprise polish — is the open question Madrid did not answer.

·04From Lab to Mainstream

What changes in a procurement cycle the day after Madrid is narrower than the keynote suggests, but more concrete than analysts often credit. RISE with SAP contracts now include three Joule Assistants in the first year; SAP GROW customers get the full portfolio at onboarding. A €100 million partner fund underwrites deployment costs. For a DAX40 finance organisation already running S/4HANA Cloud, the Autonomous Close Assistant is a bookable line item — and the model that powers it can now be Mistral, with data residency contractually pinned inside the EU. The consultancy posture matters here. Accenture and Capgemini did not just badge themselves onto SAP's slide; they brought distinct go-to-market shapes. Capgemini's library of fifty-plus validated use cases is a catalogue play, designed to compress procurement cycles for banks, utilities and defence ministries that cannot stomach bespoke pilots. Accenture's collaboration is a co-development track — Mistral Studio embedded into Accenture's delivery operations, with training and certification programmes that build a billable workforce around the stack. The two are not in obvious conflict. A French bank could buy Capgemini's regulated-industries package and an Accenture-delivered industry agent in the same year, both running on the same Mistral instance inside SAP's sovereign cloud. For smaller SAP customers — the Mittelstand the company has been trying to push onto Cloud ERP for a decade — the menu is less liberating. The interesting agents are gated behind RISE and GROW commitments. ECC and on-premise S/4HANA customers get “select AI scenarios” only if they commit to migrating most of their landscape to Cloud ERP. The sovereignty story is real, but the door to it runs through SAP's cloud transition. That is the bargain Madrid put on the table.

Three Perspectives What this story means for different readers

For a DAX40 CIO, Madrid resolves a question that had been sitting on the architecture board for eighteen months: is there a defensible answer to “why not just buy OpenAI plus Azure?” The answer is now yes, and it is bookable on the same paper that already governs the ERP. Mistral Plus on SAP sovereign cloud means data residency, model provenance and runtime control can all be specified in a single contract. The harder choice is no longer between sovereignty and capability — it is between Claude's reasoning quality for agentic work and Mistral's jurisdictional cleanliness for regulated workloads. Most large European enterprises will end up running both, with routing rules in Joule Studio that send sensitive payroll, HR and finance traffic to Mistral and let Claude handle less-controlled inference. The procurement conversation moves from model selection to traffic policy.

EU AI Act obligations for high-risk systems still apply regardless of which model sits underneath, and the omnibus deadline shift only delays — it does not remove — the documentation, logging and conformity-assessment requirements. What Mistral-on-SAP changes is the burden of proof. A bank or insurer that can point to a European-headquartered model, EU data centres, an EU-domiciled ERP vendor and a contractually pinned residency posture has a markedly easier conversation with BaFin or the ACPR than one whose stack terminates in US-Cloud-Act territory. The Schrems II overhang on transatlantic data transfers does not vanish, but it stops being load-bearing for the AI inference layer. Regulated-industries leaders should treat Madrid as evidence that the sovereign-stack option is now mature enough to assume in compliance memos, not flagged as aspirational.

Mistral's $830 million debt raise — its first — is a more important signal than the GA announcement itself. European banks underwriting Nvidia GPU clusters on Mistral's balance sheet, without equity dilution, is a financing pattern that did not exist twelve months ago. Combined with the Swedish facility and the 200 MW 2027 target, Mistral is now building a compute book the way utilities build generation: long-duration debt against contracted demand. For European AI startups, this opens a path that does not require selling to a hyperscaler to access scale. For US investors, it complicates the thesis that European AI is structurally undercapitalised — Mistral's enterprise revenue, channelled through SAP, Accenture and Capgemini, now has a credible top-line story to service the debt. The risk is concentration: if SAP becomes Mistral's largest enterprise channel, the model company's equity story becomes a derivative of one ERP vendor's roadmap.

Sources 9 references

04 / 04 · Enterprise & Architecture

8 min read

Codex cuts the cord: Goal Mode GA, Appshots, and a Mac that works while it sleeps

OpenAI ships the operational mirror of long-horizon autonomy — agents that start on a phone and keep working after the laptop locks..

·01Primer

OpenAI on May 22, 2026 shipped a wave of Codex updates that together remove the “human at keyboard” assumption from agentic coding. Goal Mode, the long-horizon mode that lets Codex pursue a stated objective for hours or days, moved from experimental to general availability. Appshots, a macOS-only feature, lets a developer press both Command keys to attach the frontmost app window — screenshot plus text — into a Codex thread. Codex can now operate desktop applications after a Mac has locked, including via remote trigger from Codex Mobile. The Chrome extension was hardened against region-blocking and Windows bugs, and the platform itself gained permission profiles, plugin discovery, and richer extension hooks. For a DAX40 IT shop, the surface area of what an agent can do without a present user has expanded materially.

·02What Happened

Picture a developer at a German bank, locking her MacBook at 18:30 to catch the U-Bahn home. Two hours later, on the train, she opens the ChatGPT mobile app, taps the Codex tab, and tells the agent to finish a refactor in a JetBrains project, then validate the change in a desktop Postgres client and a vendor admin app that has no API. Her Mac, sitting in a docking station back at the office, accepts the task. The screen stays dark. Codex temporarily unlocks the machine in the background, drives the apps it has been granted, and relocks if anyone touches the keyboard. The work continues while she sleeps. That scene is the practical consequence of three changes shipped together. The first is Goal Mode going GA. OpenAI describes it as the ability to “drive toward a specific objective for hours or even days,” with create, pause, resume and clear controls from the TUI, dedicated storage that survives session breaks, and runtime continuation across token-budget resets. Alexander Embiricos, OpenAI Codex Product Lead, has spent the past six months arguing that ‘humans are AI's biggest bottleneck’ on Lenny Rachitsky's podcast — that the operator's attention, not the model's capability, gates real productivity. Goal Mode is the engineering expression of that argument. The second is Appshots. On macOS, a double-tap of the Command key — Cmd+Cmd — grabs the frontmost application window, captures a screenshot, and extracts the underlying text (including content scrolled out of view) into the active Codex thread. The 9to5Mac walkthrough of the May 21 build emphasises that this works across native apps such as Xcode, Linear, and Figma desktop: no copy-paste, no explaining what you are looking at. The model receives both the pixels and the structured text in one move. The third is remote computer use, the change MacRumors led with on May 22. Codex can now drive desktop apps after the Mac has locked, with the trigger coming from Codex Mobile in the user's pocket. OpenAI's safeguards are explicit: short-lived authorisation, a covered display while the agent works, immediate relock and pause if local input is detected, and a manual-unlock fallback. The feature requires Screen Recording and Accessibility permissions for a Computer Use plugin, runs only on apps the user has explicitly allowed, and cannot drive Terminal, Codex itself, or system-level admin prompts. It is not enabled in the European Economic Area, the UK, or Switzerland at launch — an interesting geofence we return to below. Intel Mac support was added in this release, but several testers report the locked-use path activates reliably only on Apple Silicon. Rounding out the day: the Chrome extension dropped tab groups in favour of tab-icon status indicators for end-of-task handoff, with reliability fixes for Windows and non-geoblocked regions; permission profiles, plugin discovery, and extension hooks were broadened on the underlying Codex platform. Internally, OpenAI staff have taken to calling these batches “Codex Thursdays.” This one was the loudest.

·03Architecture

Underneath the consumer-facing demos sits a platform shift that matters more to a DAX40 architect than any single feature. Codex has effectively split into four surfaces — app, IDE extension, CLI, and mobile remote — talking to one app-server and one set of permission and hook primitives. Permission profiles are the load-bearing concept. The CLI ships three built-ins: read-only, workspace, and danger-full-access, with named custom profiles for reusable filesystem and network policies. Profiles now support list APIs, inheritance, managed requirements.toml support, runtime refresh, and a hardened Windows sandbox. For a Großkonzern, the practical implication is that a CISO can publish a corporate workspace profile via managed configuration and let developers extend it without re-licensing every machine. Extension hooks have moved from a thin lifecycle to a serious surveillance surface. PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, Stop, subagent start/stop, and async approval/turn processing all run at turn scope. That is everything an audit team needs to satisfy ISO 42001 Annex A controls on AI system monitoring — and exactly the hook list a security vendor would want to wire into a SIEM. Plugin discovery now includes marketplace-aware listing, installed versions, visible marketplace roots, and remote collection support; with features.plugin_hooks set to true, Codex discovers hooks bundled with enabled plugins, which means third-party plugins can be subject to the same telemetry as first-party tools. Goal Mode is implemented as persisted goal workflows with app-server APIs, model tools, and TUI controls. It is enabled by default, backed by dedicated storage, and tracks progress across active turns. The non-obvious architectural point: a goal is a first-class entity with its own state, separable from the thread that created it. Codex Mobile reads that state and can present a goal as a single line on a phone screen even when the agent has been grinding for fourteen hours. The Chrome extension is now a peer surface rather than a quirky satellite. Codex avoids creating tab groups when taking over existing tabs, signals task state via tab icons, and ships a read-only JavaScript sandbox for structured data extraction. The browser-use module can download and extract image assets quickly and annotate page styling in place. The security posture for locked-Mac use is engineered with surprising restraint. Locked use is, per OpenAI, scoped to “active, trusted computer use turns” — it is not a general-purpose remote-unlock for the Mac, and it cannot be reached by other apps or local processes. The screen stays covered while Codex works. Any local input triggers relock and stops the auto-unlock until the user manually logs in. Touch ID and password prompts are explicitly suppressed for non-Codex flows. Codex cannot automate Terminal, itself, or system admin prompts. The EEA / UK / Switzerland geofence reads like a deliberate signal: OpenAI does not want to argue with German works councils, Swiss FADP regulators, or the Information Commissioner's Office about whether an unattended machine driven by an AI agent counts as ‘automated decision-making’ until the legal posture is settled. For a DAX40, the relevant comparison is not Cursor or Antigravity. It is the longest reliable CI job in the bank — typically a multi-region failover test or a nightly regulatory-data ETL — which a leading German bank runs in a 6-to-8-hour window and treats as a managed change. Goal Mode plus locked computer use puts a single agent into that runtime envelope, on workstation hardware, with no operator in the room. That is the architectural shift to budget for.

·04Strategy & Transition

Today's Codex bundle connects two earlier stories in this briefing. Anthropic's SDK pricing changes give the long-horizon agent a metered FinOps surface. Qwen 3.7-Max's 35-hour autonomous run shows the model side is capable of the runtime. OpenAI has now shipped the operational scaffolding that lets enterprises actually consume that runtime: a goal abstraction with persistent state, a permission system with built-in and custom profiles, hooks at every turn boundary, plugin discovery with telemetry, and a mobile control plane. The compliance reading is more interesting than the demo. Every assumption baked into existing endpoint policies — that a locked screen means an idle user, that Touch ID gates privileged actions, that Screen Recording permission is only ever granted to obviously screen-recording apps — has to be revisited. EU AI Act high-risk obligations require human oversight ‘by natural persons during the period in which the AI system is in use.’ A Codex agent driving Xcode after the developer's laptop locks raises a fair question: is the developer still in use of the system, or has she handed control over? OpenAI's EEA/UK/CH geofence is a tacit answer that the law has not caught up. The pivot for German Konzern IT is to stop treating Codex as a developer tool and start treating it as a privileged automation account. That means catalogued permission profiles published via managed configuration; mandatory PreToolUse and PostToolUse hooks streaming to the SIEM; explicit allow-lists for which desktop apps a workstation may expose to Codex Mobile; and works-council agreements that govern when a remote agent may run on a locked endpoint. The shift is not that AI got smarter. It is that the perimeter changed.

Three Perspectives What this story means for different readers

For a DAX40 CIO, the headline cost is not the OpenAI invoice — it is the policy backlog. Endpoint-management standards written for a 2024 threat model assume a locked screen is dormant. Codex's locked-Mac path violates that quietly: the screen stays dark while the agent unlocks the system in the background, drives a vendor app with no API, then relocks. EDR vendors will need new signatures, and the Active Directory / Jamf / Intune crowd will need new compliance baselines. The upside is real: a Großkonzern can give one developer a 24-hour runway on a migration that previously needed three nightshifts. The control task is to publish a corporate permission profile, wire turn-scope hooks into the SIEM, and require formal allow-lists for any app exposed to Codex Mobile.

OpenAI's decision to not enable remote-after-lock computer use in the EEA, UK, and Switzerland is the most interesting compliance signal of the release. It is a tacit admission that ISO 42001 monitoring controls, EU AI Act Article 14 human-oversight obligations, and GDPR Article 22 automated-decision rules have not been reconciled with a workflow where the operator is asleep and the agent is driving a desktop. Works councils at German employers will read locked-Mac computer use as employee surveillance unless the telemetry is opt-in and the allow-list is shop-floor governed. The compliance opportunity is the new hook surface: PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, and Stop give an auditor exactly the events needed for Annex A.6 and A.9 evidence.

Two startup categories just got compressed. First, ‘screenshot-to-context’ tools that pipe what is on a developer's screen into an LLM — Appshots performs that move at OS-key-binding latency with zero friction; the standalone wrapper has a six-month runway at best. Second, the long-horizon agent harness, where a wave of Series A bets — Lovable-style autonomous PR builders, Devin clones, multi-day refactor agents — were predicated on owning the goal abstraction and the orchestration layer. OpenAI now ships both natively, with managed configuration and SIEM-grade hooks. The remaining defensible territory is verticalised compliance, on-prem deployment, and language-specific tuning. Investors should re-underwrite any agent platform that does not have a regulated-industry GTM, an EU-sovereign hosting story, or a meaningful model-of-models multi-vendor architecture.

Sources 13 references

Empirical Research Assistance (ERA): An AI System for Expert-Level Empirical Software (Google Research, Nature, May 19, 2026)

Google Research published “An AI system to help scientists write expert-level empirical software” in Nature, pairing a Gemini-based code generator with a tree-search controller that optimises against an explicit quality metric. ERA discovered 40 novel single-cell analysis methods that outperformed every human submission on a public bioinformatics leaderboard, produced 14 COVID-19 hospitalisation forecasters that beat the CDC ensemble, and set new state-of-the-art results on geospatial analysis, neural-activity prediction, and time-series forecasting — collapsing months of exploration into hours. Why this matters: ERA is the first peer-reviewed evidence that an LLM-plus-search loop can systematically beat domain specialists on real scientific software, not toy benchmarks; for DAX40 R&D, pharma, and quant-heavy functions, it reframes the build-versus-buy conversation around ‘agentic discovery platforms’ and forces consulting practices to advise on metric design, evaluation infrastructure, and IP ownership for AI-generated code rather than seat-based copilot rollouts.

Source

Project Glasswing: An Initial Update (Anthropic, May 22, 2026)

Anthropic published the first quantitative update on Project Glasswing, its cybersecurity collaboration with around 50 partners using Claude Mythos Preview for code auditing. The Mythos model scanned more than 1,000 open-source projects and surfaced an estimated 6,202 high- or critical-severity vulnerabilities, with 1,752 independently triaged so far; Cloudflare found roughly 2,000 bugs (400 high/critical) with fewer false positives than human-led testing, Mozilla patched 271 issues in Firefox 150, and several partners reported 10× higher discovery rates after integrating the model. Why this matters: this is the first defensible enterprise dataset showing frontier models materially out-finding human red teams on production codebases, which inverts the buyer narrative for AppSec, DevSecOps tooling, and DORA/NIS2 compliance budgets in DACH financial services and critical-infrastructure clients — and gives CISOs a concrete benchmark to challenge incumbent SAST/DAST vendors on.

Source