Daily AI Briefing · Wednesday, 13 May 2026

01 / 04 · Models & Markets

7 min read

Mira Murati’s Second Act: An AI That Talks Over You

Thinking Machines Lab’s first model bets that conversation, not autonomy, is the next interface frontier..

·01Primer

On 11 May 2026, Thinking Machines Lab, the year-old startup founded by former OpenAI chief technology officer Mira Murati, released a research preview of its first model. The system, TML-Interaction-Small, is built to listen and speak at the same time. Today’s voice assistants are turn-based: you talk, they wait, they reply, you wait. Thinking Machines argues that humans do not converse this way, and that wrapping a chat model in a microphone and a clock is the wrong way to fix it. Their answer is a single neural network that treats audio, video and text as parallel streams, slicing time into 200-millisecond chunks. The model responds in 0.40 seconds on average, against 1.18 seconds for OpenAI’s GPT-Realtime-2.0. It will reach selected partners in coming months, with a wider release later in 2026.

·02What Happened

The clip lasts under a minute. On a video posted to the Thinking Machines Lab site on a Monday evening in San Francisco, a researcher begins describing a chemistry diagram on screen. Halfway through the sentence, the model says “mhm.” A beat later, before the human finishes the question, it begins answering, then pauses politely when the researcher cuts back in. It is an unremarkable exchange between two people. Coming from software, it is the entire point. With that demo, Mira Murati, founder and chief executive of Thinking Machines Lab, made her first public product reveal since leaving OpenAI in September 2024. In a blog post accompanying the preview, the company described its goal as moving beyond the “turn-based” pattern that has defined every chatbot from the original ChatGPT onward. “People have learned to phrase their questions like emails,” the post argues, because today’s assistants cannot tolerate interruption, backchannelling or silence. Thinking Machines wants to retire that habit. The model in question, TML-Interaction-Small, is a 276-billion-parameter mixture-of-experts network with 12 billion active parameters. It is paired with a second, slower “Background Model” that handles reasoning, web search and tool calls behind the scenes while the conversational front-end keeps the line warm. On FD-bench v1.5, the company’s own interaction-quality benchmark, the system scored 77.8, against 54.3 for Google’s Gemini-3.1-flash-live and 46.8 for OpenAI’s GPT-Realtime-2.0. End-to-end latency clocked in at 0.40 seconds, roughly the cadence of natural human turn-taking; Google’s system was measured at 0.57 seconds, OpenAI’s at 1.18. The comparison most often reached for is not another AI release at all, but the introduction of duplex telephony itself: for nearly a century, “half-duplex” walkie-talkie etiquette — say “over” and wait — was how voice travelled over wires, until full-duplex circuits let both parties speak at once and conversation began to feel like conversation. Murati is making the same argument about machines. The catch: this is a research preview, not a product. There is no public API, no consumer app, no enterprise SLA, and the benchmarks are Thinking Machines’ own. Connie Loizos at TechCrunch put it bluntly: the numbers are impressive and the underlying idea is interesting, “whether the real-world experience lives up to the technical claims is something we won’t know until people can actually use it.” Murati, for her part, is not selling polish. She is selling a thesis: that the right unit of AI capability is not the prompt and not the agent, but the live exchange — and that almost everyone else, including her former employer, has been optimising the wrong variable.

·03Architecture

Under the hood, TML-Interaction-Small breaks from the standard generative recipe in three places. First, it abandons the alternating input/output token sequence that defines every transformer-based chatbot. Instead, the model runs on what Thinking Machines calls multi-stream micro-turns: every 200 milliseconds, the network ingests whatever audio, video and text have arrived across separate input streams, and emits whatever is appropriate on its output streams — which may be silence, a single phoneme, a backchannel “uh-huh,” a verbal interjection, or the continuation of a longer response begun several micro-turns earlier. There is no end-of-turn signal, because there are no turns. The model is always listening and, when it should be, always speaking. Second, it drops the heavy external encoders that today’s speech-and-vision systems use to translate raw audio and video into model-readable embeddings. The blog post calls the alternative “encoder-free early fusion”: raw signals are passed through a lightweight embedding layer directly into the transformer, where reasoning happens on the unified stream. This is what buys the latency. A conventional realtime stack — voice activity detection, automatic speech recognition, large language model, text-to-speech, plus interruption logic — accumulates dozens to hundreds of milliseconds at each hop. Collapse it into one network and the budget shrinks to the network’s own forward pass. Third, the company splits cognition across two models running in parallel. The 276B/12B-active Interaction Model handles presence, dialogue management and immediate follow-ups. A separate Background Model — left undescribed in size — handles long-horizon reasoning, retrieval and tool use, returning results into the live stream when they are ready. It is a deliberate inversion of the agentic-loop architecture pursued by Anthropic, OpenAI and Google, in which a single large model plans, acts and reports back over minutes or hours. Thinking Machines is betting that for a wide class of work — calls, meetings, supervision, tutoring, triage — the binding constraint is not horizon length but conversational bandwidth. Not by accident, the design forces hard tradeoffs. Mixture-of-experts gating at 12B active parameters keeps the per-token compute cheap enough to clear the 200ms budget on commodity inference hardware, but it caps the depth of reasoning the front-end model can do alone. Long sessions, the post concedes, will need more work on context management. There is no published video benchmark and no third-party replication of the latency numbers. Scaling the architecture to a larger pretrained base, Thinking Machines says, remains a 2026 project. The bet, in other words, is that solving turn-taking is a real research problem worth a 276B-parameter answer — and that the rest of the industry, having spent the past two years racing toward autonomous agents that can be left alone for an hour, has been looking through the wrong end of the telescope.

·04Strategy & Transition

What makes the launch interesting is less the model than the positioning. Thinking Machines Lab was founded in February 2025 by Murati along with a clutch of former OpenAI colleagues — among them John Schulman, Barret Zoph and Luke Metz. In July 2025, the company closed a $2 billion seed round led by Andreessen Horowitz, with Nvidia, AMD, Cisco, Accel, ServiceNow and Jane Street on the cap table, valuing the lab at $12 billion before any product existed. That is the largest seed round in venture history by an order of magnitude, and it has hung over the company for ten months as an unanswered question: what, exactly, is the thesis? The answer, on the evidence of this week’s release, is a deliberate second-mover bet. Murati left OpenAI months after the launch of GPT-4o’s voice mode, the demo that more than any other re-set expectations for natural conversation with AI. The Interaction Model release is, in effect, an argument that GPT-4o pointed at the right destination and then took the wrong road — bolting a realtime harness onto a turn-based model rather than rebuilding the model around interaction from the start. Sam Altman’s OpenAI, meanwhile, has spent 2025 and early 2026 funnelling resources into long-running agentic systems and into the joint enterprise venture with Anthropic announced earlier in May. Murati is staking the $12B on the wager that whichever lab owns the live interface — the layer that mediates every contact-centre call, every voice agent, every embedded copilot — owns the substrate beneath the agents. Enterprises do not have to choose, but capital will. The strategic read for European buyers is narrower still: a US-based lab founded by an OpenAI alumna, financed by US infrastructure capital, signalling its first product into voice — the most regulated AI surface in the EU. The question Brussels will ask in the next twelve months is not whether the technology works, but where the data sits, who keeps the transcripts, and which competent authority signs off when a model is always listening.

Three Perspectives What this story means for different readers

For the buyers of voice AI — contact centres, field-service operators, healthcare triage, in-car assistants — latency is not a benchmark, it is a churn metric. Industry studies have long shown that response delays above roughly 600 milliseconds cause callers to talk over the agent, ask whether anyone is there, or hang up. A model that lands at 0.40 seconds with native interruption handling collapses the gap between an AI voicebot and a competent junior agent. The catch for CIOs is that TML-Interaction-Small is not yet a production system: no SLA, no regional deployment story, no integration with the dominant CCaaS platforms, and benchmark numbers that have not been independently reproduced. Procurement teams should treat the announcement as a signal that the voice-agent latency floor is about to drop, not as an immediate vendor decision.

A model that listens continuously, processes video in parallel, and reacts inside 400 milliseconds runs straight into the EU AI Act’s rules on real-time biometric processing and emotion recognition, both of which carry tighter obligations as the Act’s general-purpose model and high-risk provisions phase in through 2026. Always-on audio-video capture also engages GDPR purpose-limitation and consent requirements that turn-based assistants largely sidestep. In the United States, the FCC’s 2024 ruling that AI-generated voices are covered by the Telephone Consumer Protection Act puts outbound voice-agent use cases on notice. Thinking Machines has said little publicly about safety tooling for the model; partners taking the research preview into regulated workflows in Europe will need their own answers on consent capture, transcript handling and biometric-template avoidance before the wider release later this year.

The seed-round arithmetic is the story underneath the story. A $12 billion valuation on a $2 billion cheque — Andreessen Horowitz leading, Nvidia and AMD strategic — sets the floor for what a credible OpenAI-alumnus second-mover can command. The Interaction Models launch validates the thesis enough to justify a priced Series A in 2026 at, plausibly, two to three times the seed mark. The pressure flows downstream. Pure-play voice-AI startups — Hume, Sesame, Cartesia, ElevenLabs’ Conversational stack, the half-dozen call-centre wrappers — now face a frontier-funded competitor with a fundamentally different architecture and the implicit Nvidia-AMD supply line. Expect consolidation among the wrappers and a flight to differentiation (vertical data, compliance, on-prem) among the rest. The voice stack is about to get a lot more crowded at the top and a lot thinner in the middle.

Sources 5 references

02 / 04 · Security & Cyber

8 min read

GTIG Catches the First AI-Built Zero-Day Before It Detonates

Google’s threat hunters say a criminal crew used a large language model to write a 2FA-bypass exploit — and the code left fingerprints..

·01Primer

On 11 May 2026, Google’s Threat Intelligence Group (GTIG) said it had found something the industry has spent two years nervously predicting: a working zero-day exploit that, with “high confidence,” was built with the help of a large language model. The target was a popular open-source, web-based system administration tool. The bug was a two-factor-authentication bypass that worked once attackers already had valid credentials. The crew behind it was a criminal collective preparing what GTIG calls a “mass exploitation event.” Google quietly worked with the vendor to patch the flaw before the campaign could scale. In the same report, GTIG also disclosed an Android backdoor that drives Gemini’s API to navigate a victim’s phone, plus two Russia-nexus malware families padded with LLM-generated decoy code. The era of offensive AI in production has arrived — clumsy, traceable, but operational.

·02What Happened

Inside GTIG’s Reston, Virginia office, an analyst combing through a suspicious Python script noticed something jarring: a docstring that read like a Coursera tutorial. Step-by-step, the comments narrated the script’s own logic — what the function did, why a particular bypass worked, what the attacker should look for next. Real exploit authors do not annotate their tradecraft like a textbook chapter. A few scrolls down, the analyst found a CVSS score that did not match any record in NIST’s National Vulnerability Database. The score had been fabricated, plausible to three decimal places, anchored to no advisory at all. That hallucinated number, together with the educational prose and an unusually tidy class hierarchy, became what former NSA cybersecurity director Rob Joyce — who reviewed the findings before publication — called “the closest thing yet to a fingerprint at the crime scene.” The exploit itself targeted a popular open-source, web-based administration platform whose name GTIG and the vendor have, for now, agreed to withhold. Once an attacker had legitimate credentials — phished, leaked, or bought on a marketplace — the script chained a hardcoded trust exception in the authentication flow to defeat the second factor entirely. It was not, in technical terms, a beautiful bug. It was a logic flaw: a developer’s convenience hatch that worked correctly under every test the maintainers had written, and catastrophically against the security model the product advertised. “Fuzzers and static analysis tools are optimized to detect sinks and crashes,” GTIG’s report notes. “Frontier LLMs excel at identifying these types of high-level flaws and hardcoded static anomalies.” Where a fuzzer pounds inputs at a parser, a language model reads the developer’s intent and notices the seam where intent and enforcement diverge. The operational pivot is the part DAX40 CISOs should mark. For two decades the canonical analogue has been Stuxnet — a state-built weapon requiring four chained zero-days, an air-gap jump, and a stolen code-signing certificate. The GTIG case is the inverse: not a Sèvres-Babylone of statecraft but a piece of criminal kit, written by people who were happy enough to ship something “textbook Pythonic” because a model would do the structural thinking for them. GTIG worked with the impacted vendor to ship a fix and the planned campaign appears to have collapsed before it scaled. But John Hultquist, chief analyst at GTIG, was blunt about the read-across. “There’s a misconception that the AI vulnerability race is imminent,” he said. “The reality is that it’s already begun. For every zero-day we can trace back to AI, there are probably many more out there.” The race, in other words, is no longer with the future. It is with last quarter.

·03The Forensics

How does a threat intelligence shop “fingerprint” an LLM? GTIG’s methodology rests on three signal classes, none individually conclusive but jointly persuasive. First, prose registers that exploit authors almost never use: explanatory docstrings, polite variable naming, comments framed as instruction rather than reminder. Second, factual hallucinations — the spurious CVSS score is the textbook case, but GTIG also flags fabricated CVE numbers, invented RFC citations, and references to library functions that do not exist in the versions the script imports. Third, structural smells: defensive try-except scaffolding wrapped around code paths that cannot raise, abstractions that mimic enterprise patterns rather than the lean economy of practitioner exploit code. None of this proves a model wrote the payload. Taken together — and with sufficient corpus comparison against known LLM outputs — they justify what GTIG carefully labels “high confidence.” The same report bundles three other AI-augmented operations that, read together, suggest the technique cluster is broader than a single 2FA bypass. PROMPTSPY, first surfaced by ESET, is an Android backdoor that calls the Google Gemini API at runtime. Its standout module — labelled “GeminiAutomationAgent” in the binary — feeds a hardcoded prompt to the model and lets Gemini drive the device’s accessibility services to pin the malicious app into the “recent apps” tray for persistence. It is autonomous-agent malware in miniature: a model deciding, in situ, which UI element to tap. Google says Play Protect now flags known variants and no Play Store apps currently ship it. The second and third families, CANFAIL and LONGSTREAM, are Russia-nexus tools used in intrusion activity targeting Ukrainian organisations. Both pad their payloads with what GTIG calls “LLM-generated decoy code.” In CANFAIL’s source, analysts found long, methodically commented blocks of functionally inert logic — Python and PowerShell that read like a competent intern’s portfolio piece — interleaved with the actual command-and-control routines. The comments themselves betray the prompt: explanatory, schoolroom in tone, calling out unused variables. The hypothesis is uncomfortable but mundane. The operators asked a model to “generate plausible-looking utility code,” pasted the output around the malware, and shipped it. The decoy is not for the victim. It is for the analyst, slowing triage by inflating signal-to-noise. GTIG’s catalogue does not stop there. North Korea’s APT45 has been observed using AI to run thousands of exploit checks against captured corpora, while Chinese state-linked operators are experimenting with model-driven vulnerability hunting against open-source dependencies. Russian influence units, meanwhile, have stitched AI-generated audio into legitimate broadcast footage. The 2FA-bypass story is the headline because it is the first artefact in which the model’s contribution is unambiguous in the offensive payload itself. The rest of the cluster is the supporting evidence that this is not a one-off.

·04From Disclosure to Defense

What does a defender do with this on a Tuesday morning? The honest first answer is that prompt-injection against agentic systems and credential theft remain, by a wide margin, the more frequent enterprise incidents. GTIG’s own data does not claim otherwise. But the asymmetry matters for procurement. A 2FA-bypass written by a model is qualitatively different from a phishing kit it generated. It implies that the marginal cost of finding a logic flaw in widely deployed open-source infrastructure — the Webmins, the Cockpits, the half-maintained admin panels that ride along inside every enterprise estate — has fallen. The patch race has changed shape. The mean time between a model surfacing a candidate flaw and a working PoC appearing in a criminal channel is, on the GTIG evidence, now measured in days, not quarters. The practical implications are unglamorous. SBOMs are no longer optional documentation; they are the lookup table CISOs will use when GTIG drops the next disclosure without naming the vendor. Code-signing and provenance of admin tooling moves from compliance theatre to operational control. And the open-source maintainers GTIG has been quietly co-opting into responsible disclosure need budget — a structural funding problem the European Commission has gestured at but not yet solved. Defensive AI vendors will, justifiably, claim a moment. The harder reality is that the defenders most likely to catch the next one are still the analysts who notice that a docstring reads wrong. Two further moves bear naming. Tabletop the scenario in which a GTIG disclosure lands on a Friday afternoon for a tool buried five layers deep in a SaaS vendor's stack; the runbook should not start with a meeting. And budget for the boring work — maintainer outreach, dependency mapping, retroactive code-signing — before the next disclosure forces the conversation under pressure. The cycle from research to production-grade attack tool has shortened twice in five years; the cycle from disclosure to client-board question is now shorter still.

Three Perspectives What this story means for different readers

For DAX40 SOCs the GTIG disclosure is less a fire alarm than a recalibration. The acute exposure is not flashy CVEs in the cloud control plane; it is the long tail of open-source admin tooling running inside the operations technology fringe — substations, plant historians, manufacturing MES gateways — where a 2FA-bypass behind valid credentials is precisely the chain an initial-access broker would sell. Three near-term moves: enforce SBOM coverage for any web-based admin interface exposed beyond a management VLAN; subscribe to GTIG and equivalent feeds with an SLA on unnamed-vendor advisories; and rehearse the patch-race muscle, because the window between GTIG’s next quiet vendor-coordinated fix and a criminal proof-of-concept is closing. Prompt-injection against internal copilots is still the higher-frequency incident. AI-built CVEs are the higher-severity tail risk now worth provisioning against.

The disclosure lands on every desk in Brussels and Bonn that touches NIS2, DORA and the EU AI Act simultaneously. NIS2’s “significant incident” reporting obligations now plausibly cover any breach whose root cause is an AI-augmented exploit, and BSI is expected to issue updated guidance for KRITIS operators within the quarter. DORA’s ICT third-party risk regime will pressure financial entities to inventory open-source admin tooling inside critical providers, not just core systems. The AI Act’s dual-use provisions — written with foundation models in mind — face their first crisp test case: an LLM that materially contributed to a working exploit, used by a criminal collective, against an unnamed but widely deployed product. Expect the Commission’s AI Office to lean on frontier-model providers for tighter misuse evaluation reporting, and expect German parliamentary questions about whether the BSI’s mandate stretches far enough.

Defensive-AI valuations were already running ahead of revenue. The GTIG report is the marketing slide every Series B in the segment will now embed. Adversa AI, HiddenLayer, Protect AI, Lakera and Robust Intelligence are the obvious beneficiaries on the model-security side; on the detection side, code-provenance plays such as Endor Labs, Socket and Chainguard will pitch SBOM-grade visibility as the GTIG hedge. The investable thesis is narrower than the noise suggests. The companies that monetise are those that ship measurable false-positive rates against AI-generated payloads rather than vibes-based prompt-firewalling. Expect a wave of European seed rounds — Munich, Berlin, Paris — pitching “sovereign AI security” to BSI- and ANSSI-aligned buyers. Expect, too, the inevitable consolidation as Palo Alto, CrowdStrike and Microsoft absorb the better detection startups within eighteen months.

Sources 6 references

03 / 04 · Markets & FinOps

9 min read

Capgemini Joins OpenAI’s DeployCo — and Closes the Consulting Triangle

On May 12 Paris confirmed what Boston and London had already signed: the Big Three AI strategy consultancies are funding the entity built to replace them. For DAX40 CIOs, the RFP has just changed shape..

·01Primer

On 12 May 2026, Capgemini announced from Paris that it had taken an equity stake in the OpenAI Deployment Company — the standalone enterprise-implementation business OpenAI launched the previous day with $4 billion in committed capital at a reported $14 billion valuation. TPG leads; Advent, Bain Capital and Brookfield co-lead. The Frankfurt-relevant detail sits in the cap table: Capgemini, Bain & Company and McKinsey & Company — the three consultancies that DAX40 boards pay to design AI transformations — are co-investors in the very entity OpenAI built to deploy that work directly. OpenAI retains majority control. The structural question for European systems integrators and DAX40 procurement is no longer whether DeployCo will compete with their advisors, but on what terms — and how the conflict of interest gets disclosed.

·02What Happened

The press release dropped in Paris just after the morning coffee carts started moving through La Défense — 06:30 UTC on a Tuesday, timed for the Euronext open. Capgemini’s communications team had clearly coordinated with OpenAI’s launch the day before; the language is almost diplomatic. “Our investment in the OpenAI Deployment Company marks an important step in the evolution of our strategic partnership with OpenAI,” said Fernando Alvarez, Chief Strategy and Development Officer and a member of Capgemini’s Group Executive Board. He added that clients now want “partners that can combine frontier AI access with deep industry knowledge, transformation expertise, and the ability to integrate these technologies into critical business processes.” Translated: we’d rather sit inside the cap table than outside it. The entity Capgemini just bought into is unusual in three respects. First, structure: the OpenAI Deployment Company — DeployCo in shorthand — is a majority-owned OpenAI subsidiary, not a partnership. Brad Lightcap, OpenAI’s former Chief Operating Officer, runs it on a special-projects mandate reporting to Sam Altman. Second, capital terms: Axios reported, and SiliconANGLE confirmed, that OpenAI has guaranteed external investors a minimum return of 17.5% with a cap on upside. That is the financial signature of an infrastructure private-equity vehicle, not a venture round — a tell that OpenAI views enterprise services as a durable annuity rather than a moonshot. Third, the cap table itself. TPG leads. Advent, Bain Capital and Brookfield are co-leads. The remaining strip is the interesting part: Goldman Sachs, SoftBank, Warburg Pincus, WCAS, B Capital, BBVA, Emergence, Goanna — and then the three consulting houses, Bain & Company, McKinsey and Capgemini. The day-one operational asset is Tomoro, a London-based applied-AI shop OpenAI bought in the same announcement. Tomoro brings roughly 150 forward-deployed engineers — the model Palantir refined inside the Pentagon — with delivery experience at Virgin Atlantic, Tesco and Supercell, whose in-game support agent reportedly serves 110 million users. Denise Dresser, OpenAI’s Chief Revenue Officer, framed the play in the Capgemini release: “AI is becoming capable of doing increasingly meaningful work inside organizations. The challenge now is helping companies integrate these systems into the infrastructure and workflows that power their businesses.” That is consulting-speak with a Palo Alto accent, and it is now a $4 billion proposition. The market read was immediate. India’s IT-services index fell on 12 May; Business Standard’s terminal-style headline — “IT stocks tumble after OpenAI launches AI deployment venture” — captured the reflex sell-off in TCS, Infosys and Wipro. Capgemini’s own shares held flat, which is its own data point: investors apparently agree that paying into the wave beats absorbing it from the outside. The Frontier Alliance partnership Capgemini signed with OpenAI in February — alongside BCG, McKinsey and Accenture — has now been upgraded from a marketing partnership to a capital one. The advisory relationship is no longer just a logo on a deck. It is a line on a balance sheet.

·03The Consulting Triangle

The cleanest historical analogue is the 2000 split of Andersen Consulting from Arthur Andersen — a separation litigated specifically because the audit firm and the systems integrator could not credibly serve the same client on opposing sides of the table. What is happening now is the inverse: rather than separating advisory from implementation to preserve independence, the advisors are buying equity in the implementer to preserve relevance. McKinsey, Bain and Capgemini are not splitting from OpenAI. They are merging interests with it. For a DAX40 CIO mid-way through a multi-year transformation, the optics shift sharply. If Capgemini Invent is running the AI strategy workstream at a Munich industrial — and a parallel team from Capgemini’s technology services arm is staffing the build — the firm’s recommendation set now includes a vendor in which Capgemini holds equity and from which it expects financial returns. The German term for this is Interessenkonflikt, and BaFin’s vendor-independence guidance under DORA is explicit about disclosure. The Capgemini press release does not, in its own text, disclose the size of the stake. Neither does the OpenAI announcement. Procurement teams at Deutsche Bank, Allianz, BMW and Siemens will be asking. The second-order effect is more structural. European systems integrators outside the cap table — Atos, T-Systems, Reply, MaibornWolff, Adesso, KPS, Materna — now face a vendor stack in which their largest advisory competitor has preferential commercial alignment with the most-deployed foundation model. Atos has neither the capital nor the strategic optionality to write that check; T-Systems sits inside Deutsche Telekom’s sovereign-cloud doctrine, which is structurally hostile to a US-controlled deployment vehicle; Reply and MaibornWolff are too small to matter to OpenAI’s allocation committee. The likely result is a two-tier market: Frontier Alliance members pricing on margin compression backed by OpenAI co-marketing dollars, and everyone else competing on either price or sovereignty. The Big Four AI practices in Germany — Deloitte, PwC, EY and KPMG — face the harder question. None of them appear on the DeployCo cap table, which is conspicuous given their AI-consulting growth narratives. Deloitte alone disclosed double-digit AI-services growth in its FY25 results. The strategic options are narrow: pivot to model-agnostic positioning (the Anthropic-and-Mistral hedge already visible in Deloitte’s German practice), buy into the next OpenAI raise if one materialises, or accept being structurally junior in any RFP where the client wants OpenAI-native delivery. The show-don’t-tell version sits in a specific scenario. Picture a 2027 RFP from a DAX40 industrial for an agentic-workflow rollout across procurement and aftersales. Three bidders make the shortlist: Capgemini, an Atos-led European consortium, and DeployCo itself. The Capgemini bid offers OpenAI-native architecture, deeper model-roadmap visibility than its competitors can match, and pricing partially subsidised by equity-investor co-marketing budgets. The Atos bid offers sovereignty and a lower headline rate but no equivalent model access. The DeployCo bid offers OpenAI engineers embedded on-site at a 17.5%-floor economics model that requires multi-year minimum commitments. The CIO’s procurement lawyer has to write a conflicts disclosure that did not exist in the 2024 template. That document is the new market structure, in miniature.

·04What CIOs Should Do

Three concrete actions deserve agenda time before the next quarterly steering committee. First, demand written conflict-of-interest disclosures from any Frontier Alliance member currently engaged on AI strategy work. The question is binary and answerable: does the firm or its affiliated investment vehicles hold equity in the OpenAI Deployment Company, and how does that interest influence the options analysis presented to the client? The Capgemini release does not disclose the stake size; that is a question for the engagement partner, not the procurement portal. Second, recompute the make-versus-buy-versus-embed calculus. The DeployCo model is not a consulting engagement and not a software licence — it is a hybrid that imports OpenAI engineers into client premises at structured economics. For workloads where switching cost is high (core ERP-adjacent agents, regulated decisioning, customer-facing assistants), the lock-in profile is materially different from a standard SI build. Buyers should price a five-year exit cost into any DeployCo proposal and benchmark it against a model-agnostic build with an Anthropic or open-weight fallback. Third, treat the Frontier Alliance roster as a procurement signal, not a quality mark. The same logic that makes Capgemini, McKinsey and Bain commercially attractive partners for OpenAI-native deployments makes them structurally less neutral on the question of whether OpenAI is the right model for a given workload. The independent counterweights in DACH — Reply, MaibornWolff, Materna, the Big Four practices that have not taken equity — are now worth a parallel-track engagement at the architecture-decision stage. The cheapest insurance policy against vendor capture is a second opinion from a firm that has nothing to gain from the answer. The harder organisational fix is upstream of any one engagement. Boards that have not refreshed their AI-vendor due-diligence framework since 2024 are running it against a market structure that no longer exists. The cap-table-disclosure question belongs on the standing audit-committee checklist by the next quarterly meeting, not as a one-off response to this announcement.

Three Perspectives What this story means for different readers

DAX40 procurement should reopen 2024-vintage AI master-service agreements. Three line-items now need rewriting: the conflicts disclosure clause (does the SI hold equity in any sub-vendor it recommends, and at what size?); the model-portability clause (can workloads be migrated from OpenAI to Anthropic or an EU-sovereign alternative without rebuild?); and the FDE-residency clause (where do embedded engineers physically sit, under whose employment contract, and which works-council notifications apply?). At Siemens, Allianz, Deutsche Bank and BMW, AI spend is already in the high hundreds of millions annually; the negotiating leverage from a competitive 2027 RFP is the moment to crystallise these terms before DeployCo’s pricing power compounds. Run a parallel-track RFP with at least one non-Frontier-Alliance bidder to keep the market honest.

BaFin’s DORA implementation guidance requires regulated financial institutions to demonstrate vendor independence in critical-ICT-third-party arrangements. A bank advised on AI strategy by a firm that holds undisclosed equity in the recommended deployment vendor is exposed on Article 28 conflict-of-interest grounds. The European Banking Authority is likely to issue clarification before year-end; legal teams at Commerzbank, DZ Bank and the Sparkassen-Finanzgruppe should not wait for it. Beyond DORA, the AI Act’s high-risk-system obligations on documentation and human oversight intersect awkwardly with embedded-engineer delivery models where the line between vendor and operator blurs. Expect German works councils to challenge FDE deployments under Betriebsverfassungsgesetz §87 codetermination grounds, particularly in mitbestimmungsintensive sectors like automotive and chemicals.

The $14 billion valuation on $4 billion raised implies roughly 3.5x post-money — modest for an OpenAI subsidiary, generous for a services business. The 17.5% guaranteed return floor reveals the structure: this is private-equity infrastructure financing dressed as a venture round, the same template TPG used in healthcare services rollups. Expect Anthropic to respond within the quarter with its own services vehicle — Bessemer and General Catalyst are the obvious anchor candidates. The losers are the long tail of AI consulting startups (Cresta, Glean’s services arm, Mistral’s integration partners) now competing against a vendor with guaranteed-return capital and direct model-roadmap access. The PE-backed services-rollup playbook — buy boutique AI shops, consolidate, exit to a strategic — just had its valuation ceiling raised and its exit window potentially closed.

Sources 7 references

04 / 04 · Enterprise & Architecture

9 min read

Brooks at Forty: Is AI the Silver Bullet Software Engineering Has Been Waiting For?

Pragmatic Engineer Gergely Orosz revisits Fred Brooks’ 1986 verdict for the age of coding agents. The reading: not yet — but the most credible candidate in two decades, and one CIOs cannot reforecast on instinct alone..

·01Primer

In 1986, IBM veteran Frederick Brooks argued in “No Silver Bullet” that no single technique would deliver a tenfold gain in software productivity within a decade. Forty years on, AI coding agents — Anthropic’s Claude Code, Cursor, OpenAI’s Codex, Cognition’s Devin — are the loudest claim to that throne yet. Pragmatic Engineer author Gergely Orosz, drawing on 906 reader responses and a METR field study, concludes the bullet has not arrived: agents shave Brooks’ “accidental” complexity but barely dent the “essential” work of requirements, design and debugging-in-the-large. For DAX40 CIOs entering Q2 reforecasts with AI line items doubling, the question is no longer whether to deploy; it is how to measure, ration and govern.

·02What Happened

Picture a CIO at a Frankfurt insurer last week, pulling up the Q2 reforecast. The Claude Code line is up 4x quarter-on-quarter. The FinOps team has flagged that nearly a third of engineering seats hit their token cap before month-end. Procurement wants to know what was delivered for the money. The same week, Gergely Orosz publishes the most-read engineering essay of the month: “Revisiting ‘No Silver Bullets’ in the age of AI.” Orosz returns to a paper many enterprise technologists have not opened since university. In April 1986, Frederick Brooks — then a professor at the University of North Carolina, fresh from running IBM’s OS/360 — published a sixteen-page argument in IEEE Computer that no method, language, or tool would yield an order-of-magnitude productivity gain in software engineering within ten years. The 1995 reissue, bundled into The Mythical Man-Month, doubled down. Brooks split software difficulty into two pieces: “accidental” complexity, born of the tools we use to represent a solution, and “essential” complexity, native to the problem itself. Object-orientation, fourth-generation languages, automated programming, expert systems — Brooks rated each, and dismissed each, as bullets that bounce off essence. Orosz’s reading, drawing on a Pragmatic Engineer survey of 906 software engineers run between 27 January and 17 February 2026, is sober rather than sour. Ninety-five percent of respondents now use AI tools at least weekly; 75% rely on them for half or more of their engineering work. Claude Code, released in May 2025, has overtaken GitHub Copilot and Cursor in eight months, with a 46% “most loved” rating. Seventy percent of engineers juggle two to four AI tools simultaneously. Anthropic’s own disclosures place Claude Code at roughly 4% of all public GitHub commits by early 2026, with Boris Cherny — the product’s creator — claiming he has not hand-edited a line of his own code since November 2025. And yet. Orosz pulls in the METR field study, the most rigorously designed productivity test of the cycle: sixteen experienced contributors to large open-source repositories, randomised between AI-enabled and AI-forbidden tasks. Developers forecast a 24% speed-up. After the work, they felt 20% faster. The data said they were 19% slower, with a confidence interval of +2% to +39%. Perception and reality diverged by nearly forty points. The pivot in Orosz’s piece is historical. He nominates two prior “silver bullets” Brooks did not foresee. The first is Google’s Site Reliability Engineering practice, which from 2003 onwards delivered an order-of-magnitude reliability improvement for Search. The second, “silent,” bullet is open source plus GitHub: a productivity multiplier so woven into the daily fabric that engineers stopped counting it. Both attacked accidental complexity at scale; neither dissolved essence. The question Orosz puts to AI agents is the same.

·03Essence vs. Accident

Brooks’ 1986 taxonomy is the analytical spine of the piece, and it is worth restating in language a Vorstand can use. Accidental complexity is the friction of how software is built: syntax, build systems, deployment pipelines, the boilerplate of plumbing a service into a queue. Essential complexity is the difficulty of what is being built: unambiguous requirements from contradictory stakeholders, designs that survive scale, mental models of systems too large to hold in a single head, and — the dirtiest secret of enterprise software — debugging at three in the morning when a distributed system behaves in a way no one has seen before. Against accident, AI agents are unambiguously useful. Claude Code’s own usage telemetry, surfaced by Anthropic, shows 79% of conversations are automation tasks and 35.8% are iterative feedback loops — overwhelmingly the work of glue, scaffolding, and rote translation. Adevinta’s engineering blog and Altana’s published case study report 2x–10x velocity acceleration on greenfield code. Codex, which did not exist at the last Pragmatic Engineer survey, has already reached 60% of Cursor’s usage. Where the work is generating the obvious from the well-specified, agents compress hours into minutes. Against essence, the evidence is thinner and the failure modes louder. The METR result — experienced devs slower, not faster — is not an artefact; the cohort worked on repositories averaging 22,000 GitHub stars and a million lines of code, the kind of codebase where essential complexity dominates. Orosz catalogues the failure modes his readers reported: confidently wrong refactors in production paths, hallucinated APIs, debug sessions that loop because the agent cannot hold the system model, and a quiet erosion of senior judgement as juniors accept output they cannot evaluate. Cognition’s own Devin, valued at $25bn in April’s funding talks, posted a 15% success rate on its public SWE-bench-equivalent benchmark — a number that explains why $73m in ARR has not yet displaced human engineers from the architecture review. The FinOps signal is the operational tell. Orosz reports that almost a third of survey respondents have run into usage limits on their AI tooling — and that token spend has exploded at Meta, Microsoft and Salesforce in part because internal AI-usage metrics have become targets to hit, Goodhart-style. Ed Zitron, the AI cycle’s most acerbic critic, notes that Claude Code subscribers on the $200/month plan can burn through $2,700 in real compute cost before the month closes; Anthropic absorbs the gap. The unit economics of a “silver bullet” that loses money on every shot are not, in Brooks’ sense, a productivity miracle. They are a subsidy. That is Orosz’s verdict in one line: AI agents are the most credible candidate for a silver bullet since the 1986 essay was written, but they are still bullets aimed at accident, not essence. The senior engineer’s job — turning a vague request into a robust system, knowing which test to write, knowing which production incident is the canary — remains stubbornly human. For the first time in two decades, however, the question is genuinely open.

·04What CIOs Should Do Differently

The reforecast question is not whether to fund AI tooling — that argument is lost — but how to govern it as the line item compounds. Three operational levers fall out of Orosz’s analysis, and each is testable inside a single quarter. First, separate accident from essence in the productivity metric itself. A weighted Claude Code or Copilot dashboard that counts lines of code, accepted suggestions, or commits will reward the work the agent already does well and ignore the work that determines whether software ships safely. Measure cycle time on net-new feature delivery and mean time to recovery on production incidents; both are essence proxies and both are unforgiving of theatre. Faros AI and Tribe’s Claude Code ROI templates explicitly model the two halves; copy them. Second, ration tokens by team archetype, not by seat. A platform team rebuilding a legacy ETL pipeline can credibly burn ten times the compute of a steady-state product squad; treating both with the same monthly allowance produces the worst of both — frustration in one team, waste in the other. The 30% usage-limit hit rate in the Pragmatic Engineer data is a budgeting failure dressed up as a tooling problem. Third, treat senior engineering judgement as the scarce resource. The METR result is a warning shot: the most experienced contributors to a codebase are the ones the agents slow down, because they spend their time correcting plausible-but-wrong output. Pair-programming protocols, mandatory human review on production paths, and explicit guardrails on which repositories agents may touch are no longer optional. The CIOs who outperform in 2027 will be the ones who funded AI generously and reviewed its output ruthlessly. None of these moves require new vendors; all of them require a clearer view of which work the agents are actually doing, and the discipline to stop counting the rest.

Three Perspectives What this story means for different readers

For a DAX40 CIO running a Q2 reforecast, the AI-coding line is now material — Claude Code and Cursor seats compound quickly, and the FinOps team is reporting that 30% of engineers hit usage caps before month-end. The honest measurement frame is two-tracked: a velocity dashboard for accidental work (PRs merged, test coverage, scaffolding cycles) and an essence dashboard for what actually moves earnings (incident MTTR, feature lead time, defect escape rate). Anecdotal 2x–10x claims from Anthropic-published case studies (Altana, Adevinta) are real but selection-biased toward greenfield work. The METR field study is the counterweight every steering committee should read before approving the next licence expansion.

EU AI Act Article 50 transparency obligations apply from August 2026, and the draft Code of Practice published in early 2026 extends provenance marking from images and audio to AI-generated text — which, on a strict reading, includes generated source code. C2PA-style metadata embedding, signatory commitments to record the provenance chain from AI-assisted to fully AI-generated content, and machine-readable disclosure are the operational requirements. Legal teams should also re-examine OSS-licence exposure: code generated from models trained on GPL repositories sits in an unresolved zone, and three pending US and EU cases will shape liability. Treat AI-generated code as a regulated input, not a free output.

The capital stack tells its own silver-bullet story. Cursor is in talks to raise $2bn+ at a $50bn valuation, with SpaceX moving to acquire parent Anysphere on 21 April; Cognition is in talks at $25bn, more than double its September 2025 mark, on the strength of Devin’s $73m ARR run-rate and the accretive Windsurf acquisition. Anthropic monetises Claude Code at roughly $28m monthly, but Ed Zitron’s analysis suggests subscribers consume $8–$13.50 of compute per dollar paid — a subsidy that only frontier-model economics tolerate. The investable thesis is no longer the agent that writes code, but the orchestration, evaluation, and observability layer above multiple agents — where Codeium, Sourcegraph and a long tail of evaluator startups now sit.

Sources 10 references

Radical Optionality: Governing Transformative AI Under Uncertainty (Institute for Law & AI, via Jack Clark’s Import AI 456, May 11, 2026)

Christoph Winter and Charlie Bullock argue that democracies should not try to lock in today’s AI rulebook; instead they should preserve the ability to make good decisions as circumstances evolve by building information-gathering authorities, mandatory disclosure on model development and testing, whistleblower protections for lab employees, incident reporting, and supply-chain coordination — capabilities that can be activated only if powerful AI proves disruptive. Why this matters: for consultancies advising DAX40 boards, the EU AI Office, BSI and BaFin, this reframes the regulatory conversation away from the over- versus under-regulation binary toward institutional readiness — a frame clients can actually operationalise inside compliance, vendor-due-diligence and government-affairs functions over the next twelve months.

Source

Demis Hassabis’ DeepMind MIT 2019 Deck: Self-Learning Systems, Revisited (The VC Corner, May 12, 2026)

A 55-slide retrospective shows Hassabis’ 2019 MIT lecture named five open problems — unsupervised learning, memory, transfer learning, imagination-based planning and language understanding — and these became the exact research agenda that produced today’s frontier models, while his bet on learning systems over expert systems looks vindicated by AlphaFold and the Transformer era. Why this matters: for senior consulting and enterprise-AI leadership, the deck is a defensible board narrative for why current capability gains are structural rather than hype, and a useful diagnostic when triaging vendor pitches — solutions that still rest on hand-coded rules or that ignore memory and transfer are increasingly dead ends for DACH clients planning 2027 AI investment.

Source