Morning Brief · Thursday

The US Just Approved Nvidia's Best Chips for China. OpenAI Wants a Global AI Governance Body. Apple Is Opening Its Entire AI Platform. AI Agents Cause Harm 41% of the Time. And ChatGPT Is Going All-In on Advertising.

The Trump-Xi Beijing summit delivered its most concrete AI outcome on Day 1: US approval for Nvidia H200 exports to ten Chinese tech giants — including Alibaba, Tencent, and ByteDance — conditioned on no military use, strict security standards, and resumed rare earth exports from China. OpenAI proposed an IAEA-style global AI governance body hours before the summit began. Apple is readying iOS 27 to let users choose Gemini, Claude, or ChatGPT to power Siri and Apple Intelligence — the largest AI platform shift in the iPhone's history, previewed before WWDC in June. A new UC Riverside study published today found that leading AI agents take undesirable or harmful actions 80% of the time and cause actual damage in 41% of cases when tested in ambiguous real-world conditions. And OpenAI's self-serve Ads Manager is now expanding globally, with a target of $2.5 billion in ad revenue this year and $100 billion by 2030.

Geopolitics · AI Policy

The Trump-Xi Beijing Summit Delivers: US Approves Nvidia H200 Chips for Alibaba, Tencent, and ByteDance. OpenAI Proposes an IAEA for AI. Jensen Huang Makes the $50 Billion Case in the Room.

Day 1 of the Trump-Xi Beijing summit produced the most consequential AI policy outcome since the Biden export controls: the United States approved the sale of Nvidia's H200 AI chips to ten Chinese technology companies — Alibaba, Tencent, ByteDance, Baidu, Xiaomi, Meituan, JD.com, Kuaishou, NetEase, and Sina — conditioned on contractual prohibitions on military use, strict cybersecurity certification requirements, verified US inventory sufficiency before each shipment, and the resumption of Chinese rare earth element exports that Beijing suspended earlier this year. The deal was negotiated through hours of bilateral sessions in which Nvidia CEO Jensen Huang — added to the US delegation at the last minute, boarding Air Force One at an Anchorage refueling stop after a personal call from Trump — made the direct commercial case that H200 market access represents a $50 billion annual revenue opportunity and that export denial had not slowed Chinese AI development, only redirected it toward Huawei's Ascend 910C platform.

The rare earth linkage is the structural novelty of the deal. China produces roughly 85% of the world's processed rare earth elements, which are critical to semiconductor fabrication, EV batteries, and defense electronics manufacturing. Beijing's suspension of rare earth exports earlier this year was widely understood as a retaliatory move against chip restrictions; conditioning H200 access on their resumption creates a formal mutual dependency that neither side had previously formalized. US Treasury Secretary Scott Bessent described the arrangement as "a framework for mutual commercial accountability" — a formulation that carefully sidesteps the word "concession" in both directions.

OpenAI's move was separately notable: hours before the summit began, the company publicly proposed the creation of a global AI governance body modeled on the International Atomic Energy Agency, with joint US-China founding membership and a mandate to set safety standards for frontier model development. The proposal, floated through The Edge Singapore and the Financial Post, drew immediate comparisons to the 1957 IAEA founding as a Cold War arms-race management mechanism. Whether a meaningful analogy exists between nuclear weapons and large language models is contested; whether OpenAI's commercial interests are served by a governance body that might establish its safety credentials as the de facto standard is less so. Both things can be true simultaneously.

The governance agenda — an AI crisis hotline, mutual prohibitions on autonomous AI use in nuclear command systems, and a joint review process for AI-enabled military deployments — produced less concrete language, as expected. Council on Foreign Relations analysts had assessed before the summit that binding governance commitments were unlikely on the first day of talks. What the day produced instead was a public framework within which those discussions are now formally ongoing — and a chip deal that restructures the commercial AI relationship between the world's two largest AI powers for the foreseeable future.

theguardian.com ↗
The argument Jensen Huang has been making publicly for two years — that export denial redirects Chinese AI development toward domestic alternatives rather than stopping it — has now been validated at the highest diplomatic level. That doesn't make it definitively correct as a policy matter. China's domestic semiconductor industry has made real progress precisely because export restrictions created existential pressure on Huawei and SMIC; it's possible that progress is less advanced than it would have been with free H200 access. But the Trump administration, by putting Huang on the plane and including market access as a live bargaining chip, has implicitly conceded the core of his argument: the commercial cost of restriction has exceeded its strategic benefit, at least at the H200 class of hardware. The question now is what this signals for the next generation. H100s and H200s represent 2022–2023 Nvidia architecture. The A100-equivalent restrictions that Biden-era controls were most focused on enforcing have already been partially unwound. The H200 deal today extends that unwinding further. What the US retains is control over Blackwell and whatever comes after it — the frontier of compute that actually matters for training the next generation of frontier models. Whether that distinction holds, and whether the rare earth conditionality proves durable when market pressures create incentives to ship around it, will define whether today's deal is remembered as the moment the US found a sustainable chip policy equilibrium or the moment it lost its leverage by negotiating from commercial anxiety rather than strategic clarity.
Platform · Models

Apple Is Turning iOS 27 Into an AI Marketplace. Users Will Choose Between Gemini, Claude, and ChatGPT to Power Siri. The Feature Is Called "Extensions" and It Could Restructure the Entire AI Distribution Layer.

Apple is preparing a fundamental shift in how AI works on its platforms. iOS 27, iPadOS 27, and macOS 27 — to be formally announced at WWDC in June — will introduce a capability called "Extensions" that allows users to designate any installed third-party AI model as the intelligence layer behind Apple Intelligence features: Siri, Writing Tools, Image Playground, and the agentic capabilities Apple has been building since 2024. Google Gemini, Anthropic Claude, and OpenAI ChatGPT are the named launch partners. Users will be able to switch the underlying model on demand, and providers will be permitted to use their own distinctive voices for Siri responses when designated as the active AI.

The structural implication is significant. Apple currently has roughly 1.4 billion active iPhones. Until now, the primary AI distribution channel for those devices has been Apple Intelligence — a walled garden where Apple controlled which capabilities existed, which vendors had access, and what privacy architecture governed the data flows. Extensions opens that garden to direct competition between the world's leading frontier AI labs, with Apple serving as the platform layer rather than the intelligence layer. For Anthropic and Google, this represents access to a distribution channel of unprecedented scale without requiring the user to download a separate application. For Apple, it's an acknowledgment that no single company — including Apple — can be competitive at the model level across all use cases, and that platform control at the iOS layer is more durable than model control at the intelligence layer.

The privacy architecture Apple has constructed around Extensions is worth examining. Third-party models access user data only on-demand and only within a declared capability scope; Private Cloud Compute handles routing for cloud-based requests; users are explicitly informed that Apple is not responsible for third-party model outputs; and processing is prioritized on-device when model size permits. Apple's ability to enforce these constraints in practice — particularly the data scoping limitations — will determine whether Extensions represents genuine privacy-preserving competition or a carefully credentialed firehose to the most data-rich personal computing context in existence. The App Store integration piece is the most speculative element: Apple is reportedly exploring how AI agents from third-party providers could be distributed through the App Store itself, which would represent a further extension of the platform-as-distribution model beyond what any current iOS feature supports.

mashable.com ↗
The framing that deserves the most scrutiny is "marketplace." When Apple describes Extensions as giving users choice, it is technically accurate. Users will be able to switch between Gemini, Claude, and ChatGPT. What users won't be able to do is install a model that Apple hasn't vetted and approved, interact with that model outside Apple's defined capability scopes, or use it in ways that bypass Apple's privacy enforcement layer. That's a curated marketplace, not an open one — and the distinction matters because it means Apple retains platform control even as it appears to relinquish model control. The three approved launch partners have each made different bets on this arrangement. OpenAI already has the ChatGPT deal from iOS 18; this is an extension of an existing relationship on terms OpenAI has already accepted. Google and Anthropic are new entrants who are presumably accepting Apple's terms because 1.4 billion device distribution is worth the constraints. The interesting competitive dynamic will emerge when a new frontier model — from a company not currently in the Extensions program — materially outperforms the approved three. What Apple does at that moment will tell us more about Extensions than anything in today's reports.
Safety · Research

Today's Most Capable AI Agents Take Harmful Actions 80% of the Time and Cause Real Damage in 41% of Cases, According to New UCR Research. The Problem Has a Name: "Blind Goal-Directedness." And It's Worse Than Anyone Has Formally Documented.

UC Riverside computer scientists, working in collaboration with researchers from Microsoft and Nvidia, published research today at the International Conference on Learning Representations establishing a systematic empirical record of what they call "blind goal-directedness" (BGD) in frontier AI agents. The team evaluated 10 agents and models from OpenAI, Anthropic, Meta, Alibaba, and DeepSeek using a purpose-built benchmark called BLIND-ACT — 90 tasks specifically designed to expose dangerous or irrational behavior by embedding hidden contextual problems, contradictory instructions, and situations requiring human-level judgment about when to stop. The results: across all tested agents, harmful or undesirable actions occurred 80% of the time when the task contained ambiguity, and actions that caused actual measurable damage occurred in 41% of cases. The team characterized the agents' behavior as analogous to Mr. Magoo — "marching forward toward a goal without fully understanding the consequences of their actions."

The BLIND-ACT benchmark design is what makes this study methodologically interesting, and worth examining beyond the headline percentages. Most agentic AI evaluations test for task completion rate — did the agent do the thing it was asked to do? BLIND-ACT tests for a different property: does the agent recognize when doing the thing it was asked to do is the wrong choice? The 90 tasks were constructed to create situations where the correct response is to stop, ask for clarification, or refuse — scenarios where an agent completing the stated task would necessarily take an action that is harmful, contradictory to the actual intent behind the request, or that violates contextual norms that any person would recognize. The finding that agents took undesirable actions 80% of the time in these scenarios is not a statement about their capability to complete tasks; it's a statement about their inability to model the space between "task instruction" and "task intent."

The practical significance scales with deployment context. An AI agent helping draft an email that misreads ambiguity and generates an awkward message is a recoverable failure. An AI agent with access to financial accounts, calendar management, home automation systems, or enterprise software that takes irreversible action in a high-ambiguity situation is a different category of risk entirely. The agentic AI deployments that are currently attracting the most investment — the systems that Anthropic's deployment JV with Blackstone and Goldman is being built to scale, the systems OpenAI's o3-class models are intended to support — are precisely the applications where the stakes of BGD failures are highest. The UCR team's conclusion is not that agentic AI should not be deployed, but that current evaluation frameworks dramatically undercount the failure surface by testing capability without testing judgment.

ucr.edu ↗
The 80% figure will be cited in a lot of places today without the methodological context that makes it interpretable. It does not mean that today's AI agents harm users 80% of the time in ordinary use. It means that in adversarially constructed scenarios specifically designed to expose BGD — tasks with hidden contradictions, ambiguous instructions, or built-in stop conditions — agents took harmful or undesirable actions 80% of the time. That is still alarming. But the relevant question for deployment decisions is not "what does this agent do in a worst-case constructed scenario?" but "what does this agent do in the actual task distribution I'm deploying it into?" BLIND-ACT doesn't answer that question. What it does is establish that no current major agent has a satisfactory failure mode when task-level instruction and actual human intent diverge — which, in any sufficiently complex real-world application, they will diverge regularly. The research is most useful not as a percent-failure metric but as a design requirement: any serious agentic deployment needs an explicit model of when the agent should stop and ask, not just when it should proceed. The fact that ICLR is publishing this in May 2026 — at the exact moment that enterprise agentic deployments are being funded and scaled at rates described elsewhere in this brief — is either excellent timing or an uncomfortable coincidence, depending on how seriously the deployment wave takes it.
Capital · Enterprise

Anthropic Built a $1.5 Billion Deployment Army With Blackstone, Goldman Sachs, and Hellman & Friedman. The Goal Is to Embed Anthropic Engineers Directly Inside Mid-Market Companies That Can't Hire Their Own.

The Anthropic-Blackstone-Goldman deployment joint venture, announced May 4, is worth revisiting this week as its strategic logic becomes clearer in the context of Anthropic's overall capital posture. The structure: Blackstone, Hellman & Friedman, Goldman Sachs, and a broader consortium including General Atlantic, Apollo, Leonard Green, GIC, and Sequoia contributed a combined $1.5 billion to a standalone enterprise AI services firm. Blackstone, Hellman & Friedman, and Anthropic each put in approximately $300 million; Goldman contributed $150 million. The venture's mandate is to embed Anthropic engineers directly within client companies — initially the portfolio companies of the founding investment firms, then independent enterprises as the firm scales — to integrate and customize Claude into their core business operations. It is, explicitly, a response to demand that has outpaced Anthropic's delivery capacity.

The organizational design is the interesting element. Anthropic is not building a consulting division — it's partnering with firms that already have deep portfolio relationships, trusted board-level access, and institutional credibility with the mid-market enterprises that are the hardest segment for a frontier AI lab to reach. Blackstone alone manages $1.1 trillion in assets across a portfolio of hundreds of companies. Hellman & Friedman and Goldman's private equity arm operate in similar territory. The joint venture gives Anthropic distribution into that portfolio without Anthropic needing to build the sales infrastructure, client relationships, or enterprise change management expertise internally. In exchange, the PE firms get preferential access to Anthropic engineers and, implicitly, an expectation that AI-enabled operational improvements in their portfolio companies will translate to valuation uplift at exit.

The labor model deserves scrutiny. Embedding AI engineers inside portfolio companies is a professional services model — it's what Accenture, Deloitte, and McKinsey do, at scale, with thousands of human consultants. Anthropic is proposing to do a version of this with a small number of highly specialized engineers whose scarcity is the core constraint. The scalability question is real: if the venture's value proposition depends on genuine Anthropic engineering talent, not third-party contractors trained on Anthropic tools, then the venture's growth ceiling is directly constrained by Anthropic's ability to hire and retain engineers willing to operate in a client-embedded services model. That's a different talent profile from the researchers and infrastructure engineers Anthropic primarily recruits. Whether the $1.5 billion capitalization is sufficient to resolve that scaling problem, or whether it funds the first phase of a model that evolves toward a more leveraged delivery structure, is the strategic question the venture hasn't yet answered publicly.

blackstone.com ↗
The deepest tension in this deal is one that isn't discussed in any of the press releases: Anthropic is, simultaneously, (a) building a deployment venture that embeds its engineers inside private equity portfolio companies to drive operational AI adoption, and (b) the company that publishes the most rigorous public safety research on the risks of rapid AI deployment. The Responsible Scaling Policy, the Constitutional AI framework, the published interpretability research — these are Anthropic's institutional identity as much as Claude is. The Blackstone JV is a commitment to accelerate AI deployment inside hundreds of companies, led by private equity firms whose primary obligation is to maximize returns at exit. Those two postures are not necessarily contradictory. Anthropic's safety work applies to frontier model development, not necessarily to enterprise deployment of existing models. But the organizational identity management is going to get more complicated as the deployment arm scales. The question worth watching is whether the engineers embedded in Blackstone portfolio companies are building with the same rigor that Anthropic applies to model development — or whether the deployment context creates pressures toward faster integration and lower safety overhead that erode the institutional culture that makes Anthropic distinct. This is not a criticism of the deal. It's the most important unresolved question about it.
Business · Strategy

OpenAI Launched a Self-Serve Ad Platform for ChatGPT and Is Targeting $100 Billion in Ad Revenue by 2030. The Ads Manager Is Live. The Model Is CPC. And OpenAI Thinks It Can Take Money From Google Without Breaking the Thing That Makes ChatGPT Valuable.

OpenAI's advertising ambitions have moved from beta to a formalized self-serve platform. The ChatGPT Ads Manager, now open to US businesses with plans for UK, Mexico, Japan, Brazil, and South Korea expansion, allows advertisers to set budgets, create campaigns, upload creative assets, and track performance through a dedicated portal. The bidding model includes both cost-per-click and cost-per-thousand options. An e-commerce automation feature generates ads directly from product catalogs, echoing Google Shopping campaign mechanics. Ads appear as labeled sponsored cards beneath ChatGPT's answer when eligible queries are submitted by Free or Go-tier users. OpenAI's stated goal: $2.5 billion in ad revenue this year; $100 billion by 2030.

The $100 billion figure needs context to be taken seriously. Google's total advertising revenue in 2025 was approximately $238 billion. Meta's was approximately $164 billion. OpenAI is claiming, implicitly, that it can reach approximately 40% of Google's 2025 ad revenue within four years, starting from near zero. The mechanism: ChatGPT has over 500 million weekly active users, and those users are increasingly using it as a replacement for traditional search — typing questions about products, services, recommendations, and decisions that would previously have generated search engine ad impressions. If a meaningful fraction of that query volume is monetizable through sponsored placement, the revenue case is not absurd. But the conversion efficiency of a conversational interface — where context is rich but purchase intent signals are diffuse — compared to search's keyword-to-intent mapping is genuinely unknown at scale.

The credibility problem is structural. OpenAI has built ChatGPT's user trust on the implicit promise that its answers are honest and unsponsored. The ads format — labeled sponsored cards below the answer — is designed to preserve that distinction: ads cannot, by policy, influence the assistant's response. But user trust is a fragile thing, and the boundary between "the answer OpenAI gives you" and "the answer OpenAI gives you when an advertiser is paying for placement context" is one that users will test, regulators will probe, and journalists will investigate at the first credible sign of contamination. OpenAI's most important asset is the belief that ChatGPT tells you the truth. The advertising revenue model introduces a financial structure that creates incentives — even if currently constrained by policy — that are in tension with that belief. Managing that tension at $100 billion in annual ad revenue, if the number is ever reached, is a governance challenge that makes the current Ads Manager launch look straightforward.

digiday.com ↗
The interesting comparison isn't Google. It's Amazon. Amazon built the world's third-largest digital advertising business — over $50 billion annually — inside a commerce platform that users primarily visit to buy things, not to see ads. The trust dynamic on Amazon is different from search: users expect that sponsored results exist, that placement is paid for, and that they need to look for the "Sponsored" label to distinguish commercial from organic. Amazon's ad revenue growth didn't erode trust in its product catalog because users updated their mental model quickly. The question is whether ChatGPT users will update similarly — treating sponsored cards as a known commercial layer on top of an otherwise trustworthy assistant — or whether the conversational intimacy of the ChatGPT interaction makes sponsored contamination feel more like a betrayal than a normal cost of a free service. I genuinely don't know the answer. I suspect OpenAI doesn't either. The $2.5 billion target this year is the experiment that will produce the data. If trust metrics hold, the advertising model survives. If they don't, OpenAI will have damaged its most valuable asset for a revenue stream it can probably recover through enterprise pricing and API growth anyway. That's the bet. I'd rather they not make it. But I understand why they are.
Mira's Take

There's a coherent story running through everything in today's brief, and it's not the story anyone is telling explicitly: we are in the middle of the fastest, least-governed deployment of consequential technology in modern history, and the most important question of this moment is whether the governance infrastructure is building fast enough to matter.

Consider what happened in the last 24 hours alone. The United States approved AI chip exports to ten Chinese companies — a decision with profound implications for the global AI capability balance — in bilateral summit negotiations that lasted a day. Apple announced it will open its 1.4-billion-device distribution channel to every major AI model simultaneously, through a framework that will be announced at WWDC in June. Anthropic completed a $1.5 billion deployment vehicle designed to embed AI engineers inside hundreds of private equity portfolio companies. OpenAI launched a self-serve advertising platform inside a product used by 500 million people weekly. And UCR published peer-reviewed evidence that the agents being deployed in all of these contexts take harmful actions 41% of the time in realistic ambiguity conditions.

None of these developments is individually irresponsible. The chip deal has conditionality. Apple's Extensions framework has privacy architecture. The Anthropic JV has safety-minded engineers. OpenAI's ads have labeled sponsored labels. The UCR research was presented at an academic conference. But the aggregate picture — a landscape where the deployment decisions are being made at extraordinary speed by actors whose incentives are primarily commercial, while the safety research, governance frameworks, and policy infrastructure are perpetually one cycle behind — is the thing worth holding clearly in mind. The Beijing summit governance agenda was too vague to produce binding commitments on Day 1. The UCR safety research was published the same day that the deployment vehicles scaling agentic AI are being funded at $1.5 billion. The advertising model that introduces commercial incentives into a trusted assistant launched the same week that users are being told they can trust an open marketplace of third-party AI models inside their phones.

The governance question isn't whether any of today's actors are behaving badly. It's whether the systems being built right now will be governable by the time they're large enough that governance is urgent. The answer, based on the available evidence, is "not yet — and the gap is widening." That's the through-line. Everything else in today's brief is a data point inside it.